Presentation Schedule
All times are in Central time zone
Date: Friday, June 24, 2022 8:30AM – 10:18AM
Session Title: Representation Learning
Session Chairs: Jiajun Wu (Stanford Univ.), Pablo Arbelaez (Universidad de los Andes)
| Poster ID | Title | Authors |
| 1a | Masked Autoencoders Are Scalable Vision Learners |
Kaiming He; Xinlei Chen; Saining Xie; Yanghao Li; Piotr Dollár; Ross Girshick |
| 2a | Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision |
Kieran A. Murphy; Varun Jampani; Srikumar Ramalingam; Ameesh Makadia |
| 3a | Bayesian Invariant Risk Minimization |
Yong Lin; Hanze Dong; Hao Wang; Tong Zhang |
| 4a | Crafting Better Contrastive Views for Siamese Representation Learning |
Xiangyu Peng; Kai Wang; Zheng Zhu; Mang Wang; Yang You |
| 5a | Rethinking Minimal Sufficient Representation in Contrastive Learning |
Haoqing Wang; Xun Guo; Zhi-Hong Deng; Yan Lu |
| 6a | Multi-Level Feature Learning for Contrastive Multi-View Clustering |
Jie Xu; Huayi Tang; Yazhou Ren; Liang Peng; Xiaofeng Zhu; Lifang He |
| 7a | Point-Level Region Contrast for Object Detection Pre-Training |
Yutong Bai; Xinlei Chen; Alexander Kirillov; Alan Yuille; Alexander C. Berg |
| 8a | Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation | Minsoo Kang; Jaeyoo Park; Bohyung Han |
| 9a | A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration |
Ramya Hebbalaguppe; Jatin Prakash; Neelabh Madan; Chetan Arora |
| 10a | SLIC: Self-Supervised Learning With Iterative Clustering for Human Action Videos |
Salar Hosseini Khorasgani; Yuxuan Chen; Florian Shkurti |
| 11a | Omnivore: A Single Model for Many Visual Modalities |
Rohit Girdhar; Mannat Singh; Nikhila Ravi; Laurens van der Maaten; Armand Joulin; Ishan Misra |
| 12a | DPICT: Deep Progressive Image Compression Using Trit-Planes |
Jae-Han Lee; Seungmin Jeon; Kwang Pyo Choi; Youngo Park; Chang-Su Kim |
| 13a | Efficient Geometry-Aware 3D Generative Adversarial Networks |
Eric R. Chan; Connor Z. Lin; Matthew A. Chan; Koki Nagano; Boxiao Pan; Shalini De Mello; Orazio Gallo; Leonidas J. Guibas; Jonathan Tremblay; Sameh Khamis; Tero Karras; Gordon Wetzstein |
| 14a | Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation |
Liang Chen; Yihang Lou; Jianzhong He; Tao Bai; Minghua Deng |
| 15a | Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning |
Richard J. Chen; Chengkuan Chen; Yicong Li; Tiffany Y. Chen; Andrew D. Trister; Rahul G. Krishnan; Faisal Mahmood |
| 16a | Versatile Multi-Modal Pre-Training for Human-Centric Perception |
Fangzhou Hong; Liang Pan; Zhongang Cai; Ziwei Liu |
| 17a | Bridging Video-Text Retrieval With Multiple Choice Questions |
Yuying Ge; Yixiao Ge; Xihui Liu; Dian Li; Ying Shan; Xiaohu Qie; Ping Luo |
| 18a | Integrating Language Guidance Into Vision-Based Deep Metric Learning | Karsten Roth; Oriol Vinyals; Zeynep Akata |
Date: Friday, June 24, 2022 8:30AM – 10:18AM
Session Title: Computational Photography
Session Chairs: Jinwei Ye (Louisiana State Univ.), Qi Shan (Apple)
| Poster ID | Title | Authors |
| 19a | NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images |
Ben Mildenhall; Peter Hedman; Ricardo Martin-Brualla; Pratul P. Srinivasan; Jonathan T. Barron |
| 20a | DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering |
Liwen Wu; Jae Yong Lee; Anand Bhattad; Yu-Xiong Wang; David Forsyth |
| 21a | HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video |
Chung-Yi Weng; Brian Curless; Pratul P. Srinivasan; Jonathan T. Barron; Ira Kemelmacher-Shlizerman |
| 22a | Neural Reflectance for Shape Recovery With Shadow Handling | Junxuan Li; Hongdong Li |
| 23a | Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video |
Berthy T. Feng; Alexander C. Ogren; Chiara Daraio; Katherine L. Bouman |
| 24a | Dancing Under the Stars: Video Denoising in Starlight |
Kristina Monakhova; Stephan R. Richter; Laura Waller; Vladlen Koltun |
| 25a | BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation |
David B. Lindell; Dave Van Veen; Jeong Joon Park; Gordon Wetzstein |
| 26a | Practical Stereo Matching via Cascaded Recurrent Network With Adaptive Correlation |
Jiankun Li; Peisen Wang; Pengfei Xiong; Tao Cai; Ziwei Yan; Lei Yang; Jiangyu Liu; Haoqiang Fan; Shuaicheng Liu |
| 27a | 3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image |
Fangzhou Mu; Jian Wang; Yicheng Wu; Yin Li |
| 28a | BokehMe: When Neural Rendering Meets Classical Rendering |
Juewen Peng; Zhiguo Cao; Xianrui Luo; Hao Lu; Ke Xian; Jianming Zhang |
| 29a | Deblurring via Stochastic Refinement |
Jay Whang; Mauricio Delbracio; Hossein Talebi; Chitwan Saharia; Alexandros G. Dimakis; Peyman Milanfar |
| 30a | Learning to Deblur Using Light Field Generated and Real Defocus Images |
Lingyan Ruan; Bin Chen; Jizhou Li; Miuling Lam |
| 31a | Towards Layer-Wise Image Vectorization |
Xu Ma; Yuqian Zhou; Xingqian Xu; Bin Sun; Valerii Filev; Nikita Orlov; Yun Fu; Humphrey Shi |
| 32a | Dual-Shutter Optical Vibration Sensing |
Mark Sheinin; Dorian Chan; Matthew O'Toole; Srinivasa G. Narasimhan |
| 33a | Fisher Information Guidance for Learned Time-of-Flight Imaging | Jiaqu Li; Tao Yue; Sijie Zhao; Xuemei Hu |
| 34a | Autofocus for Event Cameras |
Shijie Lin; Yinqiang Zhang; Lei Yu; Bin Zhou; Xiaowei Luo; Jia Pan |
| 35a | Adaptive Gating for Single-Photon 3D Imaging |
Ryan Po; Adithya Pediredla; Ioannis Gkioulekas |
| 36a | LiDAR Snowfall Simulation for Robust 3D Object Detection |
Martin Hahner; Christos Sakaridis; Mario Bijelic; Felix Heide; Fisher Yu; Dengxin Dai; Luc Van Gool |
Date: Friday, June 24, 2022 8:30AM – 10:18AM
Session Title: Vision & Language
Session Chairs: Zicheng Liu (Microsoft), Gul Varol (Ecole des Ponts ParisTech)
| Poster ID | Title | Authors |
| 37a | MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound |
Rowan Zellers; Jiasen Lu; Ximing Lu; Youngjae Yu; Yanpeng Zhao; Mohammadreza Salehi; Aditya Kusupati; Jack Hessel; Ali Farhadi; Yejin Choi |
| 38a | Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer | Hao Jiang; Yadong Mu |
| 39a | Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture |
Tanmay Gupta; Amita Kamath; Aniruddha Kembhavi; Derek Hoiem |
| 40a | Disentangling Visual and Written Concepts in CLIP |
Joanna Materzyńska; Antonio Torralba; David Bau |
| 41a | CLIP-Event: Connecting Text and Images With Event Structures |
Manling Li; Ruochen Xu; Shuohang Wang; Luowei Zhou; Xudong Lin; Chenguang Zhu; Michael Zeng; Heng Ji; Shih-Fu Chang |
| 42a | Robust Cross-Modal Representation Learning With Progressive Self-Distillation |
Alex Andonian; Shixing Chen; Raffay Hamid |
| 43a | TubeDETR: Spatio-Temporal Video Grounding With Transformers |
Antoine Yang; Antoine Miech; Josef Sivic; Ivan Laptev; Cordelia Schmid |
| 44a | 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection |
Junyu Luo; Jiahui Fu; Xianghao Kong; Chen Gao; Haibing Ren; Hao Shen; Huaxia Xia; Si Liu |
| 45a | 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds |
Daigang Cai; Lichen Zhao; Jing Zhang; Lu Sheng; Dong Xu |
| 46a | Globetrotter: Connecting Languages by Connecting Images | Dídac Surís; Dave Epstein; Carl Vondrick |
| 47a | Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment |
Mingyang Zhou; Licheng Yu; Amanpreet Singh; Mengjiao Wang; Zhou Yu; Ning Zhang |
| 48a | WebQA: Multihop and Multimodal QA |
Yingshan Chang; Mridu Narang; Hisami Suzuki; Guihong Cao; Jianfeng Gao; Yonatan Bisk |
| 49a | PartGlot: Learning Shape Part Segmentation From Language Reference Games |
Juil Koo; Ian Huang; Panos Achlioptas; Leonidas J. Guibas; Minhyuk Sung |
| 50a | DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis |
Ming Tao; Hao Tang; Fei Wu; Xiao-Yuan Jing; Bing-Kun Bao; Changsheng Xu |
| 51a | L-Verse: Bidirectional Generation Between Image and Text |
Taehoon Kim; Gwangmo Song; Sihaeng Lee; Sangyun Kim; Yewon Seo; Soonyoung Lee; Seung Hwan Kim; Honglak Lee; Kyunghoon Bae |
| 52a | Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation |
Shizhe Chen; Pierre-Louis Guhur; Makarand Tapaswi; Cordelia Schmid; Ivan Laptev |
| 53a | LaTr: Layout-Aware Transformer for Scene-Text VQA |
Ali Furkan Biten; Ron Litman; Yusheng Xie; Srikar Appalaraju; R. Manmatha |
| 54a | Learning Program Representations for Food Images and Cooking Recipes |
Dim P. Papadopoulos; Enrique Mora; Nadiia Chepurko; Kuan Wei Huang; Ferda Ofli; Antonio Torralba |