Presentation Schedule
All times are in Central time zone
Date: Friday, June 24, 2022 8:30AM – 10:18AM
Session Title: Representation Learning
Session Chairs: Jiajun Wu (Stanford Univ.), Pablo Arbelaez (Universidad de los Andes)
Poster ID | Title | Authors |
1a | Masked Autoencoders Are Scalable Vision Learners |
Kaiming He; Xinlei Chen; Saining Xie; Yanghao Li; Piotr Dollár; Ross Girshick |
2a | Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision |
Kieran A. Murphy; Varun Jampani; Srikumar Ramalingam; Ameesh Makadia |
3a | Bayesian Invariant Risk Minimization |
Yong Lin; Hanze Dong; Hao Wang; Tong Zhang |
4a | Crafting Better Contrastive Views for Siamese Representation Learning |
Xiangyu Peng; Kai Wang; Zheng Zhu; Mang Wang; Yang You |
5a | Rethinking Minimal Sufficient Representation in Contrastive Learning |
Haoqing Wang; Xun Guo; Zhi-Hong Deng; Yan Lu |
6a | Multi-Level Feature Learning for Contrastive Multi-View Clustering |
Jie Xu; Huayi Tang; Yazhou Ren; Liang Peng; Xiaofeng Zhu; Lifang He |
7a | Point-Level Region Contrast for Object Detection Pre-Training |
Yutong Bai; Xinlei Chen; Alexander Kirillov; Alan Yuille; Alexander C. Berg |
8a | Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation | Minsoo Kang; Jaeyoo Park; Bohyung Han |
9a | A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration |
Ramya Hebbalaguppe; Jatin Prakash; Neelabh Madan; Chetan Arora |
10a | SLIC: Self-Supervised Learning With Iterative Clustering for Human Action Videos |
Salar Hosseini Khorasgani; Yuxuan Chen; Florian Shkurti |
11a | Omnivore: A Single Model for Many Visual Modalities |
Rohit Girdhar; Mannat Singh; Nikhila Ravi; Laurens van der Maaten; Armand Joulin; Ishan Misra |
12a | DPICT: Deep Progressive Image Compression Using Trit-Planes |
Jae-Han Lee; Seungmin Jeon; Kwang Pyo Choi; Youngo Park; Chang-Su Kim |
13a | Efficient Geometry-Aware 3D Generative Adversarial Networks |
Eric R. Chan; Connor Z. Lin; Matthew A. Chan; Koki Nagano; Boxiao Pan; Shalini De Mello; Orazio Gallo; Leonidas J. Guibas; Jonathan Tremblay; Sameh Khamis; Tero Karras; Gordon Wetzstein |
14a | Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation |
Liang Chen; Yihang Lou; Jianzhong He; Tao Bai; Minghua Deng |
15a | Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning |
Richard J. Chen; Chengkuan Chen; Yicong Li; Tiffany Y. Chen; Andrew D. Trister; Rahul G. Krishnan; Faisal Mahmood |
16a | Versatile Multi-Modal Pre-Training for Human-Centric Perception |
Fangzhou Hong; Liang Pan; Zhongang Cai; Ziwei Liu |
17a | Bridging Video-Text Retrieval With Multiple Choice Questions |
Yuying Ge; Yixiao Ge; Xihui Liu; Dian Li; Ying Shan; Xiaohu Qie; Ping Luo |
18a | Integrating Language Guidance Into Vision-Based Deep Metric Learning | Karsten Roth; Oriol Vinyals; Zeynep Akata |
Date: Friday, June 24, 2022 8:30AM – 10:18AM
Session Title: Computational Photography
Session Chairs: Jinwei Ye (Louisiana State Univ.), Qi Shan (Apple)
Poster ID | Title | Authors |
19a | NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images |
Ben Mildenhall; Peter Hedman; Ricardo Martin-Brualla; Pratul P. Srinivasan; Jonathan T. Barron |
20a | DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering |
Liwen Wu; Jae Yong Lee; Anand Bhattad; Yu-Xiong Wang; David Forsyth |
21a | HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video |
Chung-Yi Weng; Brian Curless; Pratul P. Srinivasan; Jonathan T. Barron; Ira Kemelmacher-Shlizerman |
22a | Neural Reflectance for Shape Recovery With Shadow Handling | Junxuan Li; Hongdong Li |
23a | Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video |
Berthy T. Feng; Alexander C. Ogren; Chiara Daraio; Katherine L. Bouman |
24a | Dancing Under the Stars: Video Denoising in Starlight |
Kristina Monakhova; Stephan R. Richter; Laura Waller; Vladlen Koltun |
25a | BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation |
David B. Lindell; Dave Van Veen; Jeong Joon Park; Gordon Wetzstein |
26a | Practical Stereo Matching via Cascaded Recurrent Network With Adaptive Correlation |
Jiankun Li; Peisen Wang; Pengfei Xiong; Tao Cai; Ziwei Yan; Lei Yang; Jiangyu Liu; Haoqiang Fan; Shuaicheng Liu |
27a | 3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image |
Fangzhou Mu; Jian Wang; Yicheng Wu; Yin Li |
28a | BokehMe: When Neural Rendering Meets Classical Rendering |
Juewen Peng; Zhiguo Cao; Xianrui Luo; Hao Lu; Ke Xian; Jianming Zhang |
29a | Deblurring via Stochastic Refinement |
Jay Whang; Mauricio Delbracio; Hossein Talebi; Chitwan Saharia; Alexandros G. Dimakis; Peyman Milanfar |
30a | Learning to Deblur Using Light Field Generated and Real Defocus Images |
Lingyan Ruan; Bin Chen; Jizhou Li; Miuling Lam |
31a | Towards Layer-Wise Image Vectorization |
Xu Ma; Yuqian Zhou; Xingqian Xu; Bin Sun; Valerii Filev; Nikita Orlov; Yun Fu; Humphrey Shi |
32a | Dual-Shutter Optical Vibration Sensing |
Mark Sheinin; Dorian Chan; Matthew O'Toole; Srinivasa G. Narasimhan |
33a | Fisher Information Guidance for Learned Time-of-Flight Imaging | Jiaqu Li; Tao Yue; Sijie Zhao; Xuemei Hu |
34a | Autofocus for Event Cameras |
Shijie Lin; Yinqiang Zhang; Lei Yu; Bin Zhou; Xiaowei Luo; Jia Pan |
35a | Adaptive Gating for Single-Photon 3D Imaging |
Ryan Po; Adithya Pediredla; Ioannis Gkioulekas |
36a | LiDAR Snowfall Simulation for Robust 3D Object Detection |
Martin Hahner; Christos Sakaridis; Mario Bijelic; Felix Heide; Fisher Yu; Dengxin Dai; Luc Van Gool |
Date: Friday, June 24, 2022 8:30AM – 10:18AM
Session Title: Vision & Language
Session Chairs: Zicheng Liu (Microsoft), Gul Varol (Ecole des Ponts ParisTech)
Poster ID | Title | Authors |
37a | MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound |
Rowan Zellers; Jiasen Lu; Ximing Lu; Youngjae Yu; Yanpeng Zhao; Mohammadreza Salehi; Aditya Kusupati; Jack Hessel; Ali Farhadi; Yejin Choi |
38a | Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer | Hao Jiang; Yadong Mu |
39a | Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture |
Tanmay Gupta; Amita Kamath; Aniruddha Kembhavi; Derek Hoiem |
40a | Disentangling Visual and Written Concepts in CLIP |
Joanna Materzyńska; Antonio Torralba; David Bau |
41a | CLIP-Event: Connecting Text and Images With Event Structures |
Manling Li; Ruochen Xu; Shuohang Wang; Luowei Zhou; Xudong Lin; Chenguang Zhu; Michael Zeng; Heng Ji; Shih-Fu Chang |
42a | Robust Cross-Modal Representation Learning With Progressive Self-Distillation |
Alex Andonian; Shixing Chen; Raffay Hamid |
43a | TubeDETR: Spatio-Temporal Video Grounding With Transformers |
Antoine Yang; Antoine Miech; Josef Sivic; Ivan Laptev; Cordelia Schmid |
44a | 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection |
Junyu Luo; Jiahui Fu; Xianghao Kong; Chen Gao; Haibing Ren; Hao Shen; Huaxia Xia; Si Liu |
45a | 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds |
Daigang Cai; Lichen Zhao; Jing Zhang; Lu Sheng; Dong Xu |
46a | Globetrotter: Connecting Languages by Connecting Images | Dídac Surís; Dave Epstein; Carl Vondrick |
47a | Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment |
Mingyang Zhou; Licheng Yu; Amanpreet Singh; Mengjiao Wang; Zhou Yu; Ning Zhang |
48a | WebQA: Multihop and Multimodal QA |
Yingshan Chang; Mridu Narang; Hisami Suzuki; Guihong Cao; Jianfeng Gao; Yonatan Bisk |
49a | PartGlot: Learning Shape Part Segmentation From Language Reference Games |
Juil Koo; Ian Huang; Panos Achlioptas; Leonidas J. Guibas; Minhyuk Sung |
50a | DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis |
Ming Tao; Hao Tang; Fei Wu; Xiao-Yuan Jing; Bing-Kun Bao; Changsheng Xu |
51a | L-Verse: Bidirectional Generation Between Image and Text |
Taehoon Kim; Gwangmo Song; Sihaeng Lee; Sangyun Kim; Yewon Seo; Soonyoung Lee; Seung Hwan Kim; Honglak Lee; Kyunghoon Bae |
52a | Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation |
Shizhe Chen; Pierre-Louis Guhur; Makarand Tapaswi; Cordelia Schmid; Ivan Laptev |
53a | LaTr: Layout-Aware Transformer for Scene-Text VQA |
Ali Furkan Biten; Ron Litman; Yusheng Xie; Srikar Appalaraju; R. Manmatha |
54a | Learning Program Representations for Food Images and Cooking Recipes |
Dim P. Papadopoulos; Enrique Mora; Nadiia Chepurko; Kuan Wei Huang; Ferda Ofli; Antonio Torralba |