Dataset Contributions

The following CVPR 2022 papers claim a dataset contribution or were identified during the review process as making a dataset contribution. Their authors committed to and are accountable for making these datasets public by the start of the conference. They have supplied links to the dataset(s), provided here for the convenience of the CVPR community. Authors of these papers are responsible for the validity and accuracy of the datasets. Please contact the authors of the respective paper in case of any issues.

 

Paper ID Title Authors Dataset URL 2nd Dataset URL
19 Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang Link  
35 MLSLT: Towards Multilingual Sign Language Translation Aoxiong Yin, Zhou Zhao, Weike Jin, Meng Zhang, Xingshan Zeng, Xiaofei He Link  
41 360MonoDepth: High-Resolution 360° Monocular Depth Estimation Manuel Rey-Area, Mingze Yuan, Christian Richardt Link  
42 Generating Diverse and Natural 3D Human Motions From Text Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, Li Cheng Link  
84 De-Rendering 3D Objects in the Wild Felix Wimbauer, Shangzhe Wu, Christian Rupprecht Link  
103 Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks Peri Akiva, Matthew Purri, Matthew Leotta Link  
141 Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association Ruize Han, Yiyang Gan, Jiacheng Li, Feifan Wang, Wei Feng, Song Wang Link  
155 Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich Link  
211 IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment Yiming Zeng, Yue Qian, Qijian Zhang, Junhui Hou, Yixuan Yuan, Ying He Link  
229 Deep Decomposition for Stochastic Normal-Abnormal Transport Peirong Liu, Yueh Lee, Stephen Aylward, Marc Niethammer Link  
258 f-SfT: Shape-From-Template With a Physics-Based Deformation Model Navami Kairanda, Edith Tretschk, Mohamed Elgharib, Christian Theobalt, Vladislav Golyanik Link Link 2
348 Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image Yujiao Shi, Hongdong Li Link  
357 Forward Propagation, Backward Regression, and Pose Association for Hand Tracking in the Wild Mingzhen Huang, Supreeth Narasimhaswamy, Saif Vazir, Haibin Ling, Minh Hoai Link  
378 FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos Yan Wang, Yixuan Sun, Yiwen Huang, Zhongying Liu, Shuyong Gao, Wei Zhang, Weifeng Ge, Wenqiang Zhang Link  
436 Replacing Labeled Real-Image Datasets With Auto-Generated Contours Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota Link  
441 SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images Tewodros Habtegebrial, Christiano Gava, Marcel Rogge, Didier Stricker, Varun Jampani Link  
469 MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba, Chen Zhao, Silvio Giancola, Bernard Ghanem Link  
516 Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan Link Link 2
573 Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data Jungbeom Lee, Seong Joon Oh, Sangdoo Yun, Junsuk Choe, Eunji Kim, Sungroh Yoon Link  
576 Investigating Tradeoffs in Real-World Video Super-Resolution Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy Link  
583 OakInk: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction Lixin Yang, Kailin Li, Xinyu Zhan, Fei Wu, Anran Xu, Liu Liu, Cewu Lu Link  
628 Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification Yanan Wang, Xuezhi Liang, Shengcai Liao Link  
651 Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu Link  
656 Instance-Wise Occlusion and Depth Orders in Natural Scenes Hyunmin Lee, Jaesik Park Link  
675 Multi-Dimensional, Nuanced and Subjective – Measuring the Perception of Facial Expressions De'Aira Bryant, Siqi Deng, Nashlie Sephus, Wei Xia, Pietro Perona Link  
715 Mix and Localize: Localizing Sound Sources in Mixtures Xixi Hu, Ziyang Chen, Andrew Owens Link  
718 Point Cloud Pre-Training With Natural 3D Structures Ryosuke Yamada, Hirokatsu Kataoka, Naoya Chiba, Yukiyasu Domae, Tetsuya Ogata Link  
754 Learning Affordance Grounding From Exocentric Images Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao Link  
769 Visual Abductive Reasoning Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang Link  
770 Putting People in Their Place: Monocular Regression of 3D People in Depth Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, Michael J. Black Link  
793 DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation Jieji Ren, Feishi Wang, Jiahao Zhang, Qian Zheng, Mingjun Ren, Boxin Shi Link  
804 Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lučić, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi Link  
812 Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar Link  
818 Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran Link  
847 SketchEdit: Mask-Free Local Image Manipulation With Partial Sketches Yu Zeng, Zhe Lin, Vishal M. Patel Link  
869 BEHAVE: Dataset and Method for Tracking Human Object Interactions Bharat Lal Bhatnagar, Xianghui Xie, Ilya A. Petrov, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll Link  
889 Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan Link  
912 Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, Xiang Ruan Link  
928 Accurate 3D Body Shape Regression Using Metric and Semantic Attributes Vasileios Choutas, Lea Müller, Chun-Hao P. Huang, Siyu Tang, Dimitrios Tzionas, Michael J. Black Link  
933 Capturing and Inferring Dense Full-Body Human-Scene Contact Chun-Hao P. Huang, Hongwei Yi, Markus Höschle, Matvey Safroshkin, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, Michael J. Black Link  
1007 Neural Inertial Localization Sachini Herath, David Caruso, Chen Liu, Yufan Chen, Yasutaka Furukawa Link  
1021 MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution Wuyuan Xie, Tengcong Huang, Miaohui Wang Link  
1091 EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction Xinyu Zhou, Peiqi Duan, Yi Ma, Boxin Shi Link  
1100 Understanding 3D Object Articulation in Internet Videos Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David F. Fouhey Link  
1104 Synthetic Generation of Face Videos With Plethysmograph Physiology Zhen Wang, Yunhao Ba, Pradyumna Chari, Oyku Deniz Bozkurt, Gianna Brown, Parth Patwa, Niranjan Vaddi, Laleh Jalilian, Achuta Kadambi Link  
1146 Meta Distribution Alignment for Generalizable Person Re-Identification Hao Ni, Jingkuan Song, Xiaopeng Luo, Feng Zheng, Wen Li, Heng Tao Shen Link  
1157 Style-Based Global Appearance Flow for Virtual Try-On Sen He, Yi-Zhe Song, Tao Xiang Link  
1181 GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras Ye Yuan, Umar Iqbal, Pavlo Molchanov, Kris Kitani, Jan Kautz Link  
1254 Modeling 3D Layout for Group Re-Identification Quan Zhang, Kaiheng Dang, Jian-Huang Lai, Zhanxiang Feng, Xiaohua Xie Link  
1319 Audio-Adaptive Activity Recognition Across Video Domains Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek Link  
1333 Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang Link  
1453 Universal Photometric Stereo Network Using Global Lighting Contexts Satoshi Ikehata Link  
1501 PTTR: Relational 3D Point Cloud Object Tracking With Transformer Changqing Zhou, Zhipeng Luo, Yueru Luo, Tianrui Liu, Liang Pan, Zhongang Cai, Haiyu Zhao, Shijian Lu Link  
1503 Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds Zhao Jin, Yinjie Lei, Naveed Akhtar, Haifeng Li, Munawar Hayat Link  
1508 Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha Link  
1514 Object Localization Under Single Coarse Point Supervision Xuehui Yu, Pengfei Chen, Di Wu, Najmul Hassan, Guorong Li, Junchi Yan, Humphrey Shi, Qixiang Ye, Zhenjun Han Link  
1533 Learning Program Representations for Food Images and Cooking Recipes Dim P. Papadopoulos, Enrique Mora, Nadiia Chepurko, Kuan Wei Huang, Ferda Ofli, Antonio Torralba Link  
1557 Shape From Polarization for Complex Scenes in the Wild Chenyang Lei, Chenyang Qi, Jiaxin Xie, Na Fan, Vladlen Koltun, Qifeng Chen Link  
1637 Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network Renshuai Tao, Hainan Li, Tianbo Wang, Yanlu Wei, Yifu Ding, Bowei Jin, Hongping Zhi, Xianglong Liu, Aishan Liu Link  
1658 The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift Sara Beery, Guanhang Wu, Trevor Edwards, Filip Pavetic, Bo Majewski, Shreyasee Mukherjee, Stanley Chan, John Morgan, Vivek Rathod, Jonathan Huang Link  
1667 JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints Karl D.D. Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G. Lambourne, Armando Solar-Lezama, Wojciech Matusik Link  
1672 DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo Link  
1698 UniCoRN: A Unified Conditional Image Repainting Network Jimeng Sun, Shuchen Weng, Zheng Chang, Si Li, Boxin Shi Link  
1762 Time Lens++: Event-Based Frame Interpolation With Parametric Non-Linear Flow and Multi-Scale Fusion Stepan Tulyakov, Alfredo Bochicchio, Daniel Gehrig, Stamatios Georgoulis, Yuanyou Li, Davide Scaramuzza Link  
1780 Episodic Memory Question Answering Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh Link  
1783 REX: Reasoning-Aware and Grounded Explanation Shi Chen, Qi Zhao Link  
1795 Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Anima Anandkumar Link  
1800 CoNeRF: Controllable Neural Radiance Fields Kacper Kania, Kwang Moo Yi, Marek Kowalski, Tomasz Trzciński, Andrea Tagliasacchi Link  
1811 UnweaveNet: Unweaving Activity Stories Will Price, Carl Vondrick, Dima Damen Link  
1820 VisualHow: Multimodal Problem Solving Jinhui Yang, Xianyu Chen, Ming Jiang, Shi Chen, Louis Wang, Qi Zhao Link  
1836 Multi-Modal Extreme Classification Anshul Mittal, Kunal Dahiya, Shreya Malani, Janani Ramaswamy, Seba Kuruvilla, Jitendra Ajmera, Keng-hao Chang, Sumeet Agarwal, Purushottam Kar, Manik Varma Link  
1950 HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, Li Yi Link  
2027 Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, Errui Ding Link  
2077 Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, Angela Yao Link  
2086 Autofocus for Event Cameras Shijie Lin, Yinqiang Zhang, Lei Yu, Bin Zhou, Xiaowei Luo, Jia Pan Link  
2102 Programmatic Concept Learning for Human Motion Description and Synthesis Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu Link  
2107 Temporal Alignment Networks for Long-Term Video Tengda Han, Weidi Xie, Andrew Zisserman Link  
2170 Point Cloud Color Constancy Xiaoyan Xing, Yanlin Qian, Sibo Feng, Yuhan Dong, Jiří Matas Link  
2206 Towards Unsupervised Domain Generalization Xingxuan Zhang, Linjun Zhou, Renzhe Xu, Peng Cui, Zheyan Shen, Haoxin Liu Link  
2220 Text2Pos: Text-to-Point-Cloud Cross-Modal Localization Manuel Kolmet, Qunjie Zhou, Aljoša Ošep, Laura Leal-Taixé Link  
2221 Opening Up Open World Tracking Yang Liu, Idil Esen Zulfikar, Jonathon Luiten, Achal Dave, Deva Ramanan, Bastian Leibe, Aljoša Ošep, Laura Leal-Taixé Link  
2245 Robust Image Forgery Detection Over Online Social Network Shared Images Haiwei Wu, Jiantao Zhou, Jinyu Tian, Jun Liu Link  
2264 ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection Lingteng Qiu, Zhangyang Xiong, Xuhao Wang, Kenkun Liu, Yihan Li, Guanying Chen, Xiaoguang Han, Shuguang Cui Link  
2290 FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang Ma, Liang Li, Yebin Liu Link  
2344 Day-to-Night Image Synthesis for Training Nighttime Neural ISPs Abhijith Punnappurath, Abdullah Abuolaim, Abdelrahman Abdelhamed, Alex Levinshtein, Michael S. Brown Link  
2345 Playable Environments: Video Manipulation in Space and Time Willi Menapace, Stéphane Lathuilière, Aliaksandr Siarohin, Christian Theobalt, Sergey Tulyakov, Vladislav Golyanik, Elisa Ricci Link Link 2
2373 RIO: Rotation-Equivariance Supervised Learning of Robust Inertial Odometry Xiya Cao, Caifa Zhou, Dandan Zeng, Yongliang Wang Link  
2468 ONCE-3DLanes: Building Monocular 3D Lane Detection Fan Yan, Ming Nie, Xinyue Cai, Jianhua Han, Hang Xu, Zhen Yang, Chaoqiang Ye, Yanwei Fu, Michael Bi Mi, Li Zhang Link  
2475 ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu Link  
2485 HairMapper: Removing Hair From Portraits Using GANs Yiqian Wu, Yong-Liang Yang, Xiaogang Jin Link  
2532 Stable Long-Term Recurrent Video Super-Resolution Benjamin Naoto Chiche, Arnaud Woiselle, Joana Frontera-Pons, Jean-Luc Starck Link  
2533 Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions With Superior OOD Generalization Damien Teney, Ehsan Abbasnejad, Simon Lucey, Anton van den Hengel Link  
2547 Exploring and Evaluating Image Restoration Potential in Dynamic Scenes Cheng Zhang, Shaolin Su, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang Link  
2549 Commonality in Natural Images Rescues GANs: Pretraining GANs With Generic and Privacy-Free Synthetic Data Kyungjune Baek, Hyunjung Shim Link  
2614 Neural Global Shutter: Learn To Restore Video From a Rolling Shutter Camera With Global Reset Feature Zhixiang Wang, Xiang Ji, Jia-Bin Huang, Shin'ichi Satoh, Xiao Zhou, Yinqiang Zheng Link  
2630 Stability-Driven Contact Reconstruction From Monocular Color Images Zimeng Zhao, Binghui Zuo, Wei Xie, Yangang Wang Link  
2657 Toward Practical Monocular Indoor Depth Estimation Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su Link  
2658 Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices? Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann Link  
2683 Unsupervised Domain Adaptation for Nighttime Aerial Tracking Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, Guang Chen Link  
2709 BokehMe: When Neural Rendering Meets Classical Rendering Juewen Peng, Zhiguo Cao, Xianrui Luo, Hao Lu, Ke Xian, Jianming Zhang Link Link 2
2729 FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, Jiwen Lu Link  
2769 Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships Chao Lou, Wenjuan Han, Yuhuan Lin, Zilong Zheng Link  
2810 How Good Is Aesthetic Ability of a Fashion Model? Xingxing Zou, Kaicheng Pang, Wen Zhang, Waikeung Wong Link  
2836 PINA: Learning a Personalized Implicit Neural Avatar From a Single RGB-D Video Sequence Zijian Dong, Chen Guo, Jie Song, Xu Chen, Andreas Geiger, Otmar Hilliges Link  
2846 AKB-48: A Real-World Articulated Object Knowledge Base Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Qiaojun Yu, Yang Han, Cewu Lu Link  
2870 Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading Ganchao Tan, Yang Wang, Han Han, Yang Cao, Feng Wu, Zheng-Jun Zha Link  
2898 Open-Set Text Recognition via Character-Context Decoupling Chang Liu, Chun Yang, Xu-Cheng Yin Link  
2926 It’s About Time: Analog Clock Reading in the Wild Charig Yang, Weidi Xie, Andrew Zisserman Link  
2974 Human Hands As Probes for Interactive Object Understanding Mohit Goyal, Sahil Modi, Rishabh Goyal, Saurabh Gupta Link  
3056 FS6D: Few-Shot 6D Pose Estimation of Novel Objects Yisheng He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen Link  
3073 CLIP-Event: Connecting Text and Images With Event Structures Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang Link  
3124 HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR Yudi Dai, Yitai Lin, Chenglu Wen, Siqi Shen, Lan Xu, Jingyi Yu, Yuexin Ma, Cheng Wang Link  
3296 SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation Tao Sun, Mattia Segu, Janis Postels, Yuxuan Wang, Luc Van Gool, Bernt Schiele, Federico Tombari, Fisher Yu Link  
3373 Gait Recognition in the Wild With Dense 3D Representations and a Benchmark Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei Link  
3384 MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image Xingyu Chen, Yufeng Liu, Yajiao Dong, Xiong Zhang, Chongyang Ma, Yanmin Xiong, Yuan Zhang, Xiaoyan Guo Link  
3405 Pyramid Grafting Network for One-Stage High Resolution Saliency Detection Chenxi Xie, Changqun Xia, Mingcan Ma, Zhirui Zhao, Xiaowu Chen, Jia Li Link  
3411 Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, Vincent Lepetit Link  
3422 FocalClick: Towards Practical Interactive Image Segmentation Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao Link  
3549 Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives Xinke Li, Henghui Ding, Zekun Tong, Yuwei Wu, Yeow Meng Chee Link  
3564 JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid Rezatofighi Link  
3594 Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov Link  
3597 Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data Junfeng Lyu, Zhibo Wang, Feng Xu Link  
3621 Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, Zhouhui Lian Link  
3656 Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic Link  
3680 Scribble-Supervised LiDAR Semantic Segmentation Ozan Unal, Dengxin Dai, Luc Van Gool Link  
3686 TableFormer: Table Structure Understanding With Transformers Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar Link  
3693 Transformer Based Line Segment Classifier With Image Context for Real-Time Vanishing Point Detection in Manhattan World Xin Tong, Xianghua Ying, Yongjie Shi, Ruibin Wang, Jinfa Yang Link  
3703 NeRFReN: Neural Radiance Fields With Reflections Yuan-Chen Guo, Di Kang, Linchao Bao, Yu He, Song-Hai Zhang Link  
3717 CroMo: Cross-Modal Learning for Monocular Depth Estimation Yannick Verdié, Jifei Song, Barnabé Mas, Benjamin Busam, Ales̆ Leonardis, Steven McDonagh Link  
3784 SIOD: Single Instance Annotated per Category per Image for Object Detection Hanjun Li, Xingjia Pan, Ke Yan, Fan Tang, Wei-Shi Zheng Link  
3846 3D Common Corruptions and Data Augmentation Oğuzhan Fatih Kar, Teresa Yeo, Andrei Atanov, Amir Zamir Link  
3942 Deep Rectangling for Image Stitching: A Learning Baseline Lang Nie, Chunyu Lin, Kang Liao, Shuaicheng Liu, Yao Zhao Link  
3978 Discovering Objects That Can Move Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert Link  
4049 Structure-Aware Flow Generation for Human Body Reshaping Jianqiang Ren, Yuan Yao, Biwen Lei, Miaomiao Cui, Xuansong Xie Link  
4070 YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset Donglai Wei, Siddhant Kharbanda, Sarthak Arora, Roshan Roy, Nishant Jain, Akash Palrecha, Tanav Shah, Shray Mathur, Ritik Mathur, Abhijay Kemkar, Anirudh Chakravarthy, Zudi Lin, Won-Dong Jang, Yansong Tang, Song Bai, James Tompkin, Philip H.S. Torr, Hanspeter Pfister Link  
4112 Degree-of-Linear-Polarization-Based Color Constancy Taishi Ono, Yuhi Kondo, Legong Sun, Teppei Kurita, Yusuke Moriuchi Link Link 2
4161 Syntax-Aware Network for Handwritten Mathematical Expression Recognition Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai Link  
4199 Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks Wenwen Pan, Haonan Shi, Zhou Zhao, Jieming Zhu, Xiuqiang He, Zhigeng Pan, Lianli Gao, Jun Yu, Fei Wu, Qi Tian Link  
4213 Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer Fushun Zhu, Shan Zhao, Peng Wang, Hao Wang, Hua Yan, Shuaicheng Liu Link  
4238 IDR: Self-Supervised Image Denoising via Iterative Data Refinement Yi Zhang, Dasong Li, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li Link  
4286 Video Demoiréing With Relation-Based Temporal Consistency Peng Dai, Xin Yu, Lan Ma, Baoheng Zhang, Jia Li, Wenbo Li, Jiajun Shen, Xiaojuan Qi Link  
4307 NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul P. Srinivasan, Jonathan T. Barron Link  
4313 DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation Aysim Toker, Lukas Kondmann, Mark Weber, Marvin Eisenberger, Andrés Camero, Jingliang Hu, Ariadna Pregel Hoderlein, Çağlar Şenaras, Timothy Davis, Daniel Cremers, Giovanni Marchisio, Xiao Xiang Zhu, Laura Leal-Taixé Link  
4315 UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah Link  
4346 ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, Kate Saenko Link  
4376 ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation Hanxiang Ren, Yanchao Yang, He Wang, Bokui Shen, Qingnan Fan, Youyi Zheng, C. Karen Liu, Leonidas J. Guibas Link  
4399 Towards Multimodal Depth Estimation From Light Fields Titus Leistner, Radek Mackowiak, Lynton Ardizzone, Ullrich Köthe, Carsten Rother Link  
4513 Clothes-Changing Person Re-Identification With RGB Modality Only Xinqian Gu, Hong Chang, Bingpeng Ma, Shutao Bai, Shiguang Shan, Xilin Chen Link  
4520 From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering Jiangtong Li, Li Niu, Liqing Zhang Link  
4623 Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation Aming Wu, Cheng Deng Link  
4626 Human Instance Matting via Mutual Guidance and Multi-Instance Refinement Yanan Sun, Chi-Keung Tang, Yu-Wing Tai Link  
4630 Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts Chongzhi Zhang, Mingyuan Zhang, Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Xianglong Liu, Ziwei Liu Link  
4685 High-Fidelity Human Avatars From a Single RGB Camera Hao Zhao, Jinsong Zhang, Yu-Kun Lai, Zerong Zheng, Yingdi Xie, Yebin Liu, Kun Li Link  
4702 Kubric: A Scalable Dataset Generator Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi Link Link 2
4732 A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection Sifeng He, Xudong Yang, Chen Jiang, Gang Liang, Wei Zhang, Tan Pan, Qing Wang, Furong Xu, Chunguang Li, JinXiong Liu, Hui Xu, Kaiming Huang, Yuan Cheng, Feng Qian, Xiaobo Zhang, Lei Yang Link  
4763 Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal Yi Li, Yi Chang, Yan Gao, Changfeng Yu, Luxin Yan Link  
4793 Open Challenges in Deep Stereo: The Booster Dataset Pierluigi Zama Ramirez, Fabio Tosi, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano Link  
4797 BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Sanja Fidler, Antonio Torralba Link  
4844 Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das Link  
4846 WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery N. Dinesh Reddy, Robert Tamburo, Srinivasa G. Narasimhan Link  
4884 SimVQA: Exploring Simulated Environments for Visual Question Answering Paola Cascante-Bonilla, Hui Wu, Letao Wang, Rogerio S. Feris, Vicente Ordonez Link  
4901 Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem Link  
4909 3D Human Tongue Reconstruction From Single “In-the-Wild” Images Stylianos Ploumpis, Stylianos Moschoglou, Vasileios Triantafyllou, Stefanos Zafeiriou Link  
4952 EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Yaya Shi, Xu Yang, Haiyang Xu, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha Link  
4990 Glass Segmentation Using Intensity and Spectral Polarization Cues Haiyang Mei, Bo Dong, Wen Dong, Jiaxi Yang, Seung-Hwan Baek, Felix Heide, Pieter Peers, Xiaopeng Wei, Xin Yang Link  
5016 Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light Yuhua Xu, Xiaoli Yang, Yushan Yu, Wei Jia, Zhaobi Chu, Yulan Guo Link  
5042 SVIP: Sequence VerIfication for Procedures in Videos Yicheng Qian, Weixin Luo, Dongze Lian, Xu Tang, Peilin Zhao, Shenghua Gao Link  
5058 Deep Saliency Prior for Reducing Visual Distraction Kfir Aberman, Junfeng He, Yossi Gandelsman, Inbar Mosseri, David E. Jacobs, Kai Kohlhoff, Yael Pritch, Michael Rubinstein Link  
5075 Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding Xun Long Ng, Kian Eng Ong, Qichen Zheng, Yun Ni, Si Yong Yeo, Jun Liu Link  
5115 PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound Zhijian Yang, Xiaoran Fan, Volkan Isler, Hyun Soo Park Link  
5128 Make It Move: Controllable Image-to-Video Generation With Text Descriptions Yaosi Hu, Chong Luo, Zhenzhong Chen Link  
5174 HDR-NeRF: High Dynamic Range Neural Radiance Fields Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Xuan Wang, Qing Wang Link  
5175 Neural Volumetric Object Selection Zhongzheng Ren, Aseem Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang Link  
5258 Fourier Document Restoration for Robust Document Dewarping and Recognition Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai Link  
5267 What Matters for Meta-Learning Vision Regression Tasks? Ning Gao, Hanna Ziesche, Ngo Anh Vien, Michael Volpp, Gerhard Neumann Link  
5276 Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation Jian Zhang, Yuanqing Zhang, Huan Fu, Xiaowei Zhou, Bowen Cai, Jinchi Huang, Rongfei Jia, Binqiang Zhao, Xing Tang Link  
5303 Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo Link  
5331 Optical Flow Estimation for Spiking Camera Liwen Hu, Rui Zhao, Ziluo Ding, Lei Ma, Boxin Shi, Ruiqin Xiong, Tiejun Huang Link  
5336 Large-Scale Pre-Training for Person Re-Identification With Noisy Labels Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen Link  
5355 Finding Fallen Objects via Asynchronous Audio-Visual Integration Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh H. McDermott, Antonio Torralba Link  
5401 ViM: Out-of-Distribution With Virtual-Logit Matching Haoqi Wang, Zhizhong Li, Litong Feng, Wayne Zhang Link  
5409 Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows Sheng Liu, Xiaohan Nie, Raffay Hamid Link  
5429 FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction Zhenpei Yang, Zhile Ren, Miguel Angel Bautista, Zaiwei Zhang, Qi Shan, Qixing Huang Link  
5449 InOut: Diverse Image Outpainting via GAN Inversion Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang Link  
5457 Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality Tristan Thrush, Ryan Jiang, Max Bartolo, Amanpreet Singh, Adina Williams, Douwe Kiela, Candace Ross Link  
5459 CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data Qi Yan, Jianhao Zheng, Simon Reding, Shanci Li, Iordan Doytchinov Link  
5491 Dancing Under the Stars: Video Denoising in Starlight Kristina Monakhova, Stephan R. Richter, Laura Waller, Vladlen Koltun Link  
5508 BCOT: A Markerless High-Precision 3D Object Tracking Benchmark Jiachen Li, Bin Wang, Shiqiang Zhu, Xin Cao, Fan Zhong, Wenxuan Chen, Te Li, Jason Gu, Xueying Qin Link  
5557 GazeOnce: Real-Time Multi-Person Gaze Estimation Mingfang Zhang, Yunfei Liu, Feng Lu Link  
5686 Synthetic Aperture Imaging With Events and Frames Wei Liao, Xiang Zhang, Lei Yu, Shijie Lin, Wen Yang, Ning Qiao Link  
5710 Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes Yang Li, Tatsuya Harada Link  
5728 ISNet: Shape Matters for Infrared Small Target Detection Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing Zhang, Jie Guo Link  
5780 Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection Chunyu Li, Yusuke Monno, Masatoshi Okutomi Link  
5827 Forecasting Characteristic 3D Poses of Human Actions Christian Diller, Thomas Funkhouser, Angela Dai Link  
5863 SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage Jiahao Yu, Li Chen, Mingrui Zhang, Mading Li Link  
5903 Gated2Gated: Self-Supervised Depth Estimation From Gated Images Amanpreet Walia, Stefanie Walz, Mario Bijelic, Fahim Mannan, Frank Julca-Aguilar, Michael Langer, Werner Ritter, Felix Heide Link  
5927 Raw High-Definition Radar for Multi-Task Learning Julien Rebut, Arthur Ouaknine, Waqas Malik, Patrick Pérez Link  
5938 Revealing Occlusions With 4D Neural Fields Basile Van Hoorick, Purva Tendulkar, Dídac Surís, Dennis Park, Simon Stent, Carl Vondrick Link  
5943 Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles Qing Liu, Adam Kortylewski, Zhishuai Zhang, Zizhang Li, Mengqi Guo, Qihao Liu, Xiaoding Yuan, Jiteng Mu, Weichao Qiu, Alan Yuille Link  
5948 CellTypeGraph: A New Geometric Computer Vision Benchmark Lorenzo Cerrone, Athul Vijayan, Tejasvinee Mody, Kay Schneitz, Fred A. Hamprecht Link  
5950 Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning Xiangyu Li, Xu Yang, Kun Wei, Cheng Deng, Muli Yang Link  
5956 Reference-Based Video Super-Resolution Using Multi-Camera Video Triplets Junyong Lee, Myeonghee Lee, Sunghyun Cho, Seungyong Lee Link  
5970 Dual-Key Multimodal Backdoors for Visual Question Answering Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, Susmit Jha Link  
6061 ABO: Dataset and Benchmarks for Real-World 3D Object Understanding Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, Jitendra Malik Link  
6070 Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline Kailai Zhou, Yibo Wang, Tao Lv, Yunqian Li, Linsen Chen, Qiu Shen, Xun Cao Link  
6128 Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training Xiao Lu, Yihong Cao, Sheng Liu, Chengjiang Long, Zipei Chen, Xuanyu Zhou, Yimin Yang, Chunxia Xiao Link  
6242 STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes Peishan Cong, Xinge Zhu, Feng Qiao, Yiming Ren, Xidong Peng, Yuenan Hou, Lan Xu, Ruigang Yang, Dinesh Manocha, Yuexin Ma Link  
6336 Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification Xinyu Lin, Jinxing Li, Zeyu Ma, Huafeng Li, Shuang Li, Kaixiong Xu, Guangming Lu, David Zhang Link  
6337 Continual Predictive Learning From Videos Geng Chen, Wendong Zhang, Han Lu, Siyu Gao, Yunbo Wang, Mingsheng Long, Xiaokang Yang Link  
6350 PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images Zhen Li, Lingli Wang, Xiang Huang, Cihui Pan, Jiaqi Yang Link  
6449 BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation Wenqiao Zhang, Lei Zhu, James Hallinan, Shengyu Zhang, Andrew Makmur, Qingpeng Cai, Beng Chin Ooi Link  
6482 Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu Link  
6512 Practical Stereo Matching via Cascaded Recurrent Network With Adaptive Correlation Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, Shuaicheng Liu Link  
6518 Rethinking Visual Geo-Localization for Large-Scale Applications Gabriele Berton, Carlo Masone, Barbara Caputo Link  
6545 I M Avatar: Implicit Morphable Head Avatars From Videos Yufeng Zheng, Victoria Fernández Abrevaya, Marcel C. Bühler, Xu Chen, Michael J. Black, Otmar Hilliges Link  
6567 Grounding Answers for Visual Questions Asked by Visually Impaired People Chongyan Chen, Samreen Anjum, Danna Gurari Link  
6607 Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis Karren Yang, Dejan Marković, Steven Krenn, Vasu Agrawal, Alexander Richard Link  
6633 The Implicit Values of a Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement Ilya Chugunov, Yuxuan Zhang, Zhihao Xia, Xuaner Zhang, Jiawen Chen, Felix Heide Link  
6645 Virtual Elastic Objects Hsiao-yu Chen, Edith Tretschk, Tuur Stuyck, Petr Kadlecek, Ladislav Kavan, Etienne Vouga, Christoph Lassner Link  
6684 Towards End-to-End Unified Scene Text Detection and Layout Analysis Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis Link  
6719 Learning To Answer Questions in Dynamic Audio-Visual Scenarios Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu Link  
6761 Implicit Motion Handling for Video Camouflaged Object Detection Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran Zhong, Mehrtash Harandi, Tom Drummond, Zongyuan Ge Link  
6786 M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, Xiaoyong Wei, Minlong Lu, Yaowei Wang, Xiaodan Liang Link  
6790 CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision Ke Zhang, Xiahai Zhuang Link  
6883 OSSO: Obtaining Skeletal Shape From Outside Marilyn Keller, Silvia Zuffi, Michael J. Black, Sergi Pujades Link  
6891 How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs Hazel Doughty, Cees G. M. Snoek Link  
6918 Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, Chang-Su Kim Link  
6960 WebQA: Multihop and Multimodal QA Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, Yonatan Bisk Link  
6991 Relative Pose From a Calibrated and an Uncalibrated Smartphone Image Yaqing Ding, Daniel Barath, Jian Yang, Zuzana Kukelova Link  
7001 ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior Metin Ersin Arican, Ozgur Kara, Gustav Bredell, Ender Konukoglu Link  
7005 Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation Zhaoyang Zeng, Yongsheng Luo, Zhenhua Liu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen Link  
7060 Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention Yu Yang, Seungbae Kim, Jungseock Joo Link  
7104 Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Diogo Luvizon, Christian Theobalt Link  
7121 Less Is More: Generating Grounded Navigation Instructions From Landmarks Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson Link  
7201 Speech Driven Tongue Animation Salvador Medina, Denis Tome, Carsten Stoll, Mark Tiede, Kevin Munhall, Alexander G. Hauptmann, Iain Matthews Link  
7217 IntentVizor: Towards Generic Query Guided Interactive Video Summarization Guande Wu, Jianzhe Lin, Claudio T. Silva Link  
7236 Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, Zhongxuan Luo Link  
7245 Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination Soma Nonaka, Shohei Nobuhara, Ko Nishino Link  
7259 Dressing in the Wild by Watching Dance Videos Xin Dong, Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min Zheng, Xiang Long, Xiaodan Liang, Jianchao Yang Link  
7315 V2C: Visual Voice Cloning Qi Chen, Mingkui Tan, Yuankai Qi, Jiaqiu Zhou, Yuanqing Li, Qi Wu Link  
7354 Reflection and Rotation Symmetry Detection via Equivariant Learning Ahyun Seo, Byungjin Kim, Suha Kwak, Minsu Cho Link  
7381 Optimal LED Spectral Multiplexing for NIR2RGB Translation Lei Liu, Yuze Chen, Junchi Yan, Yinqiang Zheng Link  
7457 Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources Sahar Abdelnabi, Rakibul Hasan, Mario Fritz Link  
7477 E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, Barbara Caputo Link  
7484 Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina González, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jáchym Kolář, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbeláez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik Link  
7524 Learning Adaptive Warping for Real-World Rolling Shutter Correction Mingdeng Cao, Zhihang Zhong, Jiahao Wang, Yinqiang Zheng, Yujiu Yang Link  
7579 Whose Hands Are These? Hand Detection and Hand-Body Association in the Wild Supreeth Narasimhaswamy, Thanh Nguyen, Mingzhen Huang, Minh Hoai Link  
7596 Multimodal Material Segmentation Yupeng Liang, Ryosuke Wakaki, Shohei Nobuhara, Ko Nishino Link  
7612 ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf Link  
7655 PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects Pengyuan Wang, HyunJun Jung, Yitong Li, Siyuan Shen, Rahul Parthasarathy Srikanth, Lorenzo Garattoni, Sven Meier, Nassir Navab, Benjamin Busam Link  
7747 Modular Action Concept Grounding in Semantic Video Prediction Wei Yu, Wenxin Chen, Songheng Yin, Steve Easterbrook, Animesh Garg Link  
7887 Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion Yuqi Sun, Shili Zhou, Ri Cheng, Weimin Tan, Bo Yan, Lang Fu Link  
7916 ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo Biwen Lei, Xiefan Guo, Hongyu Yang, Miaomiao Cui, Xuansong Xie, Di Huang Link  
7936 GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains Lei Fan, Yiwen Ding, Dongdong Fan, Donglin Di, Maurice Pagnucco, Yang Song Link  
8083 Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph Siwei Wang, Xinwang Liu, Li Liu, Wenxuan Tu, Xinzhong Zhu, Jiyuan Liu, Sihang Zhou, En Zhu Link  
8115 Scaling Up Vision-Language Pre-Training for Image Captioning Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang Link  
8152 Learning To Detect Scene Landmarks for Camera Localization Tien Do, Ondrej Miksik, Joseph DeGol, Hyun Soo Park, Sudipta N. Sinha Link  
8159 Egocentric Scene Understanding via Multimodal Spatial Rectifier Tien Do, Khiem Vuong, Hyun Soo Park Link  
8175 Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang Link  
8217 LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition Dan Liu, Libo Zhang, Yanjun Wu Link  
8222 Unifying Panoptic Segmentation for Autonomous Driving Oliver Zendel, Matthias Schörghuber, Bernhard Rainer, Markus Murschitz, Csaba Beleznai Link  
8233 NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night Xueqing Deng, Peng Wang, Xiaochen Lian, Shawn Newsam Link  
8277 Neural 3D Video Synthesis From Multi-View Video Tianye Li, Mira Slavcheva, Michael Zollhöfer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, Zhaoyang Lv Link  
8338 Modeling Indirect Illumination for Inverse Rendering Yuanqing Zhang, Jiaming Sun, Xingyi He, Huan Fu, Rongfei Jia, Xiaowei Zhou Link  
8352 Knowledge Mining With Scene Text for Fine-Grained Recognition Hao Wang, Junchao Liao, Tianheng Cheng, Zewen Gao, Hao Liu, Bo Ren, Xiang Bai, Wenyu Liu Link  
8360 Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning Soumen Basu, Mayank Gupta, Pratyaksha Rana, Pankaj Gupta, Chetan Arora Link  
8367 Multi-Person Extreme Motion Prediction Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer Link  
8373 Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method Lai Jiang, Yifei Li, Shengxi Li, Mai Xu, Se Lei, Yichen Guo, Bo Huang Link  
8388 Personalized Image Aesthetics Assessment With Rich Attributes Yuzhe Yang, Liwu Xu, Leida Li, Nan Qie, Yaqian Li, Peng Zhang, Yandong Guo Link  
8504 Geometric Structure Preserving Warp for Natural Image Stitching Peng Du, Jifeng Ning, Jiguang Cui, Shaoli Huang, Xinchao Wang, Jiaxin Wang Link  
8572 Abandoning the Bayer-Filter To See in the Dark Xingbo Dong, Wanyan Xu, Zhihui Miao, Lan Ma, Chao Zhang, Jiewen Yang, Zhe Jin, Andrew Beng Jin Teoh, Jiajun Shen Link  
8650 RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising Michael Schelling, Pedro Hermosilla, Timo Ropinski Link  
8775 Learning To Learn and Remember Super Long Multi-Domain Task Sequence Zhenyi Wang, Li Shen, Tiehang Duan, Donglin Zhan, Le Fang, Mingchen Gao Link  
8783 FLOAT: Factorized Learning of Object Attributes for Improved Multi-Object Multi-Part Scene Parsing Rishubh Singh, Pranav Gupta, Pradeep Shenoy, Ravikiran Sarvadevabhatla Link  
8832 Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites Ngoc Long Nguyen, Jérémy Anger, Axel Davy, Pablo Arias, Gabriele Facciolo Link  
8857 Geometry-Aware Guided Loss for Deep Crack Recognition Zhuangzhuang Chen, Jin Zhang, Zhuonan Lai, Jie Chen, Zun Liu, Jianqiang Li Link  
8893 BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild Xixi Xu, Zhongang Qi, Jianqi Ma, Honglun Zhang, Ying Shan, Xiaohu Qie Link Link 2
8894 Stereo Magnification With Multi-Layer Images Taras Khakhulin, Denis Korzhenkov, Pavel Solovev, Gleb Sterkin, Andrei-Timotei Ardelean, Victor Lempitsky Link  
8898 Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection Jiang Liu, Alexander Levine, Chun Pong Lau, Rama Chellappa, Soheil Feizi Link  
8910 CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters Paul Gavrikov, Janis Keuper Link  
8956 RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano Link  
8979 Maintaining Reasoning Consistency in Compositional Visual Question Answering Chenchen Jing, Yunde Jia, Yuwei Wu, Xinyu Liu, Qi Wu Link  
9012 Weakly-Supervised Metric Learning With Cross-Module Communications for the Classification of Anterior Chamber Angle Images Jingqi Huang, Yue Ning, Dong Nie, Linan Guan, Xiping Jia Link  
9027 Towards Low-Cost and Efficient Malaria Detection Waqas Sultani, Wajahat Nawaz, Syed Javed, Muhammad Sohail Danish, Asma Saadia, Mohsen Ali Link  
9029 PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking Andreas Döring, Di Chen, Shanshan Zhang, Bernt Schiele, Jürgen Gall Link  
9179 The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting Ryan Szeto, Jason J. Corso Link  
9212 Scanline Homographies for Rolling-Shutter Plane Absolute Pose Fang Bai, Agniva Sengupta, Adrien Bartoli Link  
9326 Towards Principled Disentanglement for Domain Generalization Hanlin Zhang, Yi-Fan Zhang, Weiyang Liu, Adrian Weller, Bernhard Schölkopf, Eric P. Xing Link  
9349 Image Based Reconstruction of Liquids From 2D Surface Detections Florian Richter, Ryan K. Orosco, Michael C. Yip Link  
9391 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection Alexander Lehner, Stefano Gasperini, Alvaro Marcos-Ramiro, Michael Schmidt, Mohammad-Ali Nikouei Mahani, Nassir Navab, Benjamin Busam, Federico Tombari Link  
9398 A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes Mazda Moayeri, Phillip Pope, Yogesh Balaji, Soheil Feizi Link  
9405 Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs Haithem Turki, Deva Ramanan, Mahadev Satyanarayanan Link  
9422 Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman Link  
9480 SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks Xianling Zhang, Nathan Tseng, Ameerah Syed, Rohan Bhasin, Nikita Jaipuria Link  
9567 Deep Image-Based Illumination Harmonization Zhongyun Bao, Chengjiang Long, Gang Fu, Daquan Liu, Yuanzhen Li, Jiaming Wu, Chunxia Xiao Link  
9590 MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction Mihee Lee, Samuel S. Sohn, Seonghyeon Moon, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic Link  
9736 Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation Nathaniel Merrill, Yuliang Guo, Xingxing Zuo, Xinyu Huang, Stefan Leutenegger, Xi Peng, Liu Ren, Guoquan Huang Link  
9778 Egocentric Prediction of Action Target in 3D Yiming Li, Ziang Cao, Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Chen Feng Link  
9802 MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi Link  
9856 Disentangling Visual Embeddings for Attributes and Objects Nirat Saini, Khoi Pham, Abhinav Shrivastava Link  
9913 AutoMine: An Unmanned Mine Dataset Yuchen Li, Zixuan Li, Siyu Teng, Yu Zhang, Yuhang Zhou, Yuchang Zhu, Dongpu Cao, Bin Tian, Yunfeng Ai, Zhe Xuanyuan, Long Chen Link  
9930 Memory-Augmented Non-Local Attention for Video Super-Resolution Jiyang Yu, Jingen Liu, Liefeng Bo, Tao Mei Link  
10074 Domain Adaptation on Point Clouds via Geometry-Aware Implicits Yuefan Shen, Yanchao Yang, Mi Yan, He Wang, Youyi Zheng, Leonidas J. Guibas Link  
10156 Brain-Supervised Image Editing Keith M. Davis III, Carlos de la Torre-Ortiz, Tuukka Ruotsalo Link  
10159 Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors Yun-Chun Chen, Haoda Li, Dylan Turpin, Alec Jacobson, Animesh Garg Link  
10327 DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image Tetiana Martyniuk, Orest Kupyn, Yana Kurlyak, Igor Krashenyi, Jiří Matas, Viktoriia Sharmanska Link  
10369 Spatial Commonsense Graph for Object Localisation in Partial Scenes Francesco Giuliari, Geri Skenderi, Marco Cristani, Yiming Wang, Alessio Del Bue Link  
10392 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos Vikram Gupta, Trisha Mittal, Puneet Mathur, Vaibhav Mishra, Mayank Maheshwari, Aniket Bera, Debdoot Mukherjee, Dinesh Manocha Link Link 2
10404 Upright-Net: Learning Upright Orientation for 3D Point Cloud Xufang Pang, Feng Li, Ning Ding, Xiaopin Zhong Link  
10407 DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, Zaiqing Nie Link  
10467 Enabling Equivariance for Arbitrary Lie Groups Lachlan E. MacDonald, Sameera Ramasinghe, Simon Lucey Link  
10504 TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting Huazhang Hu, Sixun Dong, Yiqun Zhao, Dongze Lian, Zhengxin Li, Shenghua Gao Link  
10801 Amodal Panoptic Segmentation Rohit Mohan, Abhinav Valada Link  
10849 Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata Link  
10899 Towards Driving-Oriented Metric for Lane Detection Models Takami Sato, Qi Alfred Chen Link  
10995 Globetrotter: Connecting Languages by Connecting Images Dídac Surís, Dave Epstein, Carl Vondrick Link  
11034 KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos David Novotny, Ignacio Rocco, Samarth Sinha, Alexandre Carlier, Gael Kerchenbaum, Roman Shapovalov, Nikita Smetanin, Natalia Neverova, Benjamin Graham, Andrea Vedaldi Link  
11067 It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection Youssef Mohamed, Faizan Farooq Khan, Kilichbek Haydarov, Mohamed Elhoseiny Link  
11094 Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders Maksim Makarenko, Arturo Burguete-Lopez, Qizhou Wang, Fedor Getman, Silvio Giancola, Bernard Ghanem, Andrea Fratalocchi Link Link 2
11097 SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis Anastasiia Kornilova, Marsel Faizullin, Konstantin Pakulev, Andrey Sadkov, Denis Kukushkin, Azat Akhmetyanov, Timur Akhtyamov, Hekmat Taherinejad, Gonzalo Ferrer Link  
11100 Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation Marco Cipriano, Stefano Allegretti, Federico Bolelli, Federico Pollastri, Costantino Grana Link  
11143 Do Learned Representations Respect Causal Relationships? Lan Wang, Vishnu Naresh Boddeti Link  
11269 Measuring Compositional Consistency for Video Question Answering Mona Gandhi, Mustafa Omer Gul, Eva Prakash, Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala Link  
11454 PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents Brandon Smock, Rohith Pesala, Robin Abraham Link  
11477 LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds Jialian Li, Jingyi Zhang, Zhiyong Wang, Siqi Shen, Chenglu Wen, Yuexin Ma, Lan Xu, Jingyi Yu, Cheng Wang Link  
11529 Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches Jin-Man Park, Ue-Hwan Kim, Seon-Hoon Lee, Jong-Hwan Kim Link  
11551 M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang Link  
11561 ScanQA: 3D Question Answering for Spatial Scene Understanding Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe Link  
11631 OnePose: One-Shot Object Pose Estimation Without CAD Models Jiaming Sun, Zihao Wang, Siyu Zhang, Xingyi He, Hongcheng Zhao, Guofeng Zhang, Xiaowei Zhou Link  
11670 Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions Carlos A. Diaz-Ruiz, Youya Xia, Yurong You, Jose Nino, Junan Chen, Josephine Monica, Xiangyu Chen, Katie Luo, Yan Wang, Marc Emond, Wei-Lun Chao, Bharath Hariharan, Kilian Q. Weinberger, Mark Campbell Link  
11674 Zero-Shot Text-Guided Object Generation With Dream Fields Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole Link  
11846 Learning Video Representations of Human Motion From Synthetic Data Xi Guo, Wei Wu, Dongliang Wang, Jing Su, Haisheng Su, Weihao Gan, Jian Huang, Qin Yang Link