The following CVPR 2022 papers claim a dataset contribution or were identified during the review process as making a dataset contribution. Their authors committed to and are accountable for making these datasets public by the start of the conference. They have supplied links to the dataset(s), provided here for the convenience of the CVPR community. Authors of these papers are responsible for the validity and accuracy of the datasets. Please contact the authors of the respective paper in case of any issues.
Paper ID | Title | Authors | Dataset URL | 2nd Dataset URL |
19 | Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning | Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang | Link | |
35 | MLSLT: Towards Multilingual Sign Language Translation | Aoxiong Yin, Zhou Zhao, Weike Jin, Meng Zhang, Xingshan Zeng, Xiaofei He | Link | |
41 | 360MonoDepth: High-Resolution 360° Monocular Depth Estimation | Manuel Rey-Area, Mingze Yuan, Christian Richardt | Link | |
42 | Generating Diverse and Natural 3D Human Motions From Text | Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, Li Cheng | Link | |
84 | De-Rendering 3D Objects in the Wild | Felix Wimbauer, Shangzhe Wu, Christian Rupprecht | Link | |
103 | Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks | Peri Akiva, Matthew Purri, Matthew Leotta | Link | |
141 | Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association | Ruize Han, Yiyang Gan, Jiacheng Li, Feifan Wang, Wei Feng, Song Wang | Link | |
155 | Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation | Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich | Link | |
211 | IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment | Yiming Zeng, Yue Qian, Qijian Zhang, Junhui Hou, Yixuan Yuan, Ying He | Link | |
229 | Deep Decomposition for Stochastic Normal-Abnormal Transport | Peirong Liu, Yueh Lee, Stephen Aylward, Marc Niethammer | Link | |
258 | f-SfT: Shape-From-Template With a Physics-Based Deformation Model | Navami Kairanda, Edith Tretschk, Mohamed Elgharib, Christian Theobalt, Vladislav Golyanik | Link | Link 2 |
348 | Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image | Yujiao Shi, Hongdong Li | Link | |
357 | Forward Propagation, Backward Regression, and Pose Association for Hand Tracking in the Wild | Mingzhen Huang, Supreeth Narasimhaswamy, Saif Vazir, Haibin Ling, Minh Hoai | Link | |
378 | FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos | Yan Wang, Yixuan Sun, Yiwen Huang, Zhongying Liu, Shuyong Gao, Wei Zhang, Weifeng Ge, Wenqiang Zhang | Link | |
436 | Replacing Labeled Real-Image Datasets With Auto-Generated Contours | Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota | Link | |
441 | SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images | Tewodros Habtegebrial, Christiano Gava, Marcel Rogge, Didier Stricker, Varun Jampani | Link | |
469 | MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions | Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba, Chen Zhao, Silvio Giancola, Bernard Ghanem | Link | |
516 | Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields | Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan | Link | Link 2 |
573 | Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data | Jungbeom Lee, Seong Joon Oh, Sangdoo Yun, Junsuk Choe, Eunji Kim, Sungroh Yoon | Link | |
576 | Investigating Tradeoffs in Real-World Video Super-Resolution | Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy | Link | |
583 | OakInk: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction | Lixin Yang, Kailin Li, Xinyu Zhan, Fei Wu, Anran Xu, Liu Liu, Cewu Lu | Link | |
628 | Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification | Yanan Wang, Xuezhi Liang, Shengcai Liao | Link | |
651 | Versatile Multi-Modal Pre-Training for Human-Centric Perception | Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu | Link | |
656 | Instance-Wise Occlusion and Depth Orders in Natural Scenes | Hyunmin Lee, Jaesik Park | Link | |
675 | Multi-Dimensional, Nuanced and Subjective – Measuring the Perception of Facial Expressions | De'Aira Bryant, Siqi Deng, Nashlie Sephus, Wei Xia, Pietro Perona | Link | |
715 | Mix and Localize: Localizing Sound Sources in Mixtures | Xixi Hu, Ziyang Chen, Andrew Owens | Link | |
718 | Point Cloud Pre-Training With Natural 3D Structures | Ryosuke Yamada, Hirokatsu Kataoka, Naoya Chiba, Yukiyasu Domae, Tetsuya Ogata | Link | |
754 | Learning Affordance Grounding From Exocentric Images | Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao | Link | |
769 | Visual Abductive Reasoning | Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang | Link | |
770 | Putting People in Their Place: Monocular Regression of 3D People in Depth | Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, Michael J. Black | Link | |
793 | DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation | Jieji Ren, Feishi Wang, Jiahao Zhang, Qian Zheng, Mingjun Ren, Boxin Shi | Link | |
804 | Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations | Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lučić, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi | Link | |
812 | Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion | Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar | Link | |
818 | Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering | Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran | Link | |
847 | SketchEdit: Mask-Free Local Image Manipulation With Partial Sketches | Yu Zeng, Zhe Lin, Vishal M. Patel | Link | |
869 | BEHAVE: Dataset and Method for Tracking Human Object Interactions | Bharat Lal Bhatnagar, Xianghui Xie, Ilya A. Petrov, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll | Link | |
889 | Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction | Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan | Link | |
912 | Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline | Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, Xiang Ruan | Link | |
928 | Accurate 3D Body Shape Regression Using Metric and Semantic Attributes | Vasileios Choutas, Lea Müller, Chun-Hao P. Huang, Siyu Tang, Dimitrios Tzionas, Michael J. Black | Link | |
933 | Capturing and Inferring Dense Full-Body Human-Scene Contact | Chun-Hao P. Huang, Hongwei Yi, Markus Höschle, Matvey Safroshkin, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, Michael J. Black | Link | |
1007 | Neural Inertial Localization | Sachini Herath, David Caruso, Chen Liu, Yufan Chen, Yasutaka Furukawa | Link | |
1021 | MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution | Wuyuan Xie, Tengcong Huang, Miaohui Wang | Link | |
1091 | EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction | Xinyu Zhou, Peiqi Duan, Yi Ma, Boxin Shi | Link | |
1100 | Understanding 3D Object Articulation in Internet Videos | Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David F. Fouhey | Link | |
1104 | Synthetic Generation of Face Videos With Plethysmograph Physiology | Zhen Wang, Yunhao Ba, Pradyumna Chari, Oyku Deniz Bozkurt, Gianna Brown, Parth Patwa, Niranjan Vaddi, Laleh Jalilian, Achuta Kadambi | Link | |
1146 | Meta Distribution Alignment for Generalizable Person Re-Identification | Hao Ni, Jingkuan Song, Xiaopeng Luo, Feng Zheng, Wen Li, Heng Tao Shen | Link | |
1157 | Style-Based Global Appearance Flow for Virtual Try-On | Sen He, Yi-Zhe Song, Tao Xiang | Link | |
1181 | GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras | Ye Yuan, Umar Iqbal, Pavlo Molchanov, Kris Kitani, Jan Kautz | Link | |
1254 | Modeling 3D Layout for Group Re-Identification | Quan Zhang, Kaiheng Dang, Jian-Huang Lai, Zhanxiang Feng, Xiaohua Xie | Link | |
1319 | Audio-Adaptive Activity Recognition Across Video Domains | Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek | Link | |
1333 | Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos | Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang | Link | |
1453 | Universal Photometric Stereo Network Using Global Lighting Contexts | Satoshi Ikehata | Link | |
1501 | PTTR: Relational 3D Point Cloud Object Tracking With Transformer | Changqing Zhou, Zhipeng Luo, Yueru Luo, Tianrui Liu, Liang Pan, Zhongang Cai, Haiyu Zhao, Shijian Lu | Link | |
1503 | Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds | Zhao Jin, Yinjie Lei, Naveed Akhtar, Haifeng Li, Munawar Hayat | Link | |
1508 | Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation | Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha | Link | |
1514 | Object Localization Under Single Coarse Point Supervision | Xuehui Yu, Pengfei Chen, Di Wu, Najmul Hassan, Guorong Li, Junchi Yan, Humphrey Shi, Qixiang Ye, Zhenjun Han | Link | |
1533 | Learning Program Representations for Food Images and Cooking Recipes | Dim P. Papadopoulos, Enrique Mora, Nadiia Chepurko, Kuan Wei Huang, Ferda Ofli, Antonio Torralba | Link | |
1557 | Shape From Polarization for Complex Scenes in the Wild | Chenyang Lei, Chenyang Qi, Jiaxin Xie, Na Fan, Vladlen Koltun, Qifeng Chen | Link | |
1637 | Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network | Renshuai Tao, Hainan Li, Tianbo Wang, Yanlu Wei, Yifu Ding, Bowei Jin, Hongping Zhi, Xianglong Liu, Aishan Liu | Link | |
1658 | The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift | Sara Beery, Guanhang Wu, Trevor Edwards, Filip Pavetic, Bo Majewski, Shreyasee Mukherjee, Stanley Chan, John Morgan, Vivek Rathod, Jonathan Huang | Link | |
1667 | JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints | Karl D.D. Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G. Lambourne, Armando Solar-Lezama, Wojciech Matusik | Link | |
1672 | DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion | Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo | Link | |
1698 | UniCoRN: A Unified Conditional Image Repainting Network | Jimeng Sun, Shuchen Weng, Zheng Chang, Si Li, Boxin Shi | Link | |
1762 | Time Lens++: Event-Based Frame Interpolation With Parametric Non-Linear Flow and Multi-Scale Fusion | Stepan Tulyakov, Alfredo Bochicchio, Daniel Gehrig, Stamatios Georgoulis, Yuanyou Li, Davide Scaramuzza | Link | |
1780 | Episodic Memory Question Answering | Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh | Link | |
1783 | REX: Reasoning-Aware and Grounded Explanation | Shi Chen, Qi Zhao | Link | |
1795 | Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions | Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Anima Anandkumar | Link | |
1800 | CoNeRF: Controllable Neural Radiance Fields | Kacper Kania, Kwang Moo Yi, Marek Kowalski, Tomasz Trzciński, Andrea Tagliasacchi | Link | |
1811 | UnweaveNet: Unweaving Activity Stories | Will Price, Carl Vondrick, Dima Damen | Link | |
1820 | VisualHow: Multimodal Problem Solving | Jinhui Yang, Xianyu Chen, Ming Jiang, Shi Chen, Louis Wang, Qi Zhao | Link | |
1836 | Multi-Modal Extreme Classification | Anshul Mittal, Kunal Dahiya, Shreya Malani, Janani Ramaswamy, Seba Kuruvilla, Jitendra Ajmera, Keng-hao Chang, Sumeet Agarwal, Purushottam Kar, Manik Varma | Link | |
1950 | HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction | Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, Li Yi | Link | |
2027 | Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task | Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, Errui Ding | Link | |
2077 | Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities | Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, Angela Yao | Link | |
2086 | Autofocus for Event Cameras | Shijie Lin, Yinqiang Zhang, Lei Yu, Bin Zhou, Xiaowei Luo, Jia Pan | Link | |
2102 | Programmatic Concept Learning for Human Motion Description and Synthesis | Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu | Link | |
2107 | Temporal Alignment Networks for Long-Term Video | Tengda Han, Weidi Xie, Andrew Zisserman | Link | |
2170 | Point Cloud Color Constancy | Xiaoyan Xing, Yanlin Qian, Sibo Feng, Yuhan Dong, Jiří Matas | Link | |
2206 | Towards Unsupervised Domain Generalization | Xingxuan Zhang, Linjun Zhou, Renzhe Xu, Peng Cui, Zheyan Shen, Haoxin Liu | Link | |
2220 | Text2Pos: Text-to-Point-Cloud Cross-Modal Localization | Manuel Kolmet, Qunjie Zhou, Aljoša Ošep, Laura Leal-Taixé | Link | |
2221 | Opening Up Open World Tracking | Yang Liu, Idil Esen Zulfikar, Jonathon Luiten, Achal Dave, Deva Ramanan, Bastian Leibe, Aljoša Ošep, Laura Leal-Taixé | Link | |
2245 | Robust Image Forgery Detection Over Online Social Network Shared Images | Haiwei Wu, Jiantao Zhou, Jinyu Tian, Jun Liu | Link | |
2264 | ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection | Lingteng Qiu, Zhangyang Xiong, Xuhao Wang, Kenkun Liu, Yihan Li, Guanying Chen, Xiaoguang Han, Shuguang Cui | Link | |
2290 | FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset | Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang Ma, Liang Li, Yebin Liu | Link | |
2344 | Day-to-Night Image Synthesis for Training Nighttime Neural ISPs | Abhijith Punnappurath, Abdullah Abuolaim, Abdelrahman Abdelhamed, Alex Levinshtein, Michael S. Brown | Link | |
2345 | Playable Environments: Video Manipulation in Space and Time | Willi Menapace, Stéphane Lathuilière, Aliaksandr Siarohin, Christian Theobalt, Sergey Tulyakov, Vladislav Golyanik, Elisa Ricci | Link | Link 2 |
2373 | RIO: Rotation-Equivariance Supervised Learning of Robust Inertial Odometry | Xiya Cao, Caifa Zhou, Dandan Zeng, Yongliang Wang | Link | |
2468 | ONCE-3DLanes: Building Monocular 3D Lane Detection | Fan Yan, Ming Nie, Xinyue Cai, Jianhua Han, Hang Xu, Zhen Yang, Chaoqiang Ye, Yanwei Fu, Michael Bi Mi, Li Zhang | Link | |
2475 | ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer | Ruohan Gao, Zilin Si, Yen-Yu Chang, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, Jiajun Wu | Link | |
2485 | HairMapper: Removing Hair From Portraits Using GANs | Yiqian Wu, Yong-Liang Yang, Xiaogang Jin | Link | |
2532 | Stable Long-Term Recurrent Video Super-Resolution | Benjamin Naoto Chiche, Arnaud Woiselle, Joana Frontera-Pons, Jean-Luc Starck | Link | |
2533 | Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions With Superior OOD Generalization | Damien Teney, Ehsan Abbasnejad, Simon Lucey, Anton van den Hengel | Link | |
2547 | Exploring and Evaluating Image Restoration Potential in Dynamic Scenes | Cheng Zhang, Shaolin Su, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang | Link | |
2549 | Commonality in Natural Images Rescues GANs: Pretraining GANs With Generic and Privacy-Free Synthetic Data | Kyungjune Baek, Hyunjung Shim | Link | |
2614 | Neural Global Shutter: Learn To Restore Video From a Rolling Shutter Camera With Global Reset Feature | Zhixiang Wang, Xiang Ji, Jia-Bin Huang, Shin'ichi Satoh, Xiao Zhou, Yinqiang Zheng | Link | |
2630 | Stability-Driven Contact Reconstruction From Monocular Color Images | Zimeng Zhao, Binghui Zuo, Wei Xie, Yangang Wang | Link | |
2657 | Toward Practical Monocular Indoor Depth Estimation | Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su | Link | |
2658 | Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices? | Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann | Link | |
2683 | Unsupervised Domain Adaptation for Nighttime Aerial Tracking | Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, Guang Chen | Link | |
2709 | BokehMe: When Neural Rendering Meets Classical Rendering | Juewen Peng, Zhiguo Cao, Xianrui Luo, Hao Lu, Ke Xian, Jianming Zhang | Link | Link 2 |
2729 | FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment | Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, Jiwen Lu | Link | |
2769 | Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures via Dependency Relationships | Chao Lou, Wenjuan Han, Yuhuan Lin, Zilong Zheng | Link | |
2810 | How Good Is Aesthetic Ability of a Fashion Model? | Xingxing Zou, Kaicheng Pang, Wen Zhang, Waikeung Wong | Link | |
2836 | PINA: Learning a Personalized Implicit Neural Avatar From a Single RGB-D Video Sequence | Zijian Dong, Chen Guo, Jie Song, Xu Chen, Andreas Geiger, Otmar Hilliges | Link | |
2846 | AKB-48: A Real-World Articulated Object Knowledge Base | Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Qiaojun Yu, Yang Han, Cewu Lu | Link | |
2870 | Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading | Ganchao Tan, Yang Wang, Han Han, Yang Cao, Feng Wu, Zheng-Jun Zha | Link | |
2898 | Open-Set Text Recognition via Character-Context Decoupling | Chang Liu, Chun Yang, Xu-Cheng Yin | Link | |
2926 | It’s About Time: Analog Clock Reading in the Wild | Charig Yang, Weidi Xie, Andrew Zisserman | Link | |
2974 | Human Hands As Probes for Interactive Object Understanding | Mohit Goyal, Sahil Modi, Rishabh Goyal, Saurabh Gupta | Link | |
3056 | FS6D: Few-Shot 6D Pose Estimation of Novel Objects | Yisheng He, Yao Wang, Haoqiang Fan, Jian Sun, Qifeng Chen | Link | |
3073 | CLIP-Event: Connecting Text and Images With Event Structures | Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang | Link | |
3124 | HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR | Yudi Dai, Yitai Lin, Chenglu Wen, Siqi Shen, Lan Xu, Jingyi Yu, Yuexin Ma, Cheng Wang | Link | |
3296 | SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation | Tao Sun, Mattia Segu, Janis Postels, Yuxuan Wang, Luc Van Gool, Bernt Schiele, Federico Tombari, Fisher Yu | Link | |
3373 | Gait Recognition in the Wild With Dense 3D Representations and a Benchmark | Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei | Link | |
3384 | MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image | Xingyu Chen, Yufeng Liu, Yajiao Dong, Xiong Zhang, Chongyang Ma, Yanmin Xiong, Yuan Zhang, Xiaoyan Guo | Link | |
3405 | Pyramid Grafting Network for One-Stage High Resolution Saliency Detection | Chenxi Xie, Changqun Xia, Mingcan Ma, Zhirui Zhao, Xiaowu Chen, Jia Li | Link | |
3411 | Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation | Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, Vincent Lepetit | Link | |
3422 | FocalClick: Towards Practical Interactive Image Segmentation | Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao | Link | |
3549 | Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives | Xinke Li, Henghui Ding, Zekun Tong, Yuwei Wu, Yeow Meng Chee | Link | |
3564 | JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection | Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid Rezatofighi | Link | |
3594 | Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning | Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov | Link | |
3597 | Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data | Junfeng Lyu, Zhibo Wang, Feng Xu | Link | |
3621 | Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring | Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, Zhouhui Lian | Link | |
3656 | Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos | Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic | Link | |
3680 | Scribble-Supervised LiDAR Semantic Segmentation | Ozan Unal, Dengxin Dai, Luc Van Gool | Link | |
3686 | TableFormer: Table Structure Understanding With Transformers | Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar | Link | |
3693 | Transformer Based Line Segment Classifier With Image Context for Real-Time Vanishing Point Detection in Manhattan World | Xin Tong, Xianghua Ying, Yongjie Shi, Ruibin Wang, Jinfa Yang | Link | |
3703 | NeRFReN: Neural Radiance Fields With Reflections | Yuan-Chen Guo, Di Kang, Linchao Bao, Yu He, Song-Hai Zhang | Link | |
3717 | CroMo: Cross-Modal Learning for Monocular Depth Estimation | Yannick Verdié, Jifei Song, Barnabé Mas, Benjamin Busam, Ales̆ Leonardis, Steven McDonagh | Link | |
3784 | SIOD: Single Instance Annotated per Category per Image for Object Detection | Hanjun Li, Xingjia Pan, Ke Yan, Fan Tang, Wei-Shi Zheng | Link | |
3846 | 3D Common Corruptions and Data Augmentation | Oğuzhan Fatih Kar, Teresa Yeo, Andrei Atanov, Amir Zamir | Link | |
3942 | Deep Rectangling for Image Stitching: A Learning Baseline | Lang Nie, Chunyu Lin, Kang Liao, Shuaicheng Liu, Yao Zhao | Link | |
3978 | Discovering Objects That Can Move | Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert | Link | |
4049 | Structure-Aware Flow Generation for Human Body Reshaping | Jianqiang Ren, Yuan Yao, Biwen Lei, Miaomiao Cui, Xuansong Xie | Link | |
4070 | YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset | Donglai Wei, Siddhant Kharbanda, Sarthak Arora, Roshan Roy, Nishant Jain, Akash Palrecha, Tanav Shah, Shray Mathur, Ritik Mathur, Abhijay Kemkar, Anirudh Chakravarthy, Zudi Lin, Won-Dong Jang, Yansong Tang, Song Bai, James Tompkin, Philip H.S. Torr, Hanspeter Pfister | Link | |
4112 | Degree-of-Linear-Polarization-Based Color Constancy | Taishi Ono, Yuhi Kondo, Legong Sun, Teppei Kurita, Yusuke Moriuchi | Link | Link 2 |
4161 | Syntax-Aware Network for Handwritten Mathematical Expression Recognition | Ye Yuan, Xiao Liu, Wondimu Dikubab, Hui Liu, Zhilong Ji, Zhongqin Wu, Xiang Bai | Link | |
4199 | Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks | Wenwen Pan, Haonan Shi, Zhou Zhao, Jieming Zhu, Xiuqiang He, Zhigeng Pan, Lianli Gao, Jun Yu, Fei Wu, Qi Tian | Link | |
4213 | Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer | Fushun Zhu, Shan Zhao, Peng Wang, Hao Wang, Hua Yan, Shuaicheng Liu | Link | |
4238 | IDR: Self-Supervised Image Denoising via Iterative Data Refinement | Yi Zhang, Dasong Li, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li | Link | |
4286 | Video Demoiréing With Relation-Based Temporal Consistency | Peng Dai, Xin Yu, Lan Ma, Baoheng Zhang, Jia Li, Wenbo Li, Jiajun Shen, Xiaojuan Qi | Link | |
4307 | NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images | Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul P. Srinivasan, Jonathan T. Barron | Link | |
4313 | DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation | Aysim Toker, Lukas Kondmann, Mark Weber, Marvin Eisenberger, Andrés Camero, Jingliang Hu, Ariadna Pregel Hoderlein, Çağlar Şenaras, Timothy Davis, Daniel Cremers, Giovanni Marchisio, Xiao Xiang Zhu, Laura Leal-Taixé | Link | |
4315 | UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection | Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah | Link | |
4346 | ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes | Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, Kate Saenko | Link | |
4376 | ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation | Hanxiang Ren, Yanchao Yang, He Wang, Bokui Shen, Qingnan Fan, Youyi Zheng, C. Karen Liu, Leonidas J. Guibas | Link | |
4399 | Towards Multimodal Depth Estimation From Light Fields | Titus Leistner, Radek Mackowiak, Lynton Ardizzone, Ullrich Köthe, Carsten Rother | Link | |
4513 | Clothes-Changing Person Re-Identification With RGB Modality Only | Xinqian Gu, Hong Chang, Bingpeng Ma, Shutao Bai, Shiguang Shan, Xilin Chen | Link | |
4520 | From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering | Jiangtong Li, Li Niu, Liqing Zhang | Link | |
4623 | Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation | Aming Wu, Cheng Deng | Link | |
4626 | Human Instance Matting via Mutual Guidance and Multi-Instance Refinement | Yanan Sun, Chi-Keung Tang, Yu-Wing Tai | Link | |
4630 | Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts | Chongzhi Zhang, Mingyuan Zhang, Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Xianglong Liu, Ziwei Liu | Link | |
4685 | High-Fidelity Human Avatars From a Single RGB Camera | Hao Zhao, Jinsong Zhang, Yu-Kun Lai, Zerong Zheng, Yingdi Xie, Yebin Liu, Kun Li | Link | |
4702 | Kubric: A Scalable Dataset Generator | Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi | Link | Link 2 |
4732 | A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection | Sifeng He, Xudong Yang, Chen Jiang, Gang Liang, Wei Zhang, Tan Pan, Qing Wang, Furong Xu, Chunguang Li, JinXiong Liu, Hui Xu, Kaiming Huang, Yuan Cheng, Feng Qian, Xiaobo Zhang, Lei Yang | Link | |
4763 | Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal | Yi Li, Yi Chang, Yan Gao, Changfeng Yu, Luxin Yan | Link | |
4793 | Open Challenges in Deep Stereo: The Booster Dataset | Pierluigi Zama Ramirez, Fabio Tosi, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano | Link | |
4797 | BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations | Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Sanja Fidler, Antonio Torralba | Link | |
4844 | Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale | Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das | Link | |
4846 | WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery | N. Dinesh Reddy, Robert Tamburo, Srinivasa G. Narasimhan | Link | |
4884 | SimVQA: Exploring Simulated Environments for Visual Question Answering | Paola Cascante-Bonilla, Hui Wu, Letao Wang, Rogerio S. Feris, Vicente Ordonez | Link | |
4901 | Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture | Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem | Link | |
4909 | 3D Human Tongue Reconstruction From Single “In-the-Wild” Images | Stylianos Ploumpis, Stylianos Moschoglou, Vasileios Triantafyllou, Stefanos Zafeiriou | Link | |
4952 | EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching | Yaya Shi, Xu Yang, Haiyang Xu, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha | Link | |
4990 | Glass Segmentation Using Intensity and Spectral Polarization Cues | Haiyang Mei, Bo Dong, Wen Dong, Jiaxi Yang, Seung-Hwan Baek, Felix Heide, Pieter Peers, Xiaopeng Wei, Xin Yang | Link | |
5016 | Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light | Yuhua Xu, Xiaoli Yang, Yushan Yu, Wei Jia, Zhaobi Chu, Yulan Guo | Link | |
5042 | SVIP: Sequence VerIfication for Procedures in Videos | Yicheng Qian, Weixin Luo, Dongze Lian, Xu Tang, Peilin Zhao, Shenghua Gao | Link | |
5058 | Deep Saliency Prior for Reducing Visual Distraction | Kfir Aberman, Junfeng He, Yossi Gandelsman, Inbar Mosseri, David E. Jacobs, Kai Kohlhoff, Yael Pritch, Michael Rubinstein | Link | |
5075 | Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding | Xun Long Ng, Kian Eng Ong, Qichen Zheng, Yun Ni, Si Yong Yeo, Jun Liu | Link | |
5115 | PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound | Zhijian Yang, Xiaoran Fan, Volkan Isler, Hyun Soo Park | Link | |
5128 | Make It Move: Controllable Image-to-Video Generation With Text Descriptions | Yaosi Hu, Chong Luo, Zhenzhong Chen | Link | |
5174 | HDR-NeRF: High Dynamic Range Neural Radiance Fields | Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Xuan Wang, Qing Wang | Link | |
5175 | Neural Volumetric Object Selection | Zhongzheng Ren, Aseem Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang | Link | |
5258 | Fourier Document Restoration for Robust Document Dewarping and Recognition | Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai | Link | |
5267 | What Matters for Meta-Learning Vision Regression Tasks? | Ning Gao, Hanna Ziesche, Ngo Anh Vien, Michael Volpp, Gerhard Neumann | Link | |
5276 | Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation | Jian Zhang, Yuanqing Zhang, Huan Fu, Xiaowei Zhou, Bowen Cai, Jinchi Huang, Rongfei Jia, Binqiang Zhao, Xing Tang | Link | |
5303 | Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions | Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo | Link | |
5331 | Optical Flow Estimation for Spiking Camera | Liwen Hu, Rui Zhao, Ziluo Ding, Lei Ma, Boxin Shi, Ruiqin Xiong, Tiejun Huang | Link | |
5336 | Large-Scale Pre-Training for Person Re-Identification With Noisy Labels | Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen | Link | |
5355 | Finding Fallen Objects via Asynchronous Audio-Visual Integration | Chuang Gan, Yi Gu, Siyuan Zhou, Jeremy Schwartz, Seth Alter, James Traer, Dan Gutfreund, Joshua B. Tenenbaum, Josh H. McDermott, Antonio Torralba | Link | |
5401 | ViM: Out-of-Distribution With Virtual-Logit Matching | Haoqi Wang, Zhizhong Li, Litong Feng, Wayne Zhang | Link | |
5409 | Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows | Sheng Liu, Xiaohan Nie, Raffay Hamid | Link | |
5429 | FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction | Zhenpei Yang, Zhile Ren, Miguel Angel Bautista, Zaiwei Zhang, Qi Shan, Qixing Huang | Link | |
5449 | InOut: Diverse Image Outpainting via GAN Inversion | Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang | Link | |
5457 | Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality | Tristan Thrush, Ryan Jiang, Max Bartolo, Amanpreet Singh, Adina Williams, Douwe Kiela, Candace Ross | Link | |
5459 | CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data | Qi Yan, Jianhao Zheng, Simon Reding, Shanci Li, Iordan Doytchinov | Link | |
5491 | Dancing Under the Stars: Video Denoising in Starlight | Kristina Monakhova, Stephan R. Richter, Laura Waller, Vladlen Koltun | Link | |
5508 | BCOT: A Markerless High-Precision 3D Object Tracking Benchmark | Jiachen Li, Bin Wang, Shiqiang Zhu, Xin Cao, Fan Zhong, Wenxuan Chen, Te Li, Jason Gu, Xueying Qin | Link | |
5557 | GazeOnce: Real-Time Multi-Person Gaze Estimation | Mingfang Zhang, Yunfei Liu, Feng Lu | Link | |
5686 | Synthetic Aperture Imaging With Events and Frames | Wei Liao, Xiang Zhang, Lei Yu, Shijie Lin, Wen Yang, Ning Qiao | Link | |
5710 | Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes | Yang Li, Tatsuya Harada | Link | |
5728 | ISNet: Shape Matters for Infrared Small Target Detection | Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing Zhang, Jie Guo | Link | |
5780 | Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection | Chunyu Li, Yusuke Monno, Masatoshi Okutomi | Link | |
5827 | Forecasting Characteristic 3D Poses of Human Actions | Christian Diller, Thomas Funkhouser, Angela Dai | Link | |
5863 | SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage | Jiahao Yu, Li Chen, Mingrui Zhang, Mading Li | Link | |
5903 | Gated2Gated: Self-Supervised Depth Estimation From Gated Images | Amanpreet Walia, Stefanie Walz, Mario Bijelic, Fahim Mannan, Frank Julca-Aguilar, Michael Langer, Werner Ritter, Felix Heide | Link | |
5927 | Raw High-Definition Radar for Multi-Task Learning | Julien Rebut, Arthur Ouaknine, Waqas Malik, Patrick Pérez | Link | |
5938 | Revealing Occlusions With 4D Neural Fields | Basile Van Hoorick, Purva Tendulkar, Dídac Surís, Dennis Park, Simon Stent, Carl Vondrick | Link | |
5943 | Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles | Qing Liu, Adam Kortylewski, Zhishuai Zhang, Zizhang Li, Mengqi Guo, Qihao Liu, Xiaoding Yuan, Jiteng Mu, Weichao Qiu, Alan Yuille | Link | |
5948 | CellTypeGraph: A New Geometric Computer Vision Benchmark | Lorenzo Cerrone, Athul Vijayan, Tejasvinee Mody, Kay Schneitz, Fred A. Hamprecht | Link | |
5950 | Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning | Xiangyu Li, Xu Yang, Kun Wei, Cheng Deng, Muli Yang | Link | |
5956 | Reference-Based Video Super-Resolution Using Multi-Camera Video Triplets | Junyong Lee, Myeonghee Lee, Sunghyun Cho, Seungyong Lee | Link | |
5970 | Dual-Key Multimodal Backdoors for Visual Question Answering | Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, Susmit Jha | Link | |
6061 | ABO: Dataset and Benchmarks for Real-World 3D Object Understanding | Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, Jitendra Malik | Link | |
6070 | Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline | Kailai Zhou, Yibo Wang, Tao Lv, Yunqian Li, Linsen Chen, Qiu Shen, Xun Cao | Link | |
6128 | Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training | Xiao Lu, Yihong Cao, Sheng Liu, Chengjiang Long, Zipei Chen, Xuanyu Zhou, Yimin Yang, Chunxia Xiao | Link | |
6242 | STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes | Peishan Cong, Xinge Zhu, Feng Qiao, Yiming Ren, Xidong Peng, Yuenan Hou, Lan Xu, Ruigang Yang, Dinesh Manocha, Yuexin Ma | Link | |
6336 | Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification | Xinyu Lin, Jinxing Li, Zeyu Ma, Huafeng Li, Shuang Li, Kaixiong Xu, Guangming Lu, David Zhang | Link | |
6337 | Continual Predictive Learning From Videos | Geng Chen, Wendong Zhang, Han Lu, Siyu Gao, Yunbo Wang, Mingsheng Long, Xiaokang Yang | Link | |
6350 | PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images | Zhen Li, Lingli Wang, Xiang Huang, Cihui Pan, Jiaqi Yang | Link | |
6449 | BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation | Wenqiao Zhang, Lei Zhu, James Hallinan, Shengyu Zhang, Andrew Makmur, Qingpeng Cai, Beng Chin Ooi | Link | |
6482 | Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles | Jiaxun Cui, Hang Qiu, Dian Chen, Peter Stone, Yuke Zhu | Link | |
6512 | Practical Stereo Matching via Cascaded Recurrent Network With Adaptive Correlation | Jiankun Li, Peisen Wang, Pengfei Xiong, Tao Cai, Ziwei Yan, Lei Yang, Jiangyu Liu, Haoqiang Fan, Shuaicheng Liu | Link | |
6518 | Rethinking Visual Geo-Localization for Large-Scale Applications | Gabriele Berton, Carlo Masone, Barbara Caputo | Link | |
6545 | I M Avatar: Implicit Morphable Head Avatars From Videos | Yufeng Zheng, Victoria Fernández Abrevaya, Marcel C. Bühler, Xu Chen, Michael J. Black, Otmar Hilliges | Link | |
6567 | Grounding Answers for Visual Questions Asked by Visually Impaired People | Chongyan Chen, Samreen Anjum, Danna Gurari | Link | |
6607 | Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis | Karren Yang, Dejan Marković, Steven Krenn, Vasu Agrawal, Alexander Richard | Link | |
6633 | The Implicit Values of a Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement | Ilya Chugunov, Yuxuan Zhang, Zhihao Xia, Xuaner Zhang, Jiawen Chen, Felix Heide | Link | |
6645 | Virtual Elastic Objects | Hsiao-yu Chen, Edith Tretschk, Tuur Stuyck, Petr Kadlecek, Ladislav Kavan, Etienne Vouga, Christoph Lassner | Link | |
6684 | Towards End-to-End Unified Scene Text Detection and Layout Analysis | Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michalis Raptis | Link | |
6719 | Learning To Answer Questions in Dynamic Audio-Visual Scenarios | Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu | Link | |
6761 | Implicit Motion Handling for Video Camouflaged Object Detection | Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran Zhong, Mehrtash Harandi, Tom Drummond, Zongyuan Ge | Link | |
6786 | M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining | Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, Xiaoyong Wei, Minlong Lu, Yaowei Wang, Xiaodan Liang | Link | |
6790 | CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision | Ke Zhang, Xiahai Zhuang | Link | |
6883 | OSSO: Obtaining Skeletal Shape From Outside | Marilyn Keller, Silvia Zuffi, Michael J. Black, Sergi Pujades | Link | |
6891 | How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs | Hazel Doughty, Cees G. M. Snoek | Link | |
6918 | Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes | Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, Chang-Su Kim | Link | |
6960 | WebQA: Multihop and Multimodal QA | Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, Yonatan Bisk | Link | |
6991 | Relative Pose From a Calibrated and an Uncalibrated Smartphone Image | Yaqing Ding, Daniel Barath, Jian Yang, Zuzana Kukelova | Link | |
7001 | ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior | Metin Ersin Arican, Ozgur Kara, Gustav Bredell, Ender Konukoglu | Link | |
7005 | Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation | Zhaoyang Zeng, Yongsheng Luo, Zhenhua Liu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen | Link | |
7060 | Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention | Yu Yang, Seungbae Kim, Jungseock Joo | Link | |
7104 | Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision | Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Diogo Luvizon, Christian Theobalt | Link | |
7121 | Less Is More: Generating Grounded Navigation Instructions From Landmarks | Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson | Link | |
7201 | Speech Driven Tongue Animation | Salvador Medina, Denis Tome, Carsten Stoll, Mark Tiede, Kevin Munhall, Alexander G. Hauptmann, Iain Matthews | Link | |
7217 | IntentVizor: Towards Generic Query Guided Interactive Video Summarization | Guande Wu, Jianzhe Lin, Claudio T. Silva | Link | |
7236 | Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection | Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, Zhongxuan Luo | Link | |
7245 | Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination | Soma Nonaka, Shohei Nobuhara, Ko Nishino | Link | |
7259 | Dressing in the Wild by Watching Dance Videos | Xin Dong, Fuwei Zhao, Zhenyu Xie, Xijin Zhang, Daniel K. Du, Min Zheng, Xiang Long, Xiaodan Liang, Jianchao Yang | Link | |
7315 | V2C: Visual Voice Cloning | Qi Chen, Mingkui Tan, Yuankai Qi, Jiaqiu Zhou, Yuanqing Li, Qi Wu | Link | |
7354 | Reflection and Rotation Symmetry Detection via Equivariant Learning | Ahyun Seo, Byungjin Kim, Suha Kwak, Minsu Cho | Link | |
7381 | Optimal LED Spectral Multiplexing for NIR2RGB Translation | Lei Liu, Yuze Chen, Junchi Yan, Yinqiang Zheng | Link | |
7457 | Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources | Sahar Abdelnabi, Rakibul Hasan, Mario Fritz | Link | |
7477 | E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition | Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, Barbara Caputo | Link | |
7484 | Ego4D: Around the World in 3,000 Hours of Egocentric Video | Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina González, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jáchym Kolář, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbeláez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik | Link | |
7524 | Learning Adaptive Warping for Real-World Rolling Shutter Correction | Mingdeng Cao, Zhihang Zhong, Jiahao Wang, Yinqiang Zheng, Yujiu Yang | Link | |
7579 | Whose Hands Are These? Hand Detection and Hand-Body Association in the Wild | Supreeth Narasimhaswamy, Thanh Nguyen, Mingzhen Huang, Minh Hoai | Link | |
7596 | Multimodal Material Segmentation | Yupeng Liang, Ryosuke Wakaki, Shohei Nobuhara, Ko Nishino | Link | |
7612 | ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic | Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf | Link | |
7655 | PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects | Pengyuan Wang, HyunJun Jung, Yitong Li, Siyuan Shen, Rahul Parthasarathy Srikanth, Lorenzo Garattoni, Sven Meier, Nassir Navab, Benjamin Busam | Link | |
7747 | Modular Action Concept Grounding in Semantic Video Prediction | Wei Yu, Wenxin Chen, Songheng Yin, Steve Easterbrook, Animesh Garg | Link | |
7887 | Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion | Yuqi Sun, Shili Zhou, Ri Cheng, Weimin Tan, Bo Yan, Lang Fu | Link | |
7916 | ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo | Biwen Lei, Xiefan Guo, Hongyu Yang, Miaomiao Cui, Xuansong Xie, Di Huang | Link | |
7936 | GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains | Lei Fan, Yiwen Ding, Dongdong Fan, Donglin Di, Maurice Pagnucco, Yang Song | Link | |
8083 | Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph | Siwei Wang, Xinwang Liu, Li Liu, Wenxuan Tu, Xinzhong Zhu, Jiyuan Liu, Sihang Zhou, En Zhu | Link | |
8115 | Scaling Up Vision-Language Pre-Training for Image Captioning | Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang | Link | |
8152 | Learning To Detect Scene Landmarks for Camera Localization | Tien Do, Ondrej Miksik, Joseph DeGol, Hyun Soo Park, Sudipta N. Sinha | Link | |
8159 | Egocentric Scene Understanding via Multimodal Spatial Rectifier | Tien Do, Khiem Vuong, Hyun Soo Park | Link | |
8175 | Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark | Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang | Link | |
8217 | LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition | Dan Liu, Libo Zhang, Yanjun Wu | Link | |
8222 | Unifying Panoptic Segmentation for Autonomous Driving | Oliver Zendel, Matthias Schörghuber, Bernhard Rainer, Markus Murschitz, Csaba Beleznai | Link | |
8233 | NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night | Xueqing Deng, Peng Wang, Xiaochen Lian, Shawn Newsam | Link | |
8277 | Neural 3D Video Synthesis From Multi-View Video | Tianye Li, Mira Slavcheva, Michael Zollhöfer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, Zhaoyang Lv | Link | |
8338 | Modeling Indirect Illumination for Inverse Rendering | Yuanqing Zhang, Jiaming Sun, Xingyi He, Huan Fu, Rongfei Jia, Xiaowei Zhou | Link | |
8352 | Knowledge Mining With Scene Text for Fine-Grained Recognition | Hao Wang, Junchao Liao, Tianheng Cheng, Zewen Gao, Hao Liu, Bo Ren, Xiang Bai, Wenyu Liu | Link | |
8360 | Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning | Soumen Basu, Mayank Gupta, Pratyaksha Rana, Pankaj Gupta, Chetan Arora | Link | |
8367 | Multi-Person Extreme Motion Prediction | Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer | Link | |
8373 | Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method | Lai Jiang, Yifei Li, Shengxi Li, Mai Xu, Se Lei, Yichen Guo, Bo Huang | Link | |
8388 | Personalized Image Aesthetics Assessment With Rich Attributes | Yuzhe Yang, Liwu Xu, Leida Li, Nan Qie, Yaqian Li, Peng Zhang, Yandong Guo | Link | |
8504 | Geometric Structure Preserving Warp for Natural Image Stitching | Peng Du, Jifeng Ning, Jiguang Cui, Shaoli Huang, Xinchao Wang, Jiaxin Wang | Link | |
8572 | Abandoning the Bayer-Filter To See in the Dark | Xingbo Dong, Wanyan Xu, Zhihui Miao, Lan Ma, Chao Zhang, Jiewen Yang, Zhe Jin, Andrew Beng Jin Teoh, Jiajun Shen | Link | |
8650 | RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising | Michael Schelling, Pedro Hermosilla, Timo Ropinski | Link | |
8775 | Learning To Learn and Remember Super Long Multi-Domain Task Sequence | Zhenyi Wang, Li Shen, Tiehang Duan, Donglin Zhan, Le Fang, Mingchen Gao | Link | |
8783 | FLOAT: Factorized Learning of Object Attributes for Improved Multi-Object Multi-Part Scene Parsing | Rishubh Singh, Pranav Gupta, Pradeep Shenoy, Ravikiran Sarvadevabhatla | Link | |
8832 | Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites | Ngoc Long Nguyen, Jérémy Anger, Axel Davy, Pablo Arias, Gabriele Facciolo | Link | |
8857 | Geometry-Aware Guided Loss for Deep Crack Recognition | Zhuangzhuang Chen, Jin Zhang, Zhuonan Lai, Jie Chen, Zun Liu, Jianqiang Li | Link | |
8893 | BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild | Xixi Xu, Zhongang Qi, Jianqi Ma, Honglun Zhang, Ying Shan, Xiaohu Qie | Link | Link 2 |
8894 | Stereo Magnification With Multi-Layer Images | Taras Khakhulin, Denis Korzhenkov, Pavel Solovev, Gleb Sterkin, Andrei-Timotei Ardelean, Victor Lempitsky | Link | |
8898 | Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection | Jiang Liu, Alexander Levine, Chun Pong Lau, Rama Chellappa, Soheil Feizi | Link | |
8910 | CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters | Paul Gavrikov, Janis Keuper | Link | |
8956 | RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation | Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano | Link | |
8979 | Maintaining Reasoning Consistency in Compositional Visual Question Answering | Chenchen Jing, Yunde Jia, Yuwei Wu, Xinyu Liu, Qi Wu | Link | |
9012 | Weakly-Supervised Metric Learning With Cross-Module Communications for the Classification of Anterior Chamber Angle Images | Jingqi Huang, Yue Ning, Dong Nie, Linan Guan, Xiping Jia | Link | |
9027 | Towards Low-Cost and Efficient Malaria Detection | Waqas Sultani, Wajahat Nawaz, Syed Javed, Muhammad Sohail Danish, Asma Saadia, Mohsen Ali | Link | |
9029 | PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking | Andreas Döring, Di Chen, Shanshan Zhang, Bernt Schiele, Jürgen Gall | Link | |
9179 | The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting | Ryan Szeto, Jason J. Corso | Link | |
9212 | Scanline Homographies for Rolling-Shutter Plane Absolute Pose | Fang Bai, Agniva Sengupta, Adrien Bartoli | Link | |
9326 | Towards Principled Disentanglement for Domain Generalization | Hanlin Zhang, Yi-Fan Zhang, Weiyang Liu, Adrian Weller, Bernhard Schölkopf, Eric P. Xing | Link | |
9349 | Image Based Reconstruction of Liquids From 2D Surface Detections | Florian Richter, Ryan K. Orosco, Michael C. Yip | Link | |
9391 | 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection | Alexander Lehner, Stefano Gasperini, Alvaro Marcos-Ramiro, Michael Schmidt, Mohammad-Ali Nikouei Mahani, Nassir Navab, Benjamin Busam, Federico Tombari | Link | |
9398 | A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes | Mazda Moayeri, Phillip Pope, Yogesh Balaji, Soheil Feizi | Link | |
9405 | Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs | Haithem Turki, Deva Ramanan, Mahadev Satyanarayanan | Link | |
9422 | Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields | Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman | Link | |
9480 | SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks | Xianling Zhang, Nathan Tseng, Ameerah Syed, Rohan Bhasin, Nikita Jaipuria | Link | |
9567 | Deep Image-Based Illumination Harmonization | Zhongyun Bao, Chengjiang Long, Gang Fu, Daquan Liu, Yuanzhen Li, Jiaming Wu, Chunxia Xiao | Link | |
9590 | MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction | Mihee Lee, Samuel S. Sohn, Seonghyeon Moon, Sejong Yoon, Mubbasir Kapadia, Vladimir Pavlovic | Link | |
9736 | Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation | Nathaniel Merrill, Yuliang Guo, Xingxing Zuo, Xinyu Huang, Stefan Leutenegger, Xi Peng, Liu Ren, Guoquan Huang | Link | |
9778 | Egocentric Prediction of Action Target in 3D | Yiming Li, Ziang Cao, Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Chen Feng | Link | |
9802 | MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound | Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi | Link | |
9856 | Disentangling Visual Embeddings for Attributes and Objects | Nirat Saini, Khoi Pham, Abhinav Shrivastava | Link | |
9913 | AutoMine: An Unmanned Mine Dataset | Yuchen Li, Zixuan Li, Siyu Teng, Yu Zhang, Yuhang Zhou, Yuchang Zhu, Dongpu Cao, Bin Tian, Yunfeng Ai, Zhe Xuanyuan, Long Chen | Link | |
9930 | Memory-Augmented Non-Local Attention for Video Super-Resolution | Jiyang Yu, Jingen Liu, Liefeng Bo, Tao Mei | Link | |
10074 | Domain Adaptation on Point Clouds via Geometry-Aware Implicits | Yuefan Shen, Yanchao Yang, Mi Yan, He Wang, Youyi Zheng, Leonidas J. Guibas | Link | |
10156 | Brain-Supervised Image Editing | Keith M. Davis III, Carlos de la Torre-Ortiz, Tuukka Ruotsalo | Link | |
10159 | Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors | Yun-Chun Chen, Haoda Li, Dylan Turpin, Alec Jacobson, Animesh Garg | Link | |
10327 | DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image | Tetiana Martyniuk, Orest Kupyn, Yana Kurlyak, Igor Krashenyi, Jiří Matas, Viktoriia Sharmanska | Link | |
10369 | Spatial Commonsense Graph for Object Localisation in Partial Scenes | Francesco Giuliari, Geri Skenderi, Marco Cristani, Yiming Wang, Alessio Del Bue | Link | |
10392 | 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos | Vikram Gupta, Trisha Mittal, Puneet Mathur, Vaibhav Mishra, Mayank Maheshwari, Aniket Bera, Debdoot Mukherjee, Dinesh Manocha | Link | Link 2 |
10404 | Upright-Net: Learning Upright Orientation for 3D Point Cloud | Xufang Pang, Feng Li, Ning Ding, Xiaopin Zhong | Link | |
10407 | DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection | Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, Zaiqing Nie | Link | |
10467 | Enabling Equivariance for Arbitrary Lie Groups | Lachlan E. MacDonald, Sameera Ramasinghe, Simon Lucey | Link | |
10504 | TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting | Huazhang Hu, Sixun Dong, Yiqun Zhao, Dongze Lian, Zhengxin Li, Shenghua Gao | Link | |
10801 | Amodal Panoptic Segmentation | Rohit Mohan, Abhinav Valada | Link | |
10849 | Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language | Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata | Link | |
10899 | Towards Driving-Oriented Metric for Lane Detection Models | Takami Sato, Qi Alfred Chen | Link | |
10995 | Globetrotter: Connecting Languages by Connecting Images | Dídac Surís, Dave Epstein, Carl Vondrick | Link | |
11034 | KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos | David Novotny, Ignacio Rocco, Samarth Sinha, Alexandre Carlier, Gael Kerchenbaum, Roman Shapovalov, Nikita Smetanin, Natalia Neverova, Benjamin Graham, Andrea Vedaldi | Link | |
11067 | It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection | Youssef Mohamed, Faizan Farooq Khan, Kilichbek Haydarov, Mohamed Elhoseiny | Link | |
11094 | Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders | Maksim Makarenko, Arturo Burguete-Lopez, Qizhou Wang, Fedor Getman, Silvio Giancola, Bernard Ghanem, Andrea Fratalocchi | Link | Link 2 |
11097 | SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis | Anastasiia Kornilova, Marsel Faizullin, Konstantin Pakulev, Andrey Sadkov, Denis Kukushkin, Azat Akhmetyanov, Timur Akhtyamov, Hekmat Taherinejad, Gonzalo Ferrer | Link | |
11100 | Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation | Marco Cipriano, Stefano Allegretti, Federico Bolelli, Federico Pollastri, Costantino Grana | Link | |
11143 | Do Learned Representations Respect Causal Relationships? | Lan Wang, Vishnu Naresh Boddeti | Link | |
11269 | Measuring Compositional Consistency for Video Question Answering | Mona Gandhi, Mustafa Omer Gul, Eva Prakash, Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala | Link | |
11454 | PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents | Brandon Smock, Rohith Pesala, Robin Abraham | Link | |
11477 | LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds | Jialian Li, Jingyi Zhang, Zhiyong Wang, Siqi Shen, Chenglu Wen, Yuexin Ma, Lan Xu, Jingyi Yu, Cheng Wang | Link | |
11529 | Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches | Jin-Man Park, Ue-Hwan Kim, Seon-Hoon Lee, Jong-Hwan Kim | Link | |
11551 | M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers | Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang | Link | |
11561 | ScanQA: 3D Question Answering for Spatial Scene Understanding | Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe | Link | |
11631 | OnePose: One-Shot Object Pose Estimation Without CAD Models | Jiaming Sun, Zihao Wang, Siyu Zhang, Xingyi He, Hongcheng Zhao, Guofeng Zhang, Xiaowei Zhou | Link | |
11670 | Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions | Carlos A. Diaz-Ruiz, Youya Xia, Yurong You, Jose Nino, Junan Chen, Josephine Monica, Xiangyu Chen, Katie Luo, Yan Wang, Marc Emond, Wei-Lun Chao, Bharath Hariharan, Kilian Q. Weinberger, Mark Campbell | Link | |
11674 | Zero-Shot Text-Guided Object Generation With Dream Fields | Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole | Link | |
11846 | Learning Video Representations of Human Motion From Synthetic Data | Xi Guo, Wei Wu, Dongliang Wang, Jing Su, Haisheng Su, Weihao Gan, Jian Huang, Qin Yang | Link |