Presentation Schedule
All times are in Central time zone
Date: Tuesday, June 21, 2022 2:30PM – 5:00PM
Session Title | Poster ID | Title | Authors |
Video Analysis & Understanding | 46b | Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning |
Juncheng Li; Junlin Xie; Long Qian; Linchao Zhu; Siliang Tang; Fei Wu; Yi Yang; Yueting Zhuang; Xin Eric Wang |
47b | UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection |
Ye Liu; Siyuan Li; Yang Wu; Chang-Wen Chen; Ying Shan; Xiaohu Qie |
|
|
48b | Future Transformer for Long-Term Action Anticipation |
Dayoung Gong; Joonseok Lee; Manjin Kim; Seong Jong Ha; Minsu Cho |
49b | MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing |
Zhaofan Qiu; Ting Yao; Chong-Wah Ngo; Tao Mei |
|
50b | Learning Pixel-Level Distinctions for Video Highlight Detection |
Fanyue Wei; Biao Wang; Tiezheng Ge; Yuning Jiang; Wen Li; Lixin Duan |
|
51b | DR.VIC: Decomposition and Reasoning for Video Individual Counting |
Tao Han; Lei Bai; Junyu Gao; Qi Wang; Wanli Ouyang |
|
52b | Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation |
Yi Zhou; Hui Zhang; Hana Lee; Shuyang Sun; Pingjun Li; Yangguang Zhu; ByungIn Yoo; Xiaojuan Qi; Jae-Joon Han |
|
53b | Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline |
Kailai Zhou; Yibo Wang; Tao Lv; Yunqian Li; Linsen Chen; Qiu Shen; Xun Cao |
|
54b | Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training |
Xiao Lu; Yihong Cao; Sheng Liu; Chengjiang Long; Zipei Chen; Xuanyu Zhou; Yimin Yang; Chunxia Xiao |
|
55b | Coarse-To-Fine Feature Mining for Video Semantic Segmentation |
Guolei Sun; Yun Liu; Henghui Ding; Thomas Probst; Luc Van Gool |
|
56b | Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation |
Zhaoyang Zeng; Yongsheng Luo; Zhenhua Liu; Fengyun Rao; Dian Li; Weidong Guo; Zhen Wen |
|
57b | Object-Region Video Transformers |
Roei Herzig; Elad Ben-Avraham; Karttikeya Mangalam; Amir Bar; Gal Chechik; Anna Rohrbach; Trevor Darrell; Amir Globerson |
|
58b | Colar: Effective and Efficient Online Action Detection by Consulting Exemplars | Le Yang; Junwei Han; Dingwen Zhang | |
59b | SimVP: Simpler Yet Better Video Prediction |
Zhangyang Gao; Cheng Tan; Lirong Wu; Stan Z. Li |
|
60b | Imposing Consistency for Optical Flow Estimation |
Jisoo Jeong; Jamie Menjay Lin; Fatih Porikli; Nojun Kwak |
|
61b | Stand-Alone Inter-Frame Attention in Video Models |
Fuchen Long; Zhaofan Qiu; Yingwei Pan; Ting Yao; Jiebo Luo; Tao Mei |
|
62b | Video Swin Transformer |
Ze Liu; Jia Ning; Yue Cao; Yixuan Wei; Zheng Zhang; Stephen Lin; Han Hu |
|
63b | Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection | Hitesh Sapkota; Qi Yu | |
64b | Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes |
Zengjie Song; Yuxi Wang; Junsong Fan; Tieniu Tan; Zhaoxiang Zhang |
|
65b | Likert Scoring With Grade Decoupling for Long-Term Action Assessment | Angchi Xu; Ling-An Zeng; Wei-Shi Zheng | |
66b | Complex Video Action Reasoning via Learnable Markov Logic Network | Yang Jin; Linchao Zhu; Yadong Mu | |
67b | Learning From Temporal Gradient for Semi-Supervised Action Recognition |
Junfei Xiao; Longlong Jing; Lin Zhang; Ju He; Qi She; Zongwei Zhou; Alan Yuille; Yingwei Li |
|
68b | Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction | Jiafan Zhuang; Zilei Wang; Yuan Gao | |
69b | Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation | Linjiang Huang; Liang Wang; Hongsheng Li | |
70b | Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos |
Shaowei Liu; Subarna Tripathi; Somdeb Majumdar; Xiaolong Wang |
|
71b | Human Hands As Probes for Interactive Object Understanding |
Mohit Goyal; Sahil Modi; Rishabh Goyal; Saurabh Gupta |
|
72b | LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition | Dan Liu; Libo Zhang; Yanjun Wu | |
73b | Object-Aware Video-Language Pre-Training for Retrieval |
Jinpeng Wang; Yixiao Ge; Guanyu Cai; Rui Yan; Xudong Lin; Ying Shan; Xiaohu Qie; Mike Zheng Shou |
|
74b | Fast and Unsupervised Action Boundary Detection for Action Segmentation |
Zexing Du; Xue Wang; Guoqing Zhou; Qing Wang |
|
75b | Multiview Transformers for Video Recognition |
Shen Yan; Xuehan Xiong; Anurag Arnab; Zhichao Lu; Mi Zhang; Chen Sun; Cordelia Schmid |
|
76b | Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos | Yuhan Shen; Ehsan Elhamifar | |
77b | Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection |
Jiaqi Tang; Zhaoyang Liu; Chen Qian; Wayne Wu; Limin Wang |
|
78b | Comparing Correspondences: Video Prediction With Correspondence-Wise Losses |
Daniel Geng; Max Hamilton; Andrew Owens |
|
Image & Video Synthesis and Generation | 79b | Sound-Guided Semantic Image Manipulation |
Seung Hyun Lee; Wonseok Roh; Wonmin Byeon; Sang Ho Yoon; Chanyoung Kim; Jinkyu Kim; Sangpil Kim |
80b | Expressive Talking Head Generation With Granular Audio-Visual Control |
Borong Liang; Yan Pan; Zhizhi Guo; Hang Zhou; Zhibin Hong; Xiaoguang Han; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang |
|
81b | Depth-Aware Generative Adversarial Network for Talking Head Video Generation |
Fa-Ting Hong; Longhao Zhang; Li Shen; Dan Xu |
|
82b | Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans From a Single Camera |
Jae Shin Yoon; Duygu Ceylan; Tuanfeng Y. Wang; Jingwan Lu; Jimei Yang; Zhixin Shu; Hyun Soo Park |
|
83b | Audio-Driven Neural Gesture Reenactment With Video Motion Graphs |
Yang Zhou; Jimei Yang; Dingzeyu Li; Jun Saito; Deepali Aneja; Evangelos Kalogerakis |
|
84b | Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data | Junfeng Lyu; Zhibo Wang; Feng Xu | |
85b | Weakly Supervised High-Fidelity Clothing Model Generation |
Ruili Feng; Cheng Ma; Chengji Shen; Xin Gao; Zhenjiang Liu; Xiaobo Li; Kairi Ou; Deli Zhao; Zheng-Jun Zha |
|
86b | TemporalUV: Capturing Loose Clothing With Temporally Coherent UV Coordinates |
You Xie; Huiqi Mao; Angela Yao; Nils Thuerey |
|
87b | Full-Range Virtual Try-On With Recurrent Tri-Level Transform | Han Yang; Xinrui Yu; Ziwei Liu | |
88b | Style-Based Global Appearance Flow for Virtual Try-On | Sen He; Yi-Zhe Song; Tao Xiang | |
89b | Dressing in the Wild by Watching Dance Videos |
Xin Dong; Fuwei Zhao; Zhenyu Xie; Xijin Zhang; Daniel K. Du; Min Zheng; Xiang Long; Xiaodan Liang; Jianchao Yang |
|
90b | A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres |
Jinwoo Kim; Heeseok Oh; Seongjean Kim; Hoseok Tong; Sanghoon Lee |
|
91b | Unpaired Cartoon Image Synthesis via Gated Cycle Mapping |
Yifang Men; Yuan Yao; Miaomiao Cui; Zhouhui Lian; Xuansong Xie; Xian-Sheng Hua |
|
92b | DLFormer: Discrete Latent Transformer for Video Inpainting |
Jingjing Ren; Qingqing Zheng; Yuanyuan Zhao; Xuemiao Xu; Chen Li |
|
93b | ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation | Duolikun Danier; Fan Zhang; David Bull | |
94b | Video Frame Interpolation With Transformer |
Liying Lu; Ruizheng Wu; Huaijia Lin; Jiangbo Lu; Jiaya Jia |
|
95b | Long-Term Video Frame Interpolation via Feature Propagation | Dawit Mureja Argaw; In So Kweon | |
96b | Many-to-Many Splatting for Efficient Video Frame Interpolation |
Ping Hu; Simon Niklaus; Stan Sclaroff; Kate Saenko |
|
97b | Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image | Xuanchi Ren; Xiaolong Wang | |
98b | Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning |
Mengshun Hu; Kui Jiang; Liang Liao; Jing Xiao; Junjun Jiang; Zheng Wang |
|
99b | Playable Environments: Video Manipulation in Space and Time |
Willi Menapace; Stéphane Lathuilière; Aliaksandr Siarohin; Christian Theobalt; Sergey Tulyakov; Vladislav Golyanik; Elisa Ricci |
|
100b | Event-Based Video Reconstruction via Potential-Assisted Spiking Neural Network |
Lin Zhu; Xiao Wang; Yi Chang; Jianing Li; Tiejun Huang; Yonghong Tian |
|
101b | Modular Action Concept Grounding in Semantic Video Prediction |
Wei Yu; Wenxin Chen; Songheng Yin; Steve Easterbrook; Animesh Garg |
|
102b | Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning |
Ligong Han; Jian Ren; Hsin-Ying Lee; Francesco Barbieri; Kyle Olszewski; Shervin Minaee; Dimitris Metaxas; Sergey Tulyakov |
|
103b | StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2 |
Ivan Skorokhodov; Sergey Tulyakov; Mohamed Elhoseiny |
|
104b | Structure-Aware Motion Transfer With Deformable Anchor Model |
Jiale Tao; Biao Wang; Borun Xu; Tiezheng Ge; Yuning Jiang; Wen Li; Lixin Duan |
|
105b | Image Animation With Perturbed Masks | Yoav Shalev; Lior Wolf | |
106b | Thin-Plate Spline Motion Model for Image Animation | Jian Zhao; Hui Zhang | |
107b | Controllable Animation of Fluid Elements in Still Images | Aniruddha Mahapatra; Kuldeep Kulkarni | |
108b | Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects |
Atsuhiro Noguchi; Umar Iqbal; Jonathan Tremblay; Tatsuya Harada; Orazio Gallo |
|
109b | Geometric Structure Preserving Warp for Natural Image Stitching |
Peng Du; Jifeng Ning; Jiguang Cui; Shaoli Huang; Xinchao Wang; Jiaxin Wang |
|
110b | Few-Shot Incremental Learning for Label-to-Image Translation |
Pei Chen; Yangkang Zhang; Zejian Li; Lingyun Sun |
|
111b | Exemplar-Based Pattern Synthesis With Implicit Periodic Field Network |
Haiwei Chen; Jiayi Liu; Weikai Chen; Shichen Liu; Yajie Zhao |
|
112b | SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks |
Xianling Zhang; Nathan Tseng; Ameerah Syed; Rohan Bhasin; Nikita Jaipuria |
|
113b | SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage |
Jiahao Yu; Li Chen; Mingrui Zhang; Mading Li |
|
114b | PILC: Practical Image Lossless Compression With an End-to-End GPU Oriented Neural Framework |
Ning Kang; Shanzhao Qiu; Shifeng Zhang; Zhenguo Li; Shu-Tao Xia |
|
115b | Kubric: A Scalable Dataset Generator |
Klaus Greff; Francois Belletti; Lucas Beyer; Carl Doersch; Yilun Du; Daniel Duckworth; David J. Fleet; Dan Gnanapragasam; Florian Golemo; Charles Herrmann; Thomas Kipf; Abhijit Kundu; Dmitry Lagun; Issam Laradji; Hsueh-Ti (Derek) Liu; Henning Meyer; Yishu Miao; Derek Nowrouzezahrai; Cengiz Oztireli; Etienne Pot; Noha Radwan; Daniel Rebain; Sara Sabour; Mehdi S. M. Sajjadi; Matan Sela; Vincent Sitzmann; Austin Stone; Deqing Sun; Suhani Vora; Ziyu Wang; Tianhao Wu; Kwang Moo Yi; Fangcheng Zhong; Andrea Tagliasacchi |
|
3D From Single Images | 116b | 360MonoDepth: High-Resolution 360° Monocular Depth Estimation |
Manuel Rey-Area; Mingze Yuan; Christian Richardt |
117b | Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction |
Kalyan Vasudev Alwala; Abhinav Gupta; Shubham Tulsiani |
|
118b | DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation |
Tuo Cao; Fei Luo; Yanping Fu; Wenxiao Zhang; Shengjie Zheng; Chunxia Xiao |
|
119b | MonoGround: Detecting Monocular 3D Objects From the Ground | Zequn Qin; Xi Li | |
120b | 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow |
Xin Wen; Junsheng Zhou; Yu-Shen Liu; Hua Su; Zhen Dong; Zhizhong Han |
|
121b | Toward Practical Monocular Indoor Depth Estimation |
Cho-Ying Wu; Jialiang Wang; Michael Hall; Ulrich Neumann; Shuochen Su |
|
122b | Focal Length and Object Pose Estimation via Render and Compare |
Georgy Ponimatkin; Yann Labbé; Bryan Russell; Mathieu Aubry; Josef Sivic |
|
123b | CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields |
Can Wang; Menglei Chai; Mingming He; Dongdong Chen; Jing Liao |
|
124b | Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images |
Heming Zhu; Lingteng Qiu; Yuda Qiu; Xiaoguang Han |
|
125b | Layered Depth Refinement With Mask Guidance |
Soo Ye Kim; Jianming Zhang; Simon Niklaus; Yifei Fan; Simon Chen; Zhe Lin; Munchurl Kim |
|
126b | HEAT: Holistic Edge Attention Transformer for Structured Reconstruction |
Jiacheng Chen; Yiming Qian; Yasutaka Furukawa |
|
127b | BARC: Learning To Regress 3D Dog Shape From Images by Exploiting Breed Information |
Nadine Rüegg; Silvia Zuffi; Konrad Schindler; Michael J. Black |
|
128b | Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving | Peixuan Li; Jieyu Jin | |
129b | What’s in Your Hands? 3D Reconstruction of Generic Objects in Hands | Yufei Ye; Abhinav Gupta; Shubham Tulsiani | |
130b | 3D Moments From Near-Duplicate Photos |
Qianqian Wang; Zhengqi Li; David Salesin; Noah Snavely; Brian Curless; Janne Kontkanen |
|
131b | Neural Window Fully-Connected CRFs for Monocular Depth Estimation |
Weihao Yuan; Xiaodong Gu; Zuozhuo Dai; Siyu Zhu; Ping Tan |
|
132b | PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors |
Jérome Revaud; Vincent Leroy; Philippe Weinzaepfel; Boris Chidlovskii |
|
133b | CroMo: Cross-Modal Learning for Monocular Depth Estimation |
Yannick Verdié; Jifei Song; Barnabé Mas; Benjamin Busam; Ales̆ Leonardis; Steven McDonagh |
|
134b | f-SfT: Shape-From-Template With a Physics-Based Deformation Model |
Navami Kairanda; Edith Tretschk; Mohamed Elgharib; Christian Theobalt; Vladislav Golyanik |
|
135b | Human-Aware Object Placement for Visual Environment Reconstruction |
Hongwei Yi; Chun-Hao P. Huang; Dimitrios Tzionas; Muhammed Kocabas; Mohamed Hassan; Siyu Tang; Justus Thies; Michael J. Black |
|
136b | AutoRF: Learning 3D Object Radiance Fields From Single View Observations |
Norman Müller; Andrea Simonelli; Lorenzo Porzi; Samuel Rota Bulò; Matthias Nießner; Peter Kontschieder |
|
137b | Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation |
Shengqu Cai; Anton Obukhov; Dengxin Dai; Luc Van Gool |
|
138b | MonoScene: Monocular 3D Semantic Scene Completion | Anh-Quan Cao; Raoul de Charette | |
139b | GenDR: A Generalized Differentiable Renderer |
Felix Petersen; Bastian Goldluecke; Christian Borgelt; Oliver Deussen |
|
140b | MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer |
Kuan-Chih Huang; Tsung-Han Wu; Hung-Ting Su; Winston H. Hsu |
|
141b | ROCA: Robust CAD Model Retrieval and Alignment From a Single Image | Can Gümeli; Angela Dai; Matthias Nießner | |
Face & Gestures | 142b | HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network |
Chang Yu; Xiangyu Zhu; Xiaomei Zhang; Zidu Wang; Zhaoxiang Zhang; Zhen Lei |
143b | Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC |
Xiang An; Jiankang Deng; Jia Guo; Ziyong Feng; XuHan Zhu; Jing Yang; Tongliang Liu |
|
144b | Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning |
Jiahao Xia; Weiwei Qu; Wenjian Huang; Jianguo Zhang; Xi Wang; Min Xu |
|
145b | Enhancing Face Recognition With Self-Supervised 3D Reconstruction |
Mingjie He; Jie Zhang; Shiguang Shan; Xilin Chen |
|
146b | Learning To Learn Across Diverse Data Biases in Deep Face Recognition |
Chang Liu; Xiang Yu; Yi-Hsuan Tsai; Masoud Faraki; Ramin Moslemi; Manmohan Chandraker; Yun Fu |
|
147b | An Efficient Training Approach for Very Large Scale Face Recognition |
Kai Wang; Shuo Wang; Panpan Zhang; Zhipeng Zhou; Zheng Zhu; Xiaobo Wang; Xiaojiang Peng; Baigui Sun; Hao Li; Yang You |
|
148b | MogFace: Towards a Deeper Appreciation on Face Detection |
Yang Liu; Fei Wang; Jiankang Deng; Zhipeng Zhou; Baigui Sun; Hao Li |
|
149b | Exploring Frequency Adversarial Attacks for Face Forgery Detection |
Shuai Jia; Chao Ma; Taiping Yao; Bangjie Yin; Shouhong Ding; Xiaokang Yang |
|
150b | End-to-End Reconstruction-Classification Learning for Face Forgery Detection |
Junyi Cao; Chao Ma; Taiping Yao; Shen Chen; Shouhong Ding; Xiaokang Yang |
|
151b | Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing |
Zhuo Wang; Zezheng Wang; Zitong Yu; Weihong Deng; Jiahong Li; Tingting Gao; Zhongyuan Wang |
|
152b | Privacy-Preserving Online AutoML for Domain-Specific Face Detection |
Chenqian Yan; Yuge Zhang; Quanlu Zhang; Yaming Yang; Xinyang Jiang; Yuqing Yang; Baoyuan Wang |
|
153b | Simulated Adversarial Testing of Face Recognition Models |
Nataniel Ruiz; Adam Kortylewski; Weichao Qiu; Cihang Xie; Sarah Adel Bargal; Alan Yuille; Stan Sclaroff |
|
154b | Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing |
Qingping Zheng; Jiankang Deng; Zheng Zhu; Ying Li; Stefanos Zafeiriou |
|
155b | Towards Semi-Supervised Deep Facial Expression Recognition With an Adaptive Confidence Margin |
Hangyu Li; Nannan Wang; Xi Yang; Xiaoyu Wang; Xinbo Gao |
|
156b | Towards Accurate Facial Landmark Detection via Cascaded Transformers |
Hui Li; Zidong Guo; Seon-Min Rhee; Seungju Han; Jae-Joon Han |
|
157b | PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer |
Zitong Yu; Yuming Shen; Jingang Shi; Hengshuang Zhao; Philip H.S. Torr; Guoying Zhao |
|
158b | GazeOnce: Real-Time Multi-Person Gaze Estimation | Mingfang Zhang; Yunfei Liu; Feng Lu | |
159b | Generalizing Gaze Estimation With Rotation Consistency |
Yiwei Bao; Yunfei Liu; Haofei Wang; Feng Lu |
|
160b | Face Relighting With Geometrically Consistent Shadows |
Andrew Hou; Michel Sarkis; Ning Bi; Yiying Tong; Xiaoming Liu |
|
161b | HairMapper: Removing Hair From Portraits Using GANs | Yiqian Wu; Yong-Liang Yang; Xiaogang Jin | |
162b | Learning To Restore 3D Face From In-the-Wild Degraded Images |
Zhenyu Zhang; Yanhao Ge; Ying Tai; Xiaoming Huang; Chengjie Wang; Hao Tang; Dongjin Huang; Zhifeng Xie |
|
Segmentation, Grouping and Shape Analysis | 163b | Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels |
Yuchao Wang; Haochen Wang; Yujun Shen; Jingjing Fei; Wei Li; Guoqiang Jin; Liwei Wu; Rui Zhao; Xinyi Le |
164b | Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation |
Yuyuan Liu; Yu Tian; Yuanhong Chen; Fengbei Liu; Vasileios Belagiannis; Gustavo Carneiro |
|
165b | ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation |
Lihe Yang; Wei Zhuo; Lei Qi; Yinghuan Shi; Yang Gao |
|
166b | Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement |
Beomyoung Kim; YoungJoon Yoo; Chae Eun Rhee; Junmo Kim |
|
167b | Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation |
Qi Chen; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie |
|
168b | Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation |
Tianfei Zhou; Meijie Zhang; Fang Zhao; Jianwu Li |
|
169b | Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation |
Lian Xu; Wanli Ouyang; Mohammed Bennamoun; Farid Boussaid; Dan Xu |
|
170b | Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast |
Ye Du; Zehua Fu; Qingjie Liu; Yunhong Wang |
|
171b | Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds |
Minhyun Lee; Dongseob Kim; Hyunjung Shim |
|
172b | Novel Class Discovery in Semantic Segmentation |
Yuyang Zhao; Zhun Zhong; Nicu Sebe; Gim Hee Lee |
|
173b | Pin the Memory: Learning To Generalize Semantic Segmentation |
Jin Kim; Jiyoung Lee; Jungin Park; Dongbo Min; Kwanghoon Sohn |
|
174b | ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation |
Shaohua Guo; Liang Liu; Zhenye Gan; Yabiao Wang; Wuhao Zhang; Chengjie Wang; Guannan Jiang; Wei Zhang; Ran Yi; Lizhuang Ma; Ke Xu |
|
175b | Incremental Learning in Semantic Segmentation From Image Labels |
Fabio Cermelli; Dario Fontanel; Antonio Tavera; Marco Ciccone; Barbara Caputo |
|
176b | Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers | Justin Lazarow; Weijian Xu; Zhuowen Tu | |
177b | SharpContour: A Contour-Based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation |
Chenming Zhu; Xuanye Zhang; Yanran Li; Liangdong Qiu; Kai Han; Xiaoguang Han |
|
178b | Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings |
Adrian Wolny; Qin Yu; Constantin Pape; Anna Kreshuk |
|
179b | Mask Transfiner for High-Quality Instance Segmentation |
Lei Ke; Martin Danelljan; Xia Li; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu |
|
180b | Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity |
Weiyao Wang; Matt Feiszli; Heng Wang; Jitendra Malik; Du Tran |
|
181b | Sparse Instance Activation for Real-Time Instance Segmentation |
Tianheng Cheng; Xinggang Wang; Shaoyu Chen; Wenqiang Zhang; Qian Zhang; Chang Huang; Zhaoxiang Zhang; Wenyu Liu |
|
182b | E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation | Tao Zhang; Shiqing Wei; Shunping Ji | |
183b | Hyperbolic Image Segmentation |
Mina Ghadimi Atigh; Julian Schoep; Erman Acar; Nanne van Noord; Pascal Mettes |
|
184b | SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability Information | Dasol Han; Jaewook Yoo; Dokwan Oh | |
185b | CDGNet: Class Distribution Guided Network for Human Parsing |
Kunliang Liu; Ouk Choi; Jianming Wang; Wonjun Hwang |
|
186b | CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation |
Jinheng Xie; Xianxu Hou; Kai Ye; Linlin Shen |
|
187b | Sparse Non-Local CRF | Olga Veksler; Yuri Boykov | |
188b | Detecting Camouflaged Object in Frequency Domain |
Yijie Zhong; Bo Li; Lv Tang; Senyun Kuang; Shuang Wu; Shouhong Ding |
|
189b | Progressive Minimal Path Method With Embedded CNN | Wei Liao | |
Document Analysis & Understanding | 190b | Open-Set Text Recognition via Character-Context Decoupling | Chang Liu; Chun Yang; Xu-Cheng Yin |
191b | Neural Collaborative Graph Machines for Table Structure Recognition |
Hao Liu; Xin Li; Bing Liu; Deqiang Jiang; Yinsong Liu; Bo Ren |
|
192b | Revisiting Document Image Dewarping by Grid Regularization |
Xiangwei Jiang; Rujiao Long; Nan Xue; Zhibo Yang; Cong Yao; Gui-Song Xia |
|
193b | Syntax-Aware Network for Handwritten Mathematical Expression Recognition |
Ye Yuan; Xiao Liu; Wondimu Dikubab; Hui Liu; Zhilong Ji; Zhongqin Wu; Xiang Bai |
|
194b | Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection |
Jingqun Tang; Wenqing Zhang; Hongye Liu; MingKun Yang; Bo Jiang; Guanglong Hu; Xiang Bai |
|
195b | Fourier Document Restoration for Robust Document Dewarping and Recognition |
Chuhui Xue; Zichen Tian; Fangneng Zhan; Shijian Lu; Song Bai |
|
196b | XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding |
Zhangxuan Gu; Changhua Meng; Ke Wang; Jun Lan; Weiqiang Wang; Ming Gu; Liqing Zhang |
|
197b | SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition |
Mingxin Huang; Yuliang Liu; Zhenghao Peng; Chongyu Liu; Dahua Lin; Shenggao Zhu; Nicholas Yuan; Kai Ding; Lianwen Jin |
|
198b | Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer |
Yair Kittenplon; Inbal Lavi; Sharon Fogel; Yarin Bar; R. Manmatha; Pietro Perona |
|
199b | TableFormer: Table Structure Understanding With Transformers |
Ahmed Nassar; Nikolaos Livathinos; Maksym Lysak; Peter Staar |
|
200b | Knowledge Mining With Scene Text for Fine-Grained Recognition |
Hao Wang; Junchao Liao; Tianheng Cheng; Zewen Gao; Hao Liu; Bo Ren; Xiang Bai; Wenyu Liu |
|
201b | PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents |
Brandon Smock; Rohith Pesala; Robin Abraham |
|
Recognition: Detection, Categorization, Retrieval | 202b | Focal and Global Knowledge Distillation for Detectors |
Zhendong Yang; Zhe Li; Xiaohu Jiang; Yuan Gong; Zehuan Yuan; Danpei Zhao; Chun Yuan |
203b | Speed Up Object Detection on Gigapixel-Level Images With Patch Arrangement |
Jiahao Fan; Huabin Liu; Wenjie Yang; John See; Aixin Zhang; Weiyao Lin |
|
204b | Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer |
Weixiang Hong; Jiangwei Lao; Wang Ren; Jian Wang; Jingdong Chen; Wei Chu |
|
205b | Learning With Neighbor Consistency for Noisy Labels |
Ahmet Iscen; Jack Valmadre; Anurag Arnab; Cordelia Schmid |
|
206b | Meta Convolutional Neural Networks for Single Domain Generalization |
Chaoqun Wan; Xu Shen; Yonggang Zhang; Zhiheng Yin; Xinmei Tian; Feng Gao; Jianqiang Huang; Xian-Sheng Hua |
|
207b | Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification |
Haowei Zhu; Wenjing Ke; Dong Li; Ji Liu; Lu Tian; Yi Shan |
|
208b | Geometry-Aware Guided Loss for Deep Crack Recognition |
Zhuangzhuang Chen; Jin Zhang; Zhuonan Lai; Jie Chen; Zun Liu; Jianqiang Li |
|
209b | Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way |
Qi Jia; Shuilian Yao; Yu Liu; Xin Fan; Risheng Liu; Zhongxuan Luo |
|
210b | Dynamic Sparse R-CNN |
Qinghang Hong; Fengming Liu; Dong Li; Ji Liu; Lu Tian; Yi Shan |
|
211b | Deep Hybrid Models for Out-of-Distribution Detection | Senqi Cao; Zhongfei Zhang | |
212b | AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification |
Hongyang Gu; Jianmin Li; Guangyuan Fu; Chifong Wong; Xinghao Chen; Jun Zhu |
|
213b | Feature Erasing and Diffusion Network for Occluded Person Re-Identification |
Zhikang Wang; Feng Zhu; Shixiang Tang; Rui Zhao; Lihuo He; Jiangning Song |
|
214b | Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss |
Emanuel Ben-Baruch; Tal Ridnik; Itamar Friedman; Avi Ben-Cohen; Nadav Zamir; Asaf Noy; Lihi Zelnik-Manor |
|
215b | BoxeR: Box-Attention for 2D and 3D Transformers |
Duy-Kien Nguyen; Jihong Ju; Olaf Booij; Martin R. Oswald; Cees G. M. Snoek |
|
216b | Multi-Label Iterated Learning for Image Classification With Label Ambiguity |
Sai Rajeswar; Pau Rodríguez; Soumye Singhal; David Vazquez; Aaron Courville |
|
217b | Vision Transformer With Deformable Attention |
Zhuofan Xia; Xuran Pan; Shiji Song; Li Erran Li; Gao Huang |
|
218b | MViTv2: Improved Multiscale Vision Transformers for Classification and Detection |
Yanghao Li; Chao-Yuan Wu; Haoqi Fan; Karttikeya Mangalam; Bo Xiong; Jitendra Malik; Christoph Feichtenhofer |
|
219b | Dense Learning Based Semi-Supervised Object Detection |
Binghui Chen; Pengyu Li; Xiang Chen; Biao Wang; Lei Zhang; Xian-Sheng Hua |
|
220b | R(Det)2: Randomized Decision Routing for Object Detection | Yali Li; Shengjin Wang | |
221b | GlideNet: Global, Local and Intrinsic Based Dense Embedding NETwork for Multi-Category Attributes Prediction |
Kareem Metwaly; Aerin Kim; Elliot Branson; Vishal Monga |
|
222b | Self-Supervised Equivariant Learning for Oriented Keypoint Detection | Jongmin Lee; Byungjin Kim; Minsu Cho | |
223b | Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification |
Jingzhou Chen; Peng Wang; Jian Liu; Yuntao Qian |
|
224b | Object Localization Under Single Coarse Point Supervision |
Xuehui Yu; Pengfei Chen; Di Wu; Najmul Hassan; Guorong Li; Junchi Yan; Humphrey Shi; Qixiang Ye; Zhenjun Han |
|
225b | Rethinking Visual Geo-Localization for Large-Scale Applications |
Gabriele Berton; Carlo Masone; Barbara Caputo |
|
226b | Whose Hands Are These? Hand Detection and Hand-Body Association in the Wild |
Supreeth Narasimhaswamy; Thanh Nguyen; Mingzhen Huang; Minh Hoai |
|
227b | Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification | Yanan Wang; Xuezhi Liang; Shengcai Liao | |
228b | Towards Unsupervised Domain Generalization |
Xingxuan Zhang; Linjun Zhou; Renzhe Xu; Peng Cui; Zheyan Shen; Haoxin Liu |
|
229b | ViM: Out-of-Distribution With Virtual-Logit Matching |
Haoqi Wang; Zhizhong Li; Litong Feng; Wayne Zhang |
|
230b | Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space |
Arnav Chavan; Zhiqiang Shen; Zhuang Liu; Zechun Liu; Kwang-Ting Cheng; Eric P. Xing |
|
231b | Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation |
Zechun Liu; Kwang-Ting Cheng; Dong Huang; Eric P. Xing; Zhiqiang Shen |
|
Vision & Language | 232b | Align and Prompt: Video-and-Language Pre-Training With Entity Prompts |
Dongxu Li; Junnan Li; Hongdong Li; Juan Carlos Niebles; Steven C.H. Hoi |
233b | Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation |
Zihan Ding; Tianrui Hui; Junshi Huang; Xiaoming Wei; Jizhong Han; Si Liu |
|
234b | Language As Queries for Referring Video Object Segmentation |
Jiannan Wu; Yi Jiang; Peize Sun; Zehuan Yuan; Ping Luo |
|
235b | End-to-End Referring Video Object Segmentation With Multimodal Transformers |
Adam Botach; Evgenii Zheltonozhskii; Chaim Baskin |
|
236b | Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation |
Dongming Wu; Xingping Dong; Ling Shao; Jianbing Shen |
|
237b | X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval |
Satya Krishna Gorti; Noël Vouitsis; Junwei Ma; Keyvan Golestan; Maksims Volkovs; Animesh Garg; Guangwei Yu |
|
238b | Video-Text Representation Learning via Differentiable Weak Temporal Alignment |
Dohwan Ko; Joonmyung Choi; Juyeon Ko; Shinyeong Noh; Kyoung-Woon On; Eun-Sol Kim; Hyunwoo J. Kim |
|
239b | MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions |
Mattia Soldan; Alejandro Pardo; Juan León Alcázar; Fabian Caba; Chen Zhao; Silvio Giancola; Bernard Ghanem |
|
240b | Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions |
Hongwei Xue; Tiankai Hang; Yanhong Zeng; Yuchong Sun; Bei Liu; Huan Yang; Jianlong Fu; Baining Guo |
|
241b | Measuring Compositional Consistency for Video Question Answering |
Mona Gandhi; Mustafa Omer Gul; Eva Prakash; Madeleine Grunde-McLaughlin; Ranjay Krishna; Maneesh Agrawala |
|
242b | SimVQA: Exploring Simulated Environments for Visual Question Answering |
Paola Cascante-Bonilla; Hui Wu; Letao Wang; Rogerio S. Feris; Vicente Ordonez |
|
243b | Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering |
Feng Gao; Qing Ping; Govind Thattai; Aishwarya Reganti; Ying Nian Wu; Prem Natarajan |
|
244b | SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering |
Vipul Gupta; Zhuowan Li; Adam Kortylewski; Chenyu Zhang; Yingwei Li; Alan Yuille |
|
245b | MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering |
Yang Ding; Jing Yu; Bang Liu; Yue Hu; Mingxin Cui; Qi Wu |
|
246b | Maintaining Reasoning Consistency in Compositional Visual Question Answering |
Chenchen Jing; Yunde Jia; Yuwei Wu; Xinyu Liu; Qi Wu |
|
247b | MLSLT: Towards Multilingual Sign Language Translation |
Aoxiong Yin; Zhou Zhao; Weike Jin; Meng Zhang; Xingshan Zeng; Xiaofei He |
|
248b | A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation |
Yutong Chen; Fangyun Wei; Xiao Sun; Zhirong Wu; Stephen Lin |
|
249b | C2SLR: Consistency-Enhanced Continuous Sign Language Recognition | Ronglai Zuo; Brian Mak | |
250b | Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production |
Ben Saunders; Necati Cihan Camgoz; Richard Bowden |
|
251b | Generating Diverse and Natural 3D Human Motions From Text |
Chuan Guo; Shihao Zou; Xinxin Zuo; Sen Wang; Wei Ji; Xingyu Li; Li Cheng |
|
252b | Sub-Word Level Lip Reading With Visual Attention |
K R Prajwal; Triantafyllos Afouras; Andrew Zisserman |
|
253b | Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale |
Ram Ramrakhya; Eric Undersander; Dhruv Batra; Abhishek Das |
|
254b | ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval |
Mengjun Cheng; Yipeng Sun; Longchao Wang; Xiongwei Zhu; Kun Yao; Jie Chen; Guoli Song; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang |
|
255b | Cross Modal Retrieval With Querybank Normalisation |
Simion-Vlad Bogolin; Ioana Croitoru; Hailin Jin; Yang Liu; Samuel Albanie |
|
256b | Prompt Distribution Learning |
Yuning Lu; Jianzhuang Liu; Yonggang Zhang; Yajing Liu; Xinmei Tian |
|
257b | VALHALLA: Visual Hallucination for Machine Translation |
Yi Li; Rameswar Panda; Yoon Kim; Chun-Fu (Richard) Chen; Rogerio S. Feris; David Cox; Nuno Vasconcelos |
|
258b | VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks | Yi-Lin Sung; Jaemin Cho; Mohit Bansal | |
259b | Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality |
Tristan Thrush; Ryan Jiang; Max Bartolo; Amanpreet Singh; Adina Williams; Douwe Kiela; Candace Ross |