Posters 6/21 PM

Presentation Schedule

All times are in Central time zone

Date: Tuesday, June 21, 2022   2:30PM – 5:00PM

Session Title Poster ID Title Authors
Video Analysis & Understanding 46b Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning

Juncheng Li; Junlin Xie; Long Qian; Linchao Zhu; Siliang Tang; Fei Wu; Yi Yang; Yueting Zhuang; Xin Eric Wang

  47b UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection

Ye Liu; Siyuan Li; Yang Wu; Chang-Wen Chen; Ying Shan; Xiaohu Qie

 

48b Future Transformer for Long-Term Action Anticipation

Dayoung Gong; Joonseok Lee; Manjin Kim; Seong Jong Ha; Minsu Cho

  49b MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing

Zhaofan Qiu; Ting Yao; Chong-Wah Ngo; Tao Mei

  50b Learning Pixel-Level Distinctions for Video Highlight Detection

Fanyue Wei; Biao Wang; Tiezheng Ge; Yuning Jiang; Wen Li; Lixin Duan

  51b DR.VIC: Decomposition and Reasoning for Video Individual Counting

Tao Han; Lei Bai; Junyu Gao; Qi Wang; Wanli Ouyang

  52b Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation

Yi Zhou; Hui Zhang; Hana Lee; Shuyang Sun; Pingjun Li; Yangguang Zhu; ByungIn Yoo; Xiaojuan Qi; Jae-Joon Han

  53b Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

Kailai Zhou; Yibo Wang; Tao Lv; Yunqian Li; Linsen Chen; Qiu Shen; Xun Cao

  54b Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training

Xiao Lu; Yihong Cao; Sheng Liu; Chengjiang Long; Zipei Chen; Xuanyu Zhou; Yimin Yang; Chunxia Xiao

  55b Coarse-To-Fine Feature Mining for Video Semantic Segmentation

Guolei Sun; Yun Liu; Henghui Ding; Thomas Probst; Luc Van Gool

  56b Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation

Zhaoyang Zeng; Yongsheng Luo; Zhenhua Liu; Fengyun Rao; Dian Li; Weidong Guo; Zhen Wen

  57b Object-Region Video Transformers

Roei Herzig; Elad Ben-Avraham; Karttikeya Mangalam; Amir Bar; Gal Chechik; Anna Rohrbach; Trevor Darrell; Amir Globerson

  58b Colar: Effective and Efficient Online Action Detection by Consulting Exemplars Le Yang; Junwei Han; Dingwen Zhang
  59b SimVP: Simpler Yet Better Video Prediction

Zhangyang Gao; Cheng Tan; Lirong Wu; Stan Z. Li

  60b Imposing Consistency for Optical Flow Estimation

Jisoo Jeong; Jamie Menjay Lin; Fatih Porikli; Nojun Kwak

  61b Stand-Alone Inter-Frame Attention in Video Models

Fuchen Long; Zhaofan Qiu; Yingwei Pan; Ting Yao; Jiebo Luo; Tao Mei

  62b Video Swin Transformer

Ze Liu; Jia Ning; Yue Cao; Yixuan Wei; Zheng Zhang; Stephen Lin; Han Hu

  63b Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection Hitesh Sapkota; Qi Yu
  64b Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Zengjie Song; Yuxi Wang; Junsong Fan; Tieniu Tan; Zhaoxiang Zhang

  65b Likert Scoring With Grade Decoupling for Long-Term Action Assessment Angchi Xu; Ling-An Zeng; Wei-Shi Zheng
  66b Complex Video Action Reasoning via Learnable Markov Logic Network Yang Jin; Linchao Zhu; Yadong Mu
  67b Learning From Temporal Gradient for Semi-Supervised Action Recognition

Junfei Xiao; Longlong Jing; Lin Zhang; Ju He; Qi She; Zongwei Zhou; Alan Yuille; Yingwei Li

  68b Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction Jiafan Zhuang; Zilei Wang; Yuan Gao
  69b Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation Linjiang Huang; Liang Wang; Hongsheng Li
  70b Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos

Shaowei Liu; Subarna Tripathi; Somdeb Majumdar; Xiaolong Wang

  71b Human Hands As Probes for Interactive Object Understanding

Mohit Goyal; Sahil Modi; Rishabh Goyal; Saurabh Gupta

  72b LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition Dan Liu; Libo Zhang; Yanjun Wu
  73b Object-Aware Video-Language Pre-Training for Retrieval

Jinpeng Wang; Yixiao Ge; Guanyu Cai; Rui Yan; Xudong Lin; Ying Shan; Xiaohu Qie; Mike Zheng Shou

  74b Fast and Unsupervised Action Boundary Detection for Action Segmentation

Zexing Du; Xue Wang; Guoqing Zhou; Qing Wang

  75b Multiview Transformers for Video Recognition

Shen Yan; Xuehan Xiong; Anurag Arnab; Zhichao Lu; Mi Zhang; Chen Sun; Cordelia Schmid

  76b Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos Yuhan Shen; Ehsan Elhamifar
  77b Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection

Jiaqi Tang; Zhaoyang Liu; Chen Qian; Wayne Wu; Limin Wang

  78b Comparing Correspondences: Video Prediction With Correspondence-Wise Losses

Daniel Geng; Max Hamilton; Andrew Owens

Image & Video Synthesis and Generation 79b Sound-Guided Semantic Image Manipulation

Seung Hyun Lee; Wonseok Roh; Wonmin Byeon; Sang Ho Yoon; Chanyoung Kim; Jinkyu Kim; Sangpil Kim

  80b Expressive Talking Head Generation With Granular Audio-Visual Control

Borong Liang; Yan Pan; Zhizhi Guo; Hang Zhou; Zhibin Hong; Xiaoguang Han; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang

  81b Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Fa-Ting Hong; Longhao Zhang; Li Shen; Dan Xu

  82b Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans From a Single Camera

Jae Shin Yoon; Duygu Ceylan; Tuanfeng Y. Wang; Jingwan Lu; Jimei Yang; Zhixin Shu; Hyun Soo Park

  83b Audio-Driven Neural Gesture Reenactment With Video Motion Graphs

Yang Zhou; Jimei Yang; Dingzeyu Li; Jun Saito; Deepali Aneja; Evangelos Kalogerakis

  84b Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data Junfeng Lyu; Zhibo Wang; Feng Xu
  85b Weakly Supervised High-Fidelity Clothing Model Generation

Ruili Feng; Cheng Ma; Chengji Shen; Xin Gao; Zhenjiang Liu; Xiaobo Li; Kairi Ou; Deli Zhao; Zheng-Jun Zha

  86b TemporalUV: Capturing Loose Clothing With Temporally Coherent UV Coordinates

You Xie; Huiqi Mao; Angela Yao; Nils Thuerey

  87b Full-Range Virtual Try-On With Recurrent Tri-Level Transform Han Yang; Xinrui Yu; Ziwei Liu
  88b Style-Based Global Appearance Flow for Virtual Try-On Sen He; Yi-Zhe Song; Tao Xiang
  89b Dressing in the Wild by Watching Dance Videos

Xin Dong; Fuwei Zhao; Zhenyu Xie; Xijin Zhang; Daniel K. Du; Min Zheng; Xiang Long; Xiaodan Liang; Jianchao Yang

  90b A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres

Jinwoo Kim; Heeseok Oh; Seongjean Kim; Hoseok Tong; Sanghoon Lee

  91b Unpaired Cartoon Image Synthesis via Gated Cycle Mapping

Yifang Men; Yuan Yao; Miaomiao Cui; Zhouhui Lian; Xuansong Xie; Xian-Sheng Hua

  92b DLFormer: Discrete Latent Transformer for Video Inpainting

Jingjing Ren; Qingqing Zheng; Yuanyuan Zhao; Xuemiao Xu; Chen Li

  93b ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation Duolikun Danier; Fan Zhang; David Bull
  94b Video Frame Interpolation With Transformer

Liying Lu; Ruizheng Wu; Huaijia Lin; Jiangbo Lu; Jiaya Jia

  95b Long-Term Video Frame Interpolation via Feature Propagation Dawit Mureja Argaw; In So Kweon
  96b Many-to-Many Splatting for Efficient Video Frame Interpolation

Ping Hu; Simon Niklaus; Stan Sclaroff; Kate Saenko

  97b Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image Xuanchi Ren; Xiaolong Wang
  98b Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning

Mengshun Hu; Kui Jiang; Liang Liao; Jing Xiao; Junjun Jiang; Zheng Wang

  99b Playable Environments: Video Manipulation in Space and Time

Willi Menapace; Stéphane Lathuilière; Aliaksandr Siarohin; Christian Theobalt; Sergey Tulyakov; Vladislav Golyanik; Elisa Ricci

  100b Event-Based Video Reconstruction via Potential-Assisted Spiking Neural Network

Lin Zhu; Xiao Wang; Yi Chang; Jianing Li; Tiejun Huang; Yonghong Tian

  101b Modular Action Concept Grounding in Semantic Video Prediction

Wei Yu; Wenxin Chen; Songheng Yin; Steve Easterbrook; Animesh Garg

  102b Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

Ligong Han; Jian Ren; Hsin-Ying Lee; Francesco Barbieri; Kyle Olszewski; Shervin Minaee; Dimitris Metaxas; Sergey Tulyakov

  103b StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2

Ivan Skorokhodov; Sergey Tulyakov; Mohamed Elhoseiny

  104b Structure-Aware Motion Transfer With Deformable Anchor Model

Jiale Tao; Biao Wang; Borun Xu; Tiezheng Ge; Yuning Jiang; Wen Li; Lixin Duan

  105b Image Animation With Perturbed Masks Yoav Shalev; Lior Wolf
  106b Thin-Plate Spline Motion Model for Image Animation Jian Zhao; Hui Zhang
  107b Controllable Animation of Fluid Elements in Still Images Aniruddha Mahapatra; Kuldeep Kulkarni
  108b Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects

Atsuhiro Noguchi; Umar Iqbal; Jonathan Tremblay; Tatsuya Harada; Orazio Gallo

  109b Geometric Structure Preserving Warp for Natural Image Stitching

Peng Du; Jifeng Ning; Jiguang Cui; Shaoli Huang; Xinchao Wang; Jiaxin Wang

  110b Few-Shot Incremental Learning for Label-to-Image Translation

Pei Chen; Yangkang Zhang; Zejian Li; Lingyun Sun

  111b Exemplar-Based Pattern Synthesis With Implicit Periodic Field Network

Haiwei Chen; Jiayi Liu; Weikai Chen; Shichen Liu; Yajie Zhao

  112b SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks

Xianling Zhang; Nathan Tseng; Ameerah Syed; Rohan Bhasin; Nikita Jaipuria

  113b SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage

Jiahao Yu; Li Chen; Mingrui Zhang; Mading Li

  114b PILC: Practical Image Lossless Compression With an End-to-End GPU Oriented Neural Framework

Ning Kang; Shanzhao Qiu; Shifeng Zhang; Zhenguo Li; Shu-Tao Xia

  115b Kubric: A Scalable Dataset Generator

Klaus Greff; Francois Belletti; Lucas Beyer; Carl Doersch; Yilun Du; Daniel Duckworth; David J. Fleet; Dan Gnanapragasam; Florian Golemo; Charles Herrmann; Thomas Kipf; Abhijit Kundu; Dmitry Lagun; Issam Laradji; Hsueh-Ti (Derek) Liu; Henning Meyer; Yishu Miao; Derek Nowrouzezahrai; Cengiz Oztireli; Etienne Pot; Noha Radwan; Daniel Rebain; Sara Sabour; Mehdi S. M. Sajjadi; Matan Sela; Vincent Sitzmann; Austin Stone; Deqing Sun; Suhani Vora; Ziyu Wang; Tianhao Wu; Kwang Moo Yi; Fangcheng Zhong; Andrea Tagliasacchi

3D From Single Images 116b 360MonoDepth: High-Resolution 360° Monocular Depth Estimation

Manuel Rey-Area; Mingze Yuan; Christian Richardt

  117b Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction

Kalyan Vasudev Alwala; Abhinav Gupta; Shubham Tulsiani

  118b DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation

Tuo Cao; Fei Luo; Yanping Fu; Wenxiao Zhang; Shengjie Zheng; Chunxia Xiao

  119b MonoGround: Detecting Monocular 3D Objects From the Ground Zequn Qin; Xi Li
  120b 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow

Xin Wen; Junsheng Zhou; Yu-Shen Liu; Hua Su; Zhen Dong; Zhizhong Han

  121b Toward Practical Monocular Indoor Depth Estimation

Cho-Ying Wu; Jialiang Wang; Michael Hall; Ulrich Neumann; Shuochen Su

  122b Focal Length and Object Pose Estimation via Render and Compare

Georgy Ponimatkin; Yann Labbé; Bryan Russell; Mathieu Aubry; Josef Sivic

  123b CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

Can Wang; Menglei Chai; Mingming He; Dongdong Chen; Jing Liao

  124b Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images

Heming Zhu; Lingteng Qiu; Yuda Qiu; Xiaoguang Han

  125b Layered Depth Refinement With Mask Guidance

Soo Ye Kim; Jianming Zhang; Simon Niklaus; Yifei Fan; Simon Chen; Zhe Lin; Munchurl Kim

  126b HEAT: Holistic Edge Attention Transformer for Structured Reconstruction

Jiacheng Chen; Yiming Qian; Yasutaka Furukawa

  127b BARC: Learning To Regress 3D Dog Shape From Images by Exploiting Breed Information

Nadine Rüegg; Silvia Zuffi; Konrad Schindler; Michael J. Black

  128b Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving Peixuan Li; Jieyu Jin
  129b What’s in Your Hands? 3D Reconstruction of Generic Objects in Hands Yufei Ye; Abhinav Gupta; Shubham Tulsiani
  130b 3D Moments From Near-Duplicate Photos

Qianqian Wang; Zhengqi Li; David Salesin; Noah Snavely; Brian Curless; Janne Kontkanen

  131b Neural Window Fully-Connected CRFs for Monocular Depth Estimation

Weihao Yuan; Xiaodong Gu; Zuozhuo Dai; Siyu Zhu; Ping Tan

  132b PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors

Jérome Revaud; Vincent Leroy; Philippe Weinzaepfel; Boris Chidlovskii

  133b CroMo: Cross-Modal Learning for Monocular Depth Estimation

Yannick Verdié; Jifei Song; Barnabé Mas; Benjamin Busam; Ales̆ Leonardis; Steven McDonagh

  134b f-SfT: Shape-From-Template With a Physics-Based Deformation Model

Navami Kairanda; Edith Tretschk; Mohamed Elgharib; Christian Theobalt; Vladislav Golyanik

  135b Human-Aware Object Placement for Visual Environment Reconstruction

Hongwei Yi; Chun-Hao P. Huang; Dimitrios Tzionas; Muhammed Kocabas; Mohamed Hassan; Siyu Tang; Justus Thies; Michael J. Black

  136b AutoRF: Learning 3D Object Radiance Fields From Single View Observations

Norman Müller; Andrea Simonelli; Lorenzo Porzi; Samuel Rota Bulò; Matthias Nießner; Peter Kontschieder

  137b Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation

Shengqu Cai; Anton Obukhov; Dengxin Dai; Luc Van Gool

  138b MonoScene: Monocular 3D Semantic Scene Completion Anh-Quan Cao; Raoul de Charette
  139b GenDR: A Generalized Differentiable Renderer

Felix Petersen; Bastian Goldluecke; Christian Borgelt; Oliver Deussen

  140b MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer

Kuan-Chih Huang; Tsung-Han Wu; Hung-Ting Su; Winston H. Hsu

  141b ROCA: Robust CAD Model Retrieval and Alignment From a Single Image Can Gümeli; Angela Dai; Matthias Nießner
Face & Gestures 142b HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network

Chang Yu; Xiangyu Zhu; Xiaomei Zhang; Zidu Wang; Zhaoxiang Zhang; Zhen Lei

  143b Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC

Xiang An; Jiankang Deng; Jia Guo; Ziyong Feng; XuHan Zhu; Jing Yang; Tongliang Liu

  144b Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

Jiahao Xia; Weiwei Qu; Wenjian Huang; Jianguo Zhang; Xi Wang; Min Xu

  145b Enhancing Face Recognition With Self-Supervised 3D Reconstruction

Mingjie He; Jie Zhang; Shiguang Shan; Xilin Chen

  146b Learning To Learn Across Diverse Data Biases in Deep Face Recognition

Chang Liu; Xiang Yu; Yi-Hsuan Tsai; Masoud Faraki; Ramin Moslemi; Manmohan Chandraker; Yun Fu

  147b An Efficient Training Approach for Very Large Scale Face Recognition

Kai Wang; Shuo Wang; Panpan Zhang; Zhipeng Zhou; Zheng Zhu; Xiaobo Wang; Xiaojiang Peng; Baigui Sun; Hao Li; Yang You

  148b MogFace: Towards a Deeper Appreciation on Face Detection

Yang Liu; Fei Wang; Jiankang Deng; Zhipeng Zhou; Baigui Sun; Hao Li

  149b Exploring Frequency Adversarial Attacks for Face Forgery Detection

Shuai Jia; Chao Ma; Taiping Yao; Bangjie Yin; Shouhong Ding; Xiaokang Yang

  150b End-to-End Reconstruction-Classification Learning for Face Forgery Detection

Junyi Cao; Chao Ma; Taiping Yao; Shen Chen; Shouhong Ding; Xiaokang Yang

  151b Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing

Zhuo Wang; Zezheng Wang; Zitong Yu; Weihong Deng; Jiahong Li; Tingting Gao; Zhongyuan Wang

  152b Privacy-Preserving Online AutoML for Domain-Specific Face Detection

Chenqian Yan; Yuge Zhang; Quanlu Zhang; Yaming Yang; Xinyang Jiang; Yuqing Yang; Baoyuan Wang

  153b Simulated Adversarial Testing of Face Recognition Models

Nataniel Ruiz; Adam Kortylewski; Weichao Qiu; Cihang Xie; Sarah Adel Bargal; Alan Yuille; Stan Sclaroff

  154b Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing

Qingping Zheng; Jiankang Deng; Zheng Zhu; Ying Li; Stefanos Zafeiriou

  155b Towards Semi-Supervised Deep Facial Expression Recognition With an Adaptive Confidence Margin

Hangyu Li; Nannan Wang; Xi Yang; Xiaoyu Wang; Xinbo Gao

  156b Towards Accurate Facial Landmark Detection via Cascaded Transformers

Hui Li; Zidong Guo; Seon-Min Rhee; Seungju Han; Jae-Joon Han

  157b PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer

Zitong Yu; Yuming Shen; Jingang Shi; Hengshuang Zhao; Philip H.S. Torr; Guoying Zhao

  158b GazeOnce: Real-Time Multi-Person Gaze Estimation Mingfang Zhang; Yunfei Liu; Feng Lu
  159b Generalizing Gaze Estimation With Rotation Consistency

Yiwei Bao; Yunfei Liu; Haofei Wang; Feng Lu

  160b Face Relighting With Geometrically Consistent Shadows

Andrew Hou; Michel Sarkis; Ning Bi; Yiying Tong; Xiaoming Liu

  161b HairMapper: Removing Hair From Portraits Using GANs Yiqian Wu; Yong-Liang Yang; Xiaogang Jin
  162b Learning To Restore 3D Face From In-the-Wild Degraded Images

Zhenyu Zhang; Yanhao Ge; Ying Tai; Xiaoming Huang; Chengjie Wang; Hao Tang; Dongjin Huang; Zhifeng Xie

Segmentation, Grouping and Shape Analysis 163b Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Yuchao Wang; Haochen Wang; Yujun Shen; Jingjing Fei; Wei Li; Guoqiang Jin; Liwei Wu; Rui Zhao; Xinyi Le

  164b Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation

Yuyuan Liu; Yu Tian; Yuanhong Chen; Fengbei Liu; Vasileios Belagiannis; Gustavo Carneiro

  165b ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation

Lihe Yang; Wei Zhuo; Lei Qi; Yinghuan Shi; Yang Gao

  166b Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement

Beomyoung Kim; YoungJoon Yoo; Chae Eun Rhee; Junmo Kim

  167b Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation

Qi Chen; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie

  168b Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

Tianfei Zhou; Meijie Zhang; Fang Zhao; Jianwu Li

  169b Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

Lian Xu; Wanli Ouyang; Mohammed Bennamoun; Farid Boussaid; Dan Xu

  170b Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

Ye Du; Zehua Fu; Qingjie Liu; Yunhong Wang

  171b Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds

Minhyun Lee; Dongseob Kim; Hyunjung Shim

  172b Novel Class Discovery in Semantic Segmentation

Yuyang Zhao; Zhun Zhong; Nicu Sebe; Gim Hee Lee

  173b Pin the Memory: Learning To Generalize Semantic Segmentation

Jin Kim; Jiyoung Lee; Jungin Park; Dongbo Min; Kwanghoon Sohn

  174b ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation

Shaohua Guo; Liang Liu; Zhenye Gan; Yabiao Wang; Wuhao Zhang; Chengjie Wang; Guannan Jiang; Wei Zhang; Ran Yi; Lizhuang Ma; Ke Xu

  175b Incremental Learning in Semantic Segmentation From Image Labels

Fabio Cermelli; Dario Fontanel; Antonio Tavera; Marco Ciccone; Barbara Caputo

  176b Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers Justin Lazarow; Weijian Xu; Zhuowen Tu
  177b SharpContour: A Contour-Based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation

Chenming Zhu; Xuanye Zhang; Yanran Li; Liangdong Qiu; Kai Han; Xiaoguang Han

  178b Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings

Adrian Wolny; Qin Yu; Constantin Pape; Anna Kreshuk

  179b Mask Transfiner for High-Quality Instance Segmentation

Lei Ke; Martin Danelljan; Xia Li; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu

  180b Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

Weiyao Wang; Matt Feiszli; Heng Wang; Jitendra Malik; Du Tran

  181b Sparse Instance Activation for Real-Time Instance Segmentation

Tianheng Cheng; Xinggang Wang; Shaoyu Chen; Wenqiang Zhang; Qian Zhang; Chang Huang; Zhaoxiang Zhang; Wenyu Liu

  182b E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation Tao Zhang; Shiqing Wei; Shunping Ji
  183b Hyperbolic Image Segmentation

Mina Ghadimi Atigh; Julian Schoep; Erman Acar; Nanne van Noord; Pascal Mettes

  184b SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability Information Dasol Han; Jaewook Yoo; Dokwan Oh
  185b CDGNet: Class Distribution Guided Network for Human Parsing

Kunliang Liu; Ouk Choi; Jianming Wang; Wonjun Hwang

  186b CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation

Jinheng Xie; Xianxu Hou; Kai Ye; Linlin Shen

  187b Sparse Non-Local CRF Olga Veksler; Yuri Boykov
  188b Detecting Camouflaged Object in Frequency Domain

Yijie Zhong; Bo Li; Lv Tang; Senyun Kuang; Shuang Wu; Shouhong Ding

  189b Progressive Minimal Path Method With Embedded CNN Wei Liao
Document Analysis & Understanding 190b Open-Set Text Recognition via Character-Context Decoupling Chang Liu; Chun Yang; Xu-Cheng Yin
  191b Neural Collaborative Graph Machines for Table Structure Recognition

Hao Liu; Xin Li; Bing Liu; Deqiang Jiang; Yinsong Liu; Bo Ren

  192b Revisiting Document Image Dewarping by Grid Regularization

Xiangwei Jiang; Rujiao Long; Nan Xue; Zhibo Yang; Cong Yao; Gui-Song Xia

  193b Syntax-Aware Network for Handwritten Mathematical Expression Recognition

Ye Yuan; Xiao Liu; Wondimu Dikubab; Hui Liu; Zhilong Ji; Zhongqin Wu; Xiang Bai

  194b Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

Jingqun Tang; Wenqing Zhang; Hongye Liu; MingKun Yang; Bo Jiang; Guanglong Hu; Xiang Bai

  195b Fourier Document Restoration for Robust Document Dewarping and Recognition

Chuhui Xue; Zichen Tian; Fangneng Zhan; Shijian Lu; Song Bai

  196b XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

Zhangxuan Gu; Changhua Meng; Ke Wang; Jun Lan; Weiqiang Wang; Ming Gu; Liqing Zhang

  197b SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition

Mingxin Huang; Yuliang Liu; Zhenghao Peng; Chongyu Liu; Dahua Lin; Shenggao Zhu; Nicholas Yuan; Kai Ding; Lianwen Jin

  198b Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer

Yair Kittenplon; Inbal Lavi; Sharon Fogel; Yarin Bar; R. Manmatha; Pietro Perona

  199b TableFormer: Table Structure Understanding With Transformers

Ahmed Nassar; Nikolaos Livathinos; Maksym Lysak; Peter Staar

  200b Knowledge Mining With Scene Text for Fine-Grained Recognition

Hao Wang; Junchao Liao; Tianheng Cheng; Zewen Gao; Hao Liu; Bo Ren; Xiang Bai; Wenyu Liu

  201b PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents

Brandon Smock; Rohith Pesala; Robin Abraham

Recognition: Detection, Categorization, Retrieval 202b Focal and Global Knowledge Distillation for Detectors

Zhendong Yang; Zhe Li; Xiaohu Jiang; Yuan Gong; Zehuan Yuan; Danpei Zhao; Chun Yuan

  203b Speed Up Object Detection on Gigapixel-Level Images With Patch Arrangement

Jiahao Fan; Huabin Liu; Wenjie Yang; John See; Aixin Zhang; Weiyao Lin

  204b Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer

Weixiang Hong; Jiangwei Lao; Wang Ren; Jian Wang; Jingdong Chen; Wei Chu

  205b Learning With Neighbor Consistency for Noisy Labels

Ahmet Iscen; Jack Valmadre; Anurag Arnab; Cordelia Schmid

  206b Meta Convolutional Neural Networks for Single Domain Generalization

Chaoqun Wan; Xu Shen; Yonggang Zhang; Zhiheng Yin; Xinmei Tian; Feng Gao; Jianqiang Huang; Xian-Sheng Hua

  207b Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification

Haowei Zhu; Wenjing Ke; Dong Li; Ji Liu; Lu Tian; Yi Shan

  208b Geometry-Aware Guided Loss for Deep Crack Recognition

Zhuangzhuang Chen; Jin Zhang; Zhuonan Lai; Jie Chen; Zun Liu; Jianqiang Li

  209b Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way

Qi Jia; Shuilian Yao; Yu Liu; Xin Fan; Risheng Liu; Zhongxuan Luo

  210b Dynamic Sparse R-CNN

Qinghang Hong; Fengming Liu; Dong Li; Ji Liu; Lu Tian; Yi Shan

  211b Deep Hybrid Models for Out-of-Distribution Detection Senqi Cao; Zhongfei Zhang
  212b AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification

Hongyang Gu; Jianmin Li; Guangyuan Fu; Chifong Wong; Xinghao Chen; Jun Zhu

  213b Feature Erasing and Diffusion Network for Occluded Person Re-Identification

Zhikang Wang; Feng Zhu; Shixiang Tang; Rui Zhao; Lihuo He; Jiangning Song

  214b Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss

Emanuel Ben-Baruch; Tal Ridnik; Itamar Friedman; Avi Ben-Cohen; Nadav Zamir; Asaf Noy; Lihi Zelnik-Manor

  215b BoxeR: Box-Attention for 2D and 3D Transformers

Duy-Kien Nguyen; Jihong Ju; Olaf Booij; Martin R. Oswald; Cees G. M. Snoek

  216b Multi-Label Iterated Learning for Image Classification With Label Ambiguity

Sai Rajeswar; Pau Rodríguez; Soumye Singhal; David Vazquez; Aaron Courville

  217b Vision Transformer With Deformable Attention

Zhuofan Xia; Xuran Pan; Shiji Song; Li Erran Li; Gao Huang

  218b MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

Yanghao Li; Chao-Yuan Wu; Haoqi Fan; Karttikeya Mangalam; Bo Xiong; Jitendra Malik; Christoph Feichtenhofer

  219b Dense Learning Based Semi-Supervised Object Detection

Binghui Chen; Pengyu Li; Xiang Chen; Biao Wang; Lei Zhang; Xian-Sheng Hua

  220b R(Det)2: Randomized Decision Routing for Object Detection Yali Li; Shengjin Wang
  221b GlideNet: Global, Local and Intrinsic Based Dense Embedding NETwork for Multi-Category Attributes Prediction

Kareem Metwaly; Aerin Kim; Elliot Branson; Vishal Monga

  222b Self-Supervised Equivariant Learning for Oriented Keypoint Detection Jongmin Lee; Byungjin Kim; Minsu Cho
  223b Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification

Jingzhou Chen; Peng Wang; Jian Liu; Yuntao Qian

  224b Object Localization Under Single Coarse Point Supervision

Xuehui Yu; Pengfei Chen; Di Wu; Najmul Hassan; Guorong Li; Junchi Yan; Humphrey Shi; Qixiang Ye; Zhenjun Han

  225b Rethinking Visual Geo-Localization for Large-Scale Applications

Gabriele Berton; Carlo Masone; Barbara Caputo

  226b Whose Hands Are These? Hand Detection and Hand-Body Association in the Wild

Supreeth Narasimhaswamy; Thanh Nguyen; Mingzhen Huang; Minh Hoai

  227b Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification Yanan Wang; Xuezhi Liang; Shengcai Liao
  228b Towards Unsupervised Domain Generalization

Xingxuan Zhang; Linjun Zhou; Renzhe Xu; Peng Cui; Zheyan Shen; Haoxin Liu

  229b ViM: Out-of-Distribution With Virtual-Logit Matching

Haoqi Wang; Zhizhong Li; Litong Feng; Wayne Zhang

  230b Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

Arnav Chavan; Zhiqiang Shen; Zhuang Liu; Zechun Liu; Kwang-Ting Cheng; Eric P. Xing

  231b Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

Zechun Liu; Kwang-Ting Cheng; Dong Huang; Eric P. Xing; Zhiqiang Shen

Vision & Language 232b Align and Prompt: Video-and-Language Pre-Training With Entity Prompts

Dongxu Li; Junnan Li; Hongdong Li; Juan Carlos Niebles; Steven C.H. Hoi

  233b Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

Zihan Ding; Tianrui Hui; Junshi Huang; Xiaoming Wei; Jizhong Han; Si Liu

  234b Language As Queries for Referring Video Object Segmentation

Jiannan Wu; Yi Jiang; Peize Sun; Zehuan Yuan; Ping Luo

  235b End-to-End Referring Video Object Segmentation With Multimodal Transformers

Adam Botach; Evgenii Zheltonozhskii; Chaim Baskin

  236b Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation

Dongming Wu; Xingping Dong; Ling Shao; Jianbing Shen

  237b X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval

Satya Krishna Gorti; Noël Vouitsis; Junwei Ma; Keyvan Golestan; Maksims Volkovs; Animesh Garg; Guangwei Yu

  238b Video-Text Representation Learning via Differentiable Weak Temporal Alignment

Dohwan Ko; Joonmyung Choi; Juyeon Ko; Shinyeong Noh; Kyoung-Woon On; Eun-Sol Kim; Hyunwoo J. Kim

  239b MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions

Mattia Soldan; Alejandro Pardo; Juan León Alcázar; Fabian Caba; Chen Zhao; Silvio Giancola; Bernard Ghanem

  240b Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions

Hongwei Xue; Tiankai Hang; Yanhong Zeng; Yuchong Sun; Bei Liu; Huan Yang; Jianlong Fu; Baining Guo

  241b Measuring Compositional Consistency for Video Question Answering

Mona Gandhi; Mustafa Omer Gul; Eva Prakash; Madeleine Grunde-McLaughlin; Ranjay Krishna; Maneesh Agrawala

  242b SimVQA: Exploring Simulated Environments for Visual Question Answering

Paola Cascante-Bonilla; Hui Wu; Letao Wang; Rogerio S. Feris; Vicente Ordonez

  243b Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering

Feng Gao; Qing Ping; Govind Thattai; Aishwarya Reganti; Ying Nian Wu; Prem Natarajan

  244b SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

Vipul Gupta; Zhuowan Li; Adam Kortylewski; Chenyu Zhang; Yingwei Li; Alan Yuille

  245b MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering

Yang Ding; Jing Yu; Bang Liu; Yue Hu; Mingxin Cui; Qi Wu

  246b Maintaining Reasoning Consistency in Compositional Visual Question Answering

Chenchen Jing; Yunde Jia; Yuwei Wu; Xinyu Liu; Qi Wu

  247b MLSLT: Towards Multilingual Sign Language Translation

Aoxiong Yin; Zhou Zhao; Weike Jin; Meng Zhang; Xingshan Zeng; Xiaofei He

  248b A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation

Yutong Chen; Fangyun Wei; Xiao Sun; Zhirong Wu; Stephen Lin

  249b C2SLR: Consistency-Enhanced Continuous Sign Language Recognition Ronglai Zuo; Brian Mak
  250b Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

Ben Saunders; Necati Cihan Camgoz; Richard Bowden

  251b Generating Diverse and Natural 3D Human Motions From Text

Chuan Guo; Shihao Zou; Xinxin Zuo; Sen Wang; Wei Ji; Xingyu Li; Li Cheng

  252b Sub-Word Level Lip Reading With Visual Attention

K R Prajwal; Triantafyllos Afouras; Andrew Zisserman

  253b Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale

Ram Ramrakhya; Eric Undersander; Dhruv Batra; Abhishek Das

  254b ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

Mengjun Cheng; Yipeng Sun; Longchao Wang; Xiongwei Zhu; Kun Yao; Jie Chen; Guoli Song; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang

  255b Cross Modal Retrieval With Querybank Normalisation

Simion-Vlad Bogolin; Ioana Croitoru; Hailin Jin; Yang Liu; Samuel Albanie

  256b Prompt Distribution Learning

Yuning Lu; Jianzhuang Liu; Yonggang Zhang; Yajing Liu; Xinmei Tian

  257b VALHALLA: Visual Hallucination for Machine Translation

Yi Li; Rameswar Panda; Yoon Kim; Chun-Fu (Richard) Chen; Rogerio S. Feris; David Cox; Nuno Vasconcelos

  258b VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks Yi-Lin Sung; Jaemin Cho; Mohit Bansal
  259b Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Tristan Thrush; Ryan Jiang; Max Bartolo; Amanpreet Singh; Adina Williams; Douwe Kiela; Candace Ross