If you find this helpful, we would appreciate a star! Note: Oral papers may appear twice.
ID | Type | Title |
---|---|---|
1 | Workshop | Recovering 6D Object Pose |
2 | Workshop | Half-century of Structure-from-Motion (50SfM) |
3 | Workshop | Dense Neural SLAM Workshop (NeuSLAM) |
4 | Workshop | Geometry in the Large Model Era |
5 | Workshop | Workshop on Spatial AI |
6 | Workshop | Transparent & Reflective objects In the wild Challenges (TRICKY) |
7 | Workshop | Wild3D: 3D Modeling, Reconstruction, and Generation in the Wild |
8 | Workshop | AI3DCC: The Second Workshop of AI for 3D Content Creation |
9 | Workshop | 3D Vision and Modeling Challenges in eCommerce |
10 | Workshop | FashionAI: Exploring the intersection of Fashion and Artificial Intelligence for reshaping the Industry |
11 | Workshop | CV For Ecology Workshop (CV4E) |
12 | Workshop | 9th Workshop on Computer Vision in Plant Phenotyping and Agriculture (CVPPA) |
13 | Workshop | 3rd edition of Computer Vision for Metaverse (CV4Metaverse) |
14 | Workshop | The First Workshop on: Computer Vision for Videogames (CV2) |
15 | Workshop | 2nd Workshop on Vision-based Industrial Inspection (VISION) |
16 | Workshop | AI for Visual Arts Workshop and Challenges (AI4VA) |
17 | Workshop | Vision for Art (VISART) VII Workshop |
18 | Workshop | AI4DH: Artificial Intelligence for Digital Humanities |
19 | Workshop | The Third ROAD Workshop & Challenge: Event Detection for Situation Awareness in Autonomous Driving |
20 | Workshop | Vision-Centric Autonomous Driving (VCAD) Workshop |
21 | Workshop | ROAM: Robust, Out-of-Distribution And Multi-Modal models for Autonomous Driving |
22 | Workshop | ACVR2024 - 12th International Workshop on Assistive Computer Vision and Robotics |
23 | Workshop | Autonomous Vehicles meet Multimodal Foundation Models |
24 | Workshop | Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving: Towards Next-Generation Solutions |
25 | Workshop | Multi-Agent Autonomous Systems Meet Foundation Models: Challenges and Futures |
26 | Workshop | Visual object tracking and segmentation challenge VOTS2024 workshop |
27 | Workshop | 5th Advances in Image Manipulation (AIM) Workshop and Challenges |
28 | Workshop | Instance-Level Recognition |
29 | Workshop | Large-scale Video Object Segmentation |
30 | Workshop | The Second Perception Test Challenge |
31 | Workshop | Efficient Deep Learning for Foundation Models |
32 | Workshop | Computational Aspects of Deep Learning |
33 | Workshop | Foundation Models for 3D Humans |
34 | Workshop | Workshop on Artificial Social Intelligence |
35 | Workshop | T-CAP - Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications |
36 | Workshop | Observing and Understanding Hands in Action |
37 | Workshop | 7th Workshop and Competition on Affective Behavior Analysis in-the-wild |
38 | Workshop | The First Workshop on Expressive Encounters: Co-speech gestures across cultures in the wild |
39 | Workshop | BioImage Computing (BIC) |
40 | Workshop | Human-inspired Computer Vision |
41 | Workshop | Knowledge in Generative Models |
42 | Workshop | Self-Supervised Learning - What is next? |
43 | Workshop | Traditional Computer Vision in the Age of Deep Learning (TradiCV) |
44 | Workshop | Uncertainty Quantification for Computer Vision |
45 | Workshop | Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo) |
46 | Workshop | Beyond Euclidean: Hyperbolic and Hyperspherical Learning for Computer Vision |
47 | Workshop | Workshop on Unlearning and Model Editing (U&ME'24) |
48 | Workshop | The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models |
49 | Workshop | Workshop on Visual Concepts |
50 | Workshop | Sometimes Less is More: The First Dataset Distillation Challenge |
51 | Workshop | 2nd Workshop on Quantum Computer Vision and Machine Learning (QCVML) |
52 | Workshop | 2nd Workshop on More Exploration, Less Exploitation (MELEX) |
53 | Workshop | Synthetic Data for Computer Vision |
54 | Workshop | International Challenge on Compositional and Multimodal Perception |
55 | Workshop | AVGenL: Audio-Visual Generation and Learning |
57 | Workshop | Multimodal Agents Workshop |
58 | Workshop | 2nd OmniLabel Workshop: Enabling Complex Perception Through Vision and Language Foundational Models |
59 | Workshop | The Dark Side of Generative AIs and Beyond |
61 | Workshop | FOundation models Creators meet USers (FOCUS) |
62 | Workshop | Fairness and ethics towards transparent AI: facing the chalLEnge through model Debiasing (FAILED) |
63 | Workshop | Explainable AI for Computer Vision: Where Are We and Where Are We Going? |
64 | Workshop | TWYN: Trust What You learN. 1st Workshop on Trustworthiness in Computer Vision |
65 | Workshop | Women in Computer Vision |
66 | Workshop | 2nd International Workshop on Privacy-Preserving Computer Vision |
67 | Workshop | Critical Evaluation of Generative Models and their Impact on Society |
68 | Workshop | xAI4Biometrics at ECCV 2024 - 4th Workshop on Explainable & Interpretable Artificial Intelligence for Biometrics |
69 | Workshop | Workshop on Green Foundation Models |
70 | Workshop | Scalable 3D Scene Generation and 3D Geometric Scene Understanding |
71 | Workshop | OpenSUN3D: 3rd Workshop on Open-Vocabulary 3D Scene Understanding |
72 | Workshop | Map-free Visual Relocalization |
73 | Workshop | Workshop on Neuromorphic Vision (NeVi): Advantages and Applications of Event Cameras |
74 | Workshop | 1st Workshop on Neural Fields Beyond Conventional Cameras |
75 | Workshop | GigaVision: When Gigapixel Videography Meets Computer Vision |
76 | Workshop | Eyes of the Future: Integrating Computer Vision in Smart Eyewear |
77 | Tutorial | Large Multimodal Foundation Models |
78 | Tutorial | A Bayesian Odyssey in Uncertainty: from Theoretical Foundations to Real-World Applications |
79 | Tutorial | Third Hands-on Egocentric Research Tutorial with Project Aria, from Meta |
80 | Tutorial | Emerging Trends in Disentanglement and Compositionality |
81 | Tutorial | Efficient Text-to-Image and Text-to-3D modeling |
82 | Tutorial | Responsibly Building Generative Models |
83 | Tutorial | Recent Advances in Video Content Understanding and Generation |
84 | Tutorial | Time is precious: Self-Supervised Learning Beyond Images |
85 | Tutorial | Inside Plato's door: a tour in Multi-view Geometry |
86 | Poster Session | Poster Session 1 |
87 | Oral Session | Oral 1A: Scene Analysis And Understanding |
88 | Oral Session | Oral 1B: Autonomous Driving |
89 | Oral Session | Oral 1C: Low-Level Vision And Imaging |
90 | Poster Session | Poster Session 2 |
91 | Oral Session | Oral 2A: Generative Models I |
92 | Oral Session | Oral 2B: Recognition |
93 | Oral Session | Oral 2C: Multi-View And Visual Odometry |
94 | Poster Session | Poster Session 3 |
95 | Oral Session | Oral 3A: Datasets And Benchmarking |
96 | Oral Session | Oral 3B: Medical And Biological Imaging |
97 | Oral Session | Oral 3C: Point Clouds |
98 | Poster Session | Poster Session 4 |
99 | Oral Session | Oral 4A: Neural 3D Rendering |
100 | Oral Session | Oral 4B: Video Generation / Editing / Prediction |
101 | Oral Session | Oral 4C: Humans: Biometrics, Pose And Motion |
102 | Poster Session | Poster Session 5 |
103 | Oral Session | Oral 5A: Segmentation |
104 | Oral Session | Oral 5B: Vision Applications |
105 | Oral Session | Oral 5C: Representation Learning |
106 | Poster Session | Poster Session 6 |
107 | Oral Session | Oral 6A: Generative Models II |
108 | Oral Session | Oral 6B: Video Understanding |
109 | Oral Session | Oral 6C: Vision And Other Modalities |
110 | Poster Session | Poster Session 7 |
111 | Oral Session | Oral 7A: Learning Architectures, Transfer, Continual And Long-Tail |
112 | Oral Session | Oral 7B: Adversarial Learning And Privacy |
113 | Oral Session | Oral 7C: Optimization And Theory |
114 | Poster | Bi-directional Contextual Attention for 3D Dense Captioning |
115 | Oral | Bi-directional Contextual Attention for 3D Dense Captioning |
116 | Poster | Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention |
117 | Oral | Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention |
118 | Poster | ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting |
119 | Oral | ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting |
120 | Poster | Towards Scene Graph Anticipation |
121 | Oral | Towards Scene Graph Anticipation |
122 | Poster | OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation |
123 | Oral | OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation |
124 | Poster | PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers |
125 | Oral | PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers |
126 | Poster | H-V2X: A Large Scale Highway Dataset for BEV Perception |
127 | Oral | H-V2X: A Large Scale Highway Dataset for BEV Perception |
128 | Poster | RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios |
129 | Oral | RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios |
130 | Poster | DriveLM: Driving with Graph Visual Question Answering |
131 | Oral | DriveLM: Driving with Graph Visual Question Answering |
132 | Poster | Making Large Language Models Better Planners with Reasoning-Decision Alignment |
133 | Oral | Making Large Language Models Better Planners with Reasoning-Decision Alignment |
134 | Poster | M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation |
135 | Oral | M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation |
136 | Poster | MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping |
137 | Oral | MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping |
138 | Poster | Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction |
139 | Oral | Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction |
140 | Poster | A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging |
141 | Oral | A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging |
142 | Poster | Photon Inhibition for Energy-Efficient Single-Photon Imaging |
143 | Oral | Photon Inhibition for Energy-Efficient Single-Photon Imaging |
144 | Poster | Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging |
145 | Oral | Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging |
146 | Poster | Minimalist Vision with Freeform Pixels |
147 | Oral | Minimalist Vision with Freeform Pixels |
148 | Poster | SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow |
149 | Oral | SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow |
150 | Poster | Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection |
151 | Oral | Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection |
152 | Poster | OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects |
153 | Oral | OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects |
154 | Poster | UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model |
155 | Poster | Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture |
156 | Poster | HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting |
157 | Poster | MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space |
158 | Poster | Personalized Video Relighting With an At-Home Light Stage |
159 | Poster | Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations |
160 | Poster | Panel-Specific Degradation Representation for Raw Under-Display Camera Image Restoration |
161 | Poster | HoloADMM: High-Quality Holographic Complex Field Recovery |
162 | Poster | Flying with Photons: Rendering Novel Views of Propagating Light |
163 | Oral | Flying with Photons: Rendering Novel Views of Propagating Light |
164 | Poster | Efficient Depth-Guided Urban View Synthesis |
165 | Poster | Ray-Distance Volume Rendering for Neural Scene Reconstruction |
166 | Poster | Taming Latent Diffusion Model for Neural Radiance Field Inpainting |
167 | Poster | Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors |
168 | Poster | GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer |
169 | Poster | MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References |
170 | Poster | UNIKD: UNcertainty-Filtered Incremental Knowledge Distillation for Neural Implicit Representation |
171 | Poster | Rethinking Directional Parameterization in Neural Implicit Surface Reconstruction |
172 | Poster | Sur^2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images |
173 | Poster | Differentiable Convex Polyhedra Optimization from Multi-view Images |
174 | Poster | Combining Generative and Geometry Priors for Wide-Angle Portrait Correction |
175 | Poster | I2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM |
176 | Poster | Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops |
177 | Poster | MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo |
178 | Poster | CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians |
179 | Poster | GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting |
180 | Poster | FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally |
181 | Poster | PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis |
182 | Poster | MegaScenes: Scene-Level View Synthesis at Scale |
183 | Poster | HiFi-123: Towards High-fidelity One Image to 3D Content Generation |
184 | Poster | View-Consistent 3D Editing with Gaussian Splatting |
185 | Poster | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image |
186 | Poster | Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis |
187 | Poster | 3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views |
188 | Poster | Nuvo: Neural UV Mapping for Unruly 3D Representations |
189 | Poster | Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors |
190 | Poster | BlenderAlchemy: Editing 3D Graphics with Vision-Language Models |
191 | Poster | A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control |
192 | Poster | DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation |
193 | Poster | TPA3D: Triplane Attention for Fast Text-to-3D Generation |
194 | Poster | DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement |
195 | Poster | WordRobe: Text-Guided Generation of Textured 3D Garments |
196 | Poster | AnyHome: Open-Vocabulary Large-Scale Indoor Scene Generation with First-Person View Exploration |
197 | Poster | HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance |
198 | Poster | SENC: Handling Self-collision in Neural Cloth Simulation |
199 | Poster | AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation |
200 | Poster | SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model |
201 | Poster | Diffusion Models as Data Mining Tools |
202 | Poster | ReMatching: Low-Resolution Representations for Scalable Shape Correspondence |
203 | Poster | PolyRoom: Room-aware Transformer for Floorplan Reconstruction |
204 | Poster | WindPoly: Polygonal Mesh Reconstruction via Winding Numbers |
205 | Poster | Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack |
206 | Poster | Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion |
207 | Poster | Diffusion Bridges for 3D Point Cloud Denoising |
208 | Poster | Towards a Density Preserving Objective Function for Learning on Point Sets |
209 | Poster | Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach |
210 | Poster | T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning |
211 | Poster | Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer |
212 | Poster | DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields |
213 | Poster | Computing the Lipschitz constant needed for fast scene recovery from CASSI measurements |
214 | Poster | Regularizing Dynamic Radiance Fields with Kinematic Fields |
215 | Poster | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation |
216 | Poster | iMatching: Imperative Correspondence Learning |
217 | Poster | Fundamental Matrix Estimation Using Relative Depths |
218 | Poster | Track Everything Everywhere Fast and Robustly |
219 | Poster | Learning to Make Keypoints Sub-Pixel Accurate |
220 | Poster | Shape-guided Configuration-aware Learning for Endoscopic-image-based Pose Estimation of Flexible Robotic Instruments |
221 | Poster | FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models |
222 | Poster | Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking |
223 | Poster | Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation |
224 | Poster | Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images |
225 | Poster | GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation |
226 | Poster | D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction |
227 | Poster | Event-based Head Pose Estimation: Benchmark and Method |
228 | Poster | Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer |
229 | Poster | RAW-Adapter: Adapting Pretrained Visual Model to Camera RAW Images |
230 | Poster | Easing 3D Pattern Reasoning with Side-view Features for Semantic Scene Completion |
231 | Poster | Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions |
232 | Poster | GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth |
233 | Poster | Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry |
234 | Poster | Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network |
235 | Poster | CountFormer: Multi-View Crowd Counting Transformer |
236 | Poster | When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset |
237 | Poster | MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation |
238 | Poster | 4D Contrastive Superflows are Dense 3D Representation Learners |
239 | Poster | TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection |
240 | Poster | CARB-Net: Camera-Assisted Radar-Based Network for Vulnerable Road User Detection |
241 | Poster | SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving |
242 | Poster | RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception |
243 | Poster | TrafficNight : An Aerial Multimodal Benchmark For Nighttime Vehicle Surveillance |
244 | Poster | RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes |
245 | Poster | Monocular Occupancy Prediction for Scalable Indoor Scenes |
246 | Poster | nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding |
247 | Poster | Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks |
248 | Oral | Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks |
249 | Poster | CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting |
250 | Poster | Neural Volumetric World Models for Autonomous Driving |
251 | Poster | Progressive Pretext Task Learning for Human Trajectory Prediction |
252 | Poster | Risk-Aware Self-Consistent Imitation Learning for Trajectory Planning in Autonomous Driving |
253 | Poster | Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries |
254 | Poster | Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practice |
255 | Poster | TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos |
256 | Poster | Temporally Consistent Stereo Matching |
257 | Poster | Retrieval Robust to Object Motion Blur |
258 | Poster | Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions |
259 | Poster | CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring |
260 | Poster | Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework |
261 | Poster | Diffusion Reward: Learning Rewards via Conditional Video Diffusion |
262 | Poster | HUMOS: Human Motion Model Conditioned on Body Shape |
263 | Poster | PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture |
264 | Poster | Large Motion Model for Unified Multi-Modal Motion Generation |
265 | Poster | Realistic Human Motion Generation with Cross-Diffusion Models |
266 | Poster | Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions |
267 | Poster | Generating Human Interaction Motions in Scenes with Text Control |
268 | Poster | Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation |
269 | Poster | Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity |
270 | Poster | PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control |
271 | Poster | MoVideo: Motion-Aware Video Generation with Diffusion Models |
272 | Poster | FreeInit: Bridging Initialization Gap in Video Diffusion Models |
273 | Poster | DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing |
274 | Poster | Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion |
275 | Poster | ReNoise: Real Image Inversion Through Iterative Noising |
276 | Poster | Elegantly Written: Disentangling Writer and Character Styles for Enhancing Online Chinese Handwriting |
277 | Poster | One-Shot Diffusion Mimicker for Handwritten Text Generation |
278 | Poster | Investigating Style Similarity in Diffusion Models |
279 | Poster | DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation |
280 | Poster | PartCraft: Crafting Creative Objects by Parts |
281 | Poster | DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators |
282 | Poster | WAS: Dataset and Methods for Artistic Text Segmentation |
283 | Poster | GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections |
284 | Poster | PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation |
285 | Poster | HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation |
286 | Poster | Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance |
287 | Poster | Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm |
288 | Poster | Diffusion Soup: Model Merging for Text-to-Image Diffusion Models |
289 | Poster | Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention |
290 | Poster | Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers |
291 | Poster | Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control |
292 | Poster | DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks |
293 | Poster | Do text-free diffusion models learn discriminative visual representations? |
294 | Poster | DataDream: Few-shot Guided Dataset Generation |
295 | Poster | DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation |
296 | Poster | ZeST: Zero-Shot Material Transfer from a Single Image |
297 | Poster | FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior |
298 | Poster | Learning Equilibrium Transformation for Gamut Expansion and Color Restoration |
299 | Poster | Timestep-Aware Correction for Quantized Diffusion Models |
300 | Poster | Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer. |
301 | Poster | Energy-Clibrated VAE with Test Time Free Lunch |
302 | Poster | Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models |
303 | Poster | Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline |
304 | Poster | Asymmetric Mask Scheme for Self-Supervised Real Image Denoising |
305 | Poster | GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity |
306 | Poster | Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution |
307 | Poster | A New Dataset and Framework for Real-World Blurred Images Super-Resolution |
308 | Poster | Blind image deblurring with noise-robust kernel estimation |
309 | Poster | SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution |
310 | Poster | MambaIR: A Simple Baseline for Image Restoration with State-Space Model |
311 | Poster | BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering |
312 | Poster | Towards Robust Full Low-bit Quantization of Super Resolution Networks |
313 | Poster | Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network |
314 | Poster | SAH-SCI: Self-Supervised Adapter for Efficient Hyperspectral Snapshot Compressive Imaging |
315 | Poster | Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling |
316 | Poster | DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays |
317 | Poster | BaSIC: BayesNet Structure Learning for Computational Scalable Neural Image Compression |
318 | Poster | SNeRV: Spectra-preserving Neural Representation for Video |
319 | Poster | Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics |
320 | Poster | Multiscale Graph Texture Network |
321 | Poster | DetailSemNet: Elevating Signature Verification through Detail-Semantic Integration |
322 | Poster | Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors |
323 | Poster | Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection |
324 | Poster | AdversariaLeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems |
325 | Poster | Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference |
326 | Poster | NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation |
327 | Poster | Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning |
328 | Poster | Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps |
329 | Poster | SemTrack: A Large-scale Dataset for Semantic Tracking in the Wild |
330 | Poster | DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition |
331 | Poster | CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner |
332 | Poster | Towards More Practical Group Activity Detection: A New Benchmark and Model |
333 | Poster | Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs |
334 | Poster | Online Temporal Action Localization with Memory-Augmented Transformer |
335 | Poster | EgoLifter: Open-world 3D Segmentation for Egocentric Perception |
336 | Poster | MeshSegmenter: Zero-Shot Mesh Segmentation via Texture Synthesis |
337 | Poster | Spatial-Temporal Multi-level Association for Video Object Segmentation |
338 | Poster | Gated Temporal Diffusion for Stochastic Long-term Dense Anticipation |
339 | Poster | ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders |
340 | Poster | Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment |
341 | Poster | VideoMamba: State Space Model for Efficient Video Understanding |
342 | Poster | Text-Conditioned Resampler For Long Form Video Understanding |
343 | Poster | SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding |
344 | Poster | Vamos: Versatile Action Models for Video Understanding |
345 | Poster | Goldfish: Vision-Language Understanding of Arbitrarily Long Videos |
346 | Poster | Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning |
347 | Poster | Multi-Sentence Grounding for Long-term Instructional Video |
348 | Poster | CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios |
349 | Poster | CPM: Class-conditional Prompting Machine for Audio-visual Segmentation |
350 | Poster | SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark |
351 | Poster | CityGuessr: City-Level Video Geo-Localization on a Global Scale |
352 | Poster | WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification |
353 | Poster | Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery |
354 | Poster | AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization |
355 | Poster | LingoQA: Video Question Answering for Autonomous Driving |
356 | Poster | Dolphins: Multimodal Language Model for Driving |
357 | Poster | PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation |
358 | Poster | LLM as Copilot for Coarse-grained Vision-and-Language Navigation |
359 | Poster | Visual Grounding for Object-Level Generalization in Reinforcement Learning |
360 | Poster | m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks |
361 | Poster | Recursive Visual Programming |
362 | Poster | Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding |
363 | Poster | Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models |
364 | Poster | HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning |
365 | Poster | REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models |
366 | Poster | ViG-Bias: Visually Grounded Bias Discovery and Mitigation |
367 | Poster | GENIXER: Empowering Multimodal Large Language Models as a Powerful Data Generator |
368 | Poster | Adversarial Prompt Tuning for Vision-Language Models |
369 | Poster | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training |
370 | Poster | Synergy of Sight and Semantics: Visual Intention Understanding with CLIP |
371 | Poster | FlexAttention for Efficient High-Resolution Vision-Language Models |
372 | Poster | VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks |
373 | Poster | Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection |
374 | Poster | Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment |
375 | Poster | BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues |
376 | Poster | Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights |
377 | Poster | CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts |
378 | Poster | Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning |
379 | Poster | GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths |
380 | Oral | GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths |
381 | Poster | Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery |
382 | Poster | Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers |
383 | Poster | Trackastra: Transformer-based cell tracking for live-cell microscopy |
384 | Poster | Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking |
385 | Poster | Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Object Appearance Graphs |
386 | Poster | E3V-K5: An Authentic Benchmark for Redefining Video-Based Energy Expenditure Estimation |
387 | Poster | Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition |
388 | Poster | Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion |
389 | Poster | Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective |
390 | Poster | Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization |
391 | Poster | Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras |
392 | Poster | Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation |
393 | Poster | X-Pose: Detecting Any Keypoints |
394 | Poster | Open-Set Recognition in the Age of Vision-Language Models |
395 | Poster | Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection |
396 | Poster | A Fair Ranking and New Model for Panoptic Scene Graph Generation |
397 | Oral | A Fair Ranking and New Model for Panoptic Scene Graph Generation |
398 | Poster | Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image |
399 | Poster | Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance |
400 | Poster | A Simple Background Augmentation Method for Object Detection with Diffusion Model |
401 | Poster | OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation |
402 | Poster | Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models |
403 | Poster | Agent Attention: On the Integration of Softmax and Linear Attention |
404 | Poster | WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting |
405 | Poster | Agglomerative Token Clustering |
406 | Poster | Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation |
407 | Poster | 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance |
408 | Poster | SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images |
409 | Poster | Open-Vocabulary RGB-Thermal Semantic Segmentation |
410 | Poster | PartSTAD: 2D-to-3D Part Segmentation Task Adaptation |
411 | Poster | Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively |
412 | Poster | FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions |
413 | Poster | Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation |
414 | Poster | Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation |
415 | Poster | Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off |
416 | Poster | Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation |
417 | Poster | Self-supervised co-salient object detection via feature correspondences at multiple scales |
418 | Poster | Unsupervised Dense Prediction using Differentiable Normalized Cuts |
419 | Poster | Robust Zero-Shot Crowd Counting and Localization with Adaptive Resolution SAM |
420 | Poster | Bayesian Detector Combination for Object Detection with Crowdsourced Annotations |
421 | Poster | Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection |
422 | Poster | Bucketed Ranking-based Losses for Efficient Training of Object Detectors |
423 | Poster | Better Regression Makes Better Test-time Adaptive 3D Object Detection |
424 | Poster | MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection |
425 | Poster | IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection |
426 | Poster | Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency |
427 | Poster | The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation |
428 | Poster | A Rotation-invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound Images |
429 | Poster | Multistain Pretraining for Slide Representation Learning in Pathology |
430 | Poster | Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data |
431 | Poster | HERGen: Elevating Radiology Report Generation with Longitudinal Data |
432 | Poster | Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics |
433 | Poster | Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis |
434 | Poster | AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection |
435 | Poster | Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection |
436 | Poster | A Unified Image Compression Method for Human Perception and Multiple Vision Tasks |
437 | Poster | FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion |
438 | Poster | Quantization-Friendly Winograd Transformations for Convolutional Neural Networks |
439 | Poster | YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information |
440 | Poster | Stripe Observation Guided Inference Cost-free Attention Mechanism |
441 | Poster | NOVUM: Neural Object Volumes for Robust Object Classification |
442 | Poster | POA: Pre-training Once for Models of All Sizes |
443 | Poster | Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks |
444 | Poster | Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization |
445 | Poster | Fisher Calibration for Backdoor-Robust Heterogeneous Federated Learning |
446 | Poster | MultiDelete for Multimodal Machine Unlearning |
447 | Poster | Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing |
448 | Poster | Multi-Label Cluster Discrimination for Visual Representation Learning |
449 | Poster | Robustness Preserving Fine-tuning using Neuron Importance |
450 | Poster | Online Zero-Shot Classification with CLIP |
451 | Poster | Understanding Multi-compositional learning in Vision and Language models via Category Theory |
452 | Poster | This Probably Looks Exactly Like That: An Invertible Prototypical Network |
453 | Poster | Rethinking Unsupervised Outlier Detection via Multiple Thresholding |
454 | Poster | Learning Non-Linear Invariants for Unsupervised Out-of-Distribution Detection |
455 | Poster | Multimodal Label Relevance Ranking via Reinforcement Learning |
456 | Poster | Confidence Self-Calibration for Multi-Label Class-Incremental Learning |
457 | Poster | MTaDCS: Moving Trace and Feature Density-based Confidence Sample Selection under Label Noise |
458 | Poster | Bidirectional Uncertainty-Based Active Learning for Open-Set Annotation |
459 | Poster | Online Continuous Generalized Category Discovery |
460 | Poster | Open-set Domain Adaptation via Joint Error based Multi-class Positive and Unlabeled Learning |
461 | Poster | UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework |
462 | Poster | Rethinking Few-shot Class-incremental Learning: Learning from Yourself |
463 | Poster | Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning |
464 | Poster | Semantic Residual Prompts for Continual Learning |
465 | Poster | Encapsulating Knowledge in One Prompt |
466 | Poster | Representation Enhancement-Stabilization: Reducing Bias-Variance of Domain Generalization |
467 | Poster | Good Teachers Explain: Explanation-Enhanced Knowledge Distillation |
468 | Poster | PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation |
469 | Poster | Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation |
470 | Poster | Dataset Distillation by Automatic Training Trajectories |
471 | Poster | Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction |
472 | Poster | Graph Neural Network Causal Explanation via Neural Causal Models |
473 | Poster | Optimization-based Uncertainty Attribution Via Learning Informative Perturbations |
474 | Poster | Generalizable Symbolic Optimizer Learning |
475 | Poster | CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction |
476 | Poster | Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation |
477 | Poster | Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense |
478 | Poster | SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning |
479 | Poster | Zero-Shot Detection of AI-Generated Images |
480 | Oral | Zero-Shot Detection of AI-Generated Images |
481 | Poster | MobileNetV4: Universal Models for the Mobile Ecosystem |
482 | Oral | MobileNetV4: Universal Models for the Mobile Ecosystem |
483 | Poster | Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation |
484 | Oral | Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation |
485 | Poster | Adaptive Parametric Activation |
486 | Oral | Adaptive Parametric Activation |
487 | Poster | CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection |
488 | Oral | CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection |
489 | Poster | Dataset Enhancement with Instance-Level Augmentations |
490 | Oral | Dataset Enhancement with Instance-Level Augmentations |
491 | Poster | Efficient Bias Mitigation Without Privileged Information |
492 | Oral | Efficient Bias Mitigation Without Privileged Information |
493 | Poster | On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines |
494 | Oral | On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines |
495 | Poster | Momentum Auxiliary Network for Supervised Local Learning |
496 | Oral | Momentum Auxiliary Network for Supervised Local Learning |
497 | Poster | From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition |
498 | Oral | From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition |
499 | Poster | Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation |
500 | Oral | Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation |
501 | Poster | Relation DETR: Exploring Explicit Position Relation Prior for Object Detection |
502 | Oral | Relation DETR: Exploring Explicit Position Relation Prior for Object Detection |
503 | Poster | ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images |
504 | Oral | ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images |
505 | Poster | ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation |
506 | Oral | ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation |
507 | Poster | COMO: Compact Mapping and Odometry |
508 | Oral | COMO: Compact Mapping and Odometry |
509 | Poster | Camera Calibration using a Collimator System |
510 | Oral | Camera Calibration using a Collimator System |
511 | Poster | Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection |
512 | Oral | Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection |
513 | Poster | Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition |
514 | Oral | Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition |
515 | Poster | SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments |
516 | Oral | SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments |
517 | Poster | Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss |
518 | Oral | Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss |
519 | Poster | Six-Point Method for Multi-Camera Systems with Reduced Solution Space |
520 | Oral | Six-Point Method for Multi-Camera Systems with Reduced Solution Space |
521 | Poster | Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer |
522 | Oral | Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer |
523 | Poster | Grounding Image Matching in 3D with MASt3R |
524 | Oral | Grounding Image Matching in 3D with MASt3R |
525 | Poster | EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis |
526 | Oral | EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis |
527 | Poster | TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering |
528 | Oral | TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering |
529 | Poster | Accelerating Image Generation with Sub-path Linear Approximation Model |
530 | Oral | Accelerating Image Generation with Sub-path Linear Approximation Model |
531 | Poster | SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation |
532 | Oral | SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation |
533 | Poster | Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos |
534 | Oral | Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos |
535 | Poster | LLMGA: Multimodal Large Language Model based Generation Assistant |
536 | Oral | LLMGA: Multimodal Large Language Model based Generation Assistant |
537 | Poster | FlashTex: Fast Relightable Mesh Texturing with LightControlNet |
538 | Oral | FlashTex: Fast Relightable Mesh Texturing with LightControlNet |
539 | Poster | Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture |
540 | Oral | Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture |
541 | Poster | TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation |
542 | Oral | TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation |
543 | Poster | EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions |
544 | Poster | EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head |
545 | Poster | 3D Gaussian Parametric Head Model |
546 | Poster | Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos |
547 | Poster | RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models |
548 | Poster | PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations |
549 | Poster | COMPOSE: Comprehensive Portrait Shadow Editing |
550 | Poster | GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval |
551 | Poster | Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging |
552 | Poster | Holodepth: Programmable Depth-Varying Projection via Computer-Generated Holography |
553 | Poster | BeNeRF:Neural Radiance Fields from a Single Blurry Image and Event Stream |
554 | Poster | VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors |
555 | Poster | G3R: Gradient Guided Generalizable Reconstruction |
556 | Poster | Efficient NeRF Optimization - Not All Samples Remain Equally Hard |
557 | Poster | BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling |
558 | Poster | SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields |
559 | Poster | RS-NeRF: Neural Radiance Fields from Rolling Shutter Images |
560 | Poster | Geometry Fidelity for Spherical Images |
561 | Poster | CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance |
562 | Poster | MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering |
563 | Poster | Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis |
564 | Poster | GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time |
565 | Poster | Neural graphics texture compression supporting random access |
566 | Poster | GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views |
567 | Poster | A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis |
568 | Poster | Click-Gaussian: Interactive Segmentation to Any 3D Gaussians |
569 | Poster | McGrids: Monte Carlo-Driven Adaptive Grids for Iso-Surface Extraction |
570 | Poster | latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction |
571 | Poster | Non-parametric Sensor Noise Modeling and Synthesis |
572 | Poster | UpFusion: Novel View Diffusion from Unposed Sparse View Observations |
573 | Poster | MVDD: Multi-View Depth Diffusion Models |
574 | Poster | LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation |
575 | Oral | LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation |
576 | Poster | Hypernetworks for Generalizable BRDF Representation |
577 | Poster | High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding |
578 | Poster | Structured-NeRF: Hierarchical Scene Graph with Neural Representation |
579 | Poster | 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing |
580 | Poster | Free-Editor: Zero-shot Text-driven 3D Scene Editing |
581 | Poster | Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing |
582 | Poster | VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing |
583 | Poster | UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation |
584 | Poster | ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation |
585 | Poster | DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation |
586 | Poster | SceneTeller: Language-to-3D Scene Generation |
587 | Poster | Text to Layer-wise 3D Clothed Human Generation |
588 | Poster | ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model |
589 | Poster | D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On |
590 | Poster | Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence |
591 | Poster | Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos |
592 | Poster | Temporal Residual Jacobians for Rig-free Motion Transfer |
593 | Poster | PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation |
594 | Poster | GroundUp: Rapid Sketch-Based 3D City Massing |
595 | Poster | DiscoMatch: Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching |
596 | Poster | FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation |
597 | Poster | PointNeRF++: A multi-scale, point-based Neural Radiance Field |
598 | Poster | Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis |
599 | Poster | UMERegRobust – Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration |
600 | Poster | FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation |
601 | Poster | Learning to Adapt SAM for Segmenting Cross-domain Point Clouds |
602 | Poster | Osmosis: RGBD Diffusion Prior for Underwater Image Restoration |
603 | Poster | Differentiable Product Quantization for Memory Efficient Camera Relocalization |
604 | Poster | RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields |
605 | Poster | Light-in-Flight for a World-in-Motion |
606 | Poster | Binomial Self-compensation for Motion Error in Dynamic 3D Scanning |
607 | Poster | Non-Line-of-Sight Estimation of Fast Human Motion with Slow Scanning Imagers |
608 | Poster | Synchronization of Projective Transformations |
609 | Poster | Semicalibrated Relative Pose from an Affine Correspondence and Monodepth |
610 | Poster | GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring |
611 | Poster | LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System |
612 | Poster | SRPose: Two-view Relative Pose Estimation with Sparse Keypoints |
613 | Poster | Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences |
614 | Poster | U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation |
615 | Poster | EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation |
616 | Poster | Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot |
617 | Poster | Cut out the Middleman: Revisiting Pose-based Gait Recognition |
618 | Poster | Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? |
619 | Poster | EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere |
620 | Poster | 3D Hand Sequence Recovery from Real Blurry Images and Event Stream |
621 | Poster | Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics |
622 | Poster | Learning Cross-hand Policies of High-DOF Reaching and Grasping |
623 | Poster | Free-Viewpoint Video of Outdoor Sports Using a Drone |
624 | Poster | Unsupervised Exposure Correction |
625 | Poster | Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training |
626 | Poster | Deep Cost Ray Fusion for Sparse Depth Video Completion |
627 | Poster | PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation |
628 | Poster | Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor |
629 | Poster | UniCal: Unified Neural Sensor Calibration |
630 | Poster | Multi-modal Crowd Counting via a Broker Modality |
631 | Poster | OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection |
632 | Poster | FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection |
633 | Poster | MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain |
634 | Poster | SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data |
635 | Poster | UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving |
636 | Poster | DeTra: A Unified Model for Object Detection and Trajectory Forecasting |
637 | Poster | RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception |
638 | Poster | Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting |
639 | Poster | PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines |
640 | Poster | Sparse Refinement for Efficient High-Resolution Semantic Segmentation |
641 | Poster | InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping |
642 | Poster | PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors |
643 | Poster | Unified Local-Cloud Decision-Making via Reinforcement Learning |
644 | Poster | Generative End-to-End Autonomous Driving |
645 | Poster | MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction |
646 | Poster | Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving |
647 | Poster | LayeredFlow: A Real-World Benchmark for Non-Lambertian Multi-Layer Optical Flow |
648 | Poster | Decomposition Betters Tracking Everything Everywhere |
649 | Poster | Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching |
650 | Poster | Efficient Learning of Event-based Dense Representation using Hierarchical Memories with Adaptive Update |
651 | Poster | Towards Real-world Event-guided Low-light Video Enhancement and Deblurring |
652 | Poster | Understanding Physical Dynamics with Counterfactual World Modeling |
653 | Poster | Prompting Future Driven Diffusion Model for Hand Motion Prediction |
654 | Poster | Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild |
655 | Poster | Motion Mamba: Efficient and Long Sequence Motion Generation |
656 | Poster | TLControl: Trajectory and Language Control for Human Motion Synthesis |
657 | Poster | ParCo: Part-Coordinating Text-to-Motion Synthesis |
658 | Poster | BAMM: Bidirectional Autoregressive Motion Model |
659 | Poster | Pose Guided Fine-Grained Sign Language Video Generation |
660 | Poster | DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion |
661 | Poster | Animate Your Motion: Turning Still Images into Dynamic Videos |
662 | Poster | V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation |
663 | Poster | DragVideo: Interactive Drag-style Video Editing |
664 | Poster | StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion |
665 | Poster | MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing |
666 | Poster | FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing |
667 | Poster | Lazy Diffusion Transformer for Interactive Image Editing |
668 | Poster | WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians |
669 | Poster | Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis |
670 | Poster | Commonly Interesting Images |
671 | Poster | InstructGIE: Towards Generalizable Image Editing |
672 | Poster | The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization |
673 | Poster | CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models |
674 | Poster | Zero-shot Text-guided Infinite Image Synthesis with LLM guidance |
675 | Poster | Improving Text-guided Object Inpainting with Semantic Pre-inpainting |
676 | Poster | Customized Generation Reimagined: Fidelity and Editability Harmonized |
677 | Poster | ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement |
678 | Poster | ViPer: Visual Personalization of Generative Models via Individual Preference Learning |
679 | Poster | MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices |
680 | Poster | MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation |
681 | Poster | Towards Reliable Advertising Image Generation Using Human Feedback |
682 | Poster | IMMA: Immunizing text-to-image Models against Malicious Adaptation |
683 | Poster | PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control |
684 | Poster | AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes |
685 | Poster | UniProcessor: A Text-induced Unified Low-level Image Processor |
686 | Poster | Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models |
687 | Poster | EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models |
688 | Poster | Assessing Sample Quality via the Latent Space of Generative Models |
689 | Poster | Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection |
690 | Poster | SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers |
691 | Poster | Efficient Training with Denoised Neural Weights |
692 | Poster | FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis |
693 | Poster | A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting |
694 | Poster | Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing |
695 | Poster | DSMix: Distortion-Induced Saliency Map Based Pre-training for No-Reference Image Quality Assessment |
696 | Poster | DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior |
697 | Poster | Restoring Images in Adverse Weather Conditions via Histogram Transformer |
698 | Poster | You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation |
699 | Poster | Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution |
700 | Poster | Efficient Cascaded Multiscale Adaptive Network for Image Restoration |
701 | Poster | Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation |
702 | Poster | Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors |
703 | Poster | Taming Lookup Tables for Efficient Image Retouching |
704 | Poster | Quanta Video Restoration |
705 | Poster | Two-Stage Video Shadow Detection via Temporal-Spatial Adaption |
706 | Poster | Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework |
707 | Poster | Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging |
708 | Poster | NePhi: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration |
709 | Poster | Neural Metamorphosis |
710 | Poster | Online Video Quality Enhancement with Spatial-Temporal Look-up Tables |
711 | Poster | EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks |
712 | Poster | LaWa: Using Latent Space for In-Generation Image Watermarking |
713 | Poster | PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments |
714 | Poster | Delving into Adversarial Robustness on Document Tampering Localization |
715 | Poster | Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities |
716 | Poster | Forbes: Face Obfuscation Rendering via Backpropagation Refinement Scheme |
717 | Poster | Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment |
718 | Poster | Generalizable Facial Expression Recognition |
719 | Poster | Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding |
720 | Poster | MinD-3D: Reconstruct High-quality 3D objects in Human Brain |
721 | Poster | Pathformer3D: A 3D Scanpath Transformer for 360° Images |
722 | Poster | Eliminating Warping Shakes for Unsupervised Online Video Stitching |
723 | Poster | OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework |
724 | Poster | Semantically Guided Representation Learning For Action Anticipation |
725 | Poster | SIGMA: Sinkhorn-Guided Masked Video Modeling |
726 | Poster | Rethinking Image-to-Video Adaptation: An Object-centric Perspective |
727 | Poster | RICA^2: Rubric-Informed, Calibrated Assessment of Actions |
728 | Poster | VideoStudio: Generating Consistent-Content and Multi-Scene Videos |
729 | Poster | Training-free Video Temporal Grounding using Large-scale Pre-trained Models |
730 | Poster | EA-VTR: Event-Aware Video-Text Retrieval |
731 | Poster | Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data |
732 | Poster | FunQA: Towards Surprising Video Comprehension |
733 | Poster | Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment |
734 | Poster | Efficient Pre-training for Localized Instruction Generation of Procedural Videos |
735 | Poster | Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality |
736 | Poster | Can Textual Semantics Mitigate Sounding Object Segmentation Preference? |
737 | Poster | Visual Alignment Pre-training for Sign Language Translation |
738 | Poster | Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach |
739 | Poster | Spectral Subsurface Scattering for Material Classification |
740 | Poster | MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning |
741 | Poster | MeshVPR: Citywide Visual Place Recognition Using 3D Meshes |
742 | Poster | Frontier-enhanced Topological Memory with Improved Exploration Awareness for Embodied Visual Navigation |
743 | Poster | Asynchronous Large Language Model Enhanced Planner for Autonomous Driving |
744 | Poster | Controllable Navigation Instruction Generation with Chain of Thought Prompting |
745 | Poster | NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models |
746 | Poster | Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching |
747 | Poster | INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding |
748 | Poster | SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding |
749 | Poster | Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs |
750 | Poster | Quality Assured: Rethinking Annotation Strategies in Imaging AI |
751 | Poster | BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models |
752 | Poster | Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training |
753 | Poster | A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis |
754 | Poster | Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training |
755 | Poster | DEAL: Disentangle and Localize Concept-level Explanations for VLMs |
756 | Poster | Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models |
757 | Poster | FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction |
758 | Poster | Instruction Tuning-free Visual Token Complement for Multimodal LLMs |
759 | Poster | IVTP: Instruction-guided Visual Token Pruning for Large Vision-Language Models |
760 | Poster | LookupViT: Compressing visual information to a limited number of tokens |
761 | Poster | SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models |
762 | Poster | Integration of Global and Local Representations for Fine-grained Cross-modal Alignment |
763 | Poster | Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation |
764 | Poster | MyVLM: Personalizing VLMs for User-Specific Queries |
765 | Poster | ShareGPT4V: Improving Large Multi-Modal Models with Better Captions |
766 | Poster | View Selection for 3D Captioning via Diffusion Ranking |
767 | Poster | GRiT: A Generative Region-to-text Transformer for Object Understanding |
768 | Poster | FreestyleRet: Retrieving Images from Style-Diversified Queries |
769 | Poster | LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation |
770 | Poster | OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction |
771 | Poster | Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks |
772 | Poster | TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection |
773 | Poster | Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation |
774 | Poster | Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-diversified Documents |
775 | Poster | Region-centric Image-Language Pretraining for Open-Vocabulary Detection |
776 | Poster | Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments |
777 | Poster | Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding |
778 | Poster | Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model |
779 | Poster | Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation |
780 | Poster | SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding |
781 | Poster | PSALM: Pixelwise Segmentation with Large Multi-modal Model |
782 | Poster | Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning |
783 | Poster | OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation |
784 | Poster | On the Viability of Monocular Depth Pre-training for Semantic Segmentation |
785 | Poster | Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework |
786 | Poster | Open-Vocabulary Camouflaged Object Segmentation |
787 | Poster | From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation |
788 | Poster | 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences |
789 | Poster | Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation |
790 | Poster | Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation |
791 | Poster | Mitigating Background Shift in Class-Incremental Semantic Segmentation |
792 | Poster | LASS3D: Language-Assisted Semi-Supervised 3D Semantic Segmentation with Progressive Unreliable Data Exploitation |
793 | Poster | Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance |
794 | Poster | Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation |
795 | Poster | Zero-shot Object Counting with Good Exemplars |
796 | Poster | SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection |
797 | Poster | Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation |
798 | Poster | MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection |
799 | Poster | AugDETR: Improving Multi-scale Learning for Detection Transformer |
800 | Poster | Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter |
801 | Poster | DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion |
802 | Poster | PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation |
803 | Poster | ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image |
804 | Poster | Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification |
805 | Poster | GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation |
806 | Poster | R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection |
807 | Poster | Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation |
808 | Poster | Continuous Memory Representation for Anomaly Detection |
809 | Poster | Learning Anomalies with Normality Prior for Unsupervised Video Anomaly Detection |
810 | Poster | Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data |
811 | Poster | Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector |
812 | Poster | Fairness-aware Vision Transformer via Debiased Self-Attention |
813 | Poster | AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer |
814 | Poster | LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors |
815 | Poster | Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time |
816 | Poster | Modality Translation for Object Detection Adaptation without forgetting prior knowledge |
817 | Poster | Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition |
818 | Poster | Scaling Backwards: Minimal Synthetic Pre-training? |
819 | Poster | EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification |
820 | Poster | Training-Free Model Merging for Multi-target Domain Adaptation |
821 | Poster | CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning |
822 | Poster | Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning |
823 | Poster | Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation |
824 | Poster | Semantic-guided Robustness Tuning for Few-Shot Transfer Across Extreme Domain Shift |
825 | Poster | Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts |
826 | Poster | Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models |
827 | Poster | FlowCon: Out-of-Distribution Detection using Flow-based Contrastive Learning |
828 | Poster | PixOOD: Pixel-Level Out-of-Distribution Detection |
829 | Poster | Distributionally Robust Loss for Long-Tailed Multi-Label Image Classification |
830 | Poster | Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data |
831 | Poster | GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition |
832 | Poster | Generalized Coverage for More Robust Low-Budget Active Learning |
833 | Poster | Robust Nearest Neighbors for Source-Free Domain Adaptation under Class Distribution Shift |
834 | Poster | Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery |
835 | Poster | CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning |
836 | Poster | Disentangling Masked Autoencoders for Unsupervised Domain Generalization |
837 | Poster | Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion |
838 | Poster | Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding |
839 | Poster | Information Bottleneck Based Data Correction in Continual Learning |
840 | Poster | Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning |
841 | Poster | Markov Knowledge Distillation: Make Nasty Teachers trained by Self-undermining Knowledge Distillation Fully Distillable |
842 | Poster | FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients |
843 | Poster | SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks |
844 | Poster | SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference |
845 | Poster | Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap |
846 | Poster | Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset |
847 | Poster | Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective |
848 | Poster | Catastrophic Overfitting: A Potential Blessing in Disguise |
849 | Poster | Cocktail Universal Adversarial Attack on Deep Neural Networks |
850 | Poster | Unveiling Privacy Risks in Stochastic Neural Networks Training: Effective Image Reconstruction from Gradients |
851 | Poster | Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias |
852 | Poster | CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing |
853 | Poster | Parrot Captions Teach CLIP to Spot Text |
854 | Oral | Parrot Captions Teach CLIP to Spot Text |
855 | Poster | Towards Model-Agnostic Dataset Condensation by Heterogeneous Models |
856 | Oral | Towards Model-Agnostic Dataset Condensation by Heterogeneous Models |
857 | Poster | VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking |
858 | Oral | VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking |
859 | Poster | Insect Identification in the Wild: The AMI Dataset |
860 | Oral | Insect Identification in the Wild: The AMI Dataset |
861 | Poster | Towards Open-ended Visual Quality Comparison |
862 | Oral | Towards Open-ended Visual Quality Comparison |
863 | Poster | UniIR: Training and Benchmarking Universal Multimodal Information Retrievers |
864 | Oral | UniIR: Training and Benchmarking Universal Multimodal Information Retrievers |
865 | Poster | MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description |
866 | Oral | MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description |
867 | Poster | Adaptive Correspondence Scoring for Unsupervised Medical Image Registration |
868 | Oral | Adaptive Correspondence Scoring for Unsupervised Medical Image Registration |
869 | Poster | Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View |
870 | Oral | Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View |
871 | Poster | Knowledge-enhanced Visual-Language Pretraining for Computational Pathology |
872 | Oral | Knowledge-enhanced Visual-Language Pretraining for Computational Pathology |
873 | Poster | SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images |
874 | Oral | SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images |
875 | Poster | CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos |
876 | Oral | CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos |
877 | Poster | PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology |
878 | Oral | PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology |
879 | Poster | PointLLM: Empowering Large Language Models to Understand Point Clouds |
880 | Oral | PointLLM: Empowering Large Language Models to Understand Point Clouds |
881 | Poster | HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation |
882 | Oral | HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation |
883 | Poster | Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather |
884 | Oral | Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather |
885 | Poster | RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation |
886 | Oral | RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation |
887 | Poster | RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation |
888 | Oral | RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation |
889 | Poster | KeypointDETR: An End-to-End 3D Keypoint Detector |
890 | Oral | KeypointDETR: An End-to-End 3D Keypoint Detector |
891 | Poster | All You Need is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation |
892 | Poster | TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting |
893 | Poster | HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting |
894 | Poster | Stable Video Portraits |
895 | Poster | iHuman: Instant Animatable Digital Humans From Monocular Videos |
896 | Poster | POCA: Post-training Quantization with Temporal Alignment for Codec Avatars |
897 | Poster | Towards Image Ambient Lighting Normalization |
898 | Poster | LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models |
899 | Poster | Efficient Snapshot Spectral Imaging: Calibration-Free Parallel Structure with Aperture Diffraction Fusion |
900 | Poster | Physically Plausible Color Correction for Neural Radiance Fields |
901 | Poster | DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images |
902 | Poster | Volumetric Rendering with Baked Quadrature Fields |
903 | Poster | Depth-guided NeRF Training via Earth Mover’s Distance |
904 | Poster | RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF |
905 | Poster | Deblurring 3D Gaussian Splatting |
906 | Poster | Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in Optimization |
907 | Poster | TriNeRFLet: A Wavelet Based Triplane NeRF Representation |
908 | Poster | LaRa: Efficient Large-Baseline Radiance Fields |
909 | Poster | RANRAC: Robust Neural Scene Representations via Random Ray Consensus |
910 | Poster | SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization |
911 | Poster | Learning Representations from Foundation Models for Domain Generalized Stereo Matching |
912 | Poster | CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization |
913 | Poster | CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field |
914 | Poster | SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction |
915 | Poster | On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy |
916 | Poster | Revising Densification in Gaussian Splatting |
917 | Poster | MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation |
918 | Poster | Topology-Preserving Downsampling of Binary Images |
919 | Poster | Zero-Shot Multi-Object Scene Completion |
920 | Poster | PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance |
921 | Poster | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models |
922 | Poster | Analysis-by-Synthesis Transformer for Single-View 3D Reconstruction |
923 | Poster | Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping |
924 | Poster | COSMU: Complete 3D human shape from monocular unconstrained images |
925 | Poster | MeshFeat: Multi-Resolution Features for Neural Fields on Meshes |
926 | Poster | Real-time 3D-aware Portrait Editing from a Single Image |
927 | Poster | An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes |
928 | Poster | RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting |
929 | Poster | Scene-Conditional 3D Object Stylization and Composition |
930 | Poster | DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling |
931 | Poster | BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion |
932 | Poster | Chains of Diffusion Models |
933 | Poster | NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation |
934 | Poster | Learning Neural Deformation Representation for 4D Dynamic Shape Generation |
935 | Poster | Improving Diffusion Models for Authentic Virtual Try-on in the Wild |
936 | Poster | Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation |
937 | Poster | GIVT: Generative Infinite-Vocabulary Transformers |
938 | Poster | Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians |
939 | Poster | LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer |
940 | Poster | ZigMa: A DiT-style Zigzag Mamba Diffusion Model |
941 | Poster | Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems |
942 | Poster | Neural Surface Detection for Unsigned Distance Fields |
943 | Poster | VF-NeRF: Viewshed Fields for Rigid NeRF Registration |
944 | Poster | Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration |
945 | Oral | Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration |
946 | Poster | Transferable 3D Adversarial Shape Completion using Diffusion Models |
947 | Poster | Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation |
948 | Poster | PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training |
949 | Poster | Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds |
950 | Poster | Domain Generalization of 3D Object Detection by Density-Resampling |
951 | Poster | Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds |
952 | Poster | Physics-informed Knowledge Transfer for Underwater Monocular Depth Estimation |
953 | Poster | Improving 2D Feature Representations by 3D-Aware Fine-Tuning |
954 | Poster | SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images |
955 | Poster | 3D Congealing: 3D-Aware Image Alignment in the Wild |
956 | Poster | Reprojection Errors as Prompts for Efficient Scene Coordinate Regression |
957 | Poster | Revisiting Calibration of Wide-Angle Radially Symmetric Cameras |
958 | Poster | RGBD GS-ICP SLAM |
959 | Poster | FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos |
960 | Poster | GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence |
961 | Poster | Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation |
962 | Poster | Rotated Orthographic Projection for Self-Supervised 3D Human Pose Estimation |
963 | Poster | Diffusion Model is a Good Pose Estimator from 3D RF-Vision |
964 | Poster | Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding |
965 | Poster | Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image |
966 | Poster | 3D Reconstruction of Objects in Hands without Real World 3D Supervision |
967 | Poster | Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance |
968 | Poster | MANIKIN: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation |
969 | Poster | Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection |
970 | Poster | GraspXL: Generating Grasping Motions for Diverse Objects at Scale |
971 | Poster | HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos |
972 | Poster | Object-Aware NIR-to-Visible Translation |
973 | Poster | SEDiff: Structure Extraction for Domain Adaptive Depth Estimation via Denoising Diffusion Models |
974 | Poster | Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion |
975 | Poster | Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation |
976 | Poster | Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth |
977 | Poster | DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment |
978 | Oral | DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment |
979 | Poster | Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection |
980 | Poster | DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception |
981 | Poster | LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection |
982 | Poster | Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection |
983 | Poster | RecurrentBEV: A Long-term Temporal Fusion Framework for Multi-view 3D Detection |
984 | Poster | JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention |
985 | Poster | MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception |
986 | Poster | UAV First-Person Viewers Are Radiance Field Learners |
987 | Poster | Caltech Aerial RGB-Thermal Dataset in the Wild |
988 | Poster | V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception |
989 | Poster | CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction |
990 | Poster | Revisit Human-Scene Interaction via Space Occupancy |
991 | Poster | Enhancing Vectorized Map Perception with Historical Rasterized Maps |
992 | Poster | RoadPainter: Points Are Ideal Navigators for Topology transformER |
993 | Poster | VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions |
994 | Poster | DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving |
995 | Poster | SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic |
996 | Poster | Self-Supervised Video Desmoking for Laparoscopic Surgery |
997 | Oral | Self-Supervised Video Desmoking for Laparoscopic Surgery |
998 | Poster | BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events |
999 | Poster | LiDAR-Event Stereo Fusion with Hallucinations |
1000 | Poster | Temporal-Mapping Photography for Event Cameras |
1001 | Poster | Motion Aware Event Representation-driven Image Deblurring |
1002 | Poster | Event-Based Motion Magnification |
1003 | Poster | TimeLens-XL: Real-time Event-based Video Frame Interpolation with Large Motion |
1004 | Poster | Bidirectional Progressive Transformer for Interaction Intention Anticipation |
1005 | Poster | Reinforcement Learning via Auxillary Task Distillation |
1006 | Poster | COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation |
1007 | Poster | EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation |
1008 | Poster | MotionChain: Conversational Motion Controllers via Multimodal Prompts |
1009 | Poster | M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models |
1010 | Poster | SMooDi: Stylized Motion Diffusion Model |
1011 | Poster | IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation |
1012 | Poster | PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation |
1013 | Poster | SAVE: Protagonist Diversification with Structure Agnostic Video Editing |
1014 | Poster | Kinetic Typography Diffusion Model |
1015 | Poster | DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency |
1016 | Poster | StableDrag: Stable Dragging for Point-based Image Editing |
1017 | Poster | Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing |
1018 | Poster | Curved Diffusion: A Generative Model With Optical Geometry Control |
1019 | Poster | Tuning-Free Image Customization with Image and Text Guidance |
1020 | Poster | StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models |
1021 | Poster | AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling |
1022 | Poster | DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment |
1023 | Poster | TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling |
1024 | Poster | Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering |
1025 | Poster | AccDiffusion: An Accurate Method for Higher-Resolution Image Generation |
1026 | Poster | The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation |
1027 | Poster | DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution |
1028 | Poster | MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models |
1029 | Poster | ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion |
1030 | Poster | PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation |
1031 | Poster | Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models |
1032 | Poster | Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models |
1033 | Poster | Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models |
1034 | Poster | Distilling Diffusion Models into Conditional GANs |
1035 | Poster | Responsible Visual Editing |
1036 | Poster | HiEI: A Universal Framework for Generating High-quality Emerging Images from Natural Images |
1037 | Poster | MagicEraser: Erasing Any Objects via Semantics-Aware Control |
1038 | Poster | GenQ: Quantization in Low Data Regimes with Generative Synthetic Data |
1039 | Poster | DiffiT: Diffusion Vision Transformers for Image Generation |
1040 | Poster | DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation |
1041 | Poster | ∞-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions |
1042 | Poster | Unmasking Bias in Diffusion Model Training |
1043 | Poster | Compensation Sampling for Improved Convergence in Diffusion Models |
1044 | Poster | Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks |
1045 | Poster | Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint |
1046 | Poster | Dual-Rain: Video Rain Removal using Assertive and Gentle Teachers |
1047 | Poster | A Comparative Study of Image Restoration Networks for General Backbone Network Design |
1048 | Poster | OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal |
1049 | Poster | Domain-adaptive Video Deblurring via Test-time Blurring |
1050 | Poster | Kernel Diffusion: An Alternate Approach to Blind Deconvolution |
1051 | Poster | Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models |
1052 | Poster | Kalman-Inspired Feature Propagation for Video Face Super-Resolution |
1053 | Poster | RealViformer: Investigating Attention for Real-World Video Super-Resolution |
1054 | Poster | Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence |
1055 | Poster | Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems |
1056 | Poster | Task-Driven Uncertainty Quantification in Inverse Problems via Conformal Prediction |
1057 | Poster | Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction |
1058 | Oral | Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction |
1059 | Poster | Wavelet Convolutions for Large Receptive Fields |
1060 | Poster | Long-term Temporal Context Gathering for Neural Video Compression |
1061 | Poster | Implicit Neural Models to Extract Heart Rate from Video |
1062 | Poster | A Watermark-Conditioned Diffusion Model for IP Protection |
1063 | Poster | Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures |
1064 | Poster | Image Manipulation Detection With Implicit Neural Representation and Limited Supervision |
1065 | Poster | DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks |
1066 | Poster | Learning Natural Consistency Representation for Face Forgery Video Detection |
1067 | Poster | ARoFace: Alignment Robustness to Improve Low-quality Face Recognition |
1068 | Poster | AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors |
1069 | Poster | PetFace: A Large-Scale Dataset and Benchmark for Animal Identification |
1070 | Oral | PetFace: A Large-Scale Dataset and Benchmark for Animal Identification |
1071 | Poster | Enhancing Cross-Subject fMRI-to-Video Decoding with Global-Local Functional Alignment |
1072 | Poster | Occlusion-Aware Seamless Segmentation |
1073 | Poster | Keypoint Promptable Re-Identification |
1074 | Poster | CoTracker: It is Better to Track Together |
1075 | Poster | Free Lunch for Gait Recognition: A Novel Relation Descriptor |
1076 | Poster | S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition |
1077 | Poster | SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition |
1078 | Poster | Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition |
1079 | Poster | Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition |
1080 | Poster | Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment |
1081 | Poster | Look Around and Learn: Self-Training Object Detection by Exploration |
1082 | Poster | Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition |
1083 | Poster | Self-Supervised Video Copy Localization with Regional Token Representation |
1084 | Poster | General and Task-Oriented Video Segmentation |
1085 | Poster | Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation |
1086 | Poster | Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders |
1087 | Poster | RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos |
1088 | Poster | Referring Atomic Video Action Recognition |
1089 | Poster | Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs |
1090 | Poster | VideoAgent: Long-form Video Understanding with Large Language Model as Agent |
1091 | Poster | VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models |
1092 | Poster | AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering |
1093 | Poster | Learning Video Context as Interleaved Multimodal Sequences |
1094 | Poster | Multi-Modal Video Dialog State Tracking in the Wild |
1095 | Poster | Towards Multimodal Sentiment Analysis Debiasing via Bias Purification |
1096 | Poster | Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion |
1097 | Poster | Rethinking Normalization Layers for Domain Generalizable Person Re-identification |
1098 | Poster | Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken |
1099 | Poster | Learning Representations of Satellite Images From Metadata Supervision |
1100 | Poster | Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring |
1101 | Poster | Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition |
1102 | Poster | AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale |
1103 | Poster | QUAR-VLA: Vision-Language-Action Model for Quadruped Robots |
1104 | Poster | Navigation Instruction Generation with BEV Perception and Large Language Models |
1105 | Poster | V-IRL: Grounding Virtual Intelligence in Real Life |
1106 | Poster | M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions |
1107 | Poster | OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web |
1108 | Poster | Unifying 3D Vision-Language Understanding via Promptable Queries |
1109 | Poster | UMBRAE: Unified Multimodal Brain Decoding |
1110 | Poster | BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation |
1111 | Poster | CoReS: Orchestrating the Dance of Reasoning and Segmentation |
1112 | Poster | A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment |
1113 | Poster | Grounding Language Models for Visual Entity Recognition |
1114 | Poster | Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models |
1115 | Poster | The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models? |
1116 | Poster | AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting |
1117 | Poster | UniCode : Learning a Unified Codebook for Multimodal Large Language Models |
1118 | Poster | X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs |
1119 | Poster | EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding |
1120 | Poster | EDformer: Transformer-Based Event Denoising Across Varied Noise Levels |
1121 | Poster | Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities |
1122 | Poster | The Hard Positive Truth about Vision-Language Compositionality |
1123 | Poster | HiFi-Score: Fine-grained Image Description Evaluation with Hierarchical Parsing Graphs |
1124 | Poster | LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang |
1125 | Poster | Language-Image Pre-training with Long Captions |
1126 | Poster | IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers |
1127 | Poster | CIC-BART-SSA: : Controllable Image Captioning with Structured Semantic Augmentation |
1128 | Poster | Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective |
1129 | Poster | Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval |
1130 | Poster | Cascade Prompt Learning for Visual-Language Model Adaptation |
1131 | Poster | Gaze Target Detection Based on Head-Local-Global Coordination |
1132 | Poster | Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model |
1133 | Poster | ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling |
1134 | Poster | Towards Open-Ended Visual Recognition with Large Language Models |
1135 | Poster | AFreeCA: Annotation-Free Counting for All |
1136 | Poster | OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models |
1137 | Poster | MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection |
1138 | Poster | Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding |
1139 | Poster | SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation |
1140 | Poster | Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining |
1141 | Poster | ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference |
1142 | Poster | Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation |
1143 | Poster | DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation |
1144 | Poster | N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields |
1145 | Poster | Prioritized Semantic Learning for Zero-shot Instance Navigation |
1146 | Poster | PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model |
1147 | Poster | SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance |
1148 | Poster | Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation |
1149 | Poster | ProMerge: Prompt and Merge for Unsupervised Instance Segmentation |
1150 | Poster | Part2Object: Hierarchical Unsupervised 3D Instance Segmentation |
1151 | Poster | Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation |
1152 | Poster | Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond |
1153 | Poster | UniFS: Universal Few-shot Instance Perception with Point Representations |
1154 | Poster | Crowd-SAM:SAM as a smart annotator for object detection in crowded scenes |
1155 | Poster | Adaptive Multi-task Learning for Few-shot Object Detection |
1156 | Poster | FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection |
1157 | Poster | Distilling Knowledge from Large-Scale Image Models for Object Detection |
1158 | Poster | Revisiting Domain-Adaptive Object Detection in Adverse Weather by the Generation and Composition of High-Quality Pseudo-Labels |
1159 | Poster | Operational Open-Set Recognition and PostMax Refinement |
1160 | Poster | InfMAE: A Foundation Model in The Infrared Modality |
1161 | Poster | AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking |
1162 | Poster | Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction |
1163 | Poster | Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer |
1164 | Poster | Snuffy: Efficient Whole Slide Image Classifier |
1165 | Poster | Unified Medical Image Pre-training in Language-Guided Common Semantic Space |
1166 | Poster | Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging |
1167 | Poster | TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data |
1168 | Poster | TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection |
1169 | Poster | VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation |
1170 | Poster | Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt |
1171 | Poster | Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection |
1172 | Poster | Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision |
1173 | Poster | SAIR: Learning Semantic-aware Implicit Representation |
1174 | Poster | Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning |
1175 | Poster | Learning with Unmasked Tokens Drives Stronger Vision Learners |
1176 | Poster | Emerging Property of Masked Token for Effective Pre-training |
1177 | Poster | Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding |
1178 | Poster | The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers |
1179 | Poster | SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning |
1180 | Poster | Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers |
1181 | Poster | FYI: Flip Your Images for Dataset Distillation |
1182 | Poster | Data-to-Model Distillation: Data-Efficient Learning Framework |
1183 | Poster | Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection |
1184 | Poster | Active Generation for Image Classification |
1185 | Poster | Contrastive Learning with Synthetic Positives |
1186 | Poster | Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models |
1187 | Poster | Robust Calibration of Large Vision-Language Adapters |
1188 | Poster | Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models |
1189 | Poster | FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning |
1190 | Poster | Benchmarking Spurious Bias in Few-Shot Image Classifiers |
1191 | Poster | An Information Theoretical View for Out-Of-Distribution Detection |
1192 | Poster | ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection |
1193 | Poster | Adapting to Shifting Correlations with Unlabeled Data Calibration |
1194 | Poster | Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels |
1195 | Poster | On Pretraining Data Diversity for Self-Supervised Learning |
1196 | Poster | De-Confusing Pseudo-Labels in Source-Free Domain Adaptation |
1197 | Poster | Improving Unsupervised Domain Adaptation: A Pseudo-Candidate Set Approach |
1198 | Poster | Hierarchical Unsupervised Relation Distillation for Source Free Domain Adaptation |
1199 | Poster | Source-Free Domain-Invariant Performance Prediction |
1200 | Poster | Learning to Complement and to Defer to Multiple Users |
1201 | Poster | Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation |
1202 | Poster | Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching |
1203 | Poster | Revisiting Supervision for Continual Representation Learning |
1204 | Poster | Deep Companion Learning: Enhancing Generalization Through Historical Consistency |
1205 | Poster | Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy |
1206 | Poster | Harmonizing knowledge Transfer in Neural Network with Unified Distillation |
1207 | Poster | Feature Diversification and Adaptation for Federated Domain Generalization |
1208 | Poster | PFedEdit: Personalized Federated Learning via Automated Model Editing |
1209 | Poster | Enhanced Sparsification via Stimulative Training |
1210 | Poster | Dependency-aware Differentiable Neural Architecture Search |
1211 | Poster | Layer-Wise Relevance Propagation with Conservation Property for ResNet |
1212 | Poster | Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning |
1213 | Poster | Training A Secure Model against Data-Free Model Extraction |
1214 | Poster | CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks |
1215 | Poster | Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection |
1216 | Poster | Leveraging Imperfect Restoration for Data Availability Attack |
1217 | Poster | Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs |
1218 | Poster | Augmented Neural Fine-tuning for Efficient Backdoor Purification |
1219 | Poster | MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition |
1220 | Oral | MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition |
1221 | Poster | RaFE: Generative Radiance Fields Restoration |
1222 | Oral | RaFE: Generative Radiance Fields Restoration |
1223 | Poster | Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration |
1224 | Oral | Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration |
1225 | Poster | FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information |
1226 | Oral | FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information |
1227 | Poster | Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields |
1228 | Oral | Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields |
1229 | Poster | RPBG: Towards Robust Neural Point-based Graphics in the Wild |
1230 | Oral | RPBG: Towards Robust Neural Point-based Graphics in the Wild |
1231 | Poster | MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images |
1232 | Oral | MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images |
1233 | Poster | Learning 3D-aware GANs from Unposed Images with Template Feature Field |
1234 | Oral | Learning 3D-aware GANs from Unposed Images with Template Feature Field |
1235 | Poster | Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis |
1236 | Oral | Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis |
1237 | Poster | Watch Your Steps: Local Image and Scene Editing by Text Instructions |
1238 | Oral | Watch Your Steps: Local Image and Scene Editing by Text Instructions |
1239 | Poster | Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering |
1240 | Oral | Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering |
1241 | Poster | Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction |
1242 | Oral | Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction |
1243 | Poster | ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model |
1244 | Oral | ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model |
1245 | Poster | DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors |
1246 | Oral | DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors |
1247 | Poster | Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation |
1248 | Oral | Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation |
1249 | Poster | ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer |
1250 | Oral | ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer |
1251 | Poster | Video Editing via Factorized Diffusion Distillation |
1252 | Oral | Video Editing via Factorized Diffusion Distillation |
1253 | Poster | Efficient Neural Video Representation with Temporally Coherent Modulation |
1254 | Oral | Efficient Neural Video Representation with Temporally Coherent Modulation |
1255 | Poster | SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion |
1256 | Oral | SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion |
1257 | Poster | LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning |
1258 | Oral | LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning |
1259 | Poster | NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction |
1260 | Oral | NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction |
1261 | Poster | UGG: Unified Generative Grasping |
1262 | Oral | UGG: Unified Generative Grasping |
1263 | Poster | LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment |
1264 | Oral | LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment |
1265 | Poster | Controllable Human-Object Interaction Synthesis |
1266 | Oral | Controllable Human-Object Interaction Synthesis |
1267 | Poster | Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models |
1268 | Oral | Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models |
1269 | Poster | Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation |
1270 | Oral | Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation |
1271 | Poster | POET: Prompt Offset Tuning for Continual Human Action Adaptation |
1272 | Oral | POET: Prompt Offset Tuning for Continual Human Action Adaptation |
1273 | Poster | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model |
1274 | Oral | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model |
1275 | Poster | AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild |
1276 | Oral | AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild |
1277 | Poster | Sapiens: Foundation for Human Vision Models |
1278 | Oral | Sapiens: Foundation for Human Vision Models |
1279 | Poster | KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding |
1280 | Poster | Modeling and Driving Human Body Soundfields through Acoustic Primitives |
1281 | Poster | Let the Avatar Talk using Texts without Paired Training Data |
1282 | Poster | CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images |
1283 | Poster | Relightable Neural Actor with Intrinsic Decomposition and Pose Control |
1284 | Poster | 3R-INN: How to be climate friendly while consuming/delivering videos? |
1285 | Poster | Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement |
1286 | Poster | Intrinsic Single-Image HDR Reconstruction |
1287 | Poster | Domain Reduction Strategy for Non-Line-of-Sight Imaging |
1288 | Poster | Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering |
1289 | Poster | Synthesizing Time-varying BRDFs via Latent Space |
1290 | Poster | Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering |
1291 | Poster | Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator |
1292 | Poster | GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views |
1293 | Poster | Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization |
1294 | Poster | Collaborative Control for Geometry-Conditioned PBR Image Generation |
1295 | Poster | KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter |
1296 | Poster | Weight Conditioning for Smooth Optimization of Neural Networks |
1297 | Poster | URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields |
1298 | Poster | MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo |
1299 | Poster | TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks |
1300 | Poster | FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting |
1301 | Poster | Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion |
1302 | Poster | DoubleTake: Geometry Guided Depth Estimation |
1303 | Poster | Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal |
1304 | Poster | SAGS: Structure-Aware 3D Gaussian Splatting |
1305 | Poster | Compact 3D Scene Representation via Self-Organizing Gaussian Grids |
1306 | Poster | HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression |
1307 | Poster | GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction |
1308 | Poster | Concise Plane Arrangements for Low-Poly Surface and Volume Modelling |
1309 | Poster | Gaussian Grouping: Segment and Edit Anything in 3D Scenes |
1310 | Poster | SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis |
1311 | Poster | STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians |
1312 | Poster | Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting |
1313 | Poster | GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning |
1314 | Poster | Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation |
1315 | Poster | FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis |
1316 | Poster | Retargeting Visual Data with Deformation Fields |
1317 | Poster | LatentEditor: Text Driven Local Editing of 3D Scenes |
1318 | Poster | StyleCity: Large-Scale 3D Urban Scenes Stylization |
1319 | Poster | Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering |
1320 | Poster | Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation |
1321 | Poster | DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors |
1322 | Poster | InterFusion: Text-Driven Generation of 3D Human-Object Interaction |
1323 | Poster | Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models |
1324 | Poster | AWOL: Analysis WithOut synthesis using Language |
1325 | Poster | Improving Virtual Try-On with Garment-focused Diffusion Models |
1326 | Poster | GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns |
1327 | Poster | Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance |
1328 | Poster | DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects |
1329 | Poster | Generating 3D House Wireframes with Semantics |
1330 | Poster | LayoutFlow: Flow Matching for Layout Generation |
1331 | Poster | Synchronous Diffusion for Unsupervised Smooth Non-Rigid 3D Shape Matching |
1332 | Poster | Scalar Function Topology Divergence: Comparing Topology of 3D Objects |
1333 | Poster | DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction |
1334 | Poster | Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement |
1335 | Poster | FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds |
1336 | Poster | Frugal 3D Point Cloud Model Training via Progressive Near Point Filtering and Fused Aggregation |
1337 | Poster | SemReg: Semantics Constrained Point Cloud Registration |
1338 | Poster | GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding |
1339 | Poster | Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning |
1340 | Poster | RangeLDM: Fast Realistic LiDAR Point Cloud Generation |
1341 | Poster | Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data |
1342 | Poster | SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs |
1343 | Poster | Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization |
1344 | Poster | Adaptive Annealing for Robust Averaging |
1345 | Poster | Resolving Scale Ambiguity in Multi-view 3D Reconstruction using Dual-Pixel Sensors |
1346 | Poster | Consistent 3D Line Mapping |
1347 | Poster | Robust Incremental Structure-from-Motion with Hybrid Features |
1348 | Poster | Gravity-aligned Rotation Averaging with Circular Regression |
1349 | Poster | GeoCalib: Learning Single-image Calibration with Geometric Optimization |
1350 | Poster | Real-time Holistic Robot Pose Estimation with Unknown States |
1351 | Poster | Learning Neural Volumetric Pose Features for Camera Localization |
1352 | Poster | LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation |
1353 | Poster | SCAPE: A Simple and Strong Category-Agnostic Pose Estimator |
1354 | Poster | Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation |
1355 | Poster | UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues |
1356 | Poster | Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses |
1357 | Poster | MLPHand: Real Time Multi-View 3D Hand Reconstruction via MLP Modeling |
1358 | Poster | WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation |
1359 | Poster | RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency |
1360 | Poster | An Economic Framework for 6-DoF Grasp Detection |
1361 | Poster | SemGrasp: Semantic Grasp Generation via Language Aligned Discretization |
1362 | Oral | SemGrasp: Semantic Grasp Generation via Language Aligned Discretization |
1363 | Poster | FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation |
1364 | Poster | OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations |
1365 | Poster | ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion |
1366 | Poster | Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion |
1367 | Poster | SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning |
1368 | Poster | Reinforcement Learning Meets Visual Odometry |
1369 | Poster | Mahalanobis Distance-based Multi-view Optimal Transport for Multi-view Crowd Localization |
1370 | Poster | Camera-LiDAR Cross-modality Gait Recognition |
1371 | Poster | TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving |
1372 | Poster | 3D Single-object Tracking in Point Clouds with High Temporal Variation |
1373 | Poster | LISO: Lidar-only Self-Supervised 3D Object Detection |
1374 | Poster | MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection |
1375 | Poster | IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception |
1376 | Poster | MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty |
1377 | Poster | Reliability in Semantic Segmentation: Can We Use Synthetic Data? |
1378 | Poster | DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control |
1379 | Poster | Fully Sparse 3D Occupancy Prediction |
1380 | Poster | EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding |
1381 | Poster | Continuity Preserving Online CenterLine Graph Learning |
1382 | Poster | FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving |
1383 | Poster | Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2) |
1384 | Poster | Solving Motion Planning Tasks with a Scalable Generative Model |
1385 | Poster | Enhanced Motion Forecasting with Visual Relation Reasoning |
1386 | Poster | OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding |
1387 | Poster | Event-Aided Time-To-Collision Estimation for Autonomous Driving |
1388 | Poster | Event-based Mosaicing Bundle Adjustment |
1389 | Poster | Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations |
1390 | Poster | AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation |
1391 | Poster | Learning-based Axial Video Motion Magnification |
1392 | Poster | Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation |
1393 | Poster | Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs |
1394 | Poster | Scalable Group Choreography via Variational Phase Manifold Learning |
1395 | Poster | FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models |
1396 | Poster | Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation |
1397 | Poster | Drag Anything: Motion Control for Anything using Entity Representation |
1398 | Poster | Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores |
1399 | Poster | Audio-Synchronized Visual Animation |
1400 | Oral | Audio-Synchronized Visual Animation |
1401 | Poster | E.T. the Exceptional Trajectory: Text-to-camera-trajectory generation with character awareness |
1402 | Poster | MotionDirector: Motion Customization of Text-to-Video Diffusion Models |
1403 | Oral | MotionDirector: Motion Customization of Text-to-Video Diffusion Models |
1404 | Poster | SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models |
1405 | Poster | Object-Centric Diffusion for Efficient Video Editing |
1406 | Poster | GroupDiff: Diffusion-based Group Portrait Editing |
1407 | Poster | Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models |
1408 | Poster | Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing |
1409 | Poster | Towards compact reversible image representations for neural style transfer |
1410 | Poster | InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser |
1411 | Poster | SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing |
1412 | Poster | When and How do negative prompts take effect? |
1413 | Poster | SPIRE: Semantic Prompt-Driven Image Restoration |
1414 | Poster | LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model |
1415 | Poster | UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models |
1416 | Poster | Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models |
1417 | Poster | Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models |
1418 | Poster | Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation |
1419 | Poster | Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models |
1420 | Poster | LogoSticker: Inserting Logos into Diffusion Models for Customized Generation |
1421 | Poster | Enhancing Diffusion Models with Text-Encoder Reinforcement Learning |
1422 | Poster | SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher |
1423 | Poster | EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models |
1424 | Poster | Implicit Concept Removal of Diffusion Models |
1425 | Poster | NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image |
1426 | Poster | Global Counterfactual Directions |
1427 | Poster | Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance |
1428 | Poster | Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization |
1429 | Poster | AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation |
1430 | Poster | Beta-Tuned Timestep Diffusion Model |
1431 | Poster | Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation |
1432 | Poster | Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation |
1433 | Poster | InstructIR: High-Quality Image Restoration Following Human Instructions |
1434 | Poster | BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion |
1435 | Poster | Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models |
1436 | Poster | OneRestore: A Universal Restoration Framework for Composite Degradation |
1437 | Poster | UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt |
1438 | Poster | Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution |
1439 | Poster | When Fast Fourier Transform Meets Transformer for Image Restoration |
1440 | Poster | Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation |
1441 | Poster | SuperGaussian: Repurposing Video Models for 3D Super Resolution |
1442 | Poster | Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers |
1443 | Poster | Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients |
1444 | Poster | Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems |
1445 | Poster | Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network |
1446 | Poster | Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video Reconstruction |
1447 | Poster | Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data |
1448 | Poster | A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks |
1449 | Poster | Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures |
1450 | Poster | Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection |
1451 | Poster | Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing |
1452 | Poster | Real Appearance Modeling for More General Deepfake Detection |
1453 | Poster | SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder |
1454 | Poster | Norface: Improving Facial Expression Analysis by Identity Normalization |
1455 | Poster | Open-Set Biometrics: Beyond Good Closed-Set Models |
1456 | Poster | Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals |
1457 | Poster | PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion |
1458 | Poster | Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks |
1459 | Poster | SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking |
1460 | Poster | Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition |
1461 | Poster | VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG |
1462 | Poster | Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation |
1463 | Poster | Elucidating the Hierarchical Nature of Behavior with Masked Autoencoders |
1464 | Poster | FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition |
1465 | Poster | Bayesian Evidential Deep Learning for Online Action Detection |
1466 | Poster | Event Camera Data Dense Pre-training |
1467 | Poster | Unsupervised Moving Object Segmentation with Atmospheric Turbulence |
1468 | Poster | Beyond MOT: Semantic Multi-Object Tracking |
1469 | Poster | MRSP: Learn Multi-Representations of Single Primitive for Compositional Zero-Shot Learning |
1470 | Poster | Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition |
1471 | Poster | Open Vocabulary Multi-Label Video Classification |
1472 | Poster | R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding |
1473 | Poster | Leveraging temporal contextualization for video action recognition |
1474 | Poster | VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding |
1475 | Poster | KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval |
1476 | Poster | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding |
1477 | Poster | HowToCaption: Prompting LLMs to Transform Video Annotations at Scale |
1478 | Poster | Label-anticipated Event Disentanglement for Audio-Visual Video Parsing |
1479 | Poster | Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation |
1480 | Poster | Uncertainty-aware sign language video retrieval with probability distribution modeling |
1481 | Poster | NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition |
1482 | Poster | Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification |
1483 | Poster | HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis |
1484 | Poster | VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition |
1485 | Poster | Embodied Understanding of Driving Scenarios |
1486 | Poster | Octopus: Embodied Vision-Language Programmer from Environmental Feedback |
1487 | Poster | Finding Visual Task Vectors |
1488 | Poster | ControlLLM: Augment Language Models with Tools by Searching on Graphs |
1489 | Poster | ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities |
1490 | Poster | Uni3DL: A Unified Model for 3D Vision-Language Understanding |
1491 | Poster | CrossScore: A Multi-View Approach to Image Evaluation and Scoring |
1492 | Poster | Compositional Substitutivity of Visual Reasoning for Visual Question Answering |
1493 | Poster | The All-Seeing Project V2: Towards General Relation Comprehension of the Open World |
1494 | Poster | X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning |
1495 | Poster | ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling |
1496 | Poster | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation |
1497 | Poster | Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models |
1498 | Poster | MoAI: Mixture of All Intelligence for Large Language and Vision Models |
1499 | Poster | Training A Small Emotional Vision Language Model for Visual Art Comprehension |
1500 | Poster | Quantized Prompt for Efficient Generalization of Vision-Language Models |
1501 | Poster | VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding |
1502 | Poster | Getting it Right: Improving Spatial Consistency in Text-to-Image Models |
1503 | Poster | MultiGen: Zero-shot Image Generation from Multi-modal Prompts |
1504 | Poster | Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors |
1505 | Poster | VeCLIP: Improving CLIP Training via Visual-enriched Captions |
1506 | Poster | ControlCap: Controllable Region-level Captioning |
1507 | Poster | Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models |
1508 | Poster | Look Hear: Gaze Prediction for Speech-directed Human Attention |
1509 | Poster | Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection |
1510 | Poster | LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models |
1511 | Poster | Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy |
1512 | Poster | Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection |
1513 | Poster | Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation |
1514 | Poster | Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection |
1515 | Poster | Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation |
1516 | Poster | SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding |
1517 | Poster | LoA-Trans: Enhancing Visual Grounding by Location-Aware Transformers |
1518 | Poster | SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference |
1519 | Poster | EAFormer: Scene Text Segmentation with Edge-Aware Transformers |
1520 | Poster | CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation |
1521 | Poster | Textual Query-Driven Mask Transformer for Domain Generalized Segmentation |
1522 | Poster | Attention Decomposition for Cross-Domain Semantic Segmentation |
1523 | Poster | SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis |
1524 | Poster | A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting |
1525 | Poster | MC-PanDA: Mask Confidence for Panoptic Domain Adaptation |
1526 | Poster | OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing |
1527 | Poster | Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation |
1528 | Poster | Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation |
1529 | Poster | Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation |
1530 | Poster | ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation |
1531 | Poster | On-the-fly Category Discovery for LiDAR Semantic Segmentation |
1532 | Poster | CONDA: Condensed Deep Association Learning for Co-Salient Object Detection. |
1533 | Poster | General Geometry-aware Weakly Supervised 3D Object Detection |
1534 | Poster | CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection |
1535 | Poster | MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks |
1536 | Poster | Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights |
1537 | Poster | Rethinking Features-Fused-Pyramid-Neck for Object Detection |
1538 | Poster | 3D Small Object Detection with Dynamic Spatial Pruning |
1539 | Poster | Watching it in Dark: A Target-aware Representation Learning Framework for High-Level Vision Tasks in Low Illumination |
1540 | Poster | Gradient-Aware for Class-Imbalanced Semi-supervised Medical Image Segmentation |
1541 | Poster | Test-Time Stain Adaptation with Diffusion Models for Histopathology Image Classification |
1542 | Poster | WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering |
1543 | Poster | ChEX: Interactive Localization and Region Description in Chest X-rays |
1544 | Poster | A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization |
1545 | Poster | Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection |
1546 | Poster | Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes |
1547 | Poster | FedVAD: Enhancing Federated Video Anomaly Detection with GPT-Driven Semantic Distillation |
1548 | Poster | Efficient Training of Spiking Neural Networks with Multi-Parallel Implicit Stream Architecture |
1549 | Poster | DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation |
1550 | Poster | SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization |
1551 | Poster | SeiT++: Masked Token Modeling Improves Storage-efficient Training |
1552 | Poster | AMD: Automatic Multi-step Distillation of Large-scale Vision Models |
1553 | Poster | Stitched ViTs are Flexible Vision Backbones |
1554 | Poster | MetaAug: Meta-Data Augmentation for Post-Training Quantization |
1555 | Poster | Straightforward Layer-wise Pruning for More Efficient Visual Adaptation |
1556 | Poster | On Learning Discriminative Features from Synthesized Data for Self-Supervised Fine-Grained Visual Recognition |
1557 | Poster | Robust Multimodal Learning via Representation Decoupling |
1558 | Poster | SUMix: Mixup with Semantic and Uncertain Information |
1559 | Poster | Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning |
1560 | Poster | Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models |
1561 | Poster | SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning |
1562 | Poster | Linking in Style: Understanding learned features in deep learning models |
1563 | Poster | Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort |
1564 | Poster | Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning |
1565 | Poster | Strike a Balance in Continual Panoptic Segmentation |
1566 | Poster | IGNORE: Information Gap-based False Negative Loss Rejection for Single Positive Multi-Label Learning |
1567 | Poster | Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning |
1568 | Poster | Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation |
1569 | Poster | Learning to Distinguish Samples for Generalized Category Discovery |
1570 | Poster | Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data |
1571 | Poster | HVCLIP: High-dimensional Vector in CLIP for Unsupervised Domain Adaptation |
1572 | Poster | DiffClass: Diffusion-Based Class Incremental Learning |
1573 | Poster | Direct Distillation between Different Domains |
1574 | Poster | MemBN: Robust Test-Time Adaptation via Batch Norm with Statistics Memory |
1575 | Poster | PILoRA: Prototype Guided Incremental LoRA for Federated Class-Incremental Learning |
1576 | Poster | PromptFusion: Decoupling Stability and Plasticity for Continual Learning |
1577 | Poster | One-stage Prompt-based Continual Learning |
1578 | Poster | Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images |
1579 | Poster | Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization |
1580 | Poster | How to Train the Teacher Model for Effective Knowledge Distillation |
1581 | Poster | Local and Global Flatness for Federated Domain Generalization |
1582 | Poster | Dataset Quantization with Active Learning based Adaptive Sampling |
1583 | Poster | DεpS: Delayed ε-Shrinking for Faster Once-For-All Training |
1584 | Poster | Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search |
1585 | Poster | On Spectral Properties of Gradient-based Explanation Methods |
1586 | Poster | Cross-Input Certified Training for Universal Perturbations |
1587 | Poster | Interpretability-Guided Test-Time Adversarial Defense |
1588 | Poster | Exploring Guided Sampling of Conditional GANs |
1589 | Poster | Self-Supervised Representation Learning for Adversarial Attack Detection |
1590 | Poster | Non-transferable Pruning |
1591 | Poster | On the Vulnerability of Skip Connections to Model Inversion Attacks |
1592 | Poster | Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness |
1593 | Poster | Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities |
1594 | Oral | Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities |
1595 | Poster | Diffusion Models for Open-Vocabulary Segmentation |
1596 | Oral | Diffusion Models for Open-Vocabulary Segmentation |
1597 | Poster | Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation |
1598 | Oral | Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation |
1599 | Poster | CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model |
1600 | Oral | CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model |
1601 | Poster | Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels |
1602 | Oral | Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels |
1603 | Poster | ActionVOS: Actions as Prompts for Video Object Segmentation |
1604 | Oral | ActionVOS: Actions as Prompts for Video Object Segmentation |
1605 | Poster | WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models |
1606 | Oral | WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models |
1607 | Poster | A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability |
1608 | Oral | A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability |
1609 | Poster | COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation |
1610 | Oral | COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation |
1611 | Poster | Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance |
1612 | Oral | Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance |
1613 | Poster | Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views |
1614 | Oral | Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views |
1615 | Poster | MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery |
1616 | Oral | MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery |
1617 | Poster | Faceptor: A Generalist Model for Face Perception |
1618 | Oral | Faceptor: A Generalist Model for Face Perception |
1619 | Poster | Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking |
1620 | Oral | Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking |
1621 | Poster | Learning Multimodal Latent Generative Models with Energy-Based Prior |
1622 | Oral | Learning Multimodal Latent Generative Models with Energy-Based Prior |
1623 | Poster | Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization |
1624 | Oral | Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization |
1625 | Poster | SINDER: Repairing the Singular Defects of DINOv2 |
1626 | Oral | SINDER: Repairing the Singular Defects of DINOv2 |
1627 | Poster | Emergent Visual-Semantic Hierarchies in Image-Text Representations |
1628 | Oral | Emergent Visual-Semantic Hierarchies in Image-Text Representations |
1629 | Poster | PiTe: Pixel-Temporal Alignment for Large Video-Language Model |
1630 | Oral | PiTe: Pixel-Temporal Alignment for Large Video-Language Model |
1631 | Poster | Decoupling Common and Unique Representations for Multimodal Self-supervised Learning |
1632 | Oral | Decoupling Common and Unique Representations for Multimodal Self-supervised Learning |
1633 | Poster | Denoising Vision Transformers |
1634 | Oral | Denoising Vision Transformers |
1635 | Poster | Audio-driven Talking Face Generation with Stabilized Synchronization Loss |
1636 | Poster | ScanTalk: 3D Talking Heads from Unregistered Scans |
1637 | Poster | Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer |
1638 | Poster | Fast Registration of Photorealistic Avatars for VR Facial Animation |
1639 | Poster | MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos |
1640 | Poster | Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation |
1641 | Poster | Learning to Robustly Reconstruct Dynamic Scenes from Low-light Spike Streams |
1642 | Poster | Wavelength-Embedding-guided Filter-Array Transformer for Spectral Demosaicing |
1643 | Poster | Learned HDR Image Compression for Perceptually Optimal Storage and Display |
1644 | Poster | Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging |
1645 | Poster | Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions |
1646 | Poster | The Sky's the Limit: Relightable Outdoor Scenes via a Sky-pixel Constrained Illumination Prior and Outside-In Visibility |
1647 | Poster | A Probability-guided Sampler for Neural Implicit Surface Rendering |
1648 | Poster | REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices |
1649 | Poster | Dynamic Neural Radiance Field From Defocused Monocular Video |
1650 | Poster | VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting |
1651 | Poster | DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes |
1652 | Poster | NeRF-XL: NeRF at Any Scale with Multi-GPU |
1653 | Poster | G2fR: Frequency Regularization in Grid-based Feature Encoding Neural Radiance Fields |
1654 | Poster | InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction |
1655 | Poster | MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections |
1656 | Poster | Disentangled Generation and Aggregation for Robust Radiance Fields |
1657 | Poster | CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians |
1658 | Poster | SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians |
1659 | Poster | Surface Reconstruction for 3D Gaussian Splatting via Local Structural Hints |
1660 | Poster | Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting |
1661 | Poster | GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting |
1662 | Poster | SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting |
1663 | Poster | An Adaptive Screen-Space Meshing Approach for Normal Integration |
1664 | Poster | Fast View Synthesis of Casual Videos with Soup-of-Planes |
1665 | Poster | 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation |
1666 | Poster | GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image |
1667 | Poster | Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models |
1668 | Poster | ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance |
1669 | Poster | LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation |
1670 | Poster | External Knowledge Enhanced 3D Scene Generation from Sketch |
1671 | Poster | EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion |
1672 | Poster | 3DEgo: 3D Editing on the Go! |
1673 | Poster | Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion |
1674 | Poster | JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation |
1675 | Poster | Diverse Text-to-3D Synthesis with Augmented Text Embedding |
1676 | Poster | SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers |
1677 | Poster | CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches |
1678 | Poster | Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment |
1679 | Poster | DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose |
1680 | Poster | Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling |
1681 | Poster | LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation |
1682 | Poster | Learned Neural Physics Simulation for Articulated 3D Human Pose Reconstruction |
1683 | Poster | Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model |
1684 | Poster | Vista3D: unravel the 3d darkside of a single image |
1685 | Poster | Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem |
1686 | Poster | NICP: Neural ICP for 3D Human Registration at Scale |
1687 | Poster | PFGS: High Fidelity Point Cloud Rendering via Feature Splatting |
1688 | Poster | TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds |
1689 | Poster | EINet: Point Cloud Completion via Extrapolation and Interpolation |
1690 | Poster | DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction |
1691 | Poster | Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning |
1692 | Poster | CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection |
1693 | Poster | Formula-Supervised Visual-Geometric Pre-training |
1694 | Poster | Canonical Shape Projection is All You Need for 3D Few-shot Class Incremental Learning |
1695 | Poster | Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching |
1696 | Poster | DGD: Dynamic 3D Gaussians Distillation |
1697 | Poster | SHIC: Shape-Image Correspondences with no Keypoint Supervision |
1698 | Poster | LineFit: A Geometric Approach for Fitting Line Segments in Images |
1699 | Poster | Global Structure-from-Motion Revisited |
1700 | Poster | Robust Fitting on a Gate Quantum Computer |
1701 | Oral | Robust Fitting on a Gate Quantum Computer |
1702 | Poster | The Nerfect Match: Exploring NeRF Features for Visual Localization |
1703 | Poster | A Cephalometric Landmark Regression Method based on Dual-encoder for High-resolution X-ray Image |
1704 | Poster | FoundPose: Unseen Object Pose Estimation with Foundation Features |
1705 | Poster | PoseSOR: Human Pose Can Guide Our Attention |
1706 | Poster | A Graph-Based Approach for Category-Agnostic Pose Estimation |
1707 | Poster | 3DSA:Multi-View 3D Human Pose Estimation With 3D Space Attention Mechanisms |
1708 | Poster | HPE-Li: WiFi-enabled Lightweight Dual Selective Kernel Convolution for Human Pose Estimation |
1709 | Poster | HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation |
1710 | Poster | WHAC: World-grounded Humans and Cameras |
1711 | Poster | EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset |
1712 | Poster | 3D Human Pose Estimation via Non-Causal Retentive Networks |
1713 | Poster | Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation |
1714 | Poster | Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs |
1715 | Poster | R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding |
1716 | Poster | Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation |
1717 | Poster | FutureDepth: Learning to Predict the Future Improves Video Depth Estimation |
1718 | Poster | Möbius Transform for Mitigating Perspective Distortions in Representation Learning |
1719 | Poster | UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation |
1720 | Poster | DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences |
1721 | Poster | HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras |
1722 | Poster | SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras |
1723 | Poster | Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance |
1724 | Poster | Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection |
1725 | Poster | LiDAR-based All-weather 3D Object Detection via Prompting and Distilling 4D Radar |
1726 | Poster | SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather |
1727 | Poster | Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception |
1728 | Oral | Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception |
1729 | Poster | SkyScenes: A Synthetic Dataset for Aerial Scene Understanding |
1730 | Poster | DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model |
1731 | Poster | UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction |
1732 | Poster | VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving |
1733 | Poster | OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving |
1734 | Poster | Stream Query Denoising for Vectorized HD-Map Construction |
1735 | Poster | Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention |
1736 | Poster | Early Anticipation of Driving Maneuvers |
1737 | Poster | Adaptive Human Trajectory Prediction via Latent Corridors |
1738 | Poster | Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model |
1739 | Poster | Probabilistic Weather Forecasting with Deterministic Guidance-based Diffusion Model |
1740 | Poster | Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation |
1741 | Poster | Temporal Event Stereo via Joint Learning with Stereoscopic Flow |
1742 | Poster | FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-Based CNN |
1743 | Poster | Event-Adapted Video Super-Resolution |
1744 | Poster | Diffusion Models as Optimizers for Efficient Planning in Offline RL |
1745 | Poster | Scene-aware Human Motion Forecasting via Mutual Distance Prediction |
1746 | Poster | CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion |
1747 | Poster | F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions |
1748 | Poster | Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases |
1749 | Poster | CoMo: Controllable Motion Generation through Language Guided Pose Code Editing |
1750 | Poster | Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation |
1751 | Poster | Co-speech Gesture Video Generation with 3D Human Meshes |
1752 | Poster | MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model |
1753 | Poster | MEVG : Multi-event Video Generation with Text-to-Video Models |
1754 | Poster | HARIVO: Harnessing Text-to-Image Models for Video Generation |
1755 | Poster | WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing |
1756 | Poster | RegionDrag: Fast Region-Based Image Editing with Diffusion Models |
1757 | Poster | TurboEdit: Real-time text-based disentangled real image editing |
1758 | Poster | Factorized Diffusion: Perceptual Illusions by Noise Decomposition |
1759 | Poster | DiffusionPen: Towards Controlling the Style of Handwritten Text Generation |
1760 | Poster | ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs |
1761 | Poster | Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization |
1762 | Poster | FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation |
1763 | Poster | AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation |
1764 | Poster | Training-free Composite Scene Generation for Layout-to-Image Synthesis |
1765 | Poster | Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas |
1766 | Poster | Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models |
1767 | Poster | Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation |
1768 | Poster | OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models |
1769 | Poster | Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation |
1770 | Poster | BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion |
1771 | Poster | Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models |
1772 | Poster | MONTAGE: Monitoring Training for Attribution of Generative Diffusion Models |
1773 | Poster | ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation |
1774 | Poster | Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning |
1775 | Poster | To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now |
1776 | Poster | The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations |
1777 | Poster | Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution |
1778 | Poster | DomainFusion: Generalizing To Unseen Domains with Latent Diffusion Models |
1779 | Poster | AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation |
1780 | Oral | AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation |
1781 | Poster | Memory-Efficient Fine-Tuning for Quantized Diffusion Model |
1782 | Poster | SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow |
1783 | Poster | HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models |
1784 | Poster | EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation |
1785 | Poster | Diffusion for Natural Image Matting |
1786 | Poster | Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts |
1787 | Poster | MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration |
1788 | Poster | TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts |
1789 | Poster | Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration |
1790 | Poster | Confidence-Based Iterative Generation for Real-World Image Super-Resolution |
1791 | Poster | Efficient Frequency-Domain Image Deraining with Contrastive Regularization |
1792 | Poster | Blind Image Deconvolution by Generative-based Kernel Prior and Initializer via Latent Encoding |
1793 | Poster | SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging |
1794 | Poster | Rethinking Image Super Resolution from Training Data Perspectives |
1795 | Poster | Accelerating Image Super-Resolution Networks with Pixel-Level Classification |
1796 | Poster | Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks |
1797 | Poster | Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model |
1798 | Poster | Uncertainty-Driven Spectral Compressive Imaging with Spatial-Frequency Transformer |
1799 | Poster | Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing |
1800 | Poster | Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers |
1801 | Poster | RadEdit: stress-testing biomedical vision models via diffusion image editing |
1802 | Poster | Rate-Distortion-Cognition Controllable Versatile Neural Image Compression |
1803 | Poster | Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design |
1804 | Poster | Fast Encoding and Decoding for Implicit Video Representation |
1805 | Poster | Implicit Steganography Beyond the Constraints of Modality |
1806 | Poster | Certifiably Robust Image Watermark |
1807 | Poster | DSA: Discriminative Scatter Analysis for Early Smoke Segmentation |
1808 | Poster | AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network |
1809 | Poster | DiffFAS: Face Anti-Spoofing via Generative Diffusion Models |
1810 | Poster | Face Reconstruction Transfer Attack as Out-of-Distribution Generalization |
1811 | Poster | Toward Tiny and High-quality Facial Makeup with Data Amplify Learning |
1812 | Poster | Facial Affective Behavior Analysis with Instruction Tuning |
1813 | Poster | VideoClusterNet: Self-Supervised and Adaptive Face Clustering for Videos |
1814 | Poster | When Do We Not Need Larger Vision Models? |
1815 | Poster | Open Panoramic Segmentation |
1816 | Poster | PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking |
1817 | Poster | Self-Supervised Any-Point Tracking by Contrastive Random Walks |
1818 | Poster | WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing |
1819 | Poster | Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition |
1820 | Poster | EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding |
1821 | Poster | Trajectory-aligned Space-time Tokens for Few-shot Action Recognition |
1822 | Poster | ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos |
1823 | Poster | Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning |
1824 | Poster | OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection |
1825 | Poster | Improving Video Segmentation via Dynamic Anchor Queries |
1826 | Poster | VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement |
1827 | Poster | Merlin: Empowering Multimodal LLMs with Foresight Minds |
1828 | Poster | STSP: Spatial-Temporal Subspace Projection for Video Class-incremental Learning |
1829 | Poster | UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection |
1830 | Poster | Contextual Correspondence Matters: Bidirectional Graph Matching for Video Summarization |
1831 | Poster | Weakly-Supervised Spatio-Temporal Video Grounding with Variational Cross-Modal Alignment |
1832 | Poster | AMEGO: Active Memory from long EGOcentric videos |
1833 | Poster | Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective |
1834 | Poster | TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning |
1835 | Poster | Delving Deep into Engagement Prediction of Short Videos |
1836 | Poster | LITA: Language Instructed Temporal-Localization Assistant |
1837 | Poster | CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing |
1838 | Poster | Siamese Vision Transformers are Scalable Audio-visual Learners |
1839 | Poster | EvSign: Sign Language Recognition and Translation with Streaming Events |
1840 | Poster | WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding |
1841 | Poster | Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification |
1842 | Poster | Masked Angle-Aware Autoencoder for Remote Sensing Images |
1843 | Poster | Revisit Anything: Visual Place Recognition via Image Segment Retrieval |
1844 | Poster | Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL |
1845 | Poster | Reinforcement Learning Friendly Vision-Language Model for Minecraft |
1846 | Poster | DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control |
1847 | Poster | See and Think: Embodied Agent in Virtual Environment |
1848 | Poster | PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation |
1849 | Poster | HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning |
1850 | Poster | Take A Step Back: Rethinking the Two Stages in Visual Reasoning |
1851 | Poster | Multi-Task Domain Adaptation for Language Grounding with 3D Objects |
1852 | Poster | MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? |
1853 | Poster | Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge |
1854 | Poster | LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models |
1855 | Poster | How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs |
1856 | Poster | MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models |
1857 | Poster | Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory |
1858 | Poster | Object-Oriented Anchoring and Modal Alignment in Multimodal Learning |
1859 | Poster | An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding |
1860 | Poster | Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models |
1861 | Poster | Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks |
1862 | Poster | UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding |
1863 | Poster | ReGround: Improving Textual and Spatial Grounding at No Cost |
1864 | Poster | Platypus: A Generalized Specialist Model for Reading Text in Various Forms |
1865 | Poster | Long-CLIP: Unlocking the Long-Text Capability of CLIP |
1866 | Poster | Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning |
1867 | Poster | RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement |
1868 | Poster | Tokenize Anything via Prompting |
1869 | Poster | FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors |
1870 | Poster | De-confounded Gaze Estimation |
1871 | Poster | GalLop: Learning global and local prompts for vision-language models |
1872 | Poster | OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection |
1873 | Poster | CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection |
1874 | Poster | Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models |
1875 | Poster | Can OOD Object Detectors Learn from Foundation Models? |
1876 | Poster | VEON: Vocabulary-Enhanced Occupancy Prediction |
1877 | Poster | Efficient Vision Transformers with Partial Attention |
1878 | Poster | SAFARI: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation |
1879 | Poster | ReMamber: Referring Image Segmentation with Mamba Twister |
1880 | Poster | Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling |
1881 | Poster | A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties |
1882 | Poster | Enriching Information and Preserving Semantic Congruence in Expanding Curvilinear Object Segmentation Datasets |
1883 | Poster | Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation |
1884 | Poster | View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields |
1885 | Poster | Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization |
1886 | Poster | Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation |
1887 | Poster | PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects |
1888 | Poster | Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels |
1889 | Poster | OpenDistill3D: Open-World 3D Instance Segmentation with Unified Self-Distillation for Continual Learning and Unknown Class Discovery |
1890 | Poster | Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation |
1891 | Poster | Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier |
1892 | Poster | Bayesian Self-Training for Semi-Supervised 3D Segmentation |
1893 | Poster | Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation |
1894 | Poster | CSOT: Cross-Scan Object Transfer for Semi-Supervised LiDAR Object Detection |
1895 | Poster | Interactive 3D Object Detection with Prompts |
1896 | Poster | SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection |
1897 | Poster | Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection |
1898 | Poster | Benchmarking Object Detectors with COCO: A New Path Forward |
1899 | Poster | Frequency-Spatial Entanglement Learning for Camouflaged Object Detection |
1900 | Poster | GRA: Detecting Oriented Objects through Group-wise Rotating and Attention |
1901 | Poster | DQ-DETR: DETR with Dynamic Query for Tiny Object Detection |
1902 | Poster | AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval |
1903 | Poster | Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation |
1904 | Poster | Unleashing the Power of Prompt-driven Nucleus Instance Segmentation |
1905 | Poster | cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process |
1906 | Poster | Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification |
1907 | Poster | Learning with Counterfactual Explanations for Radiology Report Generation |
1908 | Poster | Improving Medical Multi-modal Contrastive Learning with Expert Annotations |
1909 | Poster | Few-shot Defect Image Generation based on Consistency Modeling |
1910 | Poster | Placing Objects in Context via Inpainting for Out-of-distribution Segmentation |
1911 | Poster | Learning Diffusion Models for Multi-View Anomaly Detection |
1912 | Poster | Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection |
1913 | Poster | Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models |
1914 | Poster | Enhancing Optimization Robustness in 1-bit Neural Networks through Stochastic Sign Descent |
1915 | Poster | Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training |
1916 | Poster | SNP: Structured Neuron-level Pruning to Preserve Attention Scores |
1917 | Poster | Tiny Models are the Computational Saver for Large Models |
1918 | Poster | Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning |
1919 | Poster | Trainable Highly-expressive Activation Functions |
1920 | Poster | HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion |
1921 | Poster | To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning |
1922 | Poster | SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning |
1923 | Poster | Linearly Controllable GAN: Unsupervised Feature Categorization and Decomposition for Image Generation and Manipulation |
1924 | Poster | Diagnosing and Re-learning for Balanced Multimodal Learning |
1925 | Poster | Visual Prompting via Partial Optimal Transport |
1926 | Poster | Pseudo-Labelling Should Be Aware of Disguising Channel Activations |
1927 | Poster | Efficient and Versatile Robust Fine-Tuning of Zero-shot Models |
1928 | Poster | Unsupervised Representation Learning by Balanced Self Attention Matching |
1929 | Poster | Optimal Transport of Diverse Unsupervised Tasks for Robust Learning from Noisy Few-Shot Data |
1930 | Poster | Gradient-based Out-of-Distribution Detection |
1931 | Poster | SLIM: Spuriousness Mitigation with Minimal Human Annotations |
1932 | Poster | Modeling Label Correlations with Latent Context for Multi-Label Recognition |
1933 | Poster | Rebalancing Using Estimated Class Distribution for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch |
1934 | Poster | Foster Adaptivity and Balance in Learning with Noisy Labels |
1935 | Poster | Self-Guided Generation of Minority Samples Using Diffusion Models |
1936 | Poster | Self-Cooperation Knowledge Distillation for Novel Class Discovery |
1937 | Poster | Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration |
1938 | Poster | Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams |
1939 | Poster | Few-shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt |
1940 | Poster | Exemplar-free Continual Representation Learning via Learnable Drift Compensation |
1941 | Poster | Open-World Dynamic Prompt and Continual Visual Representation Learning |
1942 | Poster | Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks |
1943 | Poster | Simple Unsupervised Knowledge Distillation With Space Similarity |
1944 | Poster | AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition |
1945 | Poster | Dataset Growth |
1946 | Poster | Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation |
1947 | Poster | MO-EMT-NAS: Multi-Objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets |
1948 | Poster | BAFFLE: A Baseline of Backpropagation-Free Federated Learning |
1949 | Poster | On the Evaluation Consistency of Attribution-based Explanations |
1950 | Poster | Debiasing surgeon: fantastic weights and how to find them |
1951 | Poster | Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search |
1952 | Poster | Improving Adversarial Transferability via Model Alignment |
1953 | Poster | Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation |
1954 | Poster | Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures |
1955 | Poster | CipherDM: Secure Three-Party Inference for Diffusion Model Sampling |
1956 | Poster | UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening |
1957 | Poster | Exact Diffusion Inversion via Bidirectional Integration Approximation |
1958 | Oral | Exact Diffusion Inversion via Bidirectional Integration Approximation |
1959 | Poster | ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction |
1960 | Oral | ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction |
1961 | Poster | Tackling Structural Hallucination in Image Translation with Local Diffusion |
1962 | Oral | Tackling Structural Hallucination in Image Translation with Local Diffusion |
1963 | Poster | Adversarial Diffusion Distillation |
1964 | Oral | Adversarial Diffusion Distillation |
1965 | Poster | Pyramid Diffusion for Fine 3D Large Scene Generation |
1966 | Oral | Pyramid Diffusion for Fine 3D Large Scene Generation |
1967 | Poster | Controlling the World by Sleight of Hand |
1968 | Oral | Controlling the World by Sleight of Hand |
1969 | Poster | Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning |
1970 | Oral | Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning |
1971 | Poster | OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model |
1972 | Oral | OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model |
1973 | Poster | MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment |
1974 | Oral | MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment |
1975 | Poster | C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition |
1976 | Oral | C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition |
1977 | Poster | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos |
1978 | Oral | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos |
1979 | Poster | Towards Neuro-Symbolic Video Understanding |
1980 | Oral | Towards Neuro-Symbolic Video Understanding |
1981 | Poster | DEVIAS: Learning Disentangled Video Representations of Action and Scene |
1982 | Oral | DEVIAS: Learning Disentangled Video Representations of Action and Scene |
1983 | Poster | Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets |
1984 | Oral | Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets |
1985 | Poster | E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation |
1986 | Oral | E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation |
1987 | Poster | Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos |
1988 | Oral | Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos |
1989 | Poster | LongVLM: Efficient Long Video Understanding via Large Language Models |
1990 | Oral | LongVLM: Efficient Long Video Understanding via Large Language Models |
1991 | Poster | Made to Order: Discovering monotonic temporal changes via self-supervised video ordering |
1992 | Oral | Made to Order: Discovering monotonic temporal changes via self-supervised video ordering |
1993 | Poster | Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization |
1994 | Oral | Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization |
1995 | Poster | A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars |
1996 | Oral | A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars |
1997 | Poster | Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models |
1998 | Oral | Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models |
1999 | Poster | Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation |
2000 | Oral | Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation |
2001 | Poster | BRAVE: Broadening the visual encoding of vision-language models |
2002 | Oral | BRAVE: Broadening the visual encoding of vision-language models |
2003 | Poster | MMBENCH: Is Your Multi-Modal Model an All-around Player? |
2004 | Oral | MMBENCH: Is Your Multi-Modal Model an All-around Player? |
2005 | Poster | uCAP: An Unsupervised Prompting Method for Vision-Language Models |
2006 | Oral | uCAP: An Unsupervised Prompting Method for Vision-Language Models |
2007 | Poster | HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts |
2008 | Oral | HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts |
2009 | Poster | An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models |
2010 | Oral | An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models |
2011 | Poster | GiT: Towards Generalist Vision Transformer through Universal Language Interface |
2012 | Oral | GiT: Towards Generalist Vision Transformer through Universal Language Interface |
2013 | Poster | Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models |
2014 | Oral | Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models |
2015 | Poster | Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360° |
2016 | Poster | Tri^{2}-plane: Thinking Head Avatar via Feature Pyramid |
2017 | Poster | AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos |
2018 | Poster | AnimateMe: 4D Facial Expressions via Diffusion Models |
2019 | Poster | Real-data-driven 2000 FPS Color Video from Mosaicked Chromatic Spikes |
2020 | Poster | Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography |
2021 | Poster | Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats |
2022 | Poster | Self-Supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM |
2023 | Poster | Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis |
2024 | Poster | Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending |
2025 | Poster | UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation |
2026 | Poster | City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web |
2027 | Poster | Few-shot NeRF by Adaptive Rendering Loss Regularization |
2028 | Poster | BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting |
2029 | Poster | Generalizable Human Gaussians for Sparse View Synthesis |
2030 | Poster | Invertible Neural Warp for NeRF |
2031 | Poster | PISR: Polarimetric Neural Implicit Surface Reconstruction for Textureless and Specular Objects |
2032 | Poster | Improving Neural Surface Reconstruction with Feature Priors from Multi-View Images |
2033 | Poster | SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization |
2034 | Poster | Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections |
2035 | Poster | 3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting |
2036 | Poster | HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes |
2037 | Poster | GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering |
2038 | Poster | EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS |
2039 | Poster | End-to-End Rate-Distortion Optimized 3D Gaussian Representation |
2040 | Poster | DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting |
2041 | Poster | Human Hair Reconstruction with Strand-Aligned 3D Gaussians |
2042 | Poster | Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting |
2043 | Poster | Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views |
2044 | Poster | SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer |
2045 | Poster | MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction |
2046 | Poster | DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting |
2047 | Poster | CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model |
2048 | Poster | Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch Image |
2049 | Poster | Lagrangian Hashing for Compressed Neural Field Representations |
2050 | Poster | GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing |
2051 | Poster | Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts |
2052 | Poster | TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation |
2053 | Poster | TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling |
2054 | Poster | Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation |
2055 | Poster | LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis |
2056 | Poster | Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation |
2057 | Poster | Synthesizing Environment-Specific People in Photographs |
2058 | Poster | Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models |
2059 | Poster | Shapefusion: 3D localized human diffusion models |
2060 | Poster | Fast Sprite Decomposition from Animated Graphics |
2061 | Poster | Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution |
2062 | Poster | WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation |
2063 | Poster | Dolfin: Diffusion Layout Transformers without Autoencoder |
2064 | Poster | MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes |
2065 | Poster | RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion |
2066 | Poster | Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds |
2067 | Poster | FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation |
2068 | Poster | T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy |
2069 | Poster | SEED: A Simple and Effective 3D DETR in Point Clouds |
2070 | Poster | ProtoComp: Diverse Point Cloud Completion with Controllable Prototype |
2071 | Poster | CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation |
2072 | Poster | Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes |
2073 | Poster | Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains |
2074 | Poster | Multi-modal Relation Distillation for Unified 3D Representation Learning |
2075 | Poster | NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields |
2076 | Poster | Single-Photon 3D Imaging with Equi-Depth Photon Histograms |
2077 | Poster | Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment |
2078 | Poster | SelfGeo: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes |
2079 | Poster | Leveraging scale- and orientation-covariant features for planar motion estimation |
2080 | Poster | Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM |
2081 | Poster | Bones Can't Be Triangles: Accurate and Efficient Vertebrae Keypoint Estimation through Collaborative Error Revision |
2082 | Poster | TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly |
2083 | Poster | SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction |
2084 | Poster | VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space |
2085 | Poster | Human Pose Recognition via Occlusion-Preserving Abstract Images |
2086 | Poster | RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark |
2087 | Poster | 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry |
2088 | Poster | HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning |
2089 | Poster | On the Utility of 3D Hand Poses for Action Recognition |
2090 | Poster | Multi-Person Pose Forecasting with Individual Interaction Perceptron and Prior Learning |
2091 | Poster | ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation |
2092 | Poster | Revisit Self-supervision with Local Structure-from-Motion |
2093 | Poster | AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation |
2094 | Poster | High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior |
2095 | Poster | Weakly-supervised Camera Localization by Ground-to-satellite Image Registration |
2096 | Poster | Benchmarking the Robustness of Cross-view Geo-localization Models |
2097 | Poster | Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance |
2098 | Poster | Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection |
2099 | Poster | GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection |
2100 | Poster | Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training |
2101 | Poster | LEROjD: Lidar Extended Radar-Only Object Detection |
2102 | Poster | Towards Stable 3D Object Detection |
2103 | Poster | ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers |
2104 | Poster | EgoPet: Egomotion and Interaction Data from an Animal's Perspective |
2105 | Poster | WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation |
2106 | Poster | Beyond the Data Imbalance: Employing the Heterogeneous Datasets for Vehicle Maneuver Prediction |
2107 | Poster | GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction |
2108 | Poster | ADMap: Anti-disturbance Framework for Vectorized HD Map Construction |
2109 | Poster | Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction |
2110 | Poster | CarFormer: Self-Driving with Learned Object-Centric Representations |
2111 | Poster | DySeT: a Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction |
2112 | Poster | NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving |
2113 | Poster | Visual Relationship Transformation |
2114 | Poster | Local All-Pair Correspondence for Point Tracking |
2115 | Poster | Un-EVIMO: Unsupervised Event-based Independent Motion Segmentation |
2116 | Poster | Edge-Guided Fusion and Motion Augmentation for Event-Image Stereo |
2117 | Poster | Physical-Based Event Camera Simulator |
2118 | Poster | REDIR: Refocus-free Event-based De-occlusion Image Reconstruction |
2119 | Poster | Exploiting Dual-Correlation for Multi-frame Time-of-Flight Denoising |
2120 | Poster | Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation |
2121 | Poster | DragAPart: Learning a Part-Level Motion Prior for Articulated Objects |
2122 | Poster | Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction |
2123 | Poster | HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects |
2124 | Poster | ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions |
2125 | Poster | Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models |
2126 | Poster | MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model |
2127 | Poster | Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos |
2128 | Poster | Self-Supervised Audio-Visual Soundscape Stylization |
2129 | Poster | TC4D: Trajectory-Conditioned Text-to-4D Generation |
2130 | Poster | LivePhoto: Real Image Animation with Text-guided Motion Control |
2131 | Poster | Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models |
2132 | Poster | Photorealistic Video Generation with Diffusion Models |
2133 | Poster | High-Fidelity and Transferable NeRF Editing by Frequency Decomposition |
2134 | Poster | Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation |
2135 | Poster | Editable Image Elements for Controllable Synthesis |
2136 | Poster | Implicit Style-Content Separation using B-LoRA |
2137 | Poster | Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression |
2138 | Poster | EraseDraw : Learning to Insert Objects by Erasing Them from Images |
2139 | Poster | Text2Place: Affordance-aware Text Guided Human Placement |
2140 | Poster | ProCreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation |
2141 | Poster | Label-free Neural Semantic Image Synthesis |
2142 | Poster | Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators |
2143 | Poster | CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion |
2144 | Poster | Context Diffusion: In-Context Aware Image Generation |
2145 | Poster | An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation |
2146 | Poster | Stable Preference: Redefining training paradigm of human preference model for Text-to-Image Synthesis |
2147 | Poster | SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models |
2148 | Poster | Large-scale Reinforcement Learning for Diffusion Models |
2149 | Poster | Latent Guard: a Safety Framework for Text-to-image Generation |
2150 | Poster | Arc2Face: A Foundation Model for ID-Consistent Human Faces |
2151 | Oral | Arc2Face: A Foundation Model for ID-Consistent Human Faces |
2152 | Poster | GAMMA-FACE: GAussian Mixture Models Amend Diffusion Models for Bias Mitigation in Face Images |
2153 | Poster | Closed-Loop Unsupervised Representation Disentanglement with |
2154 | Poster | Revisiting Feature Disentanglement Strategy in Diffusion Training and Breaking Conditional Independence Assumption in Sampling |
2155 | Poster | ByteEdit: Boost, Comply and Accelerate Generative Image Editing |
2156 | Poster | DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation |
2157 | Poster | Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion |
2158 | Poster | Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis |
2159 | Poster | FMBoost: Boosting Latent Diffusion with Flow Matching |
2160 | Oral | FMBoost: Boosting Latent Diffusion with Flow Matching |
2161 | Poster | AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation |
2162 | Poster | Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation |
2163 | Poster | L-DiffER: Single Image Reflection Removal with Language-based Diffusion Model |
2164 | Poster | LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement |
2165 | Poster | Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery |
2166 | Poster | Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal |
2167 | Poster | XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution |
2168 | Poster | AdaDiffSR: Adaptive Region-aware Dynamic acceleration Diffusion Model for Real-World Image Super-Resolution |
2169 | Poster | Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration |
2170 | Poster | Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model |
2171 | Poster | BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow |
2172 | Poster | DualDn: Dual-domain Denoising via Differentiable ISP |
2173 | Poster | Hierarchical Separable Video Transformer for Snapshot Compressive Imaging |
2174 | Poster | Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation |
2175 | Poster | Functional Transform-Based Low-Rank Tensor Factorization for Multi-Dimensional Data Recovery |
2176 | Poster | Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems |
2177 | Oral | Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems |
2178 | Poster | Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images |
2179 | Poster | Energy-induced Explicit quantification for Multi-modality MRI fusion |
2180 | Poster | WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model |
2181 | Poster | Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models |
2182 | Poster | GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields |
2183 | Poster | Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification |
2184 | Poster | Enhancing Tampered Text Detection through Frequency Feature Fusion and Decomposition |
2185 | Poster | T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models |
2186 | Poster | Towards Unified Representation of Invariant-Specific Features in Missing Modality Face Anti-Spoofing |
2187 | Poster | Personalized Privacy Protection Mask Against Unauthorized Facial Recognition |
2188 | Poster | GRAPE: Generalizable and Robust Multi-view Facial Capture |
2189 | Poster | Seeing Faces in Things: A Model and Dataset for Pareidolia |
2190 | Poster | Beyond Viewpoint: Robust 3D Object Recognition under Arbitrary Views through Joint Multi-Part Representation |
2191 | Poster | An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers |
2192 | Poster | OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers |
2193 | Poster | DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video |
2194 | Poster | Upper-body Hierarchical Graph for Skeleton Based Emotion Recognition in Assistive Driving |
2195 | Poster | SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders |
2196 | Poster | Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast |
2197 | Poster | Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition |
2198 | Poster | Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment |
2199 | Poster | Classification Matters: Improving Video Action Detection with Class-Specific Attention |
2200 | Oral | Classification Matters: Improving Video Action Detection with Class-Specific Attention |
2201 | Poster | HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization |
2202 | Poster | Appearance-based Refinement for Object-Centric Motion Segmentation |
2203 | Poster | Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation |
2204 | Poster | Fine-grained Dynamic Network for Generic Event Boundary Detection |
2205 | Poster | Data Collection-free Masked Video Modeling |
2206 | Poster | Self-supervised visual learning from interactions with objects |
2207 | Poster | Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning |
2208 | Poster | Sequential Representation Learning via Static-Dynamic Conditional Disentanglement |
2209 | Poster | Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression |
2210 | Poster | EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval |
2211 | Poster | Video Question Answering with Procedural Programs |
2212 | Poster | ViLA: Efficient Video-Language Alignment for Video Question Answering |
2213 | Poster | ST-LLM: Large Language Models Are Effective Temporal Learners |
2214 | Poster | RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos |
2215 | Poster | Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations |
2216 | Poster | Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes |
2217 | Poster | Nonverbal Interaction Detection |
2218 | Poster | PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer |
2219 | Poster | Human-in-the-Loop Visual Re-ID for Population Size Estimation |
2220 | Poster | PreLAR: World Model Pre-training with Learnable Action Representation |
2221 | Poster | Learning to Build by Building Your Own Instructions |
2222 | Poster | Situated Instruction Following |
2223 | Poster | Where am I? Scene Retrieval with Language |
2224 | Poster | ShapeLLM: Universal 3D Object Understanding for Embodied Interaction |
2225 | Poster | WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language |
2226 | Poster | SegPoint: Segment Any Point Cloud via Large Language Model |
2227 | Poster | Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions |
2228 | Poster | GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering |
2229 | Poster | LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images |
2230 | Poster | BLINK: Multimodal Large Language Models Can See but Not Perceive |
2231 | Poster | Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models |
2232 | Poster | Teach CLIP to Develop a Number Sense for Ordinal Regression |
2233 | Poster | Common Sense Reasoning for Deep Fake Detection |
2234 | Poster | Efficient Inference of Vision Instruction-Following Models with Elastic Cache |
2235 | Poster | SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models |
2236 | Poster | Improving Vision and Language Concepts Understanding with Multimodal Counterfactual Samples |
2237 | Poster | Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models |
2238 | Poster | CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs |
2239 | Poster | Evaluating Text-to-Visual Generation with Image-to-Text Generation |
2240 | Poster | DOCCI: Descriptions of Connected and Contrasting Images |
2241 | Poster | Removing Distributional Discrepancies in Captions Improves Image-Text Alignment |
2242 | Poster | LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model |
2243 | Poster | Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning |
2244 | Poster | DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism |
2245 | Poster | Conceptual Codebook Learning for Vision-Language Models |
2246 | Poster | Do Generalised Classifiers really work on Human Drawn Sketches? |
2247 | Poster | 3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views |
2248 | Poster | Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs |
2249 | Poster | PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery |
2250 | Poster | Discovering Unwritten Visual Classifiers with Large Language Models |
2251 | Poster | DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM |
2252 | Poster | LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction |
2253 | Poster | Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction |
2254 | Poster | OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation |
2255 | Poster | Rotary Position Embedding for Vision Transformer |
2256 | Poster | Multi-branch Collaborative Learning Network for 3D Visual Grounding |
2257 | Poster | SILC: Improving Vision Language Pretraining with Self-Distillation |
2258 | Poster | LiteSAM is Actually what you Need for segment Everything |
2259 | Poster | TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias |
2260 | Poster | In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation |
2261 | Poster | CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings |
2262 | Poster | SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation |
2263 | Poster | Click Prompt Learning with Optimal Transport for Interactive Segmentation |
2264 | Poster | 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation |
2265 | Poster | Segment and Recognize Anything at Any Granularity |
2266 | Poster | SOS: Segment Object System for Open-World Instance Segmentation With Object Priors |
2267 | Poster | Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images |
2268 | Poster | Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation |
2269 | Poster | AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation |
2270 | Poster | Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation |
2271 | Poster | SAM-guided Graph Cut for 3D Instance Segmentation |
2272 | Poster | Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation |
2273 | Poster | Diff3DETR: Agent-based Diffusion Model for Semi-supervised 3D Object Detection |
2274 | Poster | Shifted Autoencoders for Point Annotation Restoration in Object Counting |
2275 | Poster | Learning Camouflaged Object Detection from Noisy Pseudo Label |
2276 | Poster | Just a Hint: Point-Supervised Camouflaged Object Detection |
2277 | Poster | Rectify the Regression Bias in Long-Tailed Object Detection |
2278 | Poster | PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition |
2279 | Poster | Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning |
2280 | Poster | Visible and Clear: Finding Tiny Objects in Difference Map |
2281 | Poster | IRGen: Generative Modeling for Image Retrieval |
2282 | Poster | I-MedSAM: Implicit Medical Image Segmentation with Segment Anything |
2283 | Poster | Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation |
2284 | Poster | Norma: A Noise Robust Memory-Augmented Framework for Whole Slide Image Classification |
2285 | Poster | GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes |
2286 | Poster | BugNIST - a Large Volumetric Dataset for Detection under Domain Shift |
2287 | Poster | AD3: Introducing a score for Anomaly Detection Dataset Difficulty assessment using VIADUCT dataset |
2288 | Poster | GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection |
2289 | Poster | Unsupervised, Online and On-The-Fly Anomaly Detection For Non-Stationary Image Distributions |
2290 | Poster | Cross-Domain Learning for Video Anomaly Detection with Limited Supervision |
2291 | Poster | Attention Beats Linear for Fast Implicit Neural Representation Generation |
2292 | Poster | OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks |
2293 | Poster | ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders |
2294 | Poster | AttnZero: Efficient Attention Discovery for Vision Transformers |
2295 | Poster | Isomorphic Pruning for Vision Models |
2296 | Poster | DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs |
2297 | Poster | Robustness Tokens: Towards Adversarial Robustness of Transformers |
2298 | Poster | Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration |
2299 | Poster | Neural Spectral Decomposition for Dataset Distillation |
2300 | Poster | Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models |
2301 | Poster | Adaptive Multi-head Contrastive Learning |
2302 | Poster | Unsqueeze [CLS] Bottleneck to Learn Rich Representations |
2303 | Poster | Improving Zero-Shot Generalization for CLIP with Variational Adapter |
2304 | Poster | Learning to Obstruct Few-Shot Image Classification over Restricted Classes |
2305 | Poster | Improving Hyperbolic Representations via Gromov-Wasserstein Regularization |
2306 | Poster | HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions |
2307 | Poster | Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density |
2308 | Poster | SCOD: From Heuristics to Theory |
2309 | Poster | LNL+K: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration |
2310 | Poster | SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning |
2311 | Poster | Labeled Data Selection for Category Discovery |
2312 | Poster | PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery |
2313 | Poster | Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision |
2314 | Poster | Forget More to Learn More: Domain-specific Feature Unlearning for Semi-supervised and Unsupervised Domain Adaptation |
2315 | Poster | CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning |
2316 | Poster | Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling |
2317 | Poster | MagMax: Leveraging Model Merging for Seamless Continual Learning |
2318 | Poster | Pick-a-back: Selective Device-to-Device Knowledge Transfer in Federated Continual Learning |
2319 | Poster | Learning to Unlearn for Robust Machine Unlearning |
2320 | Poster | UNIC: Universal Classification Models via Multi-teacher Distillation |
2321 | Poster | Distributed Active Client Selection With Noisy Clients Using Model Association Scores |
2322 | Poster | Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching |
2323 | Poster | FedTSA: A Cluster-based Two-Stage Aggregation Method for Model-heterogeneous Federated Learning |
2324 | Poster | Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge |
2325 | Poster | Rethinking Fast Adversarial Training: A Splitting Technique To Overcome Catastrophic Overfitting |
2326 | Poster | A high-quality robust diffusion framework for corrupted dataset |
2327 | Poster | Similarity of Neural Architectures using Adversarial Attack Transferability |
2328 | Poster | Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data |
2329 | Poster | Resilience of Entropy Model in Distributed Neural Networks |
2330 | Poster | WBP: Training-time Backdoor Attacks through Hardware-based Weight Bit Poisoning |
2331 | Poster | Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models |
2332 | Oral | Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models |
2333 | Poster | Flatness-aware Sequential Learning Generates Resilient Backdoors |
2334 | Oral | Flatness-aware Sequential Learning Generates Resilient Backdoors |
2335 | Poster | Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks |
2336 | Oral | Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks |
2337 | Poster | Adversarial Robustification via Text-to-Image Diffusion Models |
2338 | Oral | Adversarial Robustification via Text-to-Image Diffusion Models |
2339 | Poster | Privacy-Preserving Adaptive Re-Identification without Image Transfer |
2340 | Oral | Privacy-Preserving Adaptive Re-Identification without Image Transfer |
2341 | Poster | R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model |
2342 | Oral | R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model |
2343 | Poster | Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models |
2344 | Oral | Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models |
2345 | Poster | A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks |
2346 | Oral | A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks |
2347 | Poster | Spline-based Transformers |
2348 | Oral | Spline-based Transformers |
2349 | Poster | Anytime Continual Learning for Open Vocabulary Classification |
2350 | Oral | Anytime Continual Learning for Open Vocabulary Classification |
2351 | Poster | Weighted Ensemble Models Are Strong Continual Learners |
2352 | Oral | Weighted Ensemble Models Are Strong Continual Learners |
2353 | Poster | COD: Learning Conditional Invariant Representation for Domain Adaptation Regression |
2354 | Oral | COD: Learning Conditional Invariant Representation for Domain Adaptation Regression |
2355 | Poster | On the Topology Awareness and Generalization Performance of Graph Neural Networks |
2356 | Oral | On the Topology Awareness and Generalization Performance of Graph Neural Networks |
2357 | Poster | Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning |
2358 | Oral | Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning |
2359 | Poster | Model Stock: All we need is just a few fine-tuned models |
2360 | Oral | Model Stock: All we need is just a few fine-tuned models |
2361 | Poster | A Direct Approach to Viewing Graph Solvability |
2362 | Oral | A Direct Approach to Viewing Graph Solvability |
2363 | Poster | ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems |
2364 | Oral | ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems |
2365 | Poster | A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures |
2366 | Oral | A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures |
2367 | Poster | Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering |
2368 | Oral | Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering |
2369 | Poster | Shape from Heat Conduction |
2370 | Oral | Shape from Heat Conduction |
2371 | Poster | Rasterized Edge Gradients: Handling Discontinuities Differentially |
2372 | Oral | Rasterized Edge Gradients: Handling Discontinuities Differentially |
2373 | Poster | Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation |
2374 | Oral | Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation |
2375 | Poster | HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution |
2376 | Oral | HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution |
2377 | Poster | S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis |
2378 | Poster | Loc3Diff: Local Diffusion for 3D Human Head Synthesis and Editing |
2379 | Poster | PAV: Personalized Head Avatar from Unstructured Video Collection |
2380 | Poster | Instant 3D Human Avatar Generation using Image Diffusion Models |
2381 | Poster | Expressive Whole-Body 3D Gaussian Avatar |
2382 | Poster | High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering |
2383 | Poster | Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement |
2384 | Poster | Image Demoireing in RAW and sRGB Domains |
2385 | Poster | Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures |
2386 | Poster | Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3D Computational Periscopy |
2387 | Poster | Single-Mask Inpainting for Voxel-based Neural Radiance Fields |
2388 | Poster | IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination |
2389 | Poster | DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly |
2390 | Poster | NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis |
2391 | Poster | CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering |
2392 | Poster | 2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric Distortion Correction |
2393 | Poster | Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction |
2394 | Poster | Deep Polarization Cues for Single-shot Shape and Subsurface Scattering Estimation |
2395 | Poster | High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs |
2396 | Poster | Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction |
2397 | Poster | MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views |
2398 | Poster | Dual-Camera Smooth Zoom on Mobile Phones |
2399 | Poster | 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model |
2400 | Poster | SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM |
2401 | Poster | Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing |
2402 | Poster | Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians |
2403 | Poster | CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization |
2404 | Poster | Segmentation-guided Layer-wise Image Vectorization with Gradient Fills |
2405 | Poster | EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control |
2406 | Poster | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views |
2407 | Poster | GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation |
2408 | Poster | GenRC: Generative 3D Room Completion from Sparse Image Collections |
2409 | Poster | Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval |
2410 | Poster | Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees |
2411 | Oral | Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees |
2412 | Poster | DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing |
2413 | Poster | Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting |
2414 | Poster | GVGEN: Text-to-3D Generation with Volumetric Representation |
2415 | Poster | VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation |
2416 | Poster | DreamReward: Aligning Human Preference in Text-to-3D Generation |
2417 | Poster | SemanticHuman-HD: High Resolution Semantic disentangled 3D Human Generation |
2418 | Poster | Disentangled Clothed Avatar Generation from Text Descriptions |
2419 | Poster | StructLDM: Structured Latent Diffusion for 3D Human Generation |
2420 | Poster | High-Fidelity Modeling of Generalizable Wrinkle Deformation |
2421 | Poster | ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild |
2422 | Poster | Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos |
2423 | Poster | Physics-Based Interaction with 3D Objects via Video Generation |
2424 | Oral | Physics-Based Interaction with 3D Objects via Video Generation |
2425 | Poster | Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder |
2426 | Poster | Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors |
2427 | Poster | Self-supervised Shape Completion via Involution and Implicit Correspondences |
2428 | Poster | Self-Training Room Layout via Geometry-aware Ray-casting |
2429 | Poster | DiffCD: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting |
2430 | Poster | GaussReg: Fast 3D Registration with Gaussian Splatting |
2431 | Poster | AEDNet: Adaptive Embedding and Multiview-Aware Disentanglement for Point Cloud Completion |
2432 | Poster | PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration |
2433 | Poster | ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency |
2434 | Poster | DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding |
2435 | Poster | ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention |
2436 | Poster | SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds |
2437 | Poster | MAD-DR: Map Compression for Visual Localization with Matchness Aware Descriptor Dimension Reduction |
2438 | Poster | Tensorial template matching for fast cross-correlation with rotations and its application for tomography |
2439 | Poster | Flowed Time of Flight Radiance Fields |
2440 | Poster | Zero-Shot Image Feature Consensus with Deep Functional Maps |
2441 | Poster | RSL-BA: Rolling Shutter Line Bundle Adjustment |
2442 | Poster | How Far Can a 1-Pixel Camera Go? Solving Vision Tasks using Photoreceptors and Computationally Designed Visual Morphology |
2443 | Poster | StereoGlue: Joint Feature Matching and Robust Estimation |
2444 | Poster | Hyperion – A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM |
2445 | Poster | Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information |
2446 | Poster | MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps |
2447 | Poster | iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning |
2448 | Poster | PACE: Pose Annotations in Cluttered Environments |
2449 | Poster | Global-to-Pixel Regression for Human Mesh Recovery |
2450 | Poster | 3D Hand Pose Estimation in Everyday Egocentric Images |
2451 | Poster | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects |
2452 | Poster | AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale |
2453 | Poster | Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images |
2454 | Poster | Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation |
2455 | Poster | CliffPhys: Camera-based Respiratory Measurement using Clifford Neural Networks |
2456 | Poster | Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions |
2457 | Poster | DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation |
2458 | Poster | Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation |
2459 | Poster | Deep Patch Visual SLAM |
2460 | Poster | ConGeo: Robust Cross-view Geo-localization across Ground View Variations |
2461 | Poster | GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers |
2462 | Poster | SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection |
2463 | Poster | Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression |
2464 | Poster | Image-to-Lidar Relational Distillation for Autonomous Driving Data |
2465 | Poster | Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene |
2466 | Poster | milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing |
2467 | Poster | Hetecooper: Feature Collaboration Graph for Heterogeneous Collaborative Perception |
2468 | Poster | LetsMap: Unsupervised Representation Learning for Label-Efficient Semantic BEV Mapping |
2469 | Poster | Probabilistic Image-Driven Traffic Modeling via Remote Sensing |
2470 | Poster | Occupancy as Set of Points |
2471 | Poster | Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation |
2472 | Poster | Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction |
2473 | Poster | Online Vectorized HD Map Construction using Geometry |
2474 | Poster | OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving |
2475 | Poster | PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving |
2476 | Poster | Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation |
2477 | Poster | Learning to Drive via Asymmetric Self-Play |
2478 | Poster | Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos |
2479 | Poster | I Can't Believe It's Not Scene Flow! |
2480 | Poster | Motion and Structure from Event-based Normal Flow |
2481 | Poster | Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection |
2482 | Poster | Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation |
2483 | Poster | UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation |
2484 | Poster | IAM-VFI : Interpolate Any Motion for Video Frame Interpolation with motion complexity map |
2485 | Poster | Human Motion Forecasting in Dynamic Domain Shifts: A Homeostatic Continual Test-time Adaptation Framework |
2486 | Poster | How Video Meetings Change Your Expression |
2487 | Poster | DIM: Dyadic Interaction Modeling for Social Behavior Generation |
2488 | Poster | Length-Aware Motion Synthesis via Latent Diffusion |
2489 | Poster | Towards Open Domain Text-Driven Synthesis of Multi-Person Motions |
2490 | Poster | FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis |
2491 | Poster | Spherical World-Locking for Audio-Visual Localization in Egocentric Videos |
2492 | Poster | Explorative Inbetweening of Time and Space |
2493 | Poster | TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models |
2494 | Poster | WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models |
2495 | Poster | Pix2Gif: Motion-Guided Diffusion for GIF Generation |
2496 | Poster | Factorizing Text-to-Video Generation by Explicit Image Conditioning |
2497 | Poster | DNI: Dilutional Noise Initialization for Diffusion Video Editing |
2498 | Poster | DATENeRF: Depth-Aware Text-based Editing of NeRFs |
2499 | Poster | FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models |
2500 | Poster | Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models |
2501 | Poster | Using My Artistic Style? You Must Obtain My Authorization |
2502 | Poster | Learned Image Enhancement via Color Naming |
2503 | Poster | Region-Native Visual Tokenization |
2504 | Poster | Improving image synthesis with diffusion-negative sampling |
2505 | Poster | ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images |
2506 | Poster | SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions |
2507 | Poster | PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion |
2508 | Poster | Visual Text Generation in the Wild |
2509 | Poster | ReCON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories |
2510 | Poster | Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation |
2511 | Poster | TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models |
2512 | Poster | Navigating Text-to-Image Generative Bias across Indic Languages |
2513 | Poster | Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning |
2514 | Poster | MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization |
2515 | Poster | Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion |
2516 | Poster | LCM-Lookahead for Encoder-based Text-to-Image Personalization |
2517 | Poster | Robust-Wide: Robust Watermarking against Instruction-driven Image Editing |
2518 | Poster | COIN-Matting: Confounder Intervention for Image Matting |
2519 | Poster | Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images |
2520 | Poster | ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion |
2521 | Poster | Data Augmentation via Latent Diffusion for Saliency Prediction |
2522 | Poster | Score Distillation Sampling with Learned Manifold Corrective |
2523 | Poster | Thinking Outside the BBox: Unconstrained Generative Object Compositing |
2524 | Poster | Learning Quantized Adaptive Conditions for Diffusion Models |
2525 | Poster | FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models |
2526 | Poster | ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback |
2527 | Poster | Lossy Image Compression with Foundation Diffusion Models |
2528 | Poster | AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion |
2529 | Poster | QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images |
2530 | Poster | MetaWeather: Few-Shot Weather-Degraded Image Restoration |
2531 | Poster | Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization |
2532 | Poster | Spatially-Variant Degradation Model for Dataset-free Super-resolution |
2533 | Poster | Towards Architecture-Agnostic Untrained Networks Priors for Image Reconstruction with Frequency Regularization |
2534 | Poster | Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution |
2535 | Poster | Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution |
2536 | Poster | Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids |
2537 | Poster | Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context |
2538 | Poster | denoiSplit: a method for joint microscopy image splitting and unsupervised denoising |
2539 | Poster | Region-Aware Sequence-to-Sequence Learning for Hyperspectral Denoising |
2540 | Poster | CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems |
2541 | Poster | Plug-and-Play Learned Proximal Trajectory for 3D Sparse-View X-Ray Computed Tomography |
2542 | Poster | Unsupervised Multi-modal Medical Image Registration via Invertible Translation |
2543 | Poster | Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations |
2544 | Poster | ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization |
2545 | Poster | Spiking Wavelet Transformer |
2546 | Poster | Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model |
2547 | Poster | Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection |
2548 | Poster | CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching |
2549 | Poster | Noise-assisted Prompt Learning for Image Forgery Detection and Localization |
2550 | Poster | TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing |
2551 | Poster | Towards Certifiably Robust Face Recognition |
2552 | Poster | Oulu Remote-photoplethysmography Physical Domain Attacks Database (ORPDAD) |
2553 | Poster | Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement |
2554 | Poster | Affine steerers for structured keypoint description |
2555 | Poster | A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation |
2556 | Poster | You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception |
2557 | Poster | TAPTR: Tracking Any Point with Transformers as Detection |
2558 | Poster | SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow |
2559 | Poster | Towards Physical World Backdoor Attacks against Skeleton Action Recognition |
2560 | Poster | MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion |
2561 | Poster | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph |
2562 | Poster | DyFADet: Dynamic Feature Aggregation for Temporal Action Detection |
2563 | Poster | Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization |
2564 | Poster | Two-Stage Active Learning for Efficient Temporal Action Segmentation |
2565 | Poster | MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos |
2566 | Poster | PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation |
2567 | Poster | VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network |
2568 | Poster | PALM: Predicting Actions through Language Models |
2569 | Poster | ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video |
2570 | Poster | Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data |
2571 | Oral | Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data |
2572 | Poster | VideoMamba: Spatio-Temporal Selective State Space Model |
2573 | Poster | Text-Guided Video Masked Autoencoder |
2574 | Poster | Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation |
2575 | Poster | VISA: Reasoning Video Object Segmentation via Large Language Model |
2576 | Poster | LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models |
2577 | Poster | BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos |
2578 | Poster | COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset a Vision-Language Benchmark |
2579 | Poster | Audio-visual Generalized Zero-shot Learning the Easy Way |
2580 | Poster | Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time |
2581 | Poster | SignGen: End-to-End Sign Language Video Generation with Latent Diffusion |
2582 | Poster | TrajPrompt: Aligning Color Trajectory with Vision-Language Representations |
2583 | Poster | Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification |
2584 | Poster | OmniSat: Self-Supervised Modality Fusion for Earth Observation |
2585 | Poster | Statewide Visual Geolocalization in the Wild |
2586 | Poster | Pre-trained Visual Dynamics Representations for Efficient Policy Learning |
2587 | Poster | Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving |
2588 | Poster | Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts |
2589 | Poster | ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments |
2590 | Poster | LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents |
2591 | Poster | R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations |
2592 | Poster | Agent3D-Zero: An Agent for Zero-shot 3D Understanding |
2593 | Poster | PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts |
2594 | Poster | An Explainable Vision Question Answer Model via Diffusion Chain-of-Thought |
2595 | Poster | Fully Authentic Visual Question Answering Dataset from Online Communities |
2596 | Poster | SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant |
2597 | Poster | Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning |
2598 | Poster | BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models |
2599 | Poster | Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs |
2600 | Poster | TrojVLM: Backdoor Attack Against Vision Language Models |
2601 | Poster | Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks |
2602 | Oral | Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks |
2603 | Poster | Attention Prompting on Image for Large Vision-Language Models |
2604 | Poster | LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model |
2605 | Poster | Generalizing to Unseen Domains via Text-guided Augmentation |
2606 | Poster | MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation |
2607 | Poster | TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes |
2608 | Poster | Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval |
2609 | Poster | Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits |
2610 | Poster | Prompting Language-Informed Distribution for Compositional Zero-Shot Learning |
2611 | Poster | Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following |
2612 | Poster | FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance |
2613 | Poster | Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild |
2614 | Oral | Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild |
2615 | Poster | Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection |
2616 | Poster | T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy |
2617 | Poster | Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation |
2618 | Poster | OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection |
2619 | Poster | O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation |
2620 | Poster | APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension |
2621 | Poster | GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method |
2622 | Poster | MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders |
2623 | Poster | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation |
2624 | Poster | MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment |
2625 | Poster | Think before Placement: Common Sense Enhanced Transformer for Object Placement |
2626 | Poster | Eliminating Feature Ambiguity for Few-Shot Segmentation |
2627 | Poster | Diffusion-Guided Weakly Supervised Semantic Segmentation |
2628 | Poster | Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs |
2629 | Poster | Better Call SAL: Towards Learning to Segment Anything in Lidar |
2630 | Poster | MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation |
2631 | Poster | DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation |
2632 | Poster | Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation |
2633 | Poster | Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models |
2634 | Poster | ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition |
2635 | Poster | Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts |
2636 | Poster | EcoMatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching |
2637 | Poster | Class-Agnostic Object Counting with Text-to-Image Diffusion Model |
2638 | Poster | Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector |
2639 | Poster | Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection |
2640 | Poster | Plain-Det: A Plain Multi-Dataset Object Detector |
2641 | Poster | Multi-scale Cross Distillation for Object Detection in Aerial Images |
2642 | Poster | PDT Uav Target Detection Dataset for Pests and Diseases Tree |
2643 | Poster | Region-Adaptive Transform with Segmentation Prior for Image Compression |
2644 | Poster | FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification |
2645 | Poster | CC-SAM: Enhancing SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation |
2646 | Poster | Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model |
2647 | Poster | DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification |
2648 | Poster | Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network |
2649 | Poster | MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks |
2650 | Poster | An Incremental Unified Framework for Small Defect Inspection |
2651 | Poster | Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection |
2652 | Poster | GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features |
2653 | Poster | MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection |
2654 | Poster | PQ-SAM: Post-training Quantization for Segment Anything Model |
2655 | Poster | BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation |
2656 | Poster | ELSE: Efficient Deep Neural Network Inference through Line-based Sparsity Exploration |
2657 | Poster | FairViT: Fair Vision Transformer via Adaptive Masking |
2658 | Poster | LPViT: Low-Power Semi-structured Pruning for Vision Transformers |
2659 | Poster | PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference |
2660 | Poster | CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs |
2661 | Poster | Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach |
2662 | Poster | Characterizing Model Robustness via Natural Input Gradients |
2663 | Poster | Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning |
2664 | Poster | FreeAugment: Data Augmentation Search Across All Degrees of Freedom |
2665 | Poster | Towards Multi-modal Transformers in Federated Learning |
2666 | Poster | Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception |
2667 | Poster | GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning |
2668 | Poster | Soft Prompt Generation for Domain Generalization |
2669 | Poster | SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision |
2670 | Poster | Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery |
2671 | Poster | Deep Online Probability Aggregation Clustering |
2672 | Poster | Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection |
2673 | Poster | An accurate detection is not all you need to combat label noise in web-noisy datasets |
2674 | Poster | Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration |
2675 | Poster | ExMatch: Self-guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples |
2676 | Poster | Dynamic Data Selection for Efficient SSL via Coarse-to-Fine Refinement |
2677 | Poster | SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery |
2678 | Poster | Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection |
2679 | Poster | Integrating Markov Blanket Discovery into Causal Representation Learning for Domain Generalization |
2680 | Poster | Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence |
2681 | Poster | On the Approximation Risk of Few-Shot Class-Incremental Learning |
2682 | Poster | STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay |
2683 | Poster | RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning |
2684 | Poster | CLEO: Continual Learning of Evolving Ontologies |
2685 | Poster | Learning Representation for Multitask Learning through Self-Supervised Auxiliary Learning |
2686 | Poster | Improving Knowledge Distillation via Regularizing Feature Direction and Norm |
2687 | Oral | Improving Knowledge Distillation via Regularizing Feature Direction and Norm |
2688 | Poster | MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution |
2689 | Poster | Federated Learning with Local Openset Noisy Labels |
2690 | Poster | Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents |
2691 | Poster | FedHARM: Harmonizing Model Architectural Diversity in Federated Learning |
2692 | Poster | Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks |
2693 | Poster | Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks |
2694 | Poster | Shedding More Light on Robust Classifiers under the lens of Energy-based Models |
2695 | Poster | Inter-Class Topology Alignment for Efficient Black-Box Substitute Attacks |
2696 | Poster | AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models |
2697 | Poster | FedHide: Federated Learning by Hiding in the Neighbors |
2698 | Poster | SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks |
2699 | Poster | Data Poisoning Quantization Backdoor Attack |
2700 | Poster | Event Trojan: Asynchronous Event-based Backdoor Attacks |
2701 | Keynote | Is distribution shift still an AI problem? |
2702 | Keynote | Fair, transparent, and accountable AI: What is legally required, what is ethically desired, and what is technically feasible? |
2703 | Keynote | Synthesia: From computer vision research to real-world AI avatars |
2704 | Oral Session | Oral 6A: Generative Models Ii |