Skip to content

We used a web scraper to obtain all the papers from ECCV that have not yet been officially announced, making them available for those who need to read the latest papers.

License

Notifications You must be signed in to change notification settings

wangjiyuan9/ECCV2024-Full-PaperList

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

ECCV2024-PaperList

If you find this helpful, we would appreciate a star! Note: Oral papers may appear twice.

ID Type Title
1 Workshop Recovering 6D Object Pose
2 Workshop Half-century of Structure-from-Motion (50SfM)
3 Workshop Dense Neural SLAM Workshop (NeuSLAM)
4 Workshop Geometry in the Large Model Era
5 Workshop Workshop on Spatial AI
6 Workshop Transparent & Reflective objects In the wild Challenges (TRICKY)
7 Workshop Wild3D: 3D Modeling, Reconstruction, and Generation in the Wild
8 Workshop AI3DCC: The Second Workshop of AI for 3D Content Creation
9 Workshop 3D Vision and Modeling Challenges in eCommerce
10 Workshop FashionAI: Exploring the intersection of Fashion and Artificial Intelligence for reshaping the Industry
11 Workshop CV For Ecology Workshop (CV4E)
12 Workshop 9th Workshop on Computer Vision in Plant Phenotyping and Agriculture (CVPPA)
13 Workshop 3rd edition of Computer Vision for Metaverse (CV4Metaverse)
14 Workshop The First Workshop on: Computer Vision for Videogames (CV2)
15 Workshop 2nd Workshop on Vision-based Industrial Inspection (VISION)
16 Workshop AI for Visual Arts Workshop and Challenges (AI4VA)
17 Workshop Vision for Art (VISART) VII Workshop
18 Workshop AI4DH: Artificial Intelligence for Digital Humanities
19 Workshop The Third ROAD Workshop & Challenge: Event Detection for Situation Awareness in Autonomous Driving
20 Workshop Vision-Centric Autonomous Driving (VCAD) Workshop
21 Workshop ROAM: Robust, Out-of-Distribution And Multi-Modal models for Autonomous Driving
22 Workshop ACVR2024 - 12th International Workshop on Assistive Computer Vision and Robotics
23 Workshop Autonomous Vehicles meet Multimodal Foundation Models
24 Workshop Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving: Towards Next-Generation Solutions
25 Workshop Multi-Agent Autonomous Systems Meet Foundation Models: Challenges and Futures
26 Workshop Visual object tracking and segmentation challenge VOTS2024 workshop
27 Workshop 5th Advances in Image Manipulation (AIM) Workshop and Challenges
28 Workshop Instance-Level Recognition
29 Workshop Large-scale Video Object Segmentation
30 Workshop The Second Perception Test Challenge
31 Workshop Efficient Deep Learning for Foundation Models
32 Workshop Computational Aspects of Deep Learning
33 Workshop Foundation Models for 3D Humans
34 Workshop Workshop on Artificial Social Intelligence
35 Workshop T-CAP - Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications
36 Workshop Observing and Understanding Hands in Action
37 Workshop 7th Workshop and Competition on Affective Behavior Analysis in-the-wild
38 Workshop The First Workshop on Expressive Encounters: Co-speech gestures across cultures in the wild
39 Workshop BioImage Computing (BIC)
40 Workshop Human-inspired Computer Vision
41 Workshop Knowledge in Generative Models
42 Workshop Self-Supervised Learning - What is next?
43 Workshop Traditional Computer Vision in the Age of Deep Learning (TradiCV)
44 Workshop Uncertainty Quantification for Computer Vision
45 Workshop Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo)
46 Workshop Beyond Euclidean: Hyperbolic and Hyperspherical Learning for Computer Vision
47 Workshop Workshop on Unlearning and Model Editing (U&ME'24)
48 Workshop The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models
49 Workshop Workshop on Visual Concepts
50 Workshop Sometimes Less is More: The First Dataset Distillation Challenge
51 Workshop 2nd Workshop on Quantum Computer Vision and Machine Learning (QCVML)
52 Workshop 2nd Workshop on More Exploration, Less Exploitation (MELEX)
53 Workshop Synthetic Data for Computer Vision
54 Workshop International Challenge on Compositional and Multimodal Perception
55 Workshop AVGenL: Audio-Visual Generation and Learning
57 Workshop Multimodal Agents Workshop
58 Workshop 2nd OmniLabel Workshop: Enabling Complex Perception Through Vision and Language Foundational Models
59 Workshop The Dark Side of Generative AIs and Beyond
61 Workshop FOundation models Creators meet USers (FOCUS)
62 Workshop Fairness and ethics towards transparent AI: facing the chalLEnge through model Debiasing (FAILED)
63 Workshop Explainable AI for Computer Vision: Where Are We and Where Are We Going?
64 Workshop TWYN: Trust What You learN. 1st Workshop on Trustworthiness in Computer Vision
65 Workshop Women in Computer Vision
66 Workshop 2nd International Workshop on Privacy-Preserving Computer Vision
67 Workshop Critical Evaluation of Generative Models and their Impact on Society
68 Workshop xAI4Biometrics at ECCV 2024 - 4th Workshop on Explainable & Interpretable Artificial Intelligence for Biometrics
69 Workshop Workshop on Green Foundation Models
70 Workshop Scalable 3D Scene Generation and 3D Geometric Scene Understanding
71 Workshop OpenSUN3D: 3rd Workshop on Open-Vocabulary 3D Scene Understanding
72 Workshop Map-free Visual Relocalization
73 Workshop Workshop on Neuromorphic Vision (NeVi): Advantages and Applications of Event Cameras
74 Workshop 1st Workshop on Neural Fields Beyond Conventional Cameras
75 Workshop GigaVision: When Gigapixel Videography Meets Computer Vision
76 Workshop Eyes of the Future: Integrating Computer Vision in Smart Eyewear
77 Tutorial Large Multimodal Foundation Models
78 Tutorial A Bayesian Odyssey in Uncertainty: from Theoretical Foundations to Real-World Applications
79 Tutorial Third Hands-on Egocentric Research Tutorial with Project Aria, from Meta
80 Tutorial Emerging Trends in Disentanglement and Compositionality
81 Tutorial Efficient Text-to-Image and Text-to-3D modeling
82 Tutorial Responsibly Building Generative Models
83 Tutorial Recent Advances in Video Content Understanding and Generation
84 Tutorial Time is precious: Self-Supervised Learning Beyond Images
85 Tutorial Inside Plato's door: a tour in Multi-view Geometry
86 Poster Session Poster Session 1
87 Oral Session Oral 1A: Scene Analysis And Understanding
88 Oral Session Oral 1B: Autonomous Driving
89 Oral Session Oral 1C: Low-Level Vision And Imaging
90 Poster Session Poster Session 2
91 Oral Session Oral 2A: Generative Models I
92 Oral Session Oral 2B: Recognition
93 Oral Session Oral 2C: Multi-View And Visual Odometry
94 Poster Session Poster Session 3
95 Oral Session Oral 3A: Datasets And Benchmarking
96 Oral Session Oral 3B: Medical And Biological Imaging
97 Oral Session Oral 3C: Point Clouds
98 Poster Session Poster Session 4
99 Oral Session Oral 4A: Neural 3D Rendering
100 Oral Session Oral 4B: Video Generation / Editing / Prediction
101 Oral Session Oral 4C: Humans: Biometrics, Pose And Motion
102 Poster Session Poster Session 5
103 Oral Session Oral 5A: Segmentation
104 Oral Session Oral 5B: Vision Applications
105 Oral Session Oral 5C: Representation Learning
106 Poster Session Poster Session 6
107 Oral Session Oral 6A: Generative Models II
108 Oral Session Oral 6B: Video Understanding
109 Oral Session Oral 6C: Vision And Other Modalities
110 Poster Session Poster Session 7
111 Oral Session Oral 7A: Learning Architectures, Transfer, Continual And Long-Tail
112 Oral Session Oral 7B: Adversarial Learning And Privacy
113 Oral Session Oral 7C: Optimization And Theory
114 Poster Bi-directional Contextual Attention for 3D Dense Captioning
115 Oral Bi-directional Contextual Attention for 3D Dense Captioning
116 Poster Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
117 Oral Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
118 Poster ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
119 Oral ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
120 Poster Towards Scene Graph Anticipation
121 Oral Towards Scene Graph Anticipation
122 Poster OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
123 Oral OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
124 Poster PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
125 Oral PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
126 Poster H-V2X: A Large Scale Highway Dataset for BEV Perception
127 Oral H-V2X: A Large Scale Highway Dataset for BEV Perception
128 Poster RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
129 Oral RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
130 Poster DriveLM: Driving with Graph Visual Question Answering
131 Oral DriveLM: Driving with Graph Visual Question Answering
132 Poster Making Large Language Models Better Planners with Reasoning-Decision Alignment
133 Oral Making Large Language Models Better Planners with Reasoning-Decision Alignment
134 Poster M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
135 Oral M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
136 Poster MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
137 Oral MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
138 Poster Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
139 Oral Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
140 Poster A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
141 Oral A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
142 Poster Photon Inhibition for Energy-Efficient Single-Photon Imaging
143 Oral Photon Inhibition for Energy-Efficient Single-Photon Imaging
144 Poster Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
145 Oral Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
146 Poster Minimalist Vision with Freeform Pixels
147 Oral Minimalist Vision with Freeform Pixels
148 Poster SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
149 Oral SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
150 Poster Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
151 Oral Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
152 Poster OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
153 Oral OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
154 Poster UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
155 Poster Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
156 Poster HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
157 Poster MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space
158 Poster Personalized Video Relighting With an At-Home Light Stage
159 Poster Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations
160 Poster Panel-Specific Degradation Representation for Raw Under-Display Camera Image Restoration
161 Poster HoloADMM: High-Quality Holographic Complex Field Recovery
162 Poster Flying with Photons: Rendering Novel Views of Propagating Light
163 Oral Flying with Photons: Rendering Novel Views of Propagating Light
164 Poster Efficient Depth-Guided Urban View Synthesis
165 Poster Ray-Distance Volume Rendering for Neural Scene Reconstruction
166 Poster Taming Latent Diffusion Model for Neural Radiance Field Inpainting
167 Poster Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors
168 Poster GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer
169 Poster MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References
170 Poster UNIKD: UNcertainty-Filtered Incremental Knowledge Distillation for Neural Implicit Representation
171 Poster Rethinking Directional Parameterization in Neural Implicit Surface Reconstruction
172 Poster Sur^2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images
173 Poster Differentiable Convex Polyhedra Optimization from Multi-view Images
174 Poster Combining Generative and Geometry Priors for Wide-Angle Portrait Correction
175 Poster I2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM
176 Poster Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops
177 Poster MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
178 Poster CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
179 Poster GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
180 Poster FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
181 Poster PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis
182 Poster MegaScenes: Scene-Level View Synthesis at Scale
183 Poster HiFi-123: Towards High-fidelity One Image to 3D Content Generation
184 Poster View-Consistent 3D Editing with Gaussian Splatting
185 Poster Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
186 Poster Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis
187 Poster 3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views
188 Poster Nuvo: Neural UV Mapping for Unruly 3D Representations
189 Poster Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
190 Poster BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
191 Poster A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control
192 Poster DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
193 Poster TPA3D: Triplane Attention for Fast Text-to-3D Generation
194 Poster DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement
195 Poster WordRobe: Text-Guided Generation of Textured 3D Garments
196 Poster AnyHome: Open-Vocabulary Large-Scale Indoor Scene Generation with First-Person View Exploration
197 Poster HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
198 Poster SENC: Handling Self-collision in Neural Cloth Simulation
199 Poster AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
200 Poster SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
201 Poster Diffusion Models as Data Mining Tools
202 Poster ReMatching: Low-Resolution Representations for Scalable Shape Correspondence
203 Poster PolyRoom: Room-aware Transformer for Floorplan Reconstruction
204 Poster WindPoly: Polygonal Mesh Reconstruction via Winding Numbers
205 Poster Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack
206 Poster Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
207 Poster Diffusion Bridges for 3D Point Cloud Denoising
208 Poster Towards a Density Preserving Objective Function for Learning on Point Sets
209 Poster Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach
210 Poster T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
211 Poster Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer
212 Poster DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields
213 Poster Computing the Lipschitz constant needed for fast scene recovery from CASSI measurements
214 Poster Regularizing Dynamic Radiance Fields with Kinematic Fields
215 Poster GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation
216 Poster iMatching: Imperative Correspondence Learning
217 Poster Fundamental Matrix Estimation Using Relative Depths
218 Poster Track Everything Everywhere Fast and Robustly
219 Poster Learning to Make Keypoints Sub-Pixel Accurate
220 Poster Shape-guided Configuration-aware Learning for Endoscopic-image-based Pose Estimation of Flexible Robotic Instruments
221 Poster FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
222 Poster Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
223 Poster Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation
224 Poster Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images
225 Poster GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
226 Poster D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction
227 Poster Event-based Head Pose Estimation: Benchmark and Method
228 Poster Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer
229 Poster RAW-Adapter: Adapting Pretrained Visual Model to Camera RAW Images
230 Poster Easing 3D Pattern Reasoning with Side-view Features for Semantic Scene Completion
231 Poster Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
232 Poster GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth
233 Poster Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry
234 Poster Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network
235 Poster CountFormer: Multi-View Crowd Counting Transformer
236 Poster When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
237 Poster MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
238 Poster 4D Contrastive Superflows are Dense 3D Representation Learners
239 Poster TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection
240 Poster CARB-Net: Camera-Assisted Radar-Based Network for Vulnerable Road User Detection
241 Poster SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
242 Poster RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
243 Poster TrafficNight : An Aerial Multimodal Benchmark For Nighttime Vehicle Surveillance
244 Poster RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
245 Poster Monocular Occupancy Prediction for Scalable Indoor Scenes
246 Poster nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
247 Poster Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
248 Oral Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
249 Poster CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting
250 Poster Neural Volumetric World Models for Autonomous Driving
251 Poster Progressive Pretext Task Learning for Human Trajectory Prediction
252 Poster Risk-Aware Self-Consistent Imitation Learning for Trajectory Planning in Autonomous Driving
253 Poster Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries
254 Poster Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practice
255 Poster TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
256 Poster Temporally Consistent Stereo Matching
257 Poster Retrieval Robust to Object Motion Blur
258 Poster Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions
259 Poster CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring
260 Poster Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework
261 Poster Diffusion Reward: Learning Rewards via Conditional Video Diffusion
262 Poster HUMOS: Human Motion Model Conditioned on Body Shape
263 Poster PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture
264 Poster Large Motion Model for Unified Multi-Modal Motion Generation
265 Poster Realistic Human Motion Generation with Cross-Diffusion Models
266 Poster Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions
267 Poster Generating Human Interaction Motions in Scenes with Text Control
268 Poster Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
269 Poster Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
270 Poster PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
271 Poster MoVideo: Motion-Aware Video Generation with Diffusion Models
272 Poster FreeInit: Bridging Initialization Gap in Video Diffusion Models
273 Poster DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
274 Poster Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
275 Poster ReNoise: Real Image Inversion Through Iterative Noising
276 Poster Elegantly Written: Disentangling Writer and Character Styles for Enhancing Online Chinese Handwriting
277 Poster One-Shot Diffusion Mimicker for Handwritten Text Generation
278 Poster Investigating Style Similarity in Diffusion Models
279 Poster DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
280 Poster PartCraft: Crafting Creative Objects by Parts
281 Poster DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators
282 Poster WAS: Dataset and Methods for Artistic Text Segmentation
283 Poster GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
284 Poster PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
285 Poster HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
286 Poster Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
287 Poster Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
288 Poster Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
289 Poster Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
290 Poster Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
291 Poster Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
292 Poster DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks
293 Poster Do text-free diffusion models learn discriminative visual representations?
294 Poster DataDream: Few-shot Guided Dataset Generation
295 Poster DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation
296 Poster ZeST: Zero-Shot Material Transfer from a Single Image
297 Poster FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
298 Poster Learning Equilibrium Transformation for Gamut Expansion and Color Restoration
299 Poster Timestep-Aware Correction for Quantized Diffusion Models
300 Poster Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer.
301 Poster Energy-Clibrated VAE with Test Time Free Lunch
302 Poster Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
303 Poster Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline
304 Poster Asymmetric Mask Scheme for Self-Supervised Real Image Denoising
305 Poster GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
306 Poster Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution
307 Poster A New Dataset and Framework for Real-World Blurred Images Super-Resolution
308 Poster Blind image deblurring with noise-robust kernel estimation
309 Poster SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
310 Poster MambaIR: A Simple Baseline for Image Restoration with State-Space Model
311 Poster BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering
312 Poster Towards Robust Full Low-bit Quantization of Super Resolution Networks
313 Poster Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network
314 Poster SAH-SCI: Self-Supervised Adapter for Efficient Hyperspectral Snapshot Compressive Imaging
315 Poster Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling
316 Poster DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays
317 Poster BaSIC: BayesNet Structure Learning for Computational Scalable Neural Image Compression
318 Poster SNeRV: Spectra-preserving Neural Representation for Video
319 Poster Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
320 Poster Multiscale Graph Texture Network
321 Poster DetailSemNet: Elevating Signature Verification through Detail-Semantic Integration
322 Poster Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors
323 Poster Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
324 Poster AdversariaLeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems
325 Poster Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference
326 Poster NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
327 Poster Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning
328 Poster Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps
329 Poster SemTrack: A Large-scale Dataset for Semantic Tracking in the Wild
330 Poster DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
331 Poster CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
332 Poster Towards More Practical Group Activity Detection: A New Benchmark and Model
333 Poster Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
334 Poster Online Temporal Action Localization with Memory-Augmented Transformer
335 Poster EgoLifter: Open-world 3D Segmentation for Egocentric Perception
336 Poster MeshSegmenter: Zero-Shot Mesh Segmentation via Texture Synthesis
337 Poster Spatial-Temporal Multi-level Association for Video Object Segmentation
338 Poster Gated Temporal Diffusion for Stochastic Long-term Dense Anticipation
339 Poster ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
340 Poster Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
341 Poster VideoMamba: State Space Model for Efficient Video Understanding
342 Poster Text-Conditioned Resampler For Long Form Video Understanding
343 Poster SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
344 Poster Vamos: Versatile Action Models for Video Understanding
345 Poster Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
346 Poster Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
347 Poster Multi-Sentence Grounding for Long-term Instructional Video
348 Poster CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
349 Poster CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
350 Poster SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
351 Poster CityGuessr: City-Level Video Geo-Localization on a Global Scale
352 Poster WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification
353 Poster Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery
354 Poster AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization
355 Poster LingoQA: Video Question Answering for Autonomous Driving
356 Poster Dolphins: Multimodal Language Model for Driving
357 Poster PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
358 Poster LLM as Copilot for Coarse-grained Vision-and-Language Navigation
359 Poster Visual Grounding for Object-Level Generalization in Reinforcement Learning
360 Poster m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
361 Poster Recursive Visual Programming
362 Poster Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
363 Poster Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
364 Poster HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
365 Poster REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
366 Poster ViG-Bias: Visually Grounded Bias Discovery and Mitigation
367 Poster GENIXER: Empowering Multimodal Large Language Models as a Powerful Data Generator
368 Poster Adversarial Prompt Tuning for Vision-Language Models
369 Poster MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
370 Poster Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
371 Poster FlexAttention for Efficient High-Resolution Vision-Language Models
372 Poster VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
373 Poster Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
374 Poster Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
375 Poster BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
376 Poster Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
377 Poster CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
378 Poster Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
379 Poster GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
380 Oral GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
381 Poster Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery
382 Poster Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
383 Poster Trackastra: Transformer-based cell tracking for live-cell microscopy
384 Poster Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking
385 Poster Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Object Appearance Graphs
386 Poster E3V-K5: An Authentic Benchmark for Redefining Video-Based Energy Expenditure Estimation
387 Poster Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
388 Poster Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion
389 Poster Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective
390 Poster Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization
391 Poster Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
392 Poster Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation
393 Poster X-Pose: Detecting Any Keypoints
394 Poster Open-Set Recognition in the Age of Vision-Language Models
395 Poster Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
396 Poster A Fair Ranking and New Model for Panoptic Scene Graph Generation
397 Oral A Fair Ranking and New Model for Panoptic Scene Graph Generation
398 Poster Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
399 Poster Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
400 Poster A Simple Background Augmentation Method for Object Detection with Diffusion Model
401 Poster OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
402 Poster Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
403 Poster Agent Attention: On the Integration of Softmax and Linear Attention
404 Poster WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
405 Poster Agglomerative Token Clustering
406 Poster Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
407 Poster 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
408 Poster SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images
409 Poster Open-Vocabulary RGB-Thermal Semantic Segmentation
410 Poster PartSTAD: 2D-to-3D Part Segmentation Task Adaptation
411 Poster Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
412 Poster FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions
413 Poster Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation
414 Poster Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
415 Poster Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off
416 Poster Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation
417 Poster Self-supervised co-salient object detection via feature correspondences at multiple scales
418 Poster Unsupervised Dense Prediction using Differentiable Normalized Cuts
419 Poster Robust Zero-Shot Crowd Counting and Localization with Adaptive Resolution SAM
420 Poster Bayesian Detector Combination for Object Detection with Crowdsourced Annotations
421 Poster Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection
422 Poster Bucketed Ranking-based Losses for Efficient Training of Object Detectors
423 Poster Better Regression Makes Better Test-time Adaptive 3D Object Detection
424 Poster MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection
425 Poster IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
426 Poster Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency
427 Poster The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation
428 Poster A Rotation-invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound Images
429 Poster Multistain Pretraining for Slide Representation Learning in Pathology
430 Poster Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data
431 Poster HERGen: Elevating Radiology Report Generation with Longitudinal Data
432 Poster Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics
433 Poster Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis
434 Poster AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
435 Poster Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection
436 Poster A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
437 Poster FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion
438 Poster Quantization-Friendly Winograd Transformations for Convolutional Neural Networks
439 Poster YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
440 Poster Stripe Observation Guided Inference Cost-free Attention Mechanism
441 Poster NOVUM: Neural Object Volumes for Robust Object Classification
442 Poster POA: Pre-training Once for Models of All Sizes
443 Poster Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
444 Poster Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization
445 Poster Fisher Calibration for Backdoor-Robust Heterogeneous Federated Learning
446 Poster MultiDelete for Multimodal Machine Unlearning
447 Poster Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
448 Poster Multi-Label Cluster Discrimination for Visual Representation Learning
449 Poster Robustness Preserving Fine-tuning using Neuron Importance
450 Poster Online Zero-Shot Classification with CLIP
451 Poster Understanding Multi-compositional learning in Vision and Language models via Category Theory
452 Poster This Probably Looks Exactly Like That: An Invertible Prototypical Network
453 Poster Rethinking Unsupervised Outlier Detection via Multiple Thresholding
454 Poster Learning Non-Linear Invariants for Unsupervised Out-of-Distribution Detection
455 Poster Multimodal Label Relevance Ranking via Reinforcement Learning
456 Poster Confidence Self-Calibration for Multi-Label Class-Incremental Learning
457 Poster MTaDCS: Moving Trace and Feature Density-based Confidence Sample Selection under Label Noise
458 Poster Bidirectional Uncertainty-Based Active Learning for Open-Set Annotation
459 Poster Online Continuous Generalized Category Discovery
460 Poster Open-set Domain Adaptation via Joint Error based Multi-class Positive and Unlabeled Learning
461 Poster UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
462 Poster Rethinking Few-shot Class-incremental Learning: Learning from Yourself
463 Poster Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning
464 Poster Semantic Residual Prompts for Continual Learning
465 Poster Encapsulating Knowledge in One Prompt
466 Poster Representation Enhancement-Stabilization: Reducing Bias-Variance of Domain Generalization
467 Poster Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
468 Poster PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
469 Poster Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation
470 Poster Dataset Distillation by Automatic Training Trajectories
471 Poster Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction
472 Poster Graph Neural Network Causal Explanation via Neural Causal Models
473 Poster Optimization-based Uncertainty Attribution Via Learning Informative Perturbations
474 Poster Generalizable Symbolic Optimizer Learning
475 Poster CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
476 Poster Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation
477 Poster Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense
478 Poster SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning
479 Poster Zero-Shot Detection of AI-Generated Images
480 Oral Zero-Shot Detection of AI-Generated Images
481 Poster MobileNetV4: Universal Models for the Mobile Ecosystem
482 Oral MobileNetV4: Universal Models for the Mobile Ecosystem
483 Poster Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
484 Oral Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
485 Poster Adaptive Parametric Activation
486 Oral Adaptive Parametric Activation
487 Poster CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
488 Oral CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
489 Poster Dataset Enhancement with Instance-Level Augmentations
490 Oral Dataset Enhancement with Instance-Level Augmentations
491 Poster Efficient Bias Mitigation Without Privileged Information
492 Oral Efficient Bias Mitigation Without Privileged Information
493 Poster On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
494 Oral On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
495 Poster Momentum Auxiliary Network for Supervised Local Learning
496 Oral Momentum Auxiliary Network for Supervised Local Learning
497 Poster From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
498 Oral From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
499 Poster Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation
500 Oral Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation
501 Poster Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
502 Oral Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
503 Poster ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images
504 Oral ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images
505 Poster ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
506 Oral ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
507 Poster COMO: Compact Mapping and Odometry
508 Oral COMO: Compact Mapping and Odometry
509 Poster Camera Calibration using a Collimator System
510 Oral Camera Calibration using a Collimator System
511 Poster Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection
512 Oral Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection
513 Poster Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition
514 Oral Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition
515 Poster SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
516 Oral SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
517 Poster Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss
518 Oral Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss
519 Poster Six-Point Method for Multi-Camera Systems with Reduced Solution Space
520 Oral Six-Point Method for Multi-Camera Systems with Reduced Solution Space
521 Poster Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
522 Oral Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
523 Poster Grounding Image Matching in 3D with MASt3R
524 Oral Grounding Image Matching in 3D with MASt3R
525 Poster EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
526 Oral EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
527 Poster TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
528 Oral TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
529 Poster Accelerating Image Generation with Sub-path Linear Approximation Model
530 Oral Accelerating Image Generation with Sub-path Linear Approximation Model
531 Poster SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
532 Oral SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
533 Poster Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
534 Oral Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
535 Poster LLMGA: Multimodal Large Language Model based Generation Assistant
536 Oral LLMGA: Multimodal Large Language Model based Generation Assistant
537 Poster FlashTex: Fast Relightable Mesh Texturing with LightControlNet
538 Oral FlashTex: Fast Relightable Mesh Texturing with LightControlNet
539 Poster Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
540 Oral Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
541 Poster TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
542 Oral TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
543 Poster EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
544 Poster EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
545 Poster 3D Gaussian Parametric Head Model
546 Poster Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
547 Poster RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
548 Poster PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
549 Poster COMPOSE: Comprehensive Portrait Shadow Editing
550 Poster GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
551 Poster Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
552 Poster Holodepth: Programmable Depth-Varying Projection via Computer-Generated Holography
553 Poster BeNeRF:Neural Radiance Fields from a Single Blurry Image and Event Stream
554 Poster VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
555 Poster G3R: Gradient Guided Generalizable Reconstruction
556 Poster Efficient NeRF Optimization - Not All Samples Remain Equally Hard
557 Poster BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
558 Poster SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields
559 Poster RS-NeRF: Neural Radiance Fields from Rolling Shutter Images
560 Poster Geometry Fidelity for Spherical Images
561 Poster CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance
562 Poster MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering
563 Poster Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
564 Poster GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time
565 Poster Neural graphics texture compression supporting random access
566 Poster GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
567 Poster A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
568 Poster Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
569 Poster McGrids: Monte Carlo-Driven Adaptive Grids for Iso-Surface Extraction
570 Poster latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
571 Poster Non-parametric Sensor Noise Modeling and Synthesis
572 Poster UpFusion: Novel View Diffusion from Unposed Sparse View Observations
573 Poster MVDD: Multi-View Depth Diffusion Models
574 Poster LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
575 Oral LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
576 Poster Hypernetworks for Generalizable BRDF Representation
577 Poster High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding
578 Poster Structured-NeRF: Hierarchical Scene Graph with Neural Representation
579 Poster 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing
580 Poster Free-Editor: Zero-shot Text-driven 3D Scene Editing
581 Poster Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing
582 Poster VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing
583 Poster UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
584 Poster ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation
585 Poster DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
586 Poster SceneTeller: Language-to-3D Scene Generation
587 Poster Text to Layer-wise 3D Clothed Human Generation
588 Poster ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model
589 Poster D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On
590 Poster Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence
591 Poster Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
592 Poster Temporal Residual Jacobians for Rig-free Motion Transfer
593 Poster PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation
594 Poster GroundUp: Rapid Sketch-Based 3D City Massing
595 Poster DiscoMatch: Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching
596 Poster FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation
597 Poster PointNeRF++: A multi-scale, point-based Neural Radiance Field
598 Poster Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis
599 Poster UMERegRobust – Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
600 Poster FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation
601 Poster Learning to Adapt SAM for Segmenting Cross-domain Point Clouds
602 Poster Osmosis: RGBD Diffusion Prior for Underwater Image Restoration
603 Poster Differentiable Product Quantization for Memory Efficient Camera Relocalization
604 Poster RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields
605 Poster Light-in-Flight for a World-in-Motion
606 Poster Binomial Self-compensation for Motion Error in Dynamic 3D Scanning
607 Poster Non-Line-of-Sight Estimation of Fast Human Motion with Slow Scanning Imagers
608 Poster Synchronization of Projective Transformations
609 Poster Semicalibrated Relative Pose from an Affine Correspondence and Monodepth
610 Poster GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring
611 Poster LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
612 Poster SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
613 Poster Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences
614 Poster U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation
615 Poster EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation
616 Poster Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
617 Poster Cut out the Middleman: Revisiting Pose-based Gait Recognition
618 Poster Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
619 Poster EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
620 Poster 3D Hand Sequence Recovery from Real Blurry Images and Event Stream
621 Poster Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
622 Poster Learning Cross-hand Policies of High-DOF Reaching and Grasping
623 Poster Free-Viewpoint Video of Outdoor Sports Using a Drone
624 Poster Unsupervised Exposure Correction
625 Poster Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training
626 Poster Deep Cost Ray Fusion for Sparse Depth Video Completion
627 Poster PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation
628 Poster Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor
629 Poster UniCal: Unified Neural Sensor Calibration
630 Poster Multi-modal Crowd Counting via a Broker Modality
631 Poster OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
632 Poster FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection
633 Poster MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain
634 Poster SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data
635 Poster UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
636 Poster DeTra: A Unified Model for Object Detection and Trajectory Forecasting
637 Poster RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
638 Poster Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
639 Poster PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
640 Poster Sparse Refinement for Efficient High-Resolution Semantic Segmentation
641 Poster InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
642 Poster PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
643 Poster Unified Local-Cloud Decision-Making via Reinforcement Learning
644 Poster Generative End-to-End Autonomous Driving
645 Poster MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction
646 Poster Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
647 Poster LayeredFlow: A Real-World Benchmark for Non-Lambertian Multi-Layer Optical Flow
648 Poster Decomposition Betters Tracking Everything Everywhere
649 Poster Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching
650 Poster Efficient Learning of Event-based Dense Representation using Hierarchical Memories with Adaptive Update
651 Poster Towards Real-world Event-guided Low-light Video Enhancement and Deblurring
652 Poster Understanding Physical Dynamics with Counterfactual World Modeling
653 Poster Prompting Future Driven Diffusion Model for Hand Motion Prediction
654 Poster Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild
655 Poster Motion Mamba: Efficient and Long Sequence Motion Generation
656 Poster TLControl: Trajectory and Language Control for Human Motion Synthesis
657 Poster ParCo: Part-Coordinating Text-to-Motion Synthesis
658 Poster BAMM: Bidirectional Autoregressive Motion Model
659 Poster Pose Guided Fine-Grained Sign Language Video Generation
660 Poster DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion
661 Poster Animate Your Motion: Turning Still Images into Dynamic Videos
662 Poster V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation
663 Poster DragVideo: Interactive Drag-style Video Editing
664 Poster StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
665 Poster MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
666 Poster FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
667 Poster Lazy Diffusion Transformer for Interactive Image Editing
668 Poster WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians
669 Poster Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis
670 Poster Commonly Interesting Images
671 Poster InstructGIE: Towards Generalizable Image Editing
672 Poster The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
673 Poster CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models
674 Poster Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
675 Poster Improving Text-guided Object Inpainting with Semantic Pre-inpainting
676 Poster Customized Generation Reimagined: Fidelity and Editability Harmonized
677 Poster ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
678 Poster ViPer: Visual Personalization of Generative Models via Individual Preference Learning
679 Poster MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
680 Poster MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation
681 Poster Towards Reliable Advertising Image Generation Using Human Feedback
682 Poster IMMA: Immunizing text-to-image Models against Malicious Adaptation
683 Poster PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
684 Poster AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes
685 Poster UniProcessor: A Text-induced Unified Low-level Image Processor
686 Poster Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
687 Poster EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models
688 Poster Assessing Sample Quality via the Latent Space of Generative Models
689 Poster Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
690 Poster SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
691 Poster Efficient Training with Denoised Neural Weights
692 Poster FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
693 Poster A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
694 Poster Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
695 Poster DSMix: Distortion-Induced Saliency Map Based Pre-training for No-Reference Image Quality Assessment
696 Poster DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior
697 Poster Restoring Images in Adverse Weather Conditions via Histogram Transformer
698 Poster You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
699 Poster Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
700 Poster Efficient Cascaded Multiscale Adaptive Network for Image Restoration
701 Poster Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
702 Poster Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors
703 Poster Taming Lookup Tables for Efficient Image Retouching
704 Poster Quanta Video Restoration
705 Poster Two-Stage Video Shadow Detection via Temporal-Spatial Adaption
706 Poster Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework
707 Poster Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging
708 Poster NePhi: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
709 Poster Neural Metamorphosis
710 Poster Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
711 Poster EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks
712 Poster LaWa: Using Latent Space for In-Generation Image Watermarking
713 Poster PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments
714 Poster Delving into Adversarial Robustness on Document Tampering Localization
715 Poster Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
716 Poster Forbes: Face Obfuscation Rendering via Backpropagation Refinement Scheme
717 Poster Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment
718 Poster Generalizable Facial Expression Recognition
719 Poster Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding
720 Poster MinD-3D: Reconstruct High-quality 3D objects in Human Brain
721 Poster Pathformer3D: A 3D Scanpath Transformer for 360° Images
722 Poster Eliminating Warping Shakes for Unsupervised Online Video Stitching
723 Poster OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
724 Poster Semantically Guided Representation Learning For Action Anticipation
725 Poster SIGMA: Sinkhorn-Guided Masked Video Modeling
726 Poster Rethinking Image-to-Video Adaptation: An Object-centric Perspective
727 Poster RICA^2: Rubric-Informed, Calibrated Assessment of Actions
728 Poster VideoStudio: Generating Consistent-Content and Multi-Scene Videos
729 Poster Training-free Video Temporal Grounding using Large-scale Pre-trained Models
730 Poster EA-VTR: Event-Aware Video-Text Retrieval
731 Poster Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
732 Poster FunQA: Towards Surprising Video Comprehension
733 Poster Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
734 Poster Efficient Pre-training for Localized Instruction Generation of Procedural Videos
735 Poster Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
736 Poster Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
737 Poster Visual Alignment Pre-training for Sign Language Translation
738 Poster Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
739 Poster Spectral Subsurface Scattering for Material Classification
740 Poster MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
741 Poster MeshVPR: Citywide Visual Place Recognition Using 3D Meshes
742 Poster Frontier-enhanced Topological Memory with Improved Exploration Awareness for Embodied Visual Navigation
743 Poster Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
744 Poster Controllable Navigation Instruction Generation with Chain of Thought Prompting
745 Poster NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
746 Poster Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
747 Poster INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding
748 Poster SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
749 Poster Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
750 Poster Quality Assured: Rethinking Annotation Strategies in Imaging AI
751 Poster BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
752 Poster Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
753 Poster A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
754 Poster Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
755 Poster DEAL: Disentangle and Localize Concept-level Explanations for VLMs
756 Poster Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
757 Poster FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
758 Poster Instruction Tuning-free Visual Token Complement for Multimodal LLMs
759 Poster IVTP: Instruction-guided Visual Token Pruning for Large Vision-Language Models
760 Poster LookupViT: Compressing visual information to a limited number of tokens
761 Poster SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
762 Poster Integration of Global and Local Representations for Fine-grained Cross-modal Alignment
763 Poster Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation
764 Poster MyVLM: Personalizing VLMs for User-Specific Queries
765 Poster ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
766 Poster View Selection for 3D Captioning via Diffusion Ranking
767 Poster GRiT: A Generative Region-to-text Transformer for Object Understanding
768 Poster FreestyleRet: Retrieving Images from Style-Diversified Queries
769 Poster LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation
770 Poster OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction
771 Poster Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
772 Poster TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection
773 Poster Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
774 Poster Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-diversified Documents
775 Poster Region-centric Image-Language Pretraining for Open-Vocabulary Detection
776 Poster Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
777 Poster Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
778 Poster Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
779 Poster Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
780 Poster SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
781 Poster PSALM: Pixelwise Segmentation with Large Multi-modal Model
782 Poster Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
783 Poster OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
784 Poster On the Viability of Monocular Depth Pre-training for Semantic Segmentation
785 Poster Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework
786 Poster Open-Vocabulary Camouflaged Object Segmentation
787 Poster From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
788 Poster 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
789 Poster Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation
790 Poster Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation
791 Poster Mitigating Background Shift in Class-Incremental Semantic Segmentation
792 Poster LASS3D: Language-Assisted Semi-Supervised 3D Semantic Segmentation with Progressive Unreliable Data Exploitation
793 Poster Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance
794 Poster Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation
795 Poster Zero-shot Object Counting with Good Exemplars
796 Poster SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection
797 Poster Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation
798 Poster MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection
799 Poster AugDETR: Improving Multi-scale Learning for Detection Transformer
800 Poster Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter
801 Poster DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
802 Poster PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
803 Poster ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
804 Poster Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification
805 Poster GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation
806 Poster R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
807 Poster Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
808 Poster Continuous Memory Representation for Anomaly Detection
809 Poster Learning Anomalies with Normality Prior for Unsupervised Video Anomaly Detection
810 Poster Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data
811 Poster Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector
812 Poster Fairness-aware Vision Transformer via Debiased Self-Attention
813 Poster AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
814 Poster LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
815 Poster Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time
816 Poster Modality Translation for Object Detection Adaptation without forgetting prior knowledge
817 Poster Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
818 Poster Scaling Backwards: Minimal Synthetic Pre-training?
819 Poster EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification
820 Poster Training-Free Model Merging for Multi-target Domain Adaptation
821 Poster CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
822 Poster Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
823 Poster Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
824 Poster Semantic-guided Robustness Tuning for Few-Shot Transfer Across Extreme Domain Shift
825 Poster Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts
826 Poster Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models
827 Poster FlowCon: Out-of-Distribution Detection using Flow-based Contrastive Learning
828 Poster PixOOD: Pixel-Level Out-of-Distribution Detection
829 Poster Distributionally Robust Loss for Long-Tailed Multi-Label Image Classification
830 Poster Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data
831 Poster GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
832 Poster Generalized Coverage for More Robust Low-Budget Active Learning
833 Poster Robust Nearest Neighbors for Source-Free Domain Adaptation under Class Distribution Shift
834 Poster Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery
835 Poster CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning
836 Poster Disentangling Masked Autoencoders for Unsupervised Domain Generalization
837 Poster Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
838 Poster Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
839 Poster Information Bottleneck Based Data Correction in Continual Learning
840 Poster Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning
841 Poster Markov Knowledge Distillation: Make Nasty Teachers trained by Self-undermining Knowledge Distillation Fully Distillable
842 Poster FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
843 Poster SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks
844 Poster SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference
845 Poster Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap
846 Poster Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset
847 Poster Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
848 Poster Catastrophic Overfitting: A Potential Blessing in Disguise
849 Poster Cocktail Universal Adversarial Attack on Deep Neural Networks
850 Poster Unveiling Privacy Risks in Stochastic Neural Networks Training: Effective Image Reconstruction from Gradients
851 Poster Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias
852 Poster CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
853 Poster Parrot Captions Teach CLIP to Spot Text
854 Oral Parrot Captions Teach CLIP to Spot Text
855 Poster Towards Model-Agnostic Dataset Condensation by Heterogeneous Models
856 Oral Towards Model-Agnostic Dataset Condensation by Heterogeneous Models
857 Poster VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking
858 Oral VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking
859 Poster Insect Identification in the Wild: The AMI Dataset
860 Oral Insect Identification in the Wild: The AMI Dataset
861 Poster Towards Open-ended Visual Quality Comparison
862 Oral Towards Open-ended Visual Quality Comparison
863 Poster UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
864 Oral UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
865 Poster MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description
866 Oral MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description
867 Poster Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
868 Oral Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
869 Poster Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View
870 Oral Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View
871 Poster Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
872 Oral Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
873 Poster SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images
874 Oral SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images
875 Poster CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
876 Oral CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
877 Poster PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
878 Oral PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
879 Poster PointLLM: Empowering Large Language Models to Understand Point Clouds
880 Oral PointLLM: Empowering Large Language Models to Understand Point Clouds
881 Poster HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
882 Oral HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
883 Poster Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
884 Oral Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
885 Poster RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation
886 Oral RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation
887 Poster RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
888 Oral RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
889 Poster KeypointDETR: An End-to-End 3D Keypoint Detector
890 Oral KeypointDETR: An End-to-End 3D Keypoint Detector
891 Poster All You Need is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation
892 Poster TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
893 Poster HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
894 Poster Stable Video Portraits
895 Poster iHuman: Instant Animatable Digital Humans From Monocular Videos
896 Poster POCA: Post-training Quantization with Temporal Alignment for Codec Avatars
897 Poster Towards Image Ambient Lighting Normalization
898 Poster LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
899 Poster Efficient Snapshot Spectral Imaging: Calibration-Free Parallel Structure with Aperture Diffraction Fusion
900 Poster Physically Plausible Color Correction for Neural Radiance Fields
901 Poster DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images
902 Poster Volumetric Rendering with Baked Quadrature Fields
903 Poster Depth-guided NeRF Training via Earth Mover’s Distance
904 Poster RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
905 Poster Deblurring 3D Gaussian Splatting
906 Poster Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in Optimization
907 Poster TriNeRFLet: A Wavelet Based Triplane NeRF Representation
908 Poster LaRa: Efficient Large-Baseline Radiance Fields
909 Poster RANRAC: Robust Neural Scene Representations via Random Ray Consensus
910 Poster SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization
911 Poster Learning Representations from Foundation Models for Domain Generalized Stereo Matching
912 Poster CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization
913 Poster CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
914 Poster SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
915 Poster On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy
916 Poster Revising Densification in Gaussian Splatting
917 Poster MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation
918 Poster Topology-Preserving Downsampling of Binary Images
919 Poster Zero-Shot Multi-Object Scene Completion
920 Poster PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
921 Poster VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
922 Poster Analysis-by-Synthesis Transformer for Single-View 3D Reconstruction
923 Poster Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping
924 Poster COSMU: Complete 3D human shape from monocular unconstrained images
925 Poster MeshFeat: Multi-Resolution Features for Neural Fields on Meshes
926 Poster Real-time 3D-aware Portrait Editing from a Single Image
927 Poster An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes
928 Poster RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
929 Poster Scene-Conditional 3D Object Stylization and Composition
930 Poster DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
931 Poster BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
932 Poster Chains of Diffusion Models
933 Poster NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
934 Poster Learning Neural Deformation Representation for 4D Dynamic Shape Generation
935 Poster Improving Diffusion Models for Authentic Virtual Try-on in the Wild
936 Poster Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation
937 Poster GIVT: Generative Infinite-Vocabulary Transformers
938 Poster Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
939 Poster LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
940 Poster ZigMa: A DiT-style Zigzag Mamba Diffusion Model
941 Poster Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems
942 Poster Neural Surface Detection for Unsigned Distance Fields
943 Poster VF-NeRF: Viewshed Fields for Rigid NeRF Registration
944 Poster Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration
945 Oral Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration
946 Poster Transferable 3D Adversarial Shape Completion using Diffusion Models
947 Poster Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation
948 Poster PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training
949 Poster Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds
950 Poster Domain Generalization of 3D Object Detection by Density-Resampling
951 Poster Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds
952 Poster Physics-informed Knowledge Transfer for Underwater Monocular Depth Estimation
953 Poster Improving 2D Feature Representations by 3D-Aware Fine-Tuning
954 Poster SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images
955 Poster 3D Congealing: 3D-Aware Image Alignment in the Wild
956 Poster Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
957 Poster Revisiting Calibration of Wide-Angle Radially Symmetric Cameras
958 Poster RGBD GS-ICP SLAM
959 Poster FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
960 Poster GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence
961 Poster Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
962 Poster Rotated Orthographic Projection for Self-Supervised 3D Human Pose Estimation
963 Poster Diffusion Model is a Good Pose Estimator from 3D RF-Vision
964 Poster Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding
965 Poster Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image
966 Poster 3D Reconstruction of Objects in Hands without Real World 3D Supervision
967 Poster Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance
968 Poster MANIKIN: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation
969 Poster Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection
970 Poster GraspXL: Generating Grasping Motions for Diverse Objects at Scale
971 Poster HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos
972 Poster Object-Aware NIR-to-Visible Translation
973 Poster SEDiff: Structure Extraction for Domain Adaptive Depth Estimation via Denoising Diffusion Models
974 Poster Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
975 Poster Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation
976 Poster Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
977 Poster DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
978 Oral DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
979 Poster Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
980 Poster DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception
981 Poster LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
982 Poster Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection
983 Poster RecurrentBEV: A Long-term Temporal Fusion Framework for Multi-view 3D Detection
984 Poster JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention
985 Poster MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception
986 Poster UAV First-Person Viewers Are Radiance Field Learners
987 Poster Caltech Aerial RGB-Thermal Dataset in the Wild
988 Poster V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
989 Poster CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction
990 Poster Revisit Human-Scene Interaction via Space Occupancy
991 Poster Enhancing Vectorized Map Perception with Historical Rasterized Maps
992 Poster RoadPainter: Points Are Ideal Navigators for Topology transformER
993 Poster VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
994 Poster DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
995 Poster SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic
996 Poster Self-Supervised Video Desmoking for Laparoscopic Surgery
997 Oral Self-Supervised Video Desmoking for Laparoscopic Surgery
998 Poster BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events
999 Poster LiDAR-Event Stereo Fusion with Hallucinations
1000 Poster Temporal-Mapping Photography for Event Cameras
1001 Poster Motion Aware Event Representation-driven Image Deblurring
1002 Poster Event-Based Motion Magnification
1003 Poster TimeLens-XL: Real-time Event-based Video Frame Interpolation with Large Motion
1004 Poster Bidirectional Progressive Transformer for Interaction Intention Anticipation
1005 Poster Reinforcement Learning via Auxillary Task Distillation
1006 Poster COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation
1007 Poster EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation
1008 Poster MotionChain: Conversational Motion Controllers via Multimodal Prompts
1009 Poster M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
1010 Poster SMooDi: Stylized Motion Diffusion Model
1011 Poster IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
1012 Poster PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
1013 Poster SAVE: Protagonist Diversification with Structure Agnostic Video Editing
1014 Poster Kinetic Typography Diffusion Model
1015 Poster DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
1016 Poster StableDrag: Stable Dragging for Point-based Image Editing
1017 Poster Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
1018 Poster Curved Diffusion: A Generative Model With Optical Geometry Control
1019 Poster Tuning-Free Image Customization with Image and Text Guidance
1020 Poster StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
1021 Poster AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling
1022 Poster DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
1023 Poster TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
1024 Poster Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
1025 Poster AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
1026 Poster The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
1027 Poster DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution
1028 Poster MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
1029 Poster ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion
1030 Poster PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
1031 Poster Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
1032 Poster Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
1033 Poster Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
1034 Poster Distilling Diffusion Models into Conditional GANs
1035 Poster Responsible Visual Editing
1036 Poster HiEI: A Universal Framework for Generating High-quality Emerging Images from Natural Images
1037 Poster MagicEraser: Erasing Any Objects via Semantics-Aware Control
1038 Poster GenQ: Quantization in Low Data Regimes with Generative Synthetic Data
1039 Poster DiffiT: Diffusion Vision Transformers for Image Generation
1040 Poster DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
1041 Poster ∞-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
1042 Poster Unmasking Bias in Diffusion Model Training
1043 Poster Compensation Sampling for Improved Convergence in Diffusion Models
1044 Poster Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks
1045 Poster Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint
1046 Poster Dual-Rain: Video Rain Removal using Assertive and Gentle Teachers
1047 Poster A Comparative Study of Image Restoration Networks for General Backbone Network Design
1048 Poster OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal
1049 Poster Domain-adaptive Video Deblurring via Test-time Blurring
1050 Poster Kernel Diffusion: An Alternate Approach to Blind Deconvolution
1051 Poster Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models
1052 Poster Kalman-Inspired Feature Propagation for Video Face Super-Resolution
1053 Poster RealViformer: Investigating Attention for Real-World Video Super-Resolution
1054 Poster Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence
1055 Poster Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems
1056 Poster Task-Driven Uncertainty Quantification in Inverse Problems via Conformal Prediction
1057 Poster Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction
1058 Oral Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction
1059 Poster Wavelet Convolutions for Large Receptive Fields
1060 Poster Long-term Temporal Context Gathering for Neural Video Compression
1061 Poster Implicit Neural Models to Extract Heart Rate from Video
1062 Poster A Watermark-Conditioned Diffusion Model for IP Protection
1063 Poster Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
1064 Poster Image Manipulation Detection With Implicit Neural Representation and Limited Supervision
1065 Poster DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks
1066 Poster Learning Natural Consistency Representation for Face Forgery Video Detection
1067 Poster ARoFace: Alignment Robustness to Improve Low-quality Face Recognition
1068 Poster AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
1069 Poster PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
1070 Oral PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
1071 Poster Enhancing Cross-Subject fMRI-to-Video Decoding with Global-Local Functional Alignment
1072 Poster Occlusion-Aware Seamless Segmentation
1073 Poster Keypoint Promptable Re-Identification
1074 Poster CoTracker: It is Better to Track Together
1075 Poster Free Lunch for Gait Recognition: A Novel Relation Descriptor
1076 Poster S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition
1077 Poster SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
1078 Poster Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition
1079 Poster Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
1080 Poster Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment
1081 Poster Look Around and Learn: Self-Training Object Detection by Exploration
1082 Poster Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition
1083 Poster Self-Supervised Video Copy Localization with Regional Token Representation
1084 Poster General and Task-Oriented Video Segmentation
1085 Poster Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
1086 Poster Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
1087 Poster RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
1088 Poster Referring Atomic Video Action Recognition
1089 Poster Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs
1090 Poster VideoAgent: Long-form Video Understanding with Large Language Model as Agent
1091 Poster VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
1092 Poster AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
1093 Poster Learning Video Context as Interleaved Multimodal Sequences
1094 Poster Multi-Modal Video Dialog State Tracking in the Wild
1095 Poster Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
1096 Poster Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
1097 Poster Rethinking Normalization Layers for Domain Generalizable Person Re-identification
1098 Poster Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken
1099 Poster Learning Representations of Satellite Images From Metadata Supervision
1100 Poster Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring
1101 Poster Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
1102 Poster AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
1103 Poster QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
1104 Poster Navigation Instruction Generation with BEV Perception and Large Language Models
1105 Poster V-IRL: Grounding Virtual Intelligence in Real Life
1106 Poster M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions
1107 Poster OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
1108 Poster Unifying 3D Vision-Language Understanding via Promptable Queries
1109 Poster UMBRAE: Unified Multimodal Brain Decoding
1110 Poster BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
1111 Poster CoReS: Orchestrating the Dance of Reasoning and Segmentation
1112 Poster A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
1113 Poster Grounding Language Models for Visual Entity Recognition
1114 Poster Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
1115 Poster The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
1116 Poster AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting
1117 Poster UniCode : Learning a Unified Codebook for Multimodal Large Language Models
1118 Poster X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
1119 Poster EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
1120 Poster EDformer: Transformer-Based Event Denoising Across Varied Noise Levels
1121 Poster Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
1122 Poster The Hard Positive Truth about Vision-Language Compositionality
1123 Poster HiFi-Score: Fine-grained Image Description Evaluation with Hierarchical Parsing Graphs
1124 Poster LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
1125 Poster Language-Image Pre-training with Long Captions
1126 Poster IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
1127 Poster CIC-BART-SSA: : Controllable Image Captioning with Structured Semantic Augmentation
1128 Poster Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective
1129 Poster Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
1130 Poster Cascade Prompt Learning for Visual-Language Model Adaptation
1131 Poster Gaze Target Detection Based on Head-Local-Global Coordination
1132 Poster Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model
1133 Poster ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
1134 Poster Towards Open-Ended Visual Recognition with Large Language Models
1135 Poster AFreeCA: Annotation-Free Counting for All
1136 Poster OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
1137 Poster MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
1138 Poster Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
1139 Poster SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
1140 Poster Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining
1141 Poster ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
1142 Poster Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
1143 Poster DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation
1144 Poster N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
1145 Poster Prioritized Semantic Learning for Zero-shot Instance Navigation
1146 Poster PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model
1147 Poster SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
1148 Poster Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation
1149 Poster ProMerge: Prompt and Merge for Unsupervised Instance Segmentation
1150 Poster Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
1151 Poster Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation
1152 Poster Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond
1153 Poster UniFS: Universal Few-shot Instance Perception with Point Representations
1154 Poster Crowd-SAM:SAM as a smart annotator for object detection in crowded scenes
1155 Poster Adaptive Multi-task Learning for Few-shot Object Detection
1156 Poster FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
1157 Poster Distilling Knowledge from Large-Scale Image Models for Object Detection
1158 Poster Revisiting Domain-Adaptive Object Detection in Adverse Weather by the Generation and Composition of High-Quality Pseudo-Labels
1159 Poster Operational Open-Set Recognition and PostMax Refinement
1160 Poster InfMAE: A Foundation Model in The Infrared Modality
1161 Poster AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
1162 Poster Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction
1163 Poster Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
1164 Poster Snuffy: Efficient Whole Slide Image Classifier
1165 Poster Unified Medical Image Pre-training in Language-Guided Common Semantic Space
1166 Poster Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
1167 Poster TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data
1168 Poster TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
1169 Poster VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
1170 Poster Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
1171 Poster Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection
1172 Poster Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision
1173 Poster SAIR: Learning Semantic-aware Implicit Representation
1174 Poster Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
1175 Poster Learning with Unmasked Tokens Drives Stronger Vision Learners
1176 Poster Emerging Property of Masked Token for Effective Pre-training
1177 Poster Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding
1178 Poster The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
1179 Poster SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
1180 Poster Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
1181 Poster FYI: Flip Your Images for Dataset Distillation
1182 Poster Data-to-Model Distillation: Data-Efficient Learning Framework
1183 Poster Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection
1184 Poster Active Generation for Image Classification
1185 Poster Contrastive Learning with Synthetic Positives
1186 Poster Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
1187 Poster Robust Calibration of Large Vision-Language Adapters
1188 Poster Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
1189 Poster FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning
1190 Poster Benchmarking Spurious Bias in Few-Shot Image Classifiers
1191 Poster An Information Theoretical View for Out-Of-Distribution Detection
1192 Poster ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection
1193 Poster Adapting to Shifting Correlations with Unlabeled Data Calibration
1194 Poster Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels
1195 Poster On Pretraining Data Diversity for Self-Supervised Learning
1196 Poster De-Confusing Pseudo-Labels in Source-Free Domain Adaptation
1197 Poster Improving Unsupervised Domain Adaptation: A Pseudo-Candidate Set Approach
1198 Poster Hierarchical Unsupervised Relation Distillation for Source Free Domain Adaptation
1199 Poster Source-Free Domain-Invariant Performance Prediction
1200 Poster Learning to Complement and to Defer to Multiple Users
1201 Poster Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation
1202 Poster Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching
1203 Poster Revisiting Supervision for Continual Representation Learning
1204 Poster Deep Companion Learning: Enhancing Generalization Through Historical Consistency
1205 Poster Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy
1206 Poster Harmonizing knowledge Transfer in Neural Network with Unified Distillation
1207 Poster Feature Diversification and Adaptation for Federated Domain Generalization
1208 Poster PFedEdit: Personalized Federated Learning via Automated Model Editing
1209 Poster Enhanced Sparsification via Stimulative Training
1210 Poster Dependency-aware Differentiable Neural Architecture Search
1211 Poster Layer-Wise Relevance Propagation with Conservation Property for ResNet
1212 Poster Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
1213 Poster Training A Secure Model against Data-Free Model Extraction
1214 Poster CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks
1215 Poster Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection
1216 Poster Leveraging Imperfect Restoration for Data Availability Attack
1217 Poster Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs
1218 Poster Augmented Neural Fine-tuning for Efficient Backdoor Purification
1219 Poster MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition
1220 Oral MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition
1221 Poster RaFE: Generative Radiance Fields Restoration
1222 Oral RaFE: Generative Radiance Fields Restoration
1223 Poster Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
1224 Oral Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
1225 Poster FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information
1226 Oral FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information
1227 Poster Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
1228 Oral Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
1229 Poster RPBG: Towards Robust Neural Point-based Graphics in the Wild
1230 Oral RPBG: Towards Robust Neural Point-based Graphics in the Wild
1231 Poster MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
1232 Oral MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
1233 Poster Learning 3D-aware GANs from Unposed Images with Template Feature Field
1234 Oral Learning 3D-aware GANs from Unposed Images with Template Feature Field
1235 Poster Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
1236 Oral Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
1237 Poster Watch Your Steps: Local Image and Scene Editing by Text Instructions
1238 Oral Watch Your Steps: Local Image and Scene Editing by Text Instructions
1239 Poster Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
1240 Oral Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
1241 Poster Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction
1242 Oral Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction
1243 Poster ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
1244 Oral ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
1245 Poster DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
1246 Oral DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
1247 Poster Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
1248 Oral Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
1249 Poster ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
1250 Oral ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
1251 Poster Video Editing via Factorized Diffusion Distillation
1252 Oral Video Editing via Factorized Diffusion Distillation
1253 Poster Efficient Neural Video Representation with Temporally Coherent Modulation
1254 Oral Efficient Neural Video Representation with Temporally Coherent Modulation
1255 Poster SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
1256 Oral SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
1257 Poster LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
1258 Oral LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
1259 Poster NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction
1260 Oral NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction
1261 Poster UGG: Unified Generative Grasping
1262 Oral UGG: Unified Generative Grasping
1263 Poster LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment
1264 Oral LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment
1265 Poster Controllable Human-Object Interaction Synthesis
1266 Oral Controllable Human-Object Interaction Synthesis
1267 Poster Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
1268 Oral Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
1269 Poster Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
1270 Oral Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
1271 Poster POET: Prompt Offset Tuning for Continual Human Action Adaptation
1272 Oral POET: Prompt Offset Tuning for Continual Human Action Adaptation
1273 Poster NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
1274 Oral NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
1275 Poster AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
1276 Oral AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
1277 Poster Sapiens: Foundation for Human Vision Models
1278 Oral Sapiens: Foundation for Human Vision Models
1279 Poster KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
1280 Poster Modeling and Driving Human Body Soundfields through Acoustic Primitives
1281 Poster Let the Avatar Talk using Texts without Paired Training Data
1282 Poster CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
1283 Poster Relightable Neural Actor with Intrinsic Decomposition and Pose Control
1284 Poster 3R-INN: How to be climate friendly while consuming/delivering videos?
1285 Poster Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement
1286 Poster Intrinsic Single-Image HDR Reconstruction
1287 Poster Domain Reduction Strategy for Non-Line-of-Sight Imaging
1288 Poster Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering
1289 Poster Synthesizing Time-varying BRDFs via Latent Space
1290 Poster Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering
1291 Poster Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator
1292 Poster GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views
1293 Poster Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
1294 Poster Collaborative Control for Geometry-Conditioned PBR Image Generation
1295 Poster KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter
1296 Poster Weight Conditioning for Smooth Optimization of Neural Networks
1297 Poster URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields
1298 Poster MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo
1299 Poster TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
1300 Poster FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
1301 Poster Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
1302 Poster DoubleTake: Geometry Guided Depth Estimation
1303 Poster Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
1304 Poster SAGS: Structure-Aware 3D Gaussian Splatting
1305 Poster Compact 3D Scene Representation via Self-Organizing Gaussian Grids
1306 Poster HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
1307 Poster GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
1308 Poster Concise Plane Arrangements for Low-Poly Surface and Volume Modelling
1309 Poster Gaussian Grouping: Segment and Edit Anything in 3D Scenes
1310 Poster SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis
1311 Poster STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
1312 Poster Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting
1313 Poster GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning
1314 Poster Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
1315 Poster FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis
1316 Poster Retargeting Visual Data with Deformation Fields
1317 Poster LatentEditor: Text Driven Local Editing of 3D Scenes
1318 Poster StyleCity: Large-Scale 3D Urban Scenes Stylization
1319 Poster Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
1320 Poster Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation
1321 Poster DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
1322 Poster InterFusion: Text-Driven Generation of 3D Human-Object Interaction
1323 Poster Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
1324 Poster AWOL: Analysis WithOut synthesis using Language
1325 Poster Improving Virtual Try-On with Garment-focused Diffusion Models
1326 Poster GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns
1327 Poster Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
1328 Poster DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
1329 Poster Generating 3D House Wireframes with Semantics
1330 Poster LayoutFlow: Flow Matching for Layout Generation
1331 Poster Synchronous Diffusion for Unsupervised Smooth Non-Rigid 3D Shape Matching
1332 Poster Scalar Function Topology Divergence: Comparing Topology of 3D Objects
1333 Poster DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction
1334 Poster Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement
1335 Poster FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds
1336 Poster Frugal 3D Point Cloud Model Training via Progressive Near Point Filtering and Fused Aggregation
1337 Poster SemReg: Semantics Constrained Point Cloud Registration
1338 Poster GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding
1339 Poster Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning
1340 Poster RangeLDM: Fast Realistic LiDAR Point Cloud Generation
1341 Poster Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
1342 Poster SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
1343 Poster Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
1344 Poster Adaptive Annealing for Robust Averaging
1345 Poster Resolving Scale Ambiguity in Multi-view 3D Reconstruction using Dual-Pixel Sensors
1346 Poster Consistent 3D Line Mapping
1347 Poster Robust Incremental Structure-from-Motion with Hybrid Features
1348 Poster Gravity-aligned Rotation Averaging with Circular Regression
1349 Poster GeoCalib: Learning Single-image Calibration with Geometric Optimization
1350 Poster Real-time Holistic Robot Pose Estimation with Unknown States
1351 Poster Learning Neural Volumetric Pose Features for Camera Localization
1352 Poster LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation
1353 Poster SCAPE: A Simple and Strong Category-Agnostic Pose Estimator
1354 Poster Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation
1355 Poster UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues
1356 Poster Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses
1357 Poster MLPHand: Real Time Multi-View 3D Hand Reconstruction via MLP Modeling
1358 Poster WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation
1359 Poster RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency
1360 Poster An Economic Framework for 6-DoF Grasp Detection
1361 Poster SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
1362 Oral SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
1363 Poster FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation
1364 Poster OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations
1365 Poster ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion
1366 Poster Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
1367 Poster SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
1368 Poster Reinforcement Learning Meets Visual Odometry
1369 Poster Mahalanobis Distance-based Multi-view Optimal Transport for Multi-view Crowd Localization
1370 Poster Camera-LiDAR Cross-modality Gait Recognition
1371 Poster TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving
1372 Poster 3D Single-object Tracking in Point Clouds with High Temporal Variation
1373 Poster LISO: Lidar-only Self-Supervised 3D Object Detection
1374 Poster MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
1375 Poster IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
1376 Poster MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
1377 Poster Reliability in Semantic Segmentation: Can We Use Synthetic Data?
1378 Poster DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
1379 Poster Fully Sparse 3D Occupancy Prediction
1380 Poster EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding
1381 Poster Continuity Preserving Online CenterLine Graph Learning
1382 Poster FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving
1383 Poster Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)
1384 Poster Solving Motion Planning Tasks with a Scalable Generative Model
1385 Poster Enhanced Motion Forecasting with Visual Relation Reasoning
1386 Poster OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
1387 Poster Event-Aided Time-To-Collision Estimation for Autonomous Driving
1388 Poster Event-based Mosaicing Bundle Adjustment
1389 Poster Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations
1390 Poster AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
1391 Poster Learning-based Axial Video Motion Magnification
1392 Poster Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation
1393 Poster Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs
1394 Poster Scalable Group Choreography via Variational Phase Manifold Learning
1395 Poster FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
1396 Poster Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation
1397 Poster Drag Anything: Motion Control for Anything using Entity Representation
1398 Poster Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores
1399 Poster Audio-Synchronized Visual Animation
1400 Oral Audio-Synchronized Visual Animation
1401 Poster E.T. the Exceptional Trajectory: Text-to-camera-trajectory generation with character awareness
1402 Poster MotionDirector: Motion Customization of Text-to-Video Diffusion Models
1403 Oral MotionDirector: Motion Customization of Text-to-Video Diffusion Models
1404 Poster SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
1405 Poster Object-Centric Diffusion for Efficient Video Editing
1406 Poster GroupDiff: Diffusion-based Group Portrait Editing
1407 Poster Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
1408 Poster Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing
1409 Poster Towards compact reversible image representations for neural style transfer
1410 Poster InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
1411 Poster SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
1412 Poster When and How do negative prompts take effect?
1413 Poster SPIRE: Semantic Prompt-Driven Image Restoration
1414 Poster LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
1415 Poster UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
1416 Poster Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
1417 Poster Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
1418 Poster Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
1419 Poster Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
1420 Poster LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
1421 Poster Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
1422 Poster SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
1423 Poster EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
1424 Poster Implicit Concept Removal of Diffusion Models
1425 Poster NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
1426 Poster Global Counterfactual Directions
1427 Poster Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
1428 Poster Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization
1429 Poster AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
1430 Poster Beta-Tuned Timestep Diffusion Model
1431 Poster Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
1432 Poster Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation
1433 Poster InstructIR: High-Quality Image Restoration Following Human Instructions
1434 Poster BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
1435 Poster Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
1436 Poster OneRestore: A Universal Restoration Framework for Composite Degradation
1437 Poster UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
1438 Poster Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution
1439 Poster When Fast Fourier Transform Meets Transformer for Image Restoration
1440 Poster Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation
1441 Poster SuperGaussian: Repurposing Video Models for 3D Super Resolution
1442 Poster Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers
1443 Poster Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients
1444 Poster Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems
1445 Poster Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network
1446 Poster Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video Reconstruction
1447 Poster Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data
1448 Poster A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks
1449 Poster Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
1450 Poster Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
1451 Poster Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing
1452 Poster Real Appearance Modeling for More General Deepfake Detection
1453 Poster SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder
1454 Poster Norface: Improving Facial Expression Analysis by Identity Normalization
1455 Poster Open-Set Biometrics: Beyond Good Closed-Set Models
1456 Poster Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals
1457 Poster PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
1458 Poster Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks
1459 Poster SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
1460 Poster Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
1461 Poster VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG
1462 Poster Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation
1463 Poster Elucidating the Hierarchical Nature of Behavior with Masked Autoencoders
1464 Poster FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
1465 Poster Bayesian Evidential Deep Learning for Online Action Detection
1466 Poster Event Camera Data Dense Pre-training
1467 Poster Unsupervised Moving Object Segmentation with Atmospheric Turbulence
1468 Poster Beyond MOT: Semantic Multi-Object Tracking
1469 Poster MRSP: Learn Multi-Representations of Single Primitive for Compositional Zero-Shot Learning
1470 Poster Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition
1471 Poster Open Vocabulary Multi-Label Video Classification
1472 Poster R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
1473 Poster Leveraging temporal contextualization for video action recognition
1474 Poster VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
1475 Poster KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
1476 Poster InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
1477 Poster HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
1478 Poster Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
1479 Poster Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
1480 Poster Uncertainty-aware sign language video retrieval with probability distribution modeling
1481 Poster NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition
1482 Poster Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification
1483 Poster HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis
1484 Poster VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition
1485 Poster Embodied Understanding of Driving Scenarios
1486 Poster Octopus: Embodied Vision-Language Programmer from Environmental Feedback
1487 Poster Finding Visual Task Vectors
1488 Poster ControlLLM: Augment Language Models with Tools by Searching on Graphs
1489 Poster ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
1490 Poster Uni3DL: A Unified Model for 3D Vision-Language Understanding
1491 Poster CrossScore: A Multi-View Approach to Image Evaluation and Scoring
1492 Poster Compositional Substitutivity of Visual Reasoning for Visual Question Answering
1493 Poster The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
1494 Poster X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning
1495 Poster ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
1496 Poster Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
1497 Poster Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
1498 Poster MoAI: Mixture of All Intelligence for Large Language and Vision Models
1499 Poster Training A Small Emotional Vision Language Model for Visual Art Comprehension
1500 Poster Quantized Prompt for Efficient Generalization of Vision-Language Models
1501 Poster VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
1502 Poster Getting it Right: Improving Spatial Consistency in Text-to-Image Models
1503 Poster MultiGen: Zero-shot Image Generation from Multi-modal Prompts
1504 Poster Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
1505 Poster VeCLIP: Improving CLIP Training via Visual-enriched Captions
1506 Poster ControlCap: Controllable Region-level Captioning
1507 Poster Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models
1508 Poster Look Hear: Gaze Prediction for Speech-directed Human Attention
1509 Poster Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
1510 Poster LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
1511 Poster Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy
1512 Poster Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
1513 Poster Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation
1514 Poster Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
1515 Poster Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
1516 Poster SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding
1517 Poster LoA-Trans: Enhancing Visual Grounding by Location-Aware Transformers
1518 Poster SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
1519 Poster EAFormer: Scene Text Segmentation with Edge-Aware Transformers
1520 Poster CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
1521 Poster Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
1522 Poster Attention Decomposition for Cross-Domain Semantic Segmentation
1523 Poster SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
1524 Poster A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
1525 Poster MC-PanDA: Mask Confidence for Panoptic Domain Adaptation
1526 Poster OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing
1527 Poster Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation
1528 Poster Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation
1529 Poster Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation
1530 Poster ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation
1531 Poster On-the-fly Category Discovery for LiDAR Semantic Segmentation
1532 Poster CONDA: Condensed Deep Association Learning for Co-Salient Object Detection.
1533 Poster General Geometry-aware Weakly Supervised 3D Object Detection
1534 Poster CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
1535 Poster MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks
1536 Poster Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights
1537 Poster Rethinking Features-Fused-Pyramid-Neck for Object Detection
1538 Poster 3D Small Object Detection with Dynamic Spatial Pruning
1539 Poster Watching it in Dark: A Target-aware Representation Learning Framework for High-Level Vision Tasks in Low Illumination
1540 Poster Gradient-Aware for Class-Imbalanced Semi-supervised Medical Image Segmentation
1541 Poster Test-Time Stain Adaptation with Diffusion Models for Histopathology Image Classification
1542 Poster WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
1543 Poster ChEX: Interactive Localization and Region Description in Chest X-rays
1544 Poster A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization
1545 Poster Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection
1546 Poster Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes
1547 Poster FedVAD: Enhancing Federated Video Anomaly Detection with GPT-Driven Semantic Distillation
1548 Poster Efficient Training of Spiking Neural Networks with Multi-Parallel Implicit Stream Architecture
1549 Poster DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
1550 Poster SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
1551 Poster SeiT++: Masked Token Modeling Improves Storage-efficient Training
1552 Poster AMD: Automatic Multi-step Distillation of Large-scale Vision Models
1553 Poster Stitched ViTs are Flexible Vision Backbones
1554 Poster MetaAug: Meta-Data Augmentation for Post-Training Quantization
1555 Poster Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
1556 Poster On Learning Discriminative Features from Synthesized Data for Self-Supervised Fine-Grained Visual Recognition
1557 Poster Robust Multimodal Learning via Representation Decoupling
1558 Poster SUMix: Mixup with Semantic and Uncertain Information
1559 Poster Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
1560 Poster Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
1561 Poster SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
1562 Poster Linking in Style: Understanding learned features in deep learning models
1563 Poster Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
1564 Poster Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning
1565 Poster Strike a Balance in Continual Panoptic Segmentation
1566 Poster IGNORE: Information Gap-based False Negative Loss Rejection for Single Positive Multi-Label Learning
1567 Poster Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning
1568 Poster Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation
1569 Poster Learning to Distinguish Samples for Generalized Category Discovery
1570 Poster Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data
1571 Poster HVCLIP: High-dimensional Vector in CLIP for Unsupervised Domain Adaptation
1572 Poster DiffClass: Diffusion-Based Class Incremental Learning
1573 Poster Direct Distillation between Different Domains
1574 Poster MemBN: Robust Test-Time Adaptation via Batch Norm with Statistics Memory
1575 Poster PILoRA: Prototype Guided Incremental LoRA for Federated Class-Incremental Learning
1576 Poster PromptFusion: Decoupling Stability and Plasticity for Continual Learning
1577 Poster One-stage Prompt-based Continual Learning
1578 Poster Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images
1579 Poster Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization
1580 Poster How to Train the Teacher Model for Effective Knowledge Distillation
1581 Poster Local and Global Flatness for Federated Domain Generalization
1582 Poster Dataset Quantization with Active Learning based Adaptive Sampling
1583 Poster DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
1584 Poster Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
1585 Poster On Spectral Properties of Gradient-based Explanation Methods
1586 Poster Cross-Input Certified Training for Universal Perturbations
1587 Poster Interpretability-Guided Test-Time Adversarial Defense
1588 Poster Exploring Guided Sampling of Conditional GANs
1589 Poster Self-Supervised Representation Learning for Adversarial Attack Detection
1590 Poster Non-transferable Pruning
1591 Poster On the Vulnerability of Skip Connections to Model Inversion Attacks
1592 Poster Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness
1593 Poster Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
1594 Oral Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
1595 Poster Diffusion Models for Open-Vocabulary Segmentation
1596 Oral Diffusion Models for Open-Vocabulary Segmentation
1597 Poster Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
1598 Oral Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
1599 Poster CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
1600 Oral CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
1601 Poster Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels
1602 Oral Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels
1603 Poster ActionVOS: Actions as Prompts for Video Object Segmentation
1604 Oral ActionVOS: Actions as Prompts for Video Object Segmentation
1605 Poster WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
1606 Oral WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
1607 Poster A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability
1608 Oral A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability
1609 Poster COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation
1610 Oral COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation
1611 Poster Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
1612 Oral Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
1613 Poster Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
1614 Oral Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
1615 Poster MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
1616 Oral MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
1617 Poster Faceptor: A Generalist Model for Face Perception
1618 Oral Faceptor: A Generalist Model for Face Perception
1619 Poster Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking
1620 Oral Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking
1621 Poster Learning Multimodal Latent Generative Models with Energy-Based Prior
1622 Oral Learning Multimodal Latent Generative Models with Energy-Based Prior
1623 Poster Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization
1624 Oral Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization
1625 Poster SINDER: Repairing the Singular Defects of DINOv2
1626 Oral SINDER: Repairing the Singular Defects of DINOv2
1627 Poster Emergent Visual-Semantic Hierarchies in Image-Text Representations
1628 Oral Emergent Visual-Semantic Hierarchies in Image-Text Representations
1629 Poster PiTe: Pixel-Temporal Alignment for Large Video-Language Model
1630 Oral PiTe: Pixel-Temporal Alignment for Large Video-Language Model
1631 Poster Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
1632 Oral Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
1633 Poster Denoising Vision Transformers
1634 Oral Denoising Vision Transformers
1635 Poster Audio-driven Talking Face Generation with Stabilized Synchronization Loss
1636 Poster ScanTalk: 3D Talking Heads from Unregistered Scans
1637 Poster Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
1638 Poster Fast Registration of Photorealistic Avatars for VR Facial Animation
1639 Poster MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
1640 Poster Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation
1641 Poster Learning to Robustly Reconstruct Dynamic Scenes from Low-light Spike Streams
1642 Poster Wavelength-Embedding-guided Filter-Array Transformer for Spectral Demosaicing
1643 Poster Learned HDR Image Compression for Perceptually Optimal Storage and Display
1644 Poster Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging
1645 Poster Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions
1646 Poster The Sky's the Limit: Relightable Outdoor Scenes via a Sky-pixel Constrained Illumination Prior and Outside-In Visibility
1647 Poster A Probability-guided Sampler for Neural Implicit Surface Rendering
1648 Poster REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices
1649 Poster Dynamic Neural Radiance Field From Defocused Monocular Video
1650 Poster VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting
1651 Poster DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes
1652 Poster NeRF-XL: NeRF at Any Scale with Multi-GPU
1653 Poster G2fR: Frequency Regularization in Grid-based Feature Encoding Neural Radiance Fields
1654 Poster InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction
1655 Poster MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections
1656 Poster Disentangled Generation and Aggregation for Robust Radiance Fields
1657 Poster CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
1658 Poster SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
1659 Poster Surface Reconstruction for 3D Gaussian Splatting via Local Structural Hints
1660 Poster Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
1661 Poster GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
1662 Poster SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
1663 Poster An Adaptive Screen-Space Meshing Approach for Normal Integration
1664 Poster Fast View Synthesis of Casual Videos with Soup-of-Planes
1665 Poster 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
1666 Poster GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
1667 Poster Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models
1668 Poster ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
1669 Poster LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
1670 Poster External Knowledge Enhanced 3D Scene Generation from Sketch
1671 Poster EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
1672 Poster 3DEgo: 3D Editing on the Go!
1673 Poster Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion
1674 Poster JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
1675 Poster Diverse Text-to-3D Synthesis with Augmented Text Embedding
1676 Poster SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers
1677 Poster CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
1678 Poster Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
1679 Poster DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose
1680 Poster Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling
1681 Poster LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation
1682 Poster Learned Neural Physics Simulation for Articulated 3D Human Pose Reconstruction
1683 Poster Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
1684 Poster Vista3D: unravel the 3d darkside of a single image
1685 Poster Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem
1686 Poster NICP: Neural ICP for 3D Human Registration at Scale
1687 Poster PFGS: High Fidelity Point Cloud Rendering via Feature Splatting
1688 Poster TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
1689 Poster EINet: Point Cloud Completion via Extrapolation and Interpolation
1690 Poster DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction
1691 Poster Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning
1692 Poster CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
1693 Poster Formula-Supervised Visual-Geometric Pre-training
1694 Poster Canonical Shape Projection is All You Need for 3D Few-shot Class Incremental Learning
1695 Poster Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching
1696 Poster DGD: Dynamic 3D Gaussians Distillation
1697 Poster SHIC: Shape-Image Correspondences with no Keypoint Supervision
1698 Poster LineFit: A Geometric Approach for Fitting Line Segments in Images
1699 Poster Global Structure-from-Motion Revisited
1700 Poster Robust Fitting on a Gate Quantum Computer
1701 Oral Robust Fitting on a Gate Quantum Computer
1702 Poster The Nerfect Match: Exploring NeRF Features for Visual Localization
1703 Poster A Cephalometric Landmark Regression Method based on Dual-encoder for High-resolution X-ray Image
1704 Poster FoundPose: Unseen Object Pose Estimation with Foundation Features
1705 Poster PoseSOR: Human Pose Can Guide Our Attention
1706 Poster A Graph-Based Approach for Category-Agnostic Pose Estimation
1707 Poster 3DSA:Multi-View 3D Human Pose Estimation With 3D Space Attention Mechanisms
1708 Poster HPE-Li: WiFi-enabled Lightweight Dual Selective Kernel Convolution for Human Pose Estimation
1709 Poster HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
1710 Poster WHAC: World-grounded Humans and Cameras
1711 Poster EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset
1712 Poster 3D Human Pose Estimation via Non-Causal Retentive Networks
1713 Poster Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
1714 Poster Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs
1715 Poster R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding
1716 Poster Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
1717 Poster FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
1718 Poster Möbius Transform for Mitigating Perspective Distortions in Representation Learning
1719 Poster UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation
1720 Poster DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences
1721 Poster HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
1722 Poster SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
1723 Poster Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance
1724 Poster Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
1725 Poster LiDAR-based All-weather 3D Object Detection via Prompting and Distilling 4D Radar
1726 Poster SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather
1727 Poster Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
1728 Oral Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
1729 Poster SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
1730 Poster DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
1731 Poster UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
1732 Poster VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
1733 Poster OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
1734 Poster Stream Query Denoising for Vectorized HD-Map Construction
1735 Poster Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
1736 Poster Early Anticipation of Driving Maneuvers
1737 Poster Adaptive Human Trajectory Prediction via Latent Corridors
1738 Poster Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model
1739 Poster Probabilistic Weather Forecasting with Deterministic Guidance-based Diffusion Model
1740 Poster Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation
1741 Poster Temporal Event Stereo via Joint Learning with Stereoscopic Flow
1742 Poster FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-Based CNN
1743 Poster Event-Adapted Video Super-Resolution
1744 Poster Diffusion Models as Optimizers for Efficient Planning in Offline RL
1745 Poster Scene-aware Human Motion Forecasting via Mutual Distance Prediction
1746 Poster CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
1747 Poster F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
1748 Poster Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases
1749 Poster CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
1750 Poster Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
1751 Poster Co-speech Gesture Video Generation with 3D Human Meshes
1752 Poster MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
1753 Poster MEVG : Multi-event Video Generation with Text-to-Video Models
1754 Poster HARIVO: Harnessing Text-to-Image Models for Video Generation
1755 Poster WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing
1756 Poster RegionDrag: Fast Region-Based Image Editing with Diffusion Models
1757 Poster TurboEdit: Real-time text-based disentangled real image editing
1758 Poster Factorized Diffusion: Perceptual Illusions by Noise Decomposition
1759 Poster DiffusionPen: Towards Controlling the Style of Handwritten Text Generation
1760 Poster ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
1761 Poster Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization
1762 Poster FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
1763 Poster AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
1764 Poster Training-free Composite Scene Generation for Layout-to-Image Synthesis
1765 Poster Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
1766 Poster Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
1767 Poster Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
1768 Poster OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
1769 Poster Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
1770 Poster BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
1771 Poster Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
1772 Poster MONTAGE: Monitoring Training for Attribution of Generative Diffusion Models
1773 Poster ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
1774 Poster Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
1775 Poster To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
1776 Poster The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations
1777 Poster Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution
1778 Poster DomainFusion: Generalizing To Unseen Domains with Latent Diffusion Models
1779 Poster AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation
1780 Oral AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation
1781 Poster Memory-Efficient Fine-Tuning for Quantized Diffusion Model
1782 Poster SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
1783 Poster HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models
1784 Poster EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation
1785 Poster Diffusion for Natural Image Matting
1786 Poster Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
1787 Poster MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
1788 Poster TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts
1789 Poster Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
1790 Poster Confidence-Based Iterative Generation for Real-World Image Super-Resolution
1791 Poster Efficient Frequency-Domain Image Deraining with Contrastive Regularization
1792 Poster Blind Image Deconvolution by Generative-based Kernel Prior and Initializer via Latent Encoding
1793 Poster SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging
1794 Poster Rethinking Image Super Resolution from Training Data Perspectives
1795 Poster Accelerating Image Super-Resolution Networks with Pixel-Level Classification
1796 Poster Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks
1797 Poster Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
1798 Poster Uncertainty-Driven Spectral Compressive Imaging with Spatial-Frequency Transformer
1799 Poster Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing
1800 Poster Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers
1801 Poster RadEdit: stress-testing biomedical vision models via diffusion image editing
1802 Poster Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
1803 Poster Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
1804 Poster Fast Encoding and Decoding for Implicit Video Representation
1805 Poster Implicit Steganography Beyond the Constraints of Modality
1806 Poster Certifiably Robust Image Watermark
1807 Poster DSA: Discriminative Scatter Analysis for Early Smoke Segmentation
1808 Poster AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network
1809 Poster DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
1810 Poster Face Reconstruction Transfer Attack as Out-of-Distribution Generalization
1811 Poster Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
1812 Poster Facial Affective Behavior Analysis with Instruction Tuning
1813 Poster VideoClusterNet: Self-Supervised and Adaptive Face Clustering for Videos
1814 Poster When Do We Not Need Larger Vision Models?
1815 Poster Open Panoramic Segmentation
1816 Poster PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking
1817 Poster Self-Supervised Any-Point Tracking by Contrastive Random Walks
1818 Poster WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
1819 Poster Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
1820 Poster EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
1821 Poster Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
1822 Poster ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
1823 Poster Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
1824 Poster OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection
1825 Poster Improving Video Segmentation via Dynamic Anchor Queries
1826 Poster VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
1827 Poster Merlin: Empowering Multimodal LLMs with Foresight Minds
1828 Poster STSP: Spatial-Temporal Subspace Projection for Video Class-incremental Learning
1829 Poster UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
1830 Poster Contextual Correspondence Matters: Bidirectional Graph Matching for Video Summarization
1831 Poster Weakly-Supervised Spatio-Temporal Video Grounding with Variational Cross-Modal Alignment
1832 Poster AMEGO: Active Memory from long EGOcentric videos
1833 Poster Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
1834 Poster TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning
1835 Poster Delving Deep into Engagement Prediction of Short Videos
1836 Poster LITA: Language Instructed Temporal-Localization Assistant
1837 Poster CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing
1838 Poster Siamese Vision Transformers are Scalable Audio-visual Learners
1839 Poster EvSign: Sign Language Recognition and Translation with Streaming Events
1840 Poster WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
1841 Poster Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
1842 Poster Masked Angle-Aware Autoencoder for Remote Sensing Images
1843 Poster Revisit Anything: Visual Place Recognition via Image Segment Retrieval
1844 Poster Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
1845 Poster Reinforcement Learning Friendly Vision-Language Model for Minecraft
1846 Poster DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
1847 Poster See and Think: Embodied Agent in Virtual Environment
1848 Poster PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation
1849 Poster HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
1850 Poster Take A Step Back: Rethinking the Two Stages in Visual Reasoning
1851 Poster Multi-Task Domain Adaptation for Language Grounding with 3D Objects
1852 Poster MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
1853 Poster Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
1854 Poster LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
1855 Poster How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
1856 Poster MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
1857 Poster Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
1858 Poster Object-Oriented Anchoring and Modal Alignment in Multimodal Learning
1859 Poster An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
1860 Poster Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
1861 Poster Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
1862 Poster UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
1863 Poster ReGround: Improving Textual and Spatial Grounding at No Cost
1864 Poster Platypus: A Generalized Specialist Model for Reading Text in Various Forms
1865 Poster Long-CLIP: Unlocking the Long-Text Capability of CLIP
1866 Poster Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
1867 Poster RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement
1868 Poster Tokenize Anything via Prompting
1869 Poster FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors
1870 Poster De-confounded Gaze Estimation
1871 Poster GalLop: Learning global and local prompts for vision-language models
1872 Poster OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection
1873 Poster CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
1874 Poster Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
1875 Poster Can OOD Object Detectors Learn from Foundation Models?
1876 Poster VEON: Vocabulary-Enhanced Occupancy Prediction
1877 Poster Efficient Vision Transformers with Partial Attention
1878 Poster SAFARI: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
1879 Poster ReMamber: Referring Image Segmentation with Mamba Twister
1880 Poster Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
1881 Poster A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
1882 Poster Enriching Information and Preserving Semantic Congruence in Expanding Curvilinear Object Segmentation Datasets
1883 Poster Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
1884 Poster View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields
1885 Poster Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization
1886 Poster Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
1887 Poster PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
1888 Poster Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
1889 Poster OpenDistill3D: Open-World 3D Instance Segmentation with Unified Self-Distillation for Continual Learning and Unknown Class Discovery
1890 Poster Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation
1891 Poster Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
1892 Poster Bayesian Self-Training for Semi-Supervised 3D Segmentation
1893 Poster Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation
1894 Poster CSOT: Cross-Scan Object Transfer for Semi-Supervised LiDAR Object Detection
1895 Poster Interactive 3D Object Detection with Prompts
1896 Poster SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection
1897 Poster Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection
1898 Poster Benchmarking Object Detectors with COCO: A New Path Forward
1899 Poster Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
1900 Poster GRA: Detecting Oriented Objects through Group-wise Rotating and Attention
1901 Poster DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
1902 Poster AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
1903 Poster Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
1904 Poster Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
1905 Poster cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process
1906 Poster Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification
1907 Poster Learning with Counterfactual Explanations for Radiology Report Generation
1908 Poster Improving Medical Multi-modal Contrastive Learning with Expert Annotations
1909 Poster Few-shot Defect Image Generation based on Consistency Modeling
1910 Poster Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
1911 Poster Learning Diffusion Models for Multi-View Anomaly Detection
1912 Poster Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
1913 Poster Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
1914 Poster Enhancing Optimization Robustness in 1-bit Neural Networks through Stochastic Sign Descent
1915 Poster Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
1916 Poster SNP: Structured Neuron-level Pruning to Preserve Attention Scores
1917 Poster Tiny Models are the Computational Saver for Large Models
1918 Poster Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
1919 Poster Trainable Highly-expressive Activation Functions
1920 Poster HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
1921 Poster To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning
1922 Poster SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
1923 Poster Linearly Controllable GAN: Unsupervised Feature Categorization and Decomposition for Image Generation and Manipulation
1924 Poster Diagnosing and Re-learning for Balanced Multimodal Learning
1925 Poster Visual Prompting via Partial Optimal Transport
1926 Poster Pseudo-Labelling Should Be Aware of Disguising Channel Activations
1927 Poster Efficient and Versatile Robust Fine-Tuning of Zero-shot Models
1928 Poster Unsupervised Representation Learning by Balanced Self Attention Matching
1929 Poster Optimal Transport of Diverse Unsupervised Tasks for Robust Learning from Noisy Few-Shot Data
1930 Poster Gradient-based Out-of-Distribution Detection
1931 Poster SLIM: Spuriousness Mitigation with Minimal Human Annotations
1932 Poster Modeling Label Correlations with Latent Context for Multi-Label Recognition
1933 Poster Rebalancing Using Estimated Class Distribution for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch
1934 Poster Foster Adaptivity and Balance in Learning with Noisy Labels
1935 Poster Self-Guided Generation of Minority Samples Using Diffusion Models
1936 Poster Self-Cooperation Knowledge Distillation for Novel Class Discovery
1937 Poster Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration
1938 Poster Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams
1939 Poster Few-shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt
1940 Poster Exemplar-free Continual Representation Learning via Learnable Drift Compensation
1941 Poster Open-World Dynamic Prompt and Continual Visual Representation Learning
1942 Poster Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
1943 Poster Simple Unsupervised Knowledge Distillation With Space Similarity
1944 Poster AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition
1945 Poster Dataset Growth
1946 Poster Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
1947 Poster MO-EMT-NAS: Multi-Objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets
1948 Poster BAFFLE: A Baseline of Backpropagation-Free Federated Learning
1949 Poster On the Evaluation Consistency of Attribution-based Explanations
1950 Poster Debiasing surgeon: fantastic weights and how to find them
1951 Poster Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
1952 Poster Improving Adversarial Transferability via Model Alignment
1953 Poster Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation
1954 Poster Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures
1955 Poster CipherDM: Secure Three-Party Inference for Diffusion Model Sampling
1956 Poster UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
1957 Poster Exact Diffusion Inversion via Bidirectional Integration Approximation
1958 Oral Exact Diffusion Inversion via Bidirectional Integration Approximation
1959 Poster ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
1960 Oral ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
1961 Poster Tackling Structural Hallucination in Image Translation with Local Diffusion
1962 Oral Tackling Structural Hallucination in Image Translation with Local Diffusion
1963 Poster Adversarial Diffusion Distillation
1964 Oral Adversarial Diffusion Distillation
1965 Poster Pyramid Diffusion for Fine 3D Large Scene Generation
1966 Oral Pyramid Diffusion for Fine 3D Large Scene Generation
1967 Poster Controlling the World by Sleight of Hand
1968 Oral Controlling the World by Sleight of Hand
1969 Poster Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
1970 Oral Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
1971 Poster OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
1972 Oral OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
1973 Poster MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
1974 Oral MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
1975 Poster C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
1976 Oral C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
1977 Poster Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
1978 Oral Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
1979 Poster Towards Neuro-Symbolic Video Understanding
1980 Oral Towards Neuro-Symbolic Video Understanding
1981 Poster DEVIAS: Learning Disentangled Video Representations of Action and Scene
1982 Oral DEVIAS: Learning Disentangled Video Representations of Action and Scene
1983 Poster Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
1984 Oral Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
1985 Poster E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
1986 Oral E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
1987 Poster Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos
1988 Oral Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos
1989 Poster LongVLM: Efficient Long Video Understanding via Large Language Models
1990 Oral LongVLM: Efficient Long Video Understanding via Large Language Models
1991 Poster Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
1992 Oral Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
1993 Poster Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
1994 Oral Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
1995 Poster A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
1996 Oral A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
1997 Poster Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
1998 Oral Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
1999 Poster Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
2000 Oral Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
2001 Poster BRAVE: Broadening the visual encoding of vision-language models
2002 Oral BRAVE: Broadening the visual encoding of vision-language models
2003 Poster MMBENCH: Is Your Multi-Modal Model an All-around Player?
2004 Oral MMBENCH: Is Your Multi-Modal Model an All-around Player?
2005 Poster uCAP: An Unsupervised Prompting Method for Vision-Language Models
2006 Oral uCAP: An Unsupervised Prompting Method for Vision-Language Models
2007 Poster HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
2008 Oral HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
2009 Poster An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
2010 Oral An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
2011 Poster GiT: Towards Generalist Vision Transformer through Universal Language Interface
2012 Oral GiT: Towards Generalist Vision Transformer through Universal Language Interface
2013 Poster Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
2014 Oral Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
2015 Poster Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°
2016 Poster Tri^{2}-plane: Thinking Head Avatar via Feature Pyramid
2017 Poster AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos
2018 Poster AnimateMe: 4D Facial Expressions via Diffusion Models
2019 Poster Real-data-driven 2000 FPS Color Video from Mosaicked Chromatic Spikes
2020 Poster Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography
2021 Poster Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats
2022 Poster Self-Supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM
2023 Poster Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis
2024 Poster Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending
2025 Poster UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation
2026 Poster City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
2027 Poster Few-shot NeRF by Adaptive Rendering Loss Regularization
2028 Poster BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
2029 Poster Generalizable Human Gaussians for Sparse View Synthesis
2030 Poster Invertible Neural Warp for NeRF
2031 Poster PISR: Polarimetric Neural Implicit Surface Reconstruction for Textureless and Specular Objects
2032 Poster Improving Neural Surface Reconstruction with Feature Priors from Multi-View Images
2033 Poster SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
2034 Poster Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections
2035 Poster 3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting
2036 Poster HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
2037 Poster GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
2038 Poster EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
2039 Poster End-to-End Rate-Distortion Optimized 3D Gaussian Representation
2040 Poster DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
2041 Poster Human Hair Reconstruction with Strand-Aligned 3D Gaussians
2042 Poster Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
2043 Poster Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
2044 Poster SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
2045 Poster MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
2046 Poster DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
2047 Poster CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
2048 Poster Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch Image
2049 Poster Lagrangian Hashing for Compressed Neural Field Representations
2050 Poster GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
2051 Poster Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
2052 Poster TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
2053 Poster TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling
2054 Poster Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
2055 Poster LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
2056 Poster Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation
2057 Poster Synthesizing Environment-Specific People in Photographs
2058 Poster Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models
2059 Poster Shapefusion: 3D localized human diffusion models
2060 Poster Fast Sprite Decomposition from Animated Graphics
2061 Poster Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution
2062 Poster WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
2063 Poster Dolfin: Diffusion Layout Transformers without Autoencoder
2064 Poster MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes
2065 Poster RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion
2066 Poster Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds
2067 Poster FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation
2068 Poster T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy
2069 Poster SEED: A Simple and Effective 3D DETR in Point Clouds
2070 Poster ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
2071 Poster CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation
2072 Poster Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes
2073 Poster Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains
2074 Poster Multi-modal Relation Distillation for Unified 3D Representation Learning
2075 Poster NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
2076 Poster Single-Photon 3D Imaging with Equi-Depth Photon Histograms
2077 Poster Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment
2078 Poster SelfGeo: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes
2079 Poster Leveraging scale- and orientation-covariant features for planar motion estimation
2080 Poster Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
2081 Poster Bones Can't Be Triangles: Accurate and Efficient Vertebrae Keypoint Estimation through Collaborative Error Revision
2082 Poster TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly
2083 Poster SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction
2084 Poster VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
2085 Poster Human Pose Recognition via Occlusion-Preserving Abstract Images
2086 Poster RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark
2087 Poster 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry
2088 Poster HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning
2089 Poster On the Utility of 3D Hand Poses for Action Recognition
2090 Poster Multi-Person Pose Forecasting with Individual Interaction Perceptron and Prior Learning
2091 Poster ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
2092 Poster Revisit Self-supervision with Local Structure-from-Motion
2093 Poster AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
2094 Poster High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior
2095 Poster Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
2096 Poster Benchmarking the Robustness of Cross-view Geo-localization Models
2097 Poster Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
2098 Poster Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection
2099 Poster GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
2100 Poster Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
2101 Poster LEROjD: Lidar Extended Radar-Only Object Detection
2102 Poster Towards Stable 3D Object Detection
2103 Poster ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
2104 Poster EgoPet: Egomotion and Interaction Data from an Animal's Perspective
2105 Poster WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
2106 Poster Beyond the Data Imbalance: Employing the Heterogeneous Datasets for Vehicle Maneuver Prediction
2107 Poster GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
2108 Poster ADMap: Anti-disturbance Framework for Vectorized HD Map Construction
2109 Poster Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
2110 Poster CarFormer: Self-Driving with Learned Object-Centric Representations
2111 Poster DySeT: a Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction
2112 Poster NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving
2113 Poster Visual Relationship Transformation
2114 Poster Local All-Pair Correspondence for Point Tracking
2115 Poster Un-EVIMO: Unsupervised Event-based Independent Motion Segmentation
2116 Poster Edge-Guided Fusion and Motion Augmentation for Event-Image Stereo
2117 Poster Physical-Based Event Camera Simulator
2118 Poster REDIR: Refocus-free Event-based De-occlusion Image Reconstruction
2119 Poster Exploiting Dual-Correlation for Multi-frame Time-of-Flight Denoising
2120 Poster Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation
2121 Poster DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
2122 Poster Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
2123 Poster HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
2124 Poster ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions
2125 Poster Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
2126 Poster MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
2127 Poster Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
2128 Poster Self-Supervised Audio-Visual Soundscape Stylization
2129 Poster TC4D: Trajectory-Conditioned Text-to-4D Generation
2130 Poster LivePhoto: Real Image Animation with Text-guided Motion Control
2131 Poster Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
2132 Poster Photorealistic Video Generation with Diffusion Models
2133 Poster High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
2134 Poster Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation
2135 Poster Editable Image Elements for Controllable Synthesis
2136 Poster Implicit Style-Content Separation using B-LoRA
2137 Poster Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
2138 Poster EraseDraw : Learning to Insert Objects by Erasing Them from Images
2139 Poster Text2Place: Affordance-aware Text Guided Human Placement
2140 Poster ProCreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation
2141 Poster Label-free Neural Semantic Image Synthesis
2142 Poster Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
2143 Poster CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
2144 Poster Context Diffusion: In-Context Aware Image Generation
2145 Poster An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
2146 Poster Stable Preference: Redefining training paradigm of human preference model for Text-to-Image Synthesis
2147 Poster SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
2148 Poster Large-scale Reinforcement Learning for Diffusion Models
2149 Poster Latent Guard: a Safety Framework for Text-to-image Generation
2150 Poster Arc2Face: A Foundation Model for ID-Consistent Human Faces
2151 Oral Arc2Face: A Foundation Model for ID-Consistent Human Faces
2152 Poster GAMMA-FACE: GAussian Mixture Models Amend Diffusion Models for Bias Mitigation in Face Images
2153 Poster Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback
2154 Poster Revisiting Feature Disentanglement Strategy in Diffusion Training and Breaking Conditional Independence Assumption in Sampling
2155 Poster ByteEdit: Boost, Comply and Accelerate Generative Image Editing
2156 Poster DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
2157 Poster Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion
2158 Poster Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis
2159 Poster FMBoost: Boosting Latent Diffusion with Flow Matching
2160 Oral FMBoost: Boosting Latent Diffusion with Flow Matching
2161 Poster AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation
2162 Poster Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
2163 Poster L-DiffER: Single Image Reflection Removal with Language-based Diffusion Model
2164 Poster LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement
2165 Poster Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery
2166 Poster Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal
2167 Poster XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
2168 Poster AdaDiffSR: Adaptive Region-aware Dynamic acceleration Diffusion Model for Real-World Image Super-Resolution
2169 Poster Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
2170 Poster Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
2171 Poster BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow
2172 Poster DualDn: Dual-domain Denoising via Differentiable ISP
2173 Poster Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
2174 Poster Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
2175 Poster Functional Transform-Based Low-Rank Tensor Factorization for Multi-Dimensional Data Recovery
2176 Poster Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
2177 Oral Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
2178 Poster Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images
2179 Poster Energy-induced Explicit quantification for Multi-modality MRI fusion
2180 Poster WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
2181 Poster Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models
2182 Poster GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields
2183 Poster Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification
2184 Poster Enhancing Tampered Text Detection through Frequency Feature Fusion and Decomposition
2185 Poster T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
2186 Poster Towards Unified Representation of Invariant-Specific Features in Missing Modality Face Anti-Spoofing
2187 Poster Personalized Privacy Protection Mask Against Unauthorized Facial Recognition
2188 Poster GRAPE: Generalizable and Robust Multi-view Facial Capture
2189 Poster Seeing Faces in Things: A Model and Dataset for Pareidolia
2190 Poster Beyond Viewpoint: Robust 3D Object Recognition under Arbitrary Views through Joint Multi-Part Representation
2191 Poster An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers
2192 Poster OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers
2193 Poster DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
2194 Poster Upper-body Hierarchical Graph for Skeleton Based Emotion Recognition in Assistive Driving
2195 Poster SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders
2196 Poster Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
2197 Poster Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition
2198 Poster Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment
2199 Poster Classification Matters: Improving Video Action Detection with Class-Specific Attention
2200 Oral Classification Matters: Improving Video Action Detection with Class-Specific Attention
2201 Poster HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
2202 Poster Appearance-based Refinement for Object-Centric Motion Segmentation
2203 Poster Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
2204 Poster Fine-grained Dynamic Network for Generic Event Boundary Detection
2205 Poster Data Collection-free Masked Video Modeling
2206 Poster Self-supervised visual learning from interactions with objects
2207 Poster Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning
2208 Poster Sequential Representation Learning via Static-Dynamic Conditional Disentanglement
2209 Poster Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
2210 Poster EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
2211 Poster Video Question Answering with Procedural Programs
2212 Poster ViLA: Efficient Video-Language Alignment for Video Question Answering
2213 Poster ST-LLM: Large Language Models Are Effective Temporal Learners
2214 Poster RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
2215 Poster Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
2216 Poster Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
2217 Poster Nonverbal Interaction Detection
2218 Poster PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
2219 Poster Human-in-the-Loop Visual Re-ID for Population Size Estimation
2220 Poster PreLAR: World Model Pre-training with Learnable Action Representation
2221 Poster Learning to Build by Building Your Own Instructions
2222 Poster Situated Instruction Following
2223 Poster Where am I? Scene Retrieval with Language
2224 Poster ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
2225 Poster WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
2226 Poster SegPoint: Segment Any Point Cloud via Large Language Model
2227 Poster Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
2228 Poster GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering
2229 Poster LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images
2230 Poster BLINK: Multimodal Large Language Models Can See but Not Perceive
2231 Poster Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
2232 Poster Teach CLIP to Develop a Number Sense for Ordinal Regression
2233 Poster Common Sense Reasoning for Deep Fake Detection
2234 Poster Efficient Inference of Vision Instruction-Following Models with Elastic Cache
2235 Poster SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models
2236 Poster Improving Vision and Language Concepts Understanding with Multimodal Counterfactual Samples
2237 Poster Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
2238 Poster CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
2239 Poster Evaluating Text-to-Visual Generation with Image-to-Text Generation
2240 Poster DOCCI: Descriptions of Connected and Contrasting Images
2241 Poster Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
2242 Poster LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
2243 Poster Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
2244 Poster DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
2245 Poster Conceptual Codebook Learning for Vision-Language Models
2246 Poster Do Generalised Classifiers really work on Human Drawn Sketches?
2247 Poster 3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
2248 Poster Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
2249 Poster PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery
2250 Poster Discovering Unwritten Visual Classifiers with Large Language Models
2251 Poster DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
2252 Poster LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
2253 Poster Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
2254 Poster OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
2255 Poster Rotary Position Embedding for Vision Transformer
2256 Poster Multi-branch Collaborative Learning Network for 3D Visual Grounding
2257 Poster SILC: Improving Vision Language Pretraining with Self-Distillation
2258 Poster LiteSAM is Actually what you Need for segment Everything
2259 Poster TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
2260 Poster In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
2261 Poster CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings
2262 Poster SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
2263 Poster Click Prompt Learning with Optimal Transport for Interactive Segmentation
2264 Poster 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
2265 Poster Segment and Recognize Anything at Any Granularity
2266 Poster SOS: Segment Object System for Open-World Instance Segmentation With Object Priors
2267 Poster Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images
2268 Poster Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation
2269 Poster AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
2270 Poster Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation
2271 Poster SAM-guided Graph Cut for 3D Instance Segmentation
2272 Poster Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation
2273 Poster Diff3DETR: Agent-based Diffusion Model for Semi-supervised 3D Object Detection
2274 Poster Shifted Autoencoders for Point Annotation Restoration in Object Counting
2275 Poster Learning Camouflaged Object Detection from Noisy Pseudo Label
2276 Poster Just a Hint: Point-Supervised Camouflaged Object Detection
2277 Poster Rectify the Regression Bias in Long-Tailed Object Detection
2278 Poster PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition
2279 Poster Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
2280 Poster Visible and Clear: Finding Tiny Objects in Difference Map
2281 Poster IRGen: Generative Modeling for Image Retrieval
2282 Poster I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
2283 Poster Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation
2284 Poster Norma: A Noise Robust Memory-Augmented Framework for Whole Slide Image Classification
2285 Poster GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
2286 Poster BugNIST - a Large Volumetric Dataset for Detection under Domain Shift
2287 Poster AD3: Introducing a score for Anomaly Detection Dataset Difficulty assessment using VIADUCT dataset
2288 Poster GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
2289 Poster Unsupervised, Online and On-The-Fly Anomaly Detection For Non-Stationary Image Distributions
2290 Poster Cross-Domain Learning for Video Anomaly Detection with Limited Supervision
2291 Poster Attention Beats Linear for Fast Implicit Neural Representation Generation
2292 Poster OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
2293 Poster ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
2294 Poster AttnZero: Efficient Attention Discovery for Vision Transformers
2295 Poster Isomorphic Pruning for Vision Models
2296 Poster DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
2297 Poster Robustness Tokens: Towards Adversarial Robustness of Transformers
2298 Poster Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration
2299 Poster Neural Spectral Decomposition for Dataset Distillation
2300 Poster Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models
2301 Poster Adaptive Multi-head Contrastive Learning
2302 Poster Unsqueeze [CLS] Bottleneck to Learn Rich Representations
2303 Poster Improving Zero-Shot Generalization for CLIP with Variational Adapter
2304 Poster Learning to Obstruct Few-Shot Image Classification over Restricted Classes
2305 Poster Improving Hyperbolic Representations via Gromov-Wasserstein Regularization
2306 Poster HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions
2307 Poster Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density
2308 Poster SCOD: From Heuristics to Theory
2309 Poster LNL+K: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration
2310 Poster SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning
2311 Poster Labeled Data Selection for Category Discovery
2312 Poster PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
2313 Poster Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
2314 Poster Forget More to Learn More: Domain-specific Feature Unlearning for Semi-supervised and Unsupervised Domain Adaptation
2315 Poster CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning
2316 Poster Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling
2317 Poster MagMax: Leveraging Model Merging for Seamless Continual Learning
2318 Poster Pick-a-back: Selective Device-to-Device Knowledge Transfer in Federated Continual Learning
2319 Poster Learning to Unlearn for Robust Machine Unlearning
2320 Poster UNIC: Universal Classification Models via Multi-teacher Distillation
2321 Poster Distributed Active Client Selection With Noisy Clients Using Model Association Scores
2322 Poster Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
2323 Poster FedTSA: A Cluster-based Two-Stage Aggregation Method for Model-heterogeneous Federated Learning
2324 Poster Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge
2325 Poster Rethinking Fast Adversarial Training: A Splitting Technique To Overcome Catastrophic Overfitting
2326 Poster A high-quality robust diffusion framework for corrupted dataset
2327 Poster Similarity of Neural Architectures using Adversarial Attack Transferability
2328 Poster Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
2329 Poster Resilience of Entropy Model in Distributed Neural Networks
2330 Poster WBP: Training-time Backdoor Attacks through Hardware-based Weight Bit Poisoning
2331 Poster Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
2332 Oral Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
2333 Poster Flatness-aware Sequential Learning Generates Resilient Backdoors
2334 Oral Flatness-aware Sequential Learning Generates Resilient Backdoors
2335 Poster Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks
2336 Oral Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks
2337 Poster Adversarial Robustification via Text-to-Image Diffusion Models
2338 Oral Adversarial Robustification via Text-to-Image Diffusion Models
2339 Poster Privacy-Preserving Adaptive Re-Identification without Image Transfer
2340 Oral Privacy-Preserving Adaptive Re-Identification without Image Transfer
2341 Poster R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
2342 Oral R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
2343 Poster Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
2344 Oral Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
2345 Poster A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
2346 Oral A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
2347 Poster Spline-based Transformers
2348 Oral Spline-based Transformers
2349 Poster Anytime Continual Learning for Open Vocabulary Classification
2350 Oral Anytime Continual Learning for Open Vocabulary Classification
2351 Poster Weighted Ensemble Models Are Strong Continual Learners
2352 Oral Weighted Ensemble Models Are Strong Continual Learners
2353 Poster COD: Learning Conditional Invariant Representation for Domain Adaptation Regression
2354 Oral COD: Learning Conditional Invariant Representation for Domain Adaptation Regression
2355 Poster On the Topology Awareness and Generalization Performance of Graph Neural Networks
2356 Oral On the Topology Awareness and Generalization Performance of Graph Neural Networks
2357 Poster Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning
2358 Oral Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning
2359 Poster Model Stock: All we need is just a few fine-tuned models
2360 Oral Model Stock: All we need is just a few fine-tuned models
2361 Poster A Direct Approach to Viewing Graph Solvability
2362 Oral A Direct Approach to Viewing Graph Solvability
2363 Poster ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems
2364 Oral ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems
2365 Poster A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
2366 Oral A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
2367 Poster Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
2368 Oral Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
2369 Poster Shape from Heat Conduction
2370 Oral Shape from Heat Conduction
2371 Poster Rasterized Edge Gradients: Handling Discontinuities Differentially
2372 Oral Rasterized Edge Gradients: Handling Discontinuities Differentially
2373 Poster Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
2374 Oral Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
2375 Poster HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
2376 Oral HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
2377 Poster S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
2378 Poster Loc3Diff: Local Diffusion for 3D Human Head Synthesis and Editing
2379 Poster PAV: Personalized Head Avatar from Unstructured Video Collection
2380 Poster Instant 3D Human Avatar Generation using Image Diffusion Models
2381 Poster Expressive Whole-Body 3D Gaussian Avatar
2382 Poster High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
2383 Poster Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement
2384 Poster Image Demoireing in RAW and sRGB Domains
2385 Poster Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures
2386 Poster Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3D Computational Periscopy
2387 Poster Single-Mask Inpainting for Voxel-based Neural Radiance Fields
2388 Poster IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
2389 Poster DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
2390 Poster NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis
2391 Poster CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering
2392 Poster 2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric Distortion Correction
2393 Poster Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
2394 Poster Deep Polarization Cues for Single-shot Shape and Subsurface Scattering Estimation
2395 Poster High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs
2396 Poster Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction
2397 Poster MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
2398 Poster Dual-Camera Smooth Zoom on Mobile Phones
2399 Poster 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
2400 Poster SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
2401 Poster Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing
2402 Poster Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
2403 Poster CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
2404 Poster Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
2405 Poster EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control
2406 Poster SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
2407 Poster GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
2408 Poster GenRC: Generative 3D Room Completion from Sparse Image Collections
2409 Poster Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval
2410 Poster Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees
2411 Oral Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees
2412 Poster DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
2413 Poster Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting
2414 Poster GVGEN: Text-to-3D Generation with Volumetric Representation
2415 Poster VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation
2416 Poster DreamReward: Aligning Human Preference in Text-to-3D Generation
2417 Poster SemanticHuman-HD: High Resolution Semantic disentangled 3D Human Generation
2418 Poster Disentangled Clothed Avatar Generation from Text Descriptions
2419 Poster StructLDM: Structured Latent Diffusion for 3D Human Generation
2420 Poster High-Fidelity Modeling of Generalizable Wrinkle Deformation
2421 Poster ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild
2422 Poster Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos
2423 Poster Physics-Based Interaction with 3D Objects via Video Generation
2424 Oral Physics-Based Interaction with 3D Objects via Video Generation
2425 Poster Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder
2426 Poster Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
2427 Poster Self-supervised Shape Completion via Involution and Implicit Correspondences
2428 Poster Self-Training Room Layout via Geometry-aware Ray-casting
2429 Poster DiffCD: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting
2430 Poster GaussReg: Fast 3D Registration with Gaussian Splatting
2431 Poster AEDNet: Adaptive Embedding and Multiview-Aware Disentanglement for Point Cloud Completion
2432 Poster PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration
2433 Poster ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency
2434 Poster DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding
2435 Poster ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
2436 Poster SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds
2437 Poster MAD-DR: Map Compression for Visual Localization with Matchness Aware Descriptor Dimension Reduction
2438 Poster Tensorial template matching for fast cross-correlation with rotations and its application for tomography
2439 Poster Flowed Time of Flight Radiance Fields
2440 Poster Zero-Shot Image Feature Consensus with Deep Functional Maps
2441 Poster RSL-BA: Rolling Shutter Line Bundle Adjustment
2442 Poster How Far Can a 1-Pixel Camera Go? Solving Vision Tasks using Photoreceptors and Computationally Designed Visual Morphology
2443 Poster StereoGlue: Joint Feature Matching and Robust Estimation
2444 Poster Hyperion – A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM
2445 Poster Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information
2446 Poster MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps
2447 Poster iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning
2448 Poster PACE: Pose Annotations in Cluttered Environments
2449 Poster Global-to-Pixel Regression for Human Mesh Recovery
2450 Poster 3D Hand Pose Estimation in Everyday Egocentric Images
2451 Poster Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
2452 Poster AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
2453 Poster Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images
2454 Poster Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation
2455 Poster CliffPhys: Camera-based Respiratory Measurement using Clifford Neural Networks
2456 Poster Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions
2457 Poster DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation
2458 Poster Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation
2459 Poster Deep Patch Visual SLAM
2460 Poster ConGeo: Robust Cross-view Geo-localization across Ground View Variations
2461 Poster GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
2462 Poster SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
2463 Poster Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
2464 Poster Image-to-Lidar Relational Distillation for Autonomous Driving Data
2465 Poster Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
2466 Poster milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
2467 Poster Hetecooper: Feature Collaboration Graph for Heterogeneous Collaborative Perception
2468 Poster LetsMap: Unsupervised Representation Learning for Label-Efficient Semantic BEV Mapping
2469 Poster Probabilistic Image-Driven Traffic Modeling via Remote Sensing
2470 Poster Occupancy as Set of Points
2471 Poster Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation
2472 Poster Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
2473 Poster Online Vectorized HD Map Construction using Geometry
2474 Poster OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
2475 Poster PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
2476 Poster Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation
2477 Poster Learning to Drive via Asymmetric Self-Play
2478 Poster Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
2479 Poster I Can't Believe It's Not Scene Flow!
2480 Poster Motion and Structure from Event-based Normal Flow
2481 Poster Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
2482 Poster Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation
2483 Poster UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation
2484 Poster IAM-VFI : Interpolate Any Motion for Video Frame Interpolation with motion complexity map
2485 Poster Human Motion Forecasting in Dynamic Domain Shifts: A Homeostatic Continual Test-time Adaptation Framework
2486 Poster How Video Meetings Change Your Expression
2487 Poster DIM: Dyadic Interaction Modeling for Social Behavior Generation
2488 Poster Length-Aware Motion Synthesis via Latent Diffusion
2489 Poster Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
2490 Poster FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
2491 Poster Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
2492 Poster Explorative Inbetweening of Time and Space
2493 Poster TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
2494 Poster WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
2495 Poster Pix2Gif: Motion-Guided Diffusion for GIF Generation
2496 Poster Factorizing Text-to-Video Generation by Explicit Image Conditioning
2497 Poster DNI: Dilutional Noise Initialization for Diffusion Video Editing
2498 Poster DATENeRF: Depth-Aware Text-based Editing of NeRFs
2499 Poster FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
2500 Poster Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
2501 Poster Using My Artistic Style? You Must Obtain My Authorization
2502 Poster Learned Image Enhancement via Color Naming
2503 Poster Region-Native Visual Tokenization
2504 Poster Improving image synthesis with diffusion-negative sampling
2505 Poster ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images
2506 Poster SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions
2507 Poster PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
2508 Poster Visual Text Generation in the Wild
2509 Poster ReCON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
2510 Poster Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation
2511 Poster TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
2512 Poster Navigating Text-to-Image Generative Bias across Indic Languages
2513 Poster Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
2514 Poster MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
2515 Poster Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
2516 Poster LCM-Lookahead for Encoder-based Text-to-Image Personalization
2517 Poster Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
2518 Poster COIN-Matting: Confounder Intervention for Image Matting
2519 Poster Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images
2520 Poster ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
2521 Poster Data Augmentation via Latent Diffusion for Saliency Prediction
2522 Poster Score Distillation Sampling with Learned Manifold Corrective
2523 Poster Thinking Outside the BBox: Unconstrained Generative Object Compositing
2524 Poster Learning Quantized Adaptive Conditions for Diffusion Models
2525 Poster FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models
2526 Poster ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
2527 Poster Lossy Image Compression with Foundation Diffusion Models
2528 Poster AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
2529 Poster QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images
2530 Poster MetaWeather: Few-Shot Weather-Degraded Image Restoration
2531 Poster Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
2532 Poster Spatially-Variant Degradation Model for Dataset-free Super-resolution
2533 Poster Towards Architecture-Agnostic Untrained Networks Priors for Image Reconstruction with Frequency Regularization
2534 Poster Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
2535 Poster Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution
2536 Poster Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids
2537 Poster Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context
2538 Poster denoiSplit: a method for joint microscopy image splitting and unsupervised denoising
2539 Poster Region-Aware Sequence-to-Sequence Learning for Hyperspectral Denoising
2540 Poster CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems
2541 Poster Plug-and-Play Learned Proximal Trajectory for 3D Sparse-View X-Ray Computed Tomography
2542 Poster Unsupervised Multi-modal Medical Image Registration via Invertible Translation
2543 Poster Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations
2544 Poster ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization
2545 Poster Spiking Wavelet Transformer
2546 Poster Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model
2547 Poster Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection
2548 Poster CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching
2549 Poster Noise-assisted Prompt Learning for Image Forgery Detection and Localization
2550 Poster TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing
2551 Poster Towards Certifiably Robust Face Recognition
2552 Poster Oulu Remote-photoplethysmography Physical Domain Attacks Database (ORPDAD)
2553 Poster Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement
2554 Poster Affine steerers for structured keypoint description
2555 Poster A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
2556 Poster You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception
2557 Poster TAPTR: Tracking Any Point with Transformers as Detection
2558 Poster SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow
2559 Poster Towards Physical World Backdoor Attacks against Skeleton Action Recognition
2560 Poster MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
2561 Poster Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph
2562 Poster DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
2563 Poster Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization
2564 Poster Two-Stage Active Learning for Efficient Temporal Action Segmentation
2565 Poster MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos
2566 Poster PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
2567 Poster VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network
2568 Poster PALM: Predicting Actions through Language Models
2569 Poster ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video
2570 Poster Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
2571 Oral Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
2572 Poster VideoMamba: Spatio-Temporal Selective State Space Model
2573 Poster Text-Guided Video Masked Autoencoder
2574 Poster Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
2575 Poster VISA: Reasoning Video Object Segmentation via Large Language Model
2576 Poster LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
2577 Poster BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
2578 Poster COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset a Vision-Language Benchmark
2579 Poster Audio-visual Generalized Zero-shot Learning the Easy Way
2580 Poster Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
2581 Poster SignGen: End-to-End Sign Language Video Generation with Latent Diffusion
2582 Poster TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
2583 Poster Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification
2584 Poster OmniSat: Self-Supervised Modality Fusion for Earth Observation
2585 Poster Statewide Visual Geolocalization in the Wild
2586 Poster Pre-trained Visual Dynamics Representations for Efficient Policy Learning
2587 Poster Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
2588 Poster Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
2589 Poster ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
2590 Poster LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
2591 Poster R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
2592 Poster Agent3D-Zero: An Agent for Zero-shot 3D Understanding
2593 Poster PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts
2594 Poster An Explainable Vision Question Answer Model via Diffusion Chain-of-Thought
2595 Poster Fully Authentic Visual Question Answering Dataset from Online Communities
2596 Poster SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
2597 Poster Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning
2598 Poster BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
2599 Poster Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs
2600 Poster TrojVLM: Backdoor Attack Against Vision Language Models
2601 Poster Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
2602 Oral Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
2603 Poster Attention Prompting on Image for Large Vision-Language Models
2604 Poster LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
2605 Poster Generalizing to Unseen Domains via Text-guided Augmentation
2606 Poster MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
2607 Poster TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
2608 Poster Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
2609 Poster Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
2610 Poster Prompting Language-Informed Distribution for Compositional Zero-Shot Learning
2611 Poster Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
2612 Poster FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
2613 Poster Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
2614 Oral Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
2615 Poster Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
2616 Poster T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
2617 Poster Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
2618 Poster OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
2619 Poster O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
2620 Poster APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension
2621 Poster GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method
2622 Poster MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
2623 Poster ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
2624 Poster MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment
2625 Poster Think before Placement: Common Sense Enhanced Transformer for Object Placement
2626 Poster Eliminating Feature Ambiguity for Few-Shot Segmentation
2627 Poster Diffusion-Guided Weakly Supervised Semantic Segmentation
2628 Poster Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs
2629 Poster Better Call SAL: Towards Learning to Segment Anything in Lidar
2630 Poster MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
2631 Poster DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation
2632 Poster Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation
2633 Poster Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models
2634 Poster ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition
2635 Poster Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
2636 Poster EcoMatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching
2637 Poster Class-Agnostic Object Counting with Text-to-Image Diffusion Model
2638 Poster Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
2639 Poster Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection
2640 Poster Plain-Det: A Plain Multi-Dataset Object Detector
2641 Poster Multi-scale Cross Distillation for Object Detection in Aerial Images
2642 Poster PDT Uav Target Detection Dataset for Pests and Diseases Tree
2643 Poster Region-Adaptive Transform with Segmentation Prior for Image Compression
2644 Poster FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
2645 Poster CC-SAM: Enhancing SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation
2646 Poster Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model
2647 Poster DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
2648 Poster Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network
2649 Poster MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks
2650 Poster An Incremental Unified Framework for Small Defect Inspection
2651 Poster Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection
2652 Poster GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
2653 Poster MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection
2654 Poster PQ-SAM: Post-training Quantization for Segment Anything Model
2655 Poster BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
2656 Poster ELSE: Efficient Deep Neural Network Inference through Line-based Sparsity Exploration
2657 Poster FairViT: Fair Vision Transformer via Adaptive Masking
2658 Poster LPViT: Low-Power Semi-structured Pruning for Vision Transformers
2659 Poster PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
2660 Poster CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
2661 Poster Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach
2662 Poster Characterizing Model Robustness via Natural Input Gradients
2663 Poster Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning
2664 Poster FreeAugment: Data Augmentation Search Across All Degrees of Freedom
2665 Poster Towards Multi-modal Transformers in Federated Learning
2666 Poster Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception
2667 Poster GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
2668 Poster Soft Prompt Generation for Domain Generalization
2669 Poster SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
2670 Poster Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
2671 Poster Deep Online Probability Aggregation Clustering
2672 Poster Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection
2673 Poster An accurate detection is not all you need to combat label noise in web-noisy datasets
2674 Poster Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration
2675 Poster ExMatch: Self-guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples
2676 Poster Dynamic Data Selection for Efficient SSL via Coarse-to-Fine Refinement
2677 Poster SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
2678 Poster Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
2679 Poster Integrating Markov Blanket Discovery into Causal Representation Learning for Domain Generalization
2680 Poster Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence
2681 Poster On the Approximation Risk of Few-Shot Class-Incremental Learning
2682 Poster STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay
2683 Poster RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning
2684 Poster CLEO: Continual Learning of Evolving Ontologies
2685 Poster Learning Representation for Multitask Learning through Self-Supervised Auxiliary Learning
2686 Poster Improving Knowledge Distillation via Regularizing Feature Direction and Norm
2687 Oral Improving Knowledge Distillation via Regularizing Feature Direction and Norm
2688 Poster MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
2689 Poster Federated Learning with Local Openset Noisy Labels
2690 Poster Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents
2691 Poster FedHARM: Harmonizing Model Architectural Diversity in Federated Learning
2692 Poster Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks
2693 Poster Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
2694 Poster Shedding More Light on Robust Classifiers under the lens of Energy-based Models
2695 Poster Inter-Class Topology Alignment for Efficient Black-Box Substitute Attacks
2696 Poster AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
2697 Poster FedHide: Federated Learning by Hiding in the Neighbors
2698 Poster SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks
2699 Poster Data Poisoning Quantization Backdoor Attack
2700 Poster Event Trojan: Asynchronous Event-based Backdoor Attacks
2701 Keynote Is distribution shift still an AI problem?
2702 Keynote Fair, transparent, and accountable AI: What is legally required, what is ethically desired, and what is technically feasible?
2703 Keynote Synthesia: From computer vision research to real-world AI avatars
2704 Oral Session Oral 6A: Generative Models Ii

About

We used a web scraper to obtain all the papers from ECCV that have not yet been officially announced, making them available for those who need to read the latest papers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published