ECCV2024-PaperList

If you find this helpful, we would appreciate a star! Note: Oral papers may appear twice.

ID	Type	Title
1	Workshop	Recovering 6D Object Pose
2	Workshop	Half-century of Structure-from-Motion (50SfM)
3	Workshop	Dense Neural SLAM Workshop (NeuSLAM)
4	Workshop	Geometry in the Large Model Era
5	Workshop	Workshop on Spatial AI
6	Workshop	Transparent & Reflective objects In the wild Challenges (TRICKY)
7	Workshop	Wild3D: 3D Modeling, Reconstruction, and Generation in the Wild
8	Workshop	AI3DCC: The Second Workshop of AI for 3D Content Creation
9	Workshop	3D Vision and Modeling Challenges in eCommerce
10	Workshop	FashionAI: Exploring the intersection of Fashion and Artificial Intelligence for reshaping the Industry
11	Workshop	CV For Ecology Workshop (CV4E)
12	Workshop	9th Workshop on Computer Vision in Plant Phenotyping and Agriculture (CVPPA)
13	Workshop	3rd edition of Computer Vision for Metaverse (CV4Metaverse)
14	Workshop	The First Workshop on: Computer Vision for Videogames (CV2)
15	Workshop	2nd Workshop on Vision-based Industrial Inspection (VISION)
16	Workshop	AI for Visual Arts Workshop and Challenges (AI4VA)
17	Workshop	Vision for Art (VISART) VII Workshop
18	Workshop	AI4DH: Artificial Intelligence for Digital Humanities
19	Workshop	The Third ROAD Workshop & Challenge: Event Detection for Situation Awareness in Autonomous Driving
20	Workshop	Vision-Centric Autonomous Driving (VCAD) Workshop
21	Workshop	ROAM: Robust, Out-of-Distribution And Multi-Modal models for Autonomous Driving
22	Workshop	ACVR2024 - 12th International Workshop on Assistive Computer Vision and Robotics
23	Workshop	Autonomous Vehicles meet Multimodal Foundation Models
24	Workshop	Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving: Towards Next-Generation Solutions
25	Workshop	Multi-Agent Autonomous Systems Meet Foundation Models: Challenges and Futures
26	Workshop	Visual object tracking and segmentation challenge VOTS2024 workshop
27	Workshop	5th Advances in Image Manipulation (AIM) Workshop and Challenges
28	Workshop	Instance-Level Recognition
29	Workshop	Large-scale Video Object Segmentation
30	Workshop	The Second Perception Test Challenge
31	Workshop	Efficient Deep Learning for Foundation Models
32	Workshop	Computational Aspects of Deep Learning
33	Workshop	Foundation Models for 3D Humans
34	Workshop	Workshop on Artificial Social Intelligence
35	Workshop	T-CAP - Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications
36	Workshop	Observing and Understanding Hands in Action
37	Workshop	7th Workshop and Competition on Affective Behavior Analysis in-the-wild
38	Workshop	The First Workshop on Expressive Encounters: Co-speech gestures across cultures in the wild
39	Workshop	BioImage Computing (BIC)
40	Workshop	Human-inspired Computer Vision
41	Workshop	Knowledge in Generative Models
42	Workshop	Self-Supervised Learning - What is next?
43	Workshop	Traditional Computer Vision in the Age of Deep Learning (TradiCV)
44	Workshop	Uncertainty Quantification for Computer Vision
45	Workshop	Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo)
46	Workshop	Beyond Euclidean: Hyperbolic and Hyperspherical Learning for Computer Vision
47	Workshop	Workshop on Unlearning and Model Editing (U&ME'24)
48	Workshop	The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models
49	Workshop	Workshop on Visual Concepts
50	Workshop	Sometimes Less is More: The First Dataset Distillation Challenge
51	Workshop	2nd Workshop on Quantum Computer Vision and Machine Learning (QCVML)
52	Workshop	2nd Workshop on More Exploration, Less Exploitation (MELEX)
53	Workshop	Synthetic Data for Computer Vision
54	Workshop	International Challenge on Compositional and Multimodal Perception
55	Workshop	AVGenL: Audio-Visual Generation and Learning
57	Workshop	Multimodal Agents Workshop
58	Workshop	2nd OmniLabel Workshop: Enabling Complex Perception Through Vision and Language Foundational Models
59	Workshop	The Dark Side of Generative AIs and Beyond
61	Workshop	FOundation models Creators meet USers (FOCUS)
62	Workshop	Fairness and ethics towards transparent AI: facing the chalLEnge through model Debiasing (FAILED)
63	Workshop	Explainable AI for Computer Vision: Where Are We and Where Are We Going?
64	Workshop	TWYN: Trust What You learN. 1st Workshop on Trustworthiness in Computer Vision
65	Workshop	Women in Computer Vision
66	Workshop	2nd International Workshop on Privacy-Preserving Computer Vision
67	Workshop	Critical Evaluation of Generative Models and their Impact on Society
68	Workshop	xAI4Biometrics at ECCV 2024 - 4th Workshop on Explainable & Interpretable Artificial Intelligence for Biometrics
69	Workshop	Workshop on Green Foundation Models
70	Workshop	Scalable 3D Scene Generation and 3D Geometric Scene Understanding
71	Workshop	OpenSUN3D: 3rd Workshop on Open-Vocabulary 3D Scene Understanding
72	Workshop	Map-free Visual Relocalization
73	Workshop	Workshop on Neuromorphic Vision (NeVi): Advantages and Applications of Event Cameras
74	Workshop	1st Workshop on Neural Fields Beyond Conventional Cameras
75	Workshop	GigaVision: When Gigapixel Videography Meets Computer Vision
76	Workshop	Eyes of the Future: Integrating Computer Vision in Smart Eyewear
77	Tutorial	Large Multimodal Foundation Models
78	Tutorial	A Bayesian Odyssey in Uncertainty: from Theoretical Foundations to Real-World Applications
79	Tutorial	Third Hands-on Egocentric Research Tutorial with Project Aria, from Meta
80	Tutorial	Emerging Trends in Disentanglement and Compositionality
81	Tutorial	Efficient Text-to-Image and Text-to-3D modeling
82	Tutorial	Responsibly Building Generative Models
83	Tutorial	Recent Advances in Video Content Understanding and Generation
84	Tutorial	Time is precious: Self-Supervised Learning Beyond Images
85	Tutorial	Inside Plato's door: a tour in Multi-view Geometry
86	Poster Session	Poster Session 1
87	Oral Session	Oral 1A: Scene Analysis And Understanding
88	Oral Session	Oral 1B: Autonomous Driving
89	Oral Session	Oral 1C: Low-Level Vision And Imaging
90	Poster Session	Poster Session 2
91	Oral Session	Oral 2A: Generative Models I
92	Oral Session	Oral 2B: Recognition
93	Oral Session	Oral 2C: Multi-View And Visual Odometry
94	Poster Session	Poster Session 3
95	Oral Session	Oral 3A: Datasets And Benchmarking
96	Oral Session	Oral 3B: Medical And Biological Imaging
97	Oral Session	Oral 3C: Point Clouds
98	Poster Session	Poster Session 4
99	Oral Session	Oral 4A: Neural 3D Rendering
100	Oral Session	Oral 4B: Video Generation / Editing / Prediction
101	Oral Session	Oral 4C: Humans: Biometrics, Pose And Motion
102	Poster Session	Poster Session 5
103	Oral Session	Oral 5A: Segmentation
104	Oral Session	Oral 5B: Vision Applications
105	Oral Session	Oral 5C: Representation Learning
106	Poster Session	Poster Session 6
107	Oral Session	Oral 6A: Generative Models II
108	Oral Session	Oral 6B: Video Understanding
109	Oral Session	Oral 6C: Vision And Other Modalities
110	Poster Session	Poster Session 7
111	Oral Session	Oral 7A: Learning Architectures, Transfer, Continual And Long-Tail
112	Oral Session	Oral 7B: Adversarial Learning And Privacy
113	Oral Session	Oral 7C: Optimization And Theory
114	Poster	Bi-directional Contextual Attention for 3D Dense Captioning
115	Oral	Bi-directional Contextual Attention for 3D Dense Captioning
116	Poster	Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
117	Oral	Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
118	Poster	ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
119	Oral	ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
120	Poster	Towards Scene Graph Anticipation
121	Oral	Towards Scene Graph Anticipation
122	Poster	OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
123	Oral	OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
124	Poster	PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
125	Oral	PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
126	Poster	H-V2X: A Large Scale Highway Dataset for BEV Perception
127	Oral	H-V2X: A Large Scale Highway Dataset for BEV Perception
128	Poster	RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
129	Oral	RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
130	Poster	DriveLM: Driving with Graph Visual Question Answering
131	Oral	DriveLM: Driving with Graph Visual Question Answering
132	Poster	Making Large Language Models Better Planners with Reasoning-Decision Alignment
133	Oral	Making Large Language Models Better Planners with Reasoning-Decision Alignment
134	Poster	M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
135	Oral	M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
136	Poster	MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
137	Oral	MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
138	Poster	Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
139	Oral	Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
140	Poster	A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
141	Oral	A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
142	Poster	Photon Inhibition for Energy-Efficient Single-Photon Imaging
143	Oral	Photon Inhibition for Energy-Efficient Single-Photon Imaging
144	Poster	Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
145	Oral	Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
146	Poster	Minimalist Vision with Freeform Pixels
147	Oral	Minimalist Vision with Freeform Pixels
148	Poster	SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
149	Oral	SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
150	Poster	Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
151	Oral	Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
152	Poster	OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
153	Oral	OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
154	Poster	UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
155	Poster	Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
156	Poster	HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
157	Poster	MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space
158	Poster	Personalized Video Relighting With an At-Home Light Stage
159	Poster	Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations
160	Poster	Panel-Specific Degradation Representation for Raw Under-Display Camera Image Restoration
161	Poster	HoloADMM: High-Quality Holographic Complex Field Recovery
162	Poster	Flying with Photons: Rendering Novel Views of Propagating Light
163	Oral	Flying with Photons: Rendering Novel Views of Propagating Light
164	Poster	Efficient Depth-Guided Urban View Synthesis
165	Poster	Ray-Distance Volume Rendering for Neural Scene Reconstruction
166	Poster	Taming Latent Diffusion Model for Neural Radiance Field Inpainting
167	Poster	Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors
168	Poster	GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer
169	Poster	MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References
170	Poster	UNIKD: UNcertainty-Filtered Incremental Knowledge Distillation for Neural Implicit Representation
171	Poster	Rethinking Directional Parameterization in Neural Implicit Surface Reconstruction
172	Poster	Sur^2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images
173	Poster	Differentiable Convex Polyhedra Optimization from Multi-view Images
174	Poster	Combining Generative and Geometry Priors for Wide-Angle Portrait Correction
175	Poster	I2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM
176	Poster	Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops
177	Poster	MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
178	Poster	CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
179	Poster	GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
180	Poster	FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
181	Poster	PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis
182	Poster	MegaScenes: Scene-Level View Synthesis at Scale
183	Poster	HiFi-123: Towards High-fidelity One Image to 3D Content Generation
184	Poster	View-Consistent 3D Editing with Gaussian Splatting
185	Poster	Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
186	Poster	Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis
187	Poster	3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views
188	Poster	Nuvo: Neural UV Mapping for Unruly 3D Representations
189	Poster	Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
190	Poster	BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
191	Poster	A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control
192	Poster	DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
193	Poster	TPA3D: Triplane Attention for Fast Text-to-3D Generation
194	Poster	DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement
195	Poster	WordRobe: Text-Guided Generation of Textured 3D Garments
196	Poster	AnyHome: Open-Vocabulary Large-Scale Indoor Scene Generation with First-Person View Exploration
197	Poster	HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
198	Poster	SENC: Handling Self-collision in Neural Cloth Simulation
199	Poster	AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
200	Poster	SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
201	Poster	Diffusion Models as Data Mining Tools
202	Poster	ReMatching: Low-Resolution Representations for Scalable Shape Correspondence
203	Poster	PolyRoom: Room-aware Transformer for Floorplan Reconstruction
204	Poster	WindPoly: Polygonal Mesh Reconstruction via Winding Numbers
205	Poster	Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack
206	Poster	Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
207	Poster	Diffusion Bridges for 3D Point Cloud Denoising
208	Poster	Towards a Density Preserving Objective Function for Learning on Point Sets
209	Poster	Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach
210	Poster	T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
211	Poster	Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer
212	Poster	DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields
213	Poster	Computing the Lipschitz constant needed for fast scene recovery from CASSI measurements
214	Poster	Regularizing Dynamic Radiance Fields with Kinematic Fields
215	Poster	GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation
216	Poster	iMatching: Imperative Correspondence Learning
217	Poster	Fundamental Matrix Estimation Using Relative Depths
218	Poster	Track Everything Everywhere Fast and Robustly
219	Poster	Learning to Make Keypoints Sub-Pixel Accurate
220	Poster	Shape-guided Configuration-aware Learning for Endoscopic-image-based Pose Estimation of Flexible Robotic Instruments
221	Poster	FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
222	Poster	Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
223	Poster	Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation
224	Poster	Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images
225	Poster	GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
226	Poster	D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction
227	Poster	Event-based Head Pose Estimation: Benchmark and Method
228	Poster	Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer
229	Poster	RAW-Adapter: Adapting Pretrained Visual Model to Camera RAW Images
230	Poster	Easing 3D Pattern Reasoning with Side-view Features for Semantic Scene Completion
231	Poster	Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
232	Poster	GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth
233	Poster	Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry
234	Poster	Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network
235	Poster	CountFormer: Multi-View Crowd Counting Transformer
236	Poster	When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
237	Poster	MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
238	Poster	4D Contrastive Superflows are Dense 3D Representation Learners
239	Poster	TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection
240	Poster	CARB-Net: Camera-Assisted Radar-Based Network for Vulnerable Road User Detection
241	Poster	SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
242	Poster	RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
243	Poster	TrafficNight : An Aerial Multimodal Benchmark For Nighttime Vehicle Surveillance
244	Poster	RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
245	Poster	Monocular Occupancy Prediction for Scalable Indoor Scenes
246	Poster	nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
247	Poster	Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
248	Oral	Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
249	Poster	CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting
250	Poster	Neural Volumetric World Models for Autonomous Driving
251	Poster	Progressive Pretext Task Learning for Human Trajectory Prediction
252	Poster	Risk-Aware Self-Consistent Imitation Learning for Trajectory Planning in Autonomous Driving
253	Poster	Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries
254	Poster	Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practice
255	Poster	TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
256	Poster	Temporally Consistent Stereo Matching
257	Poster	Retrieval Robust to Object Motion Blur
258	Poster	Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions
259	Poster	CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring
260	Poster	Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework
261	Poster	Diffusion Reward: Learning Rewards via Conditional Video Diffusion
262	Poster	HUMOS: Human Motion Model Conditioned on Body Shape
263	Poster	PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture
264	Poster	Large Motion Model for Unified Multi-Modal Motion Generation
265	Poster	Realistic Human Motion Generation with Cross-Diffusion Models
266	Poster	Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions
267	Poster	Generating Human Interaction Motions in Scenes with Text Control
268	Poster	Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
269	Poster	Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
270	Poster	PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
271	Poster	MoVideo: Motion-Aware Video Generation with Diffusion Models
272	Poster	FreeInit: Bridging Initialization Gap in Video Diffusion Models
273	Poster	DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
274	Poster	Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
275	Poster	ReNoise: Real Image Inversion Through Iterative Noising
276	Poster	Elegantly Written: Disentangling Writer and Character Styles for Enhancing Online Chinese Handwriting
277	Poster	One-Shot Diffusion Mimicker for Handwritten Text Generation
278	Poster	Investigating Style Similarity in Diffusion Models
279	Poster	DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
280	Poster	PartCraft: Crafting Creative Objects by Parts
281	Poster	DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators
282	Poster	WAS: Dataset and Methods for Artistic Text Segmentation
283	Poster	GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
284	Poster	PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
285	Poster	HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
286	Poster	Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
287	Poster	Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
288	Poster	Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
289	Poster	Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
290	Poster	Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
291	Poster	Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
292	Poster	DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks
293	Poster	Do text-free diffusion models learn discriminative visual representations?
294	Poster	DataDream: Few-shot Guided Dataset Generation
295	Poster	DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation
296	Poster	ZeST: Zero-Shot Material Transfer from a Single Image
297	Poster	FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
298	Poster	Learning Equilibrium Transformation for Gamut Expansion and Color Restoration
299	Poster	Timestep-Aware Correction for Quantized Diffusion Models
300	Poster	Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer.
301	Poster	Energy-Clibrated VAE with Test Time Free Lunch
302	Poster	Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
303	Poster	Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline
304	Poster	Asymmetric Mask Scheme for Self-Supervised Real Image Denoising
305	Poster	GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
306	Poster	Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution
307	Poster	A New Dataset and Framework for Real-World Blurred Images Super-Resolution
308	Poster	Blind image deblurring with noise-robust kernel estimation
309	Poster	SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
310	Poster	MambaIR: A Simple Baseline for Image Restoration with State-Space Model
311	Poster	BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering
312	Poster	Towards Robust Full Low-bit Quantization of Super Resolution Networks
313	Poster	Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network
314	Poster	SAH-SCI: Self-Supervised Adapter for Efficient Hyperspectral Snapshot Compressive Imaging
315	Poster	Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling
316	Poster	DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays
317	Poster	BaSIC: BayesNet Structure Learning for Computational Scalable Neural Image Compression
318	Poster	SNeRV: Spectra-preserving Neural Representation for Video
319	Poster	Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
320	Poster	Multiscale Graph Texture Network
321	Poster	DetailSemNet: Elevating Signature Verification through Detail-Semantic Integration
322	Poster	Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors
323	Poster	Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
324	Poster	AdversariaLeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems
325	Poster	Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference
326	Poster	NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
327	Poster	Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning
328	Poster	Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps
329	Poster	SemTrack: A Large-scale Dataset for Semantic Tracking in the Wild
330	Poster	DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
331	Poster	CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
332	Poster	Towards More Practical Group Activity Detection: A New Benchmark and Model
333	Poster	Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
334	Poster	Online Temporal Action Localization with Memory-Augmented Transformer
335	Poster	EgoLifter: Open-world 3D Segmentation for Egocentric Perception
336	Poster	MeshSegmenter: Zero-Shot Mesh Segmentation via Texture Synthesis
337	Poster	Spatial-Temporal Multi-level Association for Video Object Segmentation
338	Poster	Gated Temporal Diffusion for Stochastic Long-term Dense Anticipation
339	Poster	ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
340	Poster	Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
341	Poster	VideoMamba: State Space Model for Efficient Video Understanding
342	Poster	Text-Conditioned Resampler For Long Form Video Understanding
343	Poster	SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
344	Poster	Vamos: Versatile Action Models for Video Understanding
345	Poster	Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
346	Poster	Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
347	Poster	Multi-Sentence Grounding for Long-term Instructional Video
348	Poster	CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
349	Poster	CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
350	Poster	SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
351	Poster	CityGuessr: City-Level Video Geo-Localization on a Global Scale
352	Poster	WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification
353	Poster	Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery
354	Poster	AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization
355	Poster	LingoQA: Video Question Answering for Autonomous Driving
356	Poster	Dolphins: Multimodal Language Model for Driving
357	Poster	PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
358	Poster	LLM as Copilot for Coarse-grained Vision-and-Language Navigation
359	Poster	Visual Grounding for Object-Level Generalization in Reinforcement Learning
360	Poster	m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
361	Poster	Recursive Visual Programming
362	Poster	Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
363	Poster	Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
364	Poster	HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
365	Poster	REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
366	Poster	ViG-Bias: Visually Grounded Bias Discovery and Mitigation
367	Poster	GENIXER: Empowering Multimodal Large Language Models as a Powerful Data Generator
368	Poster	Adversarial Prompt Tuning for Vision-Language Models
369	Poster	MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
370	Poster	Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
371	Poster	FlexAttention for Efficient High-Resolution Vision-Language Models
372	Poster	VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
373	Poster	Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
374	Poster	Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
375	Poster	BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
376	Poster	Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
377	Poster	CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
378	Poster	Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
379	Poster	GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
380	Oral	GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
381	Poster	Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery
382	Poster	Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
383	Poster	Trackastra: Transformer-based cell tracking for live-cell microscopy
384	Poster	Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking
385	Poster	Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Object Appearance Graphs
386	Poster	E3V-K5: An Authentic Benchmark for Redefining Video-Based Energy Expenditure Estimation
387	Poster	Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
388	Poster	Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion
389	Poster	Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective
390	Poster	Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization
391	Poster	Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
392	Poster	Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation
393	Poster	X-Pose: Detecting Any Keypoints
394	Poster	Open-Set Recognition in the Age of Vision-Language Models
395	Poster	Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
396	Poster	A Fair Ranking and New Model for Panoptic Scene Graph Generation
397	Oral	A Fair Ranking and New Model for Panoptic Scene Graph Generation
398	Poster	Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
399	Poster	Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
400	Poster	A Simple Background Augmentation Method for Object Detection with Diffusion Model
401	Poster	OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
402	Poster	Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
403	Poster	Agent Attention: On the Integration of Softmax and Linear Attention
404	Poster	WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
405	Poster	Agglomerative Token Clustering
406	Poster	Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
407	Poster	3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
408	Poster	SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images
409	Poster	Open-Vocabulary RGB-Thermal Semantic Segmentation
410	Poster	PartSTAD: 2D-to-3D Part Segmentation Task Adaptation
411	Poster	Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
412	Poster	FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions
413	Poster	Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation
414	Poster	Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
415	Poster	Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off
416	Poster	Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation
417	Poster	Self-supervised co-salient object detection via feature correspondences at multiple scales
418	Poster	Unsupervised Dense Prediction using Differentiable Normalized Cuts
419	Poster	Robust Zero-Shot Crowd Counting and Localization with Adaptive Resolution SAM
420	Poster	Bayesian Detector Combination for Object Detection with Crowdsourced Annotations
421	Poster	Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection
422	Poster	Bucketed Ranking-based Losses for Efficient Training of Object Detectors
423	Poster	Better Regression Makes Better Test-time Adaptive 3D Object Detection
424	Poster	MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection
425	Poster	IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
426	Poster	Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency
427	Poster	The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation
428	Poster	A Rotation-invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound Images
429	Poster	Multistain Pretraining for Slide Representation Learning in Pathology
430	Poster	Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data
431	Poster	HERGen: Elevating Radiology Report Generation with Longitudinal Data
432	Poster	Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics
433	Poster	Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis
434	Poster	AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
435	Poster	Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection
436	Poster	A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
437	Poster	FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion
438	Poster	Quantization-Friendly Winograd Transformations for Convolutional Neural Networks
439	Poster	YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
440	Poster	Stripe Observation Guided Inference Cost-free Attention Mechanism
441	Poster	NOVUM: Neural Object Volumes for Robust Object Classification
442	Poster	POA: Pre-training Once for Models of All Sizes
443	Poster	Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
444	Poster	Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization
445	Poster	Fisher Calibration for Backdoor-Robust Heterogeneous Federated Learning
446	Poster	MultiDelete for Multimodal Machine Unlearning
447	Poster	Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
448	Poster	Multi-Label Cluster Discrimination for Visual Representation Learning
449	Poster	Robustness Preserving Fine-tuning using Neuron Importance
450	Poster	Online Zero-Shot Classification with CLIP
451	Poster	Understanding Multi-compositional learning in Vision and Language models via Category Theory
452	Poster	This Probably Looks Exactly Like That: An Invertible Prototypical Network
453	Poster	Rethinking Unsupervised Outlier Detection via Multiple Thresholding
454	Poster	Learning Non-Linear Invariants for Unsupervised Out-of-Distribution Detection
455	Poster	Multimodal Label Relevance Ranking via Reinforcement Learning
456	Poster	Confidence Self-Calibration for Multi-Label Class-Incremental Learning
457	Poster	MTaDCS: Moving Trace and Feature Density-based Confidence Sample Selection under Label Noise
458	Poster	Bidirectional Uncertainty-Based Active Learning for Open-Set Annotation
459	Poster	Online Continuous Generalized Category Discovery
460	Poster	Open-set Domain Adaptation via Joint Error based Multi-class Positive and Unlabeled Learning
461	Poster	UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
462	Poster	Rethinking Few-shot Class-incremental Learning: Learning from Yourself
463	Poster	Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning
464	Poster	Semantic Residual Prompts for Continual Learning
465	Poster	Encapsulating Knowledge in One Prompt
466	Poster	Representation Enhancement-Stabilization: Reducing Bias-Variance of Domain Generalization
467	Poster	Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
468	Poster	PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
469	Poster	Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation
470	Poster	Dataset Distillation by Automatic Training Trajectories
471	Poster	Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction
472	Poster	Graph Neural Network Causal Explanation via Neural Causal Models
473	Poster	Optimization-based Uncertainty Attribution Via Learning Informative Perturbations
474	Poster	Generalizable Symbolic Optimizer Learning
475	Poster	CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
476	Poster	Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation
477	Poster	Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense
478	Poster	SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning
479	Poster	Zero-Shot Detection of AI-Generated Images
480	Oral	Zero-Shot Detection of AI-Generated Images
481	Poster	MobileNetV4: Universal Models for the Mobile Ecosystem
482	Oral	MobileNetV4: Universal Models for the Mobile Ecosystem
483	Poster	Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
484	Oral	Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
485	Poster	Adaptive Parametric Activation
486	Oral	Adaptive Parametric Activation
487	Poster	CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
488	Oral	CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
489	Poster	Dataset Enhancement with Instance-Level Augmentations
490	Oral	Dataset Enhancement with Instance-Level Augmentations
491	Poster	Efficient Bias Mitigation Without Privileged Information
492	Oral	Efficient Bias Mitigation Without Privileged Information
493	Poster	On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
494	Oral	On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
495	Poster	Momentum Auxiliary Network for Supervised Local Learning
496	Oral	Momentum Auxiliary Network for Supervised Local Learning
497	Poster	From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
498	Oral	From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
499	Poster	Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation
500	Oral	Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation
501	Poster	Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
502	Oral	Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
503	Poster	ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images
504	Oral	ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images
505	Poster	ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
506	Oral	ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
507	Poster	COMO: Compact Mapping and Odometry
508	Oral	COMO: Compact Mapping and Odometry
509	Poster	Camera Calibration using a Collimator System
510	Oral	Camera Calibration using a Collimator System
511	Poster	Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection
512	Oral	Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection
513	Poster	Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition
514	Oral	Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition
515	Poster	SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
516	Oral	SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
517	Poster	Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss
518	Oral	Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss
519	Poster	Six-Point Method for Multi-Camera Systems with Reduced Solution Space
520	Oral	Six-Point Method for Multi-Camera Systems with Reduced Solution Space
521	Poster	Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
522	Oral	Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
523	Poster	Grounding Image Matching in 3D with MASt3R
524	Oral	Grounding Image Matching in 3D with MASt3R
525	Poster	EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
526	Oral	EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
527	Poster	TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
528	Oral	TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
529	Poster	Accelerating Image Generation with Sub-path Linear Approximation Model
530	Oral	Accelerating Image Generation with Sub-path Linear Approximation Model
531	Poster	SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
532	Oral	SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
533	Poster	Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
534	Oral	Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
535	Poster	LLMGA: Multimodal Large Language Model based Generation Assistant
536	Oral	LLMGA: Multimodal Large Language Model based Generation Assistant
537	Poster	FlashTex: Fast Relightable Mesh Texturing with LightControlNet
538	Oral	FlashTex: Fast Relightable Mesh Texturing with LightControlNet
539	Poster	Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
540	Oral	Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
541	Poster	TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
542	Oral	TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
543	Poster	EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
544	Poster	EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
545	Poster	3D Gaussian Parametric Head Model
546	Poster	Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
547	Poster	RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
548	Poster	PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
549	Poster	COMPOSE: Comprehensive Portrait Shadow Editing
550	Poster	GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
551	Poster	Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
552	Poster	Holodepth: Programmable Depth-Varying Projection via Computer-Generated Holography
553	Poster	BeNeRF:Neural Radiance Fields from a Single Blurry Image and Event Stream
554	Poster	VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
555	Poster	G3R: Gradient Guided Generalizable Reconstruction
556	Poster	Efficient NeRF Optimization - Not All Samples Remain Equally Hard
557	Poster	BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
558	Poster	SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields
559	Poster	RS-NeRF: Neural Radiance Fields from Rolling Shutter Images
560	Poster	Geometry Fidelity for Spherical Images
561	Poster	CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance
562	Poster	MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering
563	Poster	Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
564	Poster	GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time
565	Poster	Neural graphics texture compression supporting random access
566	Poster	GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
567	Poster	A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
568	Poster	Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
569	Poster	McGrids: Monte Carlo-Driven Adaptive Grids for Iso-Surface Extraction
570	Poster	latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
571	Poster	Non-parametric Sensor Noise Modeling and Synthesis
572	Poster	UpFusion: Novel View Diffusion from Unposed Sparse View Observations
573	Poster	MVDD: Multi-View Depth Diffusion Models
574	Poster	LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
575	Oral	LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
576	Poster	Hypernetworks for Generalizable BRDF Representation
577	Poster	High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding
578	Poster	Structured-NeRF: Hierarchical Scene Graph with Neural Representation
579	Poster	3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing
580	Poster	Free-Editor: Zero-shot Text-driven 3D Scene Editing
581	Poster	Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing
582	Poster	VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing
583	Poster	UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
584	Poster	ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation
585	Poster	DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
586	Poster	SceneTeller: Language-to-3D Scene Generation
587	Poster	Text to Layer-wise 3D Clothed Human Generation
588	Poster	ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model
589	Poster	D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On
590	Poster	Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence
591	Poster	Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
592	Poster	Temporal Residual Jacobians for Rig-free Motion Transfer
593	Poster	PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation
594	Poster	GroundUp: Rapid Sketch-Based 3D City Massing
595	Poster	DiscoMatch: Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching
596	Poster	FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation
597	Poster	PointNeRF++: A multi-scale, point-based Neural Radiance Field
598	Poster	Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis
599	Poster	UMERegRobust – Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
600	Poster	FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation
601	Poster	Learning to Adapt SAM for Segmenting Cross-domain Point Clouds
602	Poster	Osmosis: RGBD Diffusion Prior for Underwater Image Restoration
603	Poster	Differentiable Product Quantization for Memory Efficient Camera Relocalization
604	Poster	RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields
605	Poster	Light-in-Flight for a World-in-Motion
606	Poster	Binomial Self-compensation for Motion Error in Dynamic 3D Scanning
607	Poster	Non-Line-of-Sight Estimation of Fast Human Motion with Slow Scanning Imagers
608	Poster	Synchronization of Projective Transformations
609	Poster	Semicalibrated Relative Pose from an Affine Correspondence and Monodepth
610	Poster	GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring
611	Poster	LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
612	Poster	SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
613	Poster	Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences
614	Poster	U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation
615	Poster	EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation
616	Poster	Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
617	Poster	Cut out the Middleman: Revisiting Pose-based Gait Recognition
618	Poster	Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
619	Poster	EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
620	Poster	3D Hand Sequence Recovery from Real Blurry Images and Event Stream
621	Poster	Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
622	Poster	Learning Cross-hand Policies of High-DOF Reaching and Grasping
623	Poster	Free-Viewpoint Video of Outdoor Sports Using a Drone
624	Poster	Unsupervised Exposure Correction
625	Poster	Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training
626	Poster	Deep Cost Ray Fusion for Sparse Depth Video Completion
627	Poster	PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation
628	Poster	Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor
629	Poster	UniCal: Unified Neural Sensor Calibration
630	Poster	Multi-modal Crowd Counting via a Broker Modality
631	Poster	OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
632	Poster	FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection
633	Poster	MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain
634	Poster	SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data
635	Poster	UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
636	Poster	DeTra: A Unified Model for Object Detection and Trajectory Forecasting
637	Poster	RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
638	Poster	Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
639	Poster	PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
640	Poster	Sparse Refinement for Efficient High-Resolution Semantic Segmentation
641	Poster	InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
642	Poster	PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
643	Poster	Unified Local-Cloud Decision-Making via Reinforcement Learning
644	Poster	Generative End-to-End Autonomous Driving
645	Poster	MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction
646	Poster	Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
647	Poster	LayeredFlow: A Real-World Benchmark for Non-Lambertian Multi-Layer Optical Flow
648	Poster	Decomposition Betters Tracking Everything Everywhere
649	Poster	Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching
650	Poster	Efficient Learning of Event-based Dense Representation using Hierarchical Memories with Adaptive Update
651	Poster	Towards Real-world Event-guided Low-light Video Enhancement and Deblurring
652	Poster	Understanding Physical Dynamics with Counterfactual World Modeling
653	Poster	Prompting Future Driven Diffusion Model for Hand Motion Prediction
654	Poster	Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild
655	Poster	Motion Mamba: Efficient and Long Sequence Motion Generation
656	Poster	TLControl: Trajectory and Language Control for Human Motion Synthesis
657	Poster	ParCo: Part-Coordinating Text-to-Motion Synthesis
658	Poster	BAMM: Bidirectional Autoregressive Motion Model
659	Poster	Pose Guided Fine-Grained Sign Language Video Generation
660	Poster	DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion
661	Poster	Animate Your Motion: Turning Still Images into Dynamic Videos
662	Poster	V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation
663	Poster	DragVideo: Interactive Drag-style Video Editing
664	Poster	StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
665	Poster	MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
666	Poster	FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
667	Poster	Lazy Diffusion Transformer for Interactive Image Editing
668	Poster	WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians
669	Poster	Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis
670	Poster	Commonly Interesting Images
671	Poster	InstructGIE: Towards Generalizable Image Editing
672	Poster	The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
673	Poster	CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models
674	Poster	Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
675	Poster	Improving Text-guided Object Inpainting with Semantic Pre-inpainting
676	Poster	Customized Generation Reimagined: Fidelity and Editability Harmonized
677	Poster	ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
678	Poster	ViPer: Visual Personalization of Generative Models via Individual Preference Learning
679	Poster	MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
680	Poster	MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation
681	Poster	Towards Reliable Advertising Image Generation Using Human Feedback
682	Poster	IMMA: Immunizing text-to-image Models against Malicious Adaptation
683	Poster	PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
684	Poster	AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes
685	Poster	UniProcessor: A Text-induced Unified Low-level Image Processor
686	Poster	Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
687	Poster	EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models
688	Poster	Assessing Sample Quality via the Latent Space of Generative Models
689	Poster	Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
690	Poster	SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
691	Poster	Efficient Training with Denoised Neural Weights
692	Poster	FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
693	Poster	A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
694	Poster	Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
695	Poster	DSMix: Distortion-Induced Saliency Map Based Pre-training for No-Reference Image Quality Assessment
696	Poster	DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior
697	Poster	Restoring Images in Adverse Weather Conditions via Histogram Transformer
698	Poster	You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
699	Poster	Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
700	Poster	Efficient Cascaded Multiscale Adaptive Network for Image Restoration
701	Poster	Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
702	Poster	Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors
703	Poster	Taming Lookup Tables for Efficient Image Retouching
704	Poster	Quanta Video Restoration
705	Poster	Two-Stage Video Shadow Detection via Temporal-Spatial Adaption
706	Poster	Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework
707	Poster	Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging
708	Poster	NePhi: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
709	Poster	Neural Metamorphosis
710	Poster	Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
711	Poster	EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks
712	Poster	LaWa: Using Latent Space for In-Generation Image Watermarking
713	Poster	PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments
714	Poster	Delving into Adversarial Robustness on Document Tampering Localization
715	Poster	Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
716	Poster	Forbes: Face Obfuscation Rendering via Backpropagation Refinement Scheme
717	Poster	Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment
718	Poster	Generalizable Facial Expression Recognition
719	Poster	Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding
720	Poster	MinD-3D: Reconstruct High-quality 3D objects in Human Brain
721	Poster	Pathformer3D: A 3D Scanpath Transformer for 360° Images
722	Poster	Eliminating Warping Shakes for Unsupervised Online Video Stitching
723	Poster	OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
724	Poster	Semantically Guided Representation Learning For Action Anticipation
725	Poster	SIGMA: Sinkhorn-Guided Masked Video Modeling
726	Poster	Rethinking Image-to-Video Adaptation: An Object-centric Perspective
727	Poster	RICA^2: Rubric-Informed, Calibrated Assessment of Actions
728	Poster	VideoStudio: Generating Consistent-Content and Multi-Scene Videos
729	Poster	Training-free Video Temporal Grounding using Large-scale Pre-trained Models
730	Poster	EA-VTR: Event-Aware Video-Text Retrieval
731	Poster	Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
732	Poster	FunQA: Towards Surprising Video Comprehension
733	Poster	Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
734	Poster	Efficient Pre-training for Localized Instruction Generation of Procedural Videos
735	Poster	Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
736	Poster	Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
737	Poster	Visual Alignment Pre-training for Sign Language Translation
738	Poster	Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
739	Poster	Spectral Subsurface Scattering for Material Classification
740	Poster	MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
741	Poster	MeshVPR: Citywide Visual Place Recognition Using 3D Meshes
742	Poster	Frontier-enhanced Topological Memory with Improved Exploration Awareness for Embodied Visual Navigation
743	Poster	Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
744	Poster	Controllable Navigation Instruction Generation with Chain of Thought Prompting
745	Poster	NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
746	Poster	Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
747	Poster	INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding
748	Poster	SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
749	Poster	Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
750	Poster	Quality Assured: Rethinking Annotation Strategies in Imaging AI
751	Poster	BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
752	Poster	Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
753	Poster	A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
754	Poster	Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
755	Poster	DEAL: Disentangle and Localize Concept-level Explanations for VLMs
756	Poster	Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
757	Poster	FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
758	Poster	Instruction Tuning-free Visual Token Complement for Multimodal LLMs
759	Poster	IVTP: Instruction-guided Visual Token Pruning for Large Vision-Language Models
760	Poster	LookupViT: Compressing visual information to a limited number of tokens
761	Poster	SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
762	Poster	Integration of Global and Local Representations for Fine-grained Cross-modal Alignment
763	Poster	Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation
764	Poster	MyVLM: Personalizing VLMs for User-Specific Queries
765	Poster	ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
766	Poster	View Selection for 3D Captioning via Diffusion Ranking
767	Poster	GRiT: A Generative Region-to-text Transformer for Object Understanding
768	Poster	FreestyleRet: Retrieving Images from Style-Diversified Queries
769	Poster	LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation
770	Poster	OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction
771	Poster	Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
772	Poster	TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection
773	Poster	Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
774	Poster	Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-diversified Documents
775	Poster	Region-centric Image-Language Pretraining for Open-Vocabulary Detection
776	Poster	Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
777	Poster	Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
778	Poster	Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
779	Poster	Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
780	Poster	SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
781	Poster	PSALM: Pixelwise Segmentation with Large Multi-modal Model
782	Poster	Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
783	Poster	OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
784	Poster	On the Viability of Monocular Depth Pre-training for Semantic Segmentation
785	Poster	Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework
786	Poster	Open-Vocabulary Camouflaged Object Segmentation
787	Poster	From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
788	Poster	3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
789	Poster	Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation
790	Poster	Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation
791	Poster	Mitigating Background Shift in Class-Incremental Semantic Segmentation
792	Poster	LASS3D: Language-Assisted Semi-Supervised 3D Semantic Segmentation with Progressive Unreliable Data Exploitation
793	Poster	Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance
794	Poster	Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation
795	Poster	Zero-shot Object Counting with Good Exemplars
796	Poster	SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection
797	Poster	Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation
798	Poster	MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection
799	Poster	AugDETR: Improving Multi-scale Learning for Detection Transformer
800	Poster	Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter
801	Poster	DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
802	Poster	PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
803	Poster	ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
804	Poster	Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification
805	Poster	GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation
806	Poster	R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
807	Poster	Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
808	Poster	Continuous Memory Representation for Anomaly Detection
809	Poster	Learning Anomalies with Normality Prior for Unsupervised Video Anomaly Detection
810	Poster	Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data
811	Poster	Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector
812	Poster	Fairness-aware Vision Transformer via Debiased Self-Attention
813	Poster	AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
814	Poster	LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
815	Poster	Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time
816	Poster	Modality Translation for Object Detection Adaptation without forgetting prior knowledge
817	Poster	Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
818	Poster	Scaling Backwards: Minimal Synthetic Pre-training?
819	Poster	EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification
820	Poster	Training-Free Model Merging for Multi-target Domain Adaptation
821	Poster	CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
822	Poster	Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
823	Poster	Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
824	Poster	Semantic-guided Robustness Tuning for Few-Shot Transfer Across Extreme Domain Shift
825	Poster	Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts
826	Poster	Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models
827	Poster	FlowCon: Out-of-Distribution Detection using Flow-based Contrastive Learning
828	Poster	PixOOD: Pixel-Level Out-of-Distribution Detection
829	Poster	Distributionally Robust Loss for Long-Tailed Multi-Label Image Classification
830	Poster	Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data
831	Poster	GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
832	Poster	Generalized Coverage for More Robust Low-Budget Active Learning
833	Poster	Robust Nearest Neighbors for Source-Free Domain Adaptation under Class Distribution Shift
834	Poster	Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery
835	Poster	CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning
836	Poster	Disentangling Masked Autoencoders for Unsupervised Domain Generalization
837	Poster	Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
838	Poster	Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
839	Poster	Information Bottleneck Based Data Correction in Continual Learning
840	Poster	Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning
841	Poster	Markov Knowledge Distillation: Make Nasty Teachers trained by Self-undermining Knowledge Distillation Fully Distillable
842	Poster	FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
843	Poster	SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks
844	Poster	SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference
845	Poster	Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap
846	Poster	Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset
847	Poster	Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
848	Poster	Catastrophic Overfitting: A Potential Blessing in Disguise
849	Poster	Cocktail Universal Adversarial Attack on Deep Neural Networks
850	Poster	Unveiling Privacy Risks in Stochastic Neural Networks Training: Effective Image Reconstruction from Gradients
851	Poster	Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias
852	Poster	CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
853	Poster	Parrot Captions Teach CLIP to Spot Text
854	Oral	Parrot Captions Teach CLIP to Spot Text
855	Poster	Towards Model-Agnostic Dataset Condensation by Heterogeneous Models
856	Oral	Towards Model-Agnostic Dataset Condensation by Heterogeneous Models
857	Poster	VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking
858	Oral	VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking
859	Poster	Insect Identification in the Wild: The AMI Dataset
860	Oral	Insect Identification in the Wild: The AMI Dataset
861	Poster	Towards Open-ended Visual Quality Comparison
862	Oral	Towards Open-ended Visual Quality Comparison
863	Poster	UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
864	Oral	UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
865	Poster	MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description
866	Oral	MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description
867	Poster	Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
868	Oral	Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
869	Poster	Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View
870	Oral	Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View
871	Poster	Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
872	Oral	Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
873	Poster	SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images
874	Oral	SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images
875	Poster	CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
876	Oral	CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
877	Poster	PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
878	Oral	PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
879	Poster	PointLLM: Empowering Large Language Models to Understand Point Clouds
880	Oral	PointLLM: Empowering Large Language Models to Understand Point Clouds
881	Poster	HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
882	Oral	HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
883	Poster	Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
884	Oral	Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
885	Poster	RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation
886	Oral	RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation
887	Poster	RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
888	Oral	RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
889	Poster	KeypointDETR: An End-to-End 3D Keypoint Detector
890	Oral	KeypointDETR: An End-to-End 3D Keypoint Detector
891	Poster	All You Need is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation
892	Poster	TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
893	Poster	HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
894	Poster	Stable Video Portraits
895	Poster	iHuman: Instant Animatable Digital Humans From Monocular Videos
896	Poster	POCA: Post-training Quantization with Temporal Alignment for Codec Avatars
897	Poster	Towards Image Ambient Lighting Normalization
898	Poster	LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
899	Poster	Efficient Snapshot Spectral Imaging: Calibration-Free Parallel Structure with Aperture Diffraction Fusion
900	Poster	Physically Plausible Color Correction for Neural Radiance Fields
901	Poster	DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images
902	Poster	Volumetric Rendering with Baked Quadrature Fields
903	Poster	Depth-guided NeRF Training via Earth Mover’s Distance
904	Poster	RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
905	Poster	Deblurring 3D Gaussian Splatting
906	Poster	Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in Optimization
907	Poster	TriNeRFLet: A Wavelet Based Triplane NeRF Representation
908	Poster	LaRa: Efficient Large-Baseline Radiance Fields
909	Poster	RANRAC: Robust Neural Scene Representations via Random Ray Consensus
910	Poster	SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization
911	Poster	Learning Representations from Foundation Models for Domain Generalized Stereo Matching
912	Poster	CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization
913	Poster	CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
914	Poster	SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
915	Poster	On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy
916	Poster	Revising Densification in Gaussian Splatting
917	Poster	MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation
918	Poster	Topology-Preserving Downsampling of Binary Images
919	Poster	Zero-Shot Multi-Object Scene Completion
920	Poster	PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
921	Poster	VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
922	Poster	Analysis-by-Synthesis Transformer for Single-View 3D Reconstruction
923	Poster	Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping
924	Poster	COSMU: Complete 3D human shape from monocular unconstrained images
925	Poster	MeshFeat: Multi-Resolution Features for Neural Fields on Meshes
926	Poster	Real-time 3D-aware Portrait Editing from a Single Image
927	Poster	An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes
928	Poster	RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
929	Poster	Scene-Conditional 3D Object Stylization and Composition
930	Poster	DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
931	Poster	BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
932	Poster	Chains of Diffusion Models
933	Poster	NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
934	Poster	Learning Neural Deformation Representation for 4D Dynamic Shape Generation
935	Poster	Improving Diffusion Models for Authentic Virtual Try-on in the Wild
936	Poster	Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation
937	Poster	GIVT: Generative Infinite-Vocabulary Transformers
938	Poster	Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
939	Poster	LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
940	Poster	ZigMa: A DiT-style Zigzag Mamba Diffusion Model
941	Poster	Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems
942	Poster	Neural Surface Detection for Unsigned Distance Fields
943	Poster	VF-NeRF: Viewshed Fields for Rigid NeRF Registration
944	Poster	Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration
945	Oral	Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration
946	Poster	Transferable 3D Adversarial Shape Completion using Diffusion Models
947	Poster	Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation
948	Poster	PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training
949	Poster	Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds
950	Poster	Domain Generalization of 3D Object Detection by Density-Resampling
951	Poster	Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds
952	Poster	Physics-informed Knowledge Transfer for Underwater Monocular Depth Estimation
953	Poster	Improving 2D Feature Representations by 3D-Aware Fine-Tuning
954	Poster	SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images
955	Poster	3D Congealing: 3D-Aware Image Alignment in the Wild
956	Poster	Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
957	Poster	Revisiting Calibration of Wide-Angle Radially Symmetric Cameras
958	Poster	RGBD GS-ICP SLAM
959	Poster	FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
960	Poster	GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence
961	Poster	Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
962	Poster	Rotated Orthographic Projection for Self-Supervised 3D Human Pose Estimation
963	Poster	Diffusion Model is a Good Pose Estimator from 3D RF-Vision
964	Poster	Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding
965	Poster	Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image
966	Poster	3D Reconstruction of Objects in Hands without Real World 3D Supervision
967	Poster	Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance
968	Poster	MANIKIN: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation
969	Poster	Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection
970	Poster	GraspXL: Generating Grasping Motions for Diverse Objects at Scale
971	Poster	HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos
972	Poster	Object-Aware NIR-to-Visible Translation
973	Poster	SEDiff: Structure Extraction for Domain Adaptive Depth Estimation via Denoising Diffusion Models
974	Poster	Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
975	Poster	Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation
976	Poster	Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
977	Poster	DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
978	Oral	DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
979	Poster	Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
980	Poster	DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception
981	Poster	LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
982	Poster	Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection
983	Poster	RecurrentBEV: A Long-term Temporal Fusion Framework for Multi-view 3D Detection
984	Poster	JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention
985	Poster	MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception
986	Poster	UAV First-Person Viewers Are Radiance Field Learners
987	Poster	Caltech Aerial RGB-Thermal Dataset in the Wild
988	Poster	V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
989	Poster	CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction
990	Poster	Revisit Human-Scene Interaction via Space Occupancy
991	Poster	Enhancing Vectorized Map Perception with Historical Rasterized Maps
992	Poster	RoadPainter: Points Are Ideal Navigators for Topology transformER
993	Poster	VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
994	Poster	DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
995	Poster	SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic
996	Poster	Self-Supervised Video Desmoking for Laparoscopic Surgery
997	Oral	Self-Supervised Video Desmoking for Laparoscopic Surgery
998	Poster	BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events
999	Poster	LiDAR-Event Stereo Fusion with Hallucinations
1000	Poster	Temporal-Mapping Photography for Event Cameras
1001	Poster	Motion Aware Event Representation-driven Image Deblurring
1002	Poster	Event-Based Motion Magnification
1003	Poster	TimeLens-XL: Real-time Event-based Video Frame Interpolation with Large Motion
1004	Poster	Bidirectional Progressive Transformer for Interaction Intention Anticipation
1005	Poster	Reinforcement Learning via Auxillary Task Distillation
1006	Poster	COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation
1007	Poster	EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation
1008	Poster	MotionChain: Conversational Motion Controllers via Multimodal Prompts
1009	Poster	M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
1010	Poster	SMooDi: Stylized Motion Diffusion Model
1011	Poster	IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
1012	Poster	PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
1013	Poster	SAVE: Protagonist Diversification with Structure Agnostic Video Editing
1014	Poster	Kinetic Typography Diffusion Model
1015	Poster	DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
1016	Poster	StableDrag: Stable Dragging for Point-based Image Editing
1017	Poster	Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
1018	Poster	Curved Diffusion: A Generative Model With Optical Geometry Control
1019	Poster	Tuning-Free Image Customization with Image and Text Guidance
1020	Poster	StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
1021	Poster	AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling
1022	Poster	DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
1023	Poster	TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
1024	Poster	Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
1025	Poster	AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
1026	Poster	The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
1027	Poster	DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution
1028	Poster	MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
1029	Poster	ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion
1030	Poster	PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
1031	Poster	Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
1032	Poster	Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
1033	Poster	Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
1034	Poster	Distilling Diffusion Models into Conditional GANs
1035	Poster	Responsible Visual Editing
1036	Poster	HiEI: A Universal Framework for Generating High-quality Emerging Images from Natural Images
1037	Poster	MagicEraser: Erasing Any Objects via Semantics-Aware Control
1038	Poster	GenQ: Quantization in Low Data Regimes with Generative Synthetic Data
1039	Poster	DiffiT: Diffusion Vision Transformers for Image Generation
1040	Poster	DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
1041	Poster	∞-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
1042	Poster	Unmasking Bias in Diffusion Model Training
1043	Poster	Compensation Sampling for Improved Convergence in Diffusion Models
1044	Poster	Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks
1045	Poster	Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint
1046	Poster	Dual-Rain: Video Rain Removal using Assertive and Gentle Teachers
1047	Poster	A Comparative Study of Image Restoration Networks for General Backbone Network Design
1048	Poster	OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal
1049	Poster	Domain-adaptive Video Deblurring via Test-time Blurring
1050	Poster	Kernel Diffusion: An Alternate Approach to Blind Deconvolution
1051	Poster	Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models
1052	Poster	Kalman-Inspired Feature Propagation for Video Face Super-Resolution
1053	Poster	RealViformer: Investigating Attention for Real-World Video Super-Resolution
1054	Poster	Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence
1055	Poster	Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems
1056	Poster	Task-Driven Uncertainty Quantification in Inverse Problems via Conformal Prediction
1057	Poster	Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction
1058	Oral	Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction
1059	Poster	Wavelet Convolutions for Large Receptive Fields
1060	Poster	Long-term Temporal Context Gathering for Neural Video Compression
1061	Poster	Implicit Neural Models to Extract Heart Rate from Video
1062	Poster	A Watermark-Conditioned Diffusion Model for IP Protection
1063	Poster	Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
1064	Poster	Image Manipulation Detection With Implicit Neural Representation and Limited Supervision
1065	Poster	DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks
1066	Poster	Learning Natural Consistency Representation for Face Forgery Video Detection
1067	Poster	ARoFace: Alignment Robustness to Improve Low-quality Face Recognition
1068	Poster	AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
1069	Poster	PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
1070	Oral	PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
1071	Poster	Enhancing Cross-Subject fMRI-to-Video Decoding with Global-Local Functional Alignment
1072	Poster	Occlusion-Aware Seamless Segmentation
1073	Poster	Keypoint Promptable Re-Identification
1074	Poster	CoTracker: It is Better to Track Together
1075	Poster	Free Lunch for Gait Recognition: A Novel Relation Descriptor
1076	Poster	S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition
1077	Poster	SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
1078	Poster	Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition
1079	Poster	Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
1080	Poster	Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment
1081	Poster	Look Around and Learn: Self-Training Object Detection by Exploration
1082	Poster	Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition
1083	Poster	Self-Supervised Video Copy Localization with Regional Token Representation
1084	Poster	General and Task-Oriented Video Segmentation
1085	Poster	Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
1086	Poster	Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
1087	Poster	RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
1088	Poster	Referring Atomic Video Action Recognition
1089	Poster	Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs
1090	Poster	VideoAgent: Long-form Video Understanding with Large Language Model as Agent
1091	Poster	VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
1092	Poster	AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
1093	Poster	Learning Video Context as Interleaved Multimodal Sequences
1094	Poster	Multi-Modal Video Dialog State Tracking in the Wild
1095	Poster	Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
1096	Poster	Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
1097	Poster	Rethinking Normalization Layers for Domain Generalizable Person Re-identification
1098	Poster	Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken
1099	Poster	Learning Representations of Satellite Images From Metadata Supervision
1100	Poster	Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring
1101	Poster	Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
1102	Poster	AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
1103	Poster	QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
1104	Poster	Navigation Instruction Generation with BEV Perception and Large Language Models
1105	Poster	V-IRL: Grounding Virtual Intelligence in Real Life
1106	Poster	M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions
1107	Poster	OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
1108	Poster	Unifying 3D Vision-Language Understanding via Promptable Queries
1109	Poster	UMBRAE: Unified Multimodal Brain Decoding
1110	Poster	BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
1111	Poster	CoReS: Orchestrating the Dance of Reasoning and Segmentation
1112	Poster	A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
1113	Poster	Grounding Language Models for Visual Entity Recognition
1114	Poster	Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
1115	Poster	The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
1116	Poster	AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting
1117	Poster	UniCode : Learning a Unified Codebook for Multimodal Large Language Models
1118	Poster	X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
1119	Poster	EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
1120	Poster	EDformer: Transformer-Based Event Denoising Across Varied Noise Levels
1121	Poster	Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
1122	Poster	The Hard Positive Truth about Vision-Language Compositionality
1123	Poster	HiFi-Score: Fine-grained Image Description Evaluation with Hierarchical Parsing Graphs
1124	Poster	LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
1125	Poster	Language-Image Pre-training with Long Captions
1126	Poster	IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
1127	Poster	CIC-BART-SSA: : Controllable Image Captioning with Structured Semantic Augmentation
1128	Poster	Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective
1129	Poster	Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
1130	Poster	Cascade Prompt Learning for Visual-Language Model Adaptation
1131	Poster	Gaze Target Detection Based on Head-Local-Global Coordination
1132	Poster	Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model
1133	Poster	ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
1134	Poster	Towards Open-Ended Visual Recognition with Large Language Models
1135	Poster	AFreeCA: Annotation-Free Counting for All
1136	Poster	OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
1137	Poster	MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
1138	Poster	Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
1139	Poster	SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
1140	Poster	Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining
1141	Poster	ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
1142	Poster	Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
1143	Poster	DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation
1144	Poster	N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
1145	Poster	Prioritized Semantic Learning for Zero-shot Instance Navigation
1146	Poster	PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model
1147	Poster	SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
1148	Poster	Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation
1149	Poster	ProMerge: Prompt and Merge for Unsupervised Instance Segmentation
1150	Poster	Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
1151	Poster	Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation
1152	Poster	Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond
1153	Poster	UniFS: Universal Few-shot Instance Perception with Point Representations
1154	Poster	Crowd-SAM:SAM as a smart annotator for object detection in crowded scenes
1155	Poster	Adaptive Multi-task Learning for Few-shot Object Detection
1156	Poster	FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
1157	Poster	Distilling Knowledge from Large-Scale Image Models for Object Detection
1158	Poster	Revisiting Domain-Adaptive Object Detection in Adverse Weather by the Generation and Composition of High-Quality Pseudo-Labels
1159	Poster	Operational Open-Set Recognition and PostMax Refinement
1160	Poster	InfMAE: A Foundation Model in The Infrared Modality
1161	Poster	AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
1162	Poster	Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction
1163	Poster	Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
1164	Poster	Snuffy: Efficient Whole Slide Image Classifier
1165	Poster	Unified Medical Image Pre-training in Language-Guided Common Semantic Space
1166	Poster	Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
1167	Poster	TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data
1168	Poster	TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
1169	Poster	VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
1170	Poster	Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
1171	Poster	Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection
1172	Poster	Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision
1173	Poster	SAIR: Learning Semantic-aware Implicit Representation
1174	Poster	Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
1175	Poster	Learning with Unmasked Tokens Drives Stronger Vision Learners
1176	Poster	Emerging Property of Masked Token for Effective Pre-training
1177	Poster	Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding
1178	Poster	The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
1179	Poster	SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
1180	Poster	Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
1181	Poster	FYI: Flip Your Images for Dataset Distillation
1182	Poster	Data-to-Model Distillation: Data-Efficient Learning Framework
1183	Poster	Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection
1184	Poster	Active Generation for Image Classification
1185	Poster	Contrastive Learning with Synthetic Positives
1186	Poster	Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
1187	Poster	Robust Calibration of Large Vision-Language Adapters
1188	Poster	Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
1189	Poster	FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning
1190	Poster	Benchmarking Spurious Bias in Few-Shot Image Classifiers
1191	Poster	An Information Theoretical View for Out-Of-Distribution Detection
1192	Poster	ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection
1193	Poster	Adapting to Shifting Correlations with Unlabeled Data Calibration
1194	Poster	Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels
1195	Poster	On Pretraining Data Diversity for Self-Supervised Learning
1196	Poster	De-Confusing Pseudo-Labels in Source-Free Domain Adaptation
1197	Poster	Improving Unsupervised Domain Adaptation: A Pseudo-Candidate Set Approach
1198	Poster	Hierarchical Unsupervised Relation Distillation for Source Free Domain Adaptation
1199	Poster	Source-Free Domain-Invariant Performance Prediction
1200	Poster	Learning to Complement and to Defer to Multiple Users
1201	Poster	Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation
1202	Poster	Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching
1203	Poster	Revisiting Supervision for Continual Representation Learning
1204	Poster	Deep Companion Learning: Enhancing Generalization Through Historical Consistency
1205	Poster	Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy
1206	Poster	Harmonizing knowledge Transfer in Neural Network with Unified Distillation
1207	Poster	Feature Diversification and Adaptation for Federated Domain Generalization
1208	Poster	PFedEdit: Personalized Federated Learning via Automated Model Editing
1209	Poster	Enhanced Sparsification via Stimulative Training
1210	Poster	Dependency-aware Differentiable Neural Architecture Search
1211	Poster	Layer-Wise Relevance Propagation with Conservation Property for ResNet
1212	Poster	Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
1213	Poster	Training A Secure Model against Data-Free Model Extraction
1214	Poster	CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks
1215	Poster	Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection
1216	Poster	Leveraging Imperfect Restoration for Data Availability Attack
1217	Poster	Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs
1218	Poster	Augmented Neural Fine-tuning for Efficient Backdoor Purification
1219	Poster	MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition
1220	Oral	MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition
1221	Poster	RaFE: Generative Radiance Fields Restoration
1222	Oral	RaFE: Generative Radiance Fields Restoration
1223	Poster	Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
1224	Oral	Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
1225	Poster	FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information
1226	Oral	FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information
1227	Poster	Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
1228	Oral	Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
1229	Poster	RPBG: Towards Robust Neural Point-based Graphics in the Wild
1230	Oral	RPBG: Towards Robust Neural Point-based Graphics in the Wild
1231	Poster	MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
1232	Oral	MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
1233	Poster	Learning 3D-aware GANs from Unposed Images with Template Feature Field
1234	Oral	Learning 3D-aware GANs from Unposed Images with Template Feature Field
1235	Poster	Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
1236	Oral	Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
1237	Poster	Watch Your Steps: Local Image and Scene Editing by Text Instructions
1238	Oral	Watch Your Steps: Local Image and Scene Editing by Text Instructions
1239	Poster	Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
1240	Oral	Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
1241	Poster	Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction
1242	Oral	Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction
1243	Poster	ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
1244	Oral	ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
1245	Poster	DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
1246	Oral	DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
1247	Poster	Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
1248	Oral	Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
1249	Poster	ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
1250	Oral	ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
1251	Poster	Video Editing via Factorized Diffusion Distillation
1252	Oral	Video Editing via Factorized Diffusion Distillation
1253	Poster	Efficient Neural Video Representation with Temporally Coherent Modulation
1254	Oral	Efficient Neural Video Representation with Temporally Coherent Modulation
1255	Poster	SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
1256	Oral	SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
1257	Poster	LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
1258	Oral	LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
1259	Poster	NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction
1260	Oral	NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction
1261	Poster	UGG: Unified Generative Grasping
1262	Oral	UGG: Unified Generative Grasping
1263	Poster	LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment
1264	Oral	LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment
1265	Poster	Controllable Human-Object Interaction Synthesis
1266	Oral	Controllable Human-Object Interaction Synthesis
1267	Poster	Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
1268	Oral	Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
1269	Poster	Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
1270	Oral	Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
1271	Poster	POET: Prompt Offset Tuning for Continual Human Action Adaptation
1272	Oral	POET: Prompt Offset Tuning for Continual Human Action Adaptation
1273	Poster	NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
1274	Oral	NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
1275	Poster	AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
1276	Oral	AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
1277	Poster	Sapiens: Foundation for Human Vision Models
1278	Oral	Sapiens: Foundation for Human Vision Models
1279	Poster	KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
1280	Poster	Modeling and Driving Human Body Soundfields through Acoustic Primitives
1281	Poster	Let the Avatar Talk using Texts without Paired Training Data
1282	Poster	CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
1283	Poster	Relightable Neural Actor with Intrinsic Decomposition and Pose Control
1284	Poster	3R-INN: How to be climate friendly while consuming/delivering videos?
1285	Poster	Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement
1286	Poster	Intrinsic Single-Image HDR Reconstruction
1287	Poster	Domain Reduction Strategy for Non-Line-of-Sight Imaging
1288	Poster	Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering
1289	Poster	Synthesizing Time-varying BRDFs via Latent Space
1290	Poster	Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering
1291	Poster	Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator
1292	Poster	GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views
1293	Poster	Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
1294	Poster	Collaborative Control for Geometry-Conditioned PBR Image Generation
1295	Poster	KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter
1296	Poster	Weight Conditioning for Smooth Optimization of Neural Networks
1297	Poster	URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields
1298	Poster	MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo
1299	Poster	TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
1300	Poster	FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
1301	Poster	Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
1302	Poster	DoubleTake: Geometry Guided Depth Estimation
1303	Poster	Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
1304	Poster	SAGS: Structure-Aware 3D Gaussian Splatting
1305	Poster	Compact 3D Scene Representation via Self-Organizing Gaussian Grids
1306	Poster	HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
1307	Poster	GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
1308	Poster	Concise Plane Arrangements for Low-Poly Surface and Volume Modelling
1309	Poster	Gaussian Grouping: Segment and Edit Anything in 3D Scenes
1310	Poster	SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis
1311	Poster	STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
1312	Poster	Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting
1313	Poster	GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning
1314	Poster	Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
1315	Poster	FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis
1316	Poster	Retargeting Visual Data with Deformation Fields
1317	Poster	LatentEditor: Text Driven Local Editing of 3D Scenes
1318	Poster	StyleCity: Large-Scale 3D Urban Scenes Stylization
1319	Poster	Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
1320	Poster	Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation
1321	Poster	DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
1322	Poster	InterFusion: Text-Driven Generation of 3D Human-Object Interaction
1323	Poster	Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
1324	Poster	AWOL: Analysis WithOut synthesis using Language
1325	Poster	Improving Virtual Try-On with Garment-focused Diffusion Models
1326	Poster	GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns
1327	Poster	Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
1328	Poster	DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
1329	Poster	Generating 3D House Wireframes with Semantics
1330	Poster	LayoutFlow: Flow Matching for Layout Generation
1331	Poster	Synchronous Diffusion for Unsupervised Smooth Non-Rigid 3D Shape Matching
1332	Poster	Scalar Function Topology Divergence: Comparing Topology of 3D Objects
1333	Poster	DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction
1334	Poster	Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement
1335	Poster	FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds
1336	Poster	Frugal 3D Point Cloud Model Training via Progressive Near Point Filtering and Fused Aggregation
1337	Poster	SemReg: Semantics Constrained Point Cloud Registration
1338	Poster	GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding
1339	Poster	Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning
1340	Poster	RangeLDM: Fast Realistic LiDAR Point Cloud Generation
1341	Poster	Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
1342	Poster	SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
1343	Poster	Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
1344	Poster	Adaptive Annealing for Robust Averaging
1345	Poster	Resolving Scale Ambiguity in Multi-view 3D Reconstruction using Dual-Pixel Sensors
1346	Poster	Consistent 3D Line Mapping
1347	Poster	Robust Incremental Structure-from-Motion with Hybrid Features
1348	Poster	Gravity-aligned Rotation Averaging with Circular Regression
1349	Poster	GeoCalib: Learning Single-image Calibration with Geometric Optimization
1350	Poster	Real-time Holistic Robot Pose Estimation with Unknown States
1351	Poster	Learning Neural Volumetric Pose Features for Camera Localization
1352	Poster	LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation
1353	Poster	SCAPE: A Simple and Strong Category-Agnostic Pose Estimator
1354	Poster	Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation
1355	Poster	UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues
1356	Poster	Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses
1357	Poster	MLPHand: Real Time Multi-View 3D Hand Reconstruction via MLP Modeling
1358	Poster	WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation
1359	Poster	RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency
1360	Poster	An Economic Framework for 6-DoF Grasp Detection
1361	Poster	SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
1362	Oral	SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
1363	Poster	FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation
1364	Poster	OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations
1365	Poster	ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion
1366	Poster	Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
1367	Poster	SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
1368	Poster	Reinforcement Learning Meets Visual Odometry
1369	Poster	Mahalanobis Distance-based Multi-view Optimal Transport for Multi-view Crowd Localization
1370	Poster	Camera-LiDAR Cross-modality Gait Recognition
1371	Poster	TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving
1372	Poster	3D Single-object Tracking in Point Clouds with High Temporal Variation
1373	Poster	LISO: Lidar-only Self-Supervised 3D Object Detection
1374	Poster	MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
1375	Poster	IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
1376	Poster	MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
1377	Poster	Reliability in Semantic Segmentation: Can We Use Synthetic Data?
1378	Poster	DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
1379	Poster	Fully Sparse 3D Occupancy Prediction
1380	Poster	EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding
1381	Poster	Continuity Preserving Online CenterLine Graph Learning
1382	Poster	FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving
1383	Poster	Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)
1384	Poster	Solving Motion Planning Tasks with a Scalable Generative Model
1385	Poster	Enhanced Motion Forecasting with Visual Relation Reasoning
1386	Poster	OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
1387	Poster	Event-Aided Time-To-Collision Estimation for Autonomous Driving
1388	Poster	Event-based Mosaicing Bundle Adjustment
1389	Poster	Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations
1390	Poster	AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
1391	Poster	Learning-based Axial Video Motion Magnification
1392	Poster	Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation
1393	Poster	Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs
1394	Poster	Scalable Group Choreography via Variational Phase Manifold Learning
1395	Poster	FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
1396	Poster	Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation
1397	Poster	Drag Anything: Motion Control for Anything using Entity Representation
1398	Poster	Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores
1399	Poster	Audio-Synchronized Visual Animation
1400	Oral	Audio-Synchronized Visual Animation
1401	Poster	E.T. the Exceptional Trajectory: Text-to-camera-trajectory generation with character awareness
1402	Poster	MotionDirector: Motion Customization of Text-to-Video Diffusion Models
1403	Oral	MotionDirector: Motion Customization of Text-to-Video Diffusion Models
1404	Poster	SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
1405	Poster	Object-Centric Diffusion for Efficient Video Editing
1406	Poster	GroupDiff: Diffusion-based Group Portrait Editing
1407	Poster	Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
1408	Poster	Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing
1409	Poster	Towards compact reversible image representations for neural style transfer
1410	Poster	InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
1411	Poster	SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
1412	Poster	When and How do negative prompts take effect?
1413	Poster	SPIRE: Semantic Prompt-Driven Image Restoration
1414	Poster	LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
1415	Poster	UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
1416	Poster	Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
1417	Poster	Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
1418	Poster	Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
1419	Poster	Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
1420	Poster	LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
1421	Poster	Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
1422	Poster	SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
1423	Poster	EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
1424	Poster	Implicit Concept Removal of Diffusion Models
1425	Poster	NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
1426	Poster	Global Counterfactual Directions
1427	Poster	Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
1428	Poster	Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization
1429	Poster	AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
1430	Poster	Beta-Tuned Timestep Diffusion Model
1431	Poster	Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
1432	Poster	Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation
1433	Poster	InstructIR: High-Quality Image Restoration Following Human Instructions
1434	Poster	BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
1435	Poster	Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
1436	Poster	OneRestore: A Universal Restoration Framework for Composite Degradation
1437	Poster	UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
1438	Poster	Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution
1439	Poster	When Fast Fourier Transform Meets Transformer for Image Restoration
1440	Poster	Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation
1441	Poster	SuperGaussian: Repurposing Video Models for 3D Super Resolution
1442	Poster	Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers
1443	Poster	Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients
1444	Poster	Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems
1445	Poster	Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network
1446	Poster	Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video Reconstruction
1447	Poster	Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data
1448	Poster	A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks
1449	Poster	Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
1450	Poster	Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
1451	Poster	Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing
1452	Poster	Real Appearance Modeling for More General Deepfake Detection
1453	Poster	SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder
1454	Poster	Norface: Improving Facial Expression Analysis by Identity Normalization
1455	Poster	Open-Set Biometrics: Beyond Good Closed-Set Models
1456	Poster	Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals
1457	Poster	PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
1458	Poster	Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks
1459	Poster	SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
1460	Poster	Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
1461	Poster	VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG
1462	Poster	Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation
1463	Poster	Elucidating the Hierarchical Nature of Behavior with Masked Autoencoders
1464	Poster	FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
1465	Poster	Bayesian Evidential Deep Learning for Online Action Detection
1466	Poster	Event Camera Data Dense Pre-training
1467	Poster	Unsupervised Moving Object Segmentation with Atmospheric Turbulence
1468	Poster	Beyond MOT: Semantic Multi-Object Tracking
1469	Poster	MRSP: Learn Multi-Representations of Single Primitive for Compositional Zero-Shot Learning
1470	Poster	Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition
1471	Poster	Open Vocabulary Multi-Label Video Classification
1472	Poster	R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
1473	Poster	Leveraging temporal contextualization for video action recognition
1474	Poster	VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
1475	Poster	KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
1476	Poster	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
1477	Poster	HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
1478	Poster	Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
1479	Poster	Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
1480	Poster	Uncertainty-aware sign language video retrieval with probability distribution modeling
1481	Poster	NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition
1482	Poster	Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification
1483	Poster	HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis
1484	Poster	VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition
1485	Poster	Embodied Understanding of Driving Scenarios
1486	Poster	Octopus: Embodied Vision-Language Programmer from Environmental Feedback
1487	Poster	Finding Visual Task Vectors
1488	Poster	ControlLLM: Augment Language Models with Tools by Searching on Graphs
1489	Poster	ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
1490	Poster	Uni3DL: A Unified Model for 3D Vision-Language Understanding
1491	Poster	CrossScore: A Multi-View Approach to Image Evaluation and Scoring
1492	Poster	Compositional Substitutivity of Visual Reasoning for Visual Question Answering
1493	Poster	The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
1494	Poster	X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning
1495	Poster	ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
1496	Poster	Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
1497	Poster	Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
1498	Poster	MoAI: Mixture of All Intelligence for Large Language and Vision Models
1499	Poster	Training A Small Emotional Vision Language Model for Visual Art Comprehension
1500	Poster	Quantized Prompt for Efficient Generalization of Vision-Language Models
1501	Poster	VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
1502	Poster	Getting it Right: Improving Spatial Consistency in Text-to-Image Models
1503	Poster	MultiGen: Zero-shot Image Generation from Multi-modal Prompts
1504	Poster	Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
1505	Poster	VeCLIP: Improving CLIP Training via Visual-enriched Captions
1506	Poster	ControlCap: Controllable Region-level Captioning
1507	Poster	Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models
1508	Poster	Look Hear: Gaze Prediction for Speech-directed Human Attention
1509	Poster	Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
1510	Poster	LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
1511	Poster	Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy
1512	Poster	Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
1513	Poster	Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation
1514	Poster	Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
1515	Poster	Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
1516	Poster	SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding
1517	Poster	LoA-Trans: Enhancing Visual Grounding by Location-Aware Transformers
1518	Poster	SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
1519	Poster	EAFormer: Scene Text Segmentation with Edge-Aware Transformers
1520	Poster	CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
1521	Poster	Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
1522	Poster	Attention Decomposition for Cross-Domain Semantic Segmentation
1523	Poster	SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
1524	Poster	A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
1525	Poster	MC-PanDA: Mask Confidence for Panoptic Domain Adaptation
1526	Poster	OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing
1527	Poster	Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation
1528	Poster	Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation
1529	Poster	Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation
1530	Poster	ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation
1531	Poster	On-the-fly Category Discovery for LiDAR Semantic Segmentation
1532	Poster	CONDA: Condensed Deep Association Learning for Co-Salient Object Detection.
1533	Poster	General Geometry-aware Weakly Supervised 3D Object Detection
1534	Poster	CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
1535	Poster	MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks
1536	Poster	Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights
1537	Poster	Rethinking Features-Fused-Pyramid-Neck for Object Detection
1538	Poster	3D Small Object Detection with Dynamic Spatial Pruning
1539	Poster	Watching it in Dark: A Target-aware Representation Learning Framework for High-Level Vision Tasks in Low Illumination
1540	Poster	Gradient-Aware for Class-Imbalanced Semi-supervised Medical Image Segmentation
1541	Poster	Test-Time Stain Adaptation with Diffusion Models for Histopathology Image Classification
1542	Poster	WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
1543	Poster	ChEX: Interactive Localization and Region Description in Chest X-rays
1544	Poster	A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization
1545	Poster	Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection
1546	Poster	Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes
1547	Poster	FedVAD: Enhancing Federated Video Anomaly Detection with GPT-Driven Semantic Distillation
1548	Poster	Efficient Training of Spiking Neural Networks with Multi-Parallel Implicit Stream Architecture
1549	Poster	DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
1550	Poster	SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
1551	Poster	SeiT++: Masked Token Modeling Improves Storage-efficient Training
1552	Poster	AMD: Automatic Multi-step Distillation of Large-scale Vision Models
1553	Poster	Stitched ViTs are Flexible Vision Backbones
1554	Poster	MetaAug: Meta-Data Augmentation for Post-Training Quantization
1555	Poster	Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
1556	Poster	On Learning Discriminative Features from Synthesized Data for Self-Supervised Fine-Grained Visual Recognition
1557	Poster	Robust Multimodal Learning via Representation Decoupling
1558	Poster	SUMix: Mixup with Semantic and Uncertain Information
1559	Poster	Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
1560	Poster	Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
1561	Poster	SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
1562	Poster	Linking in Style: Understanding learned features in deep learning models
1563	Poster	Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
1564	Poster	Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning
1565	Poster	Strike a Balance in Continual Panoptic Segmentation
1566	Poster	IGNORE: Information Gap-based False Negative Loss Rejection for Single Positive Multi-Label Learning
1567	Poster	Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning
1568	Poster	Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation
1569	Poster	Learning to Distinguish Samples for Generalized Category Discovery
1570	Poster	Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data
1571	Poster	HVCLIP: High-dimensional Vector in CLIP for Unsupervised Domain Adaptation
1572	Poster	DiffClass: Diffusion-Based Class Incremental Learning
1573	Poster	Direct Distillation between Different Domains
1574	Poster	MemBN: Robust Test-Time Adaptation via Batch Norm with Statistics Memory
1575	Poster	PILoRA: Prototype Guided Incremental LoRA for Federated Class-Incremental Learning
1576	Poster	PromptFusion: Decoupling Stability and Plasticity for Continual Learning
1577	Poster	One-stage Prompt-based Continual Learning
1578	Poster	Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images
1579	Poster	Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization
1580	Poster	How to Train the Teacher Model for Effective Knowledge Distillation
1581	Poster	Local and Global Flatness for Federated Domain Generalization
1582	Poster	Dataset Quantization with Active Learning based Adaptive Sampling
1583	Poster	DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
1584	Poster	Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
1585	Poster	On Spectral Properties of Gradient-based Explanation Methods
1586	Poster	Cross-Input Certified Training for Universal Perturbations
1587	Poster	Interpretability-Guided Test-Time Adversarial Defense
1588	Poster	Exploring Guided Sampling of Conditional GANs
1589	Poster	Self-Supervised Representation Learning for Adversarial Attack Detection
1590	Poster	Non-transferable Pruning
1591	Poster	On the Vulnerability of Skip Connections to Model Inversion Attacks
1592	Poster	Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness
1593	Poster	Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
1594	Oral	Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
1595	Poster	Diffusion Models for Open-Vocabulary Segmentation
1596	Oral	Diffusion Models for Open-Vocabulary Segmentation
1597	Poster	Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
1598	Oral	Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
1599	Poster	CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
1600	Oral	CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
1601	Poster	Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels
1602	Oral	Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels
1603	Poster	ActionVOS: Actions as Prompts for Video Object Segmentation
1604	Oral	ActionVOS: Actions as Prompts for Video Object Segmentation
1605	Poster	WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
1606	Oral	WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
1607	Poster	A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability
1608	Oral	A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability
1609	Poster	COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation
1610	Oral	COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation
1611	Poster	Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
1612	Oral	Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
1613	Poster	Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
1614	Oral	Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
1615	Poster	MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
1616	Oral	MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
1617	Poster	Faceptor: A Generalist Model for Face Perception
1618	Oral	Faceptor: A Generalist Model for Face Perception
1619	Poster	Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking
1620	Oral	Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking
1621	Poster	Learning Multimodal Latent Generative Models with Energy-Based Prior
1622	Oral	Learning Multimodal Latent Generative Models with Energy-Based Prior
1623	Poster	Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization
1624	Oral	Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization
1625	Poster	SINDER: Repairing the Singular Defects of DINOv2
1626	Oral	SINDER: Repairing the Singular Defects of DINOv2
1627	Poster	Emergent Visual-Semantic Hierarchies in Image-Text Representations
1628	Oral	Emergent Visual-Semantic Hierarchies in Image-Text Representations
1629	Poster	PiTe: Pixel-Temporal Alignment for Large Video-Language Model
1630	Oral	PiTe: Pixel-Temporal Alignment for Large Video-Language Model
1631	Poster	Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
1632	Oral	Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
1633	Poster	Denoising Vision Transformers
1634	Oral	Denoising Vision Transformers
1635	Poster	Audio-driven Talking Face Generation with Stabilized Synchronization Loss
1636	Poster	ScanTalk: 3D Talking Heads from Unregistered Scans
1637	Poster	Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
1638	Poster	Fast Registration of Photorealistic Avatars for VR Facial Animation
1639	Poster	MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
1640	Poster	Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation
1641	Poster	Learning to Robustly Reconstruct Dynamic Scenes from Low-light Spike Streams
1642	Poster	Wavelength-Embedding-guided Filter-Array Transformer for Spectral Demosaicing
1643	Poster	Learned HDR Image Compression for Perceptually Optimal Storage and Display
1644	Poster	Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging
1645	Poster	Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions
1646	Poster	The Sky's the Limit: Relightable Outdoor Scenes via a Sky-pixel Constrained Illumination Prior and Outside-In Visibility
1647	Poster	A Probability-guided Sampler for Neural Implicit Surface Rendering
1648	Poster	REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices
1649	Poster	Dynamic Neural Radiance Field From Defocused Monocular Video
1650	Poster	VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting
1651	Poster	DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes
1652	Poster	NeRF-XL: NeRF at Any Scale with Multi-GPU
1653	Poster	G2fR: Frequency Regularization in Grid-based Feature Encoding Neural Radiance Fields
1654	Poster	InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction
1655	Poster	MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections
1656	Poster	Disentangled Generation and Aggregation for Robust Radiance Fields
1657	Poster	CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
1658	Poster	SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
1659	Poster	Surface Reconstruction for 3D Gaussian Splatting via Local Structural Hints
1660	Poster	Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
1661	Poster	GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
1662	Poster	SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
1663	Poster	An Adaptive Screen-Space Meshing Approach for Normal Integration
1664	Poster	Fast View Synthesis of Casual Videos with Soup-of-Planes
1665	Poster	4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
1666	Poster	GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
1667	Poster	Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models
1668	Poster	ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
1669	Poster	LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
1670	Poster	External Knowledge Enhanced 3D Scene Generation from Sketch
1671	Poster	EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
1672	Poster	3DEgo: 3D Editing on the Go!
1673	Poster	Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion
1674	Poster	JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
1675	Poster	Diverse Text-to-3D Synthesis with Augmented Text Embedding
1676	Poster	SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers
1677	Poster	CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
1678	Poster	Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
1679	Poster	DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose
1680	Poster	Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling
1681	Poster	LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation
1682	Poster	Learned Neural Physics Simulation for Articulated 3D Human Pose Reconstruction
1683	Poster	Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
1684	Poster	Vista3D: unravel the 3d darkside of a single image
1685	Poster	Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem
1686	Poster	NICP: Neural ICP for 3D Human Registration at Scale
1687	Poster	PFGS: High Fidelity Point Cloud Rendering via Feature Splatting
1688	Poster	TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
1689	Poster	EINet: Point Cloud Completion via Extrapolation and Interpolation
1690	Poster	DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction
1691	Poster	Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning
1692	Poster	CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
1693	Poster	Formula-Supervised Visual-Geometric Pre-training
1694	Poster	Canonical Shape Projection is All You Need for 3D Few-shot Class Incremental Learning
1695	Poster	Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching
1696	Poster	DGD: Dynamic 3D Gaussians Distillation
1697	Poster	SHIC: Shape-Image Correspondences with no Keypoint Supervision
1698	Poster	LineFit: A Geometric Approach for Fitting Line Segments in Images
1699	Poster	Global Structure-from-Motion Revisited
1700	Poster	Robust Fitting on a Gate Quantum Computer
1701	Oral	Robust Fitting on a Gate Quantum Computer
1702	Poster	The Nerfect Match: Exploring NeRF Features for Visual Localization
1703	Poster	A Cephalometric Landmark Regression Method based on Dual-encoder for High-resolution X-ray Image
1704	Poster	FoundPose: Unseen Object Pose Estimation with Foundation Features
1705	Poster	PoseSOR: Human Pose Can Guide Our Attention
1706	Poster	A Graph-Based Approach for Category-Agnostic Pose Estimation
1707	Poster	3DSA:Multi-View 3D Human Pose Estimation With 3D Space Attention Mechanisms
1708	Poster	HPE-Li: WiFi-enabled Lightweight Dual Selective Kernel Convolution for Human Pose Estimation
1709	Poster	HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
1710	Poster	WHAC: World-grounded Humans and Cameras
1711	Poster	EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset
1712	Poster	3D Human Pose Estimation via Non-Causal Retentive Networks
1713	Poster	Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
1714	Poster	Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs
1715	Poster	R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding
1716	Poster	Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
1717	Poster	FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
1718	Poster	Möbius Transform for Mitigating Perspective Distortions in Representation Learning
1719	Poster	UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation
1720	Poster	DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences
1721	Poster	HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
1722	Poster	SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
1723	Poster	Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance
1724	Poster	Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
1725	Poster	LiDAR-based All-weather 3D Object Detection via Prompting and Distilling 4D Radar
1726	Poster	SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather
1727	Poster	Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
1728	Oral	Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
1729	Poster	SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
1730	Poster	DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
1731	Poster	UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
1732	Poster	VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
1733	Poster	OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
1734	Poster	Stream Query Denoising for Vectorized HD-Map Construction
1735	Poster	Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
1736	Poster	Early Anticipation of Driving Maneuvers
1737	Poster	Adaptive Human Trajectory Prediction via Latent Corridors
1738	Poster	Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model
1739	Poster	Probabilistic Weather Forecasting with Deterministic Guidance-based Diffusion Model
1740	Poster	Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation
1741	Poster	Temporal Event Stereo via Joint Learning with Stereoscopic Flow
1742	Poster	FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-Based CNN
1743	Poster	Event-Adapted Video Super-Resolution
1744	Poster	Diffusion Models as Optimizers for Efficient Planning in Offline RL
1745	Poster	Scene-aware Human Motion Forecasting via Mutual Distance Prediction
1746	Poster	CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
1747	Poster	F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
1748	Poster	Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases
1749	Poster	CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
1750	Poster	Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
1751	Poster	Co-speech Gesture Video Generation with 3D Human Meshes
1752	Poster	MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
1753	Poster	MEVG : Multi-event Video Generation with Text-to-Video Models
1754	Poster	HARIVO: Harnessing Text-to-Image Models for Video Generation
1755	Poster	WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing
1756	Poster	RegionDrag: Fast Region-Based Image Editing with Diffusion Models
1757	Poster	TurboEdit: Real-time text-based disentangled real image editing
1758	Poster	Factorized Diffusion: Perceptual Illusions by Noise Decomposition
1759	Poster	DiffusionPen: Towards Controlling the Style of Handwritten Text Generation
1760	Poster	ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
1761	Poster	Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization
1762	Poster	FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
1763	Poster	AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
1764	Poster	Training-free Composite Scene Generation for Layout-to-Image Synthesis
1765	Poster	Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
1766	Poster	Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
1767	Poster	Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
1768	Poster	OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
1769	Poster	Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
1770	Poster	BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
1771	Poster	Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
1772	Poster	MONTAGE: Monitoring Training for Attribution of Generative Diffusion Models
1773	Poster	ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
1774	Poster	Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
1775	Poster	To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
1776	Poster	The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations
1777	Poster	Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution
1778	Poster	DomainFusion: Generalizing To Unseen Domains with Latent Diffusion Models
1779	Poster	AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation
1780	Oral	AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation
1781	Poster	Memory-Efficient Fine-Tuning for Quantized Diffusion Model
1782	Poster	SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
1783	Poster	HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models
1784	Poster	EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation
1785	Poster	Diffusion for Natural Image Matting
1786	Poster	Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
1787	Poster	MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
1788	Poster	TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts
1789	Poster	Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
1790	Poster	Confidence-Based Iterative Generation for Real-World Image Super-Resolution
1791	Poster	Efficient Frequency-Domain Image Deraining with Contrastive Regularization
1792	Poster	Blind Image Deconvolution by Generative-based Kernel Prior and Initializer via Latent Encoding
1793	Poster	SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging
1794	Poster	Rethinking Image Super Resolution from Training Data Perspectives
1795	Poster	Accelerating Image Super-Resolution Networks with Pixel-Level Classification
1796	Poster	Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks
1797	Poster	Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
1798	Poster	Uncertainty-Driven Spectral Compressive Imaging with Spatial-Frequency Transformer
1799	Poster	Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing
1800	Poster	Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers
1801	Poster	RadEdit: stress-testing biomedical vision models via diffusion image editing
1802	Poster	Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
1803	Poster	Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
1804	Poster	Fast Encoding and Decoding for Implicit Video Representation
1805	Poster	Implicit Steganography Beyond the Constraints of Modality
1806	Poster	Certifiably Robust Image Watermark
1807	Poster	DSA: Discriminative Scatter Analysis for Early Smoke Segmentation
1808	Poster	AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network
1809	Poster	DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
1810	Poster	Face Reconstruction Transfer Attack as Out-of-Distribution Generalization
1811	Poster	Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
1812	Poster	Facial Affective Behavior Analysis with Instruction Tuning
1813	Poster	VideoClusterNet: Self-Supervised and Adaptive Face Clustering for Videos
1814	Poster	When Do We Not Need Larger Vision Models?
1815	Poster	Open Panoramic Segmentation
1816	Poster	PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking
1817	Poster	Self-Supervised Any-Point Tracking by Contrastive Random Walks
1818	Poster	WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
1819	Poster	Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
1820	Poster	EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
1821	Poster	Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
1822	Poster	ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
1823	Poster	Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
1824	Poster	OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection
1825	Poster	Improving Video Segmentation via Dynamic Anchor Queries
1826	Poster	VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
1827	Poster	Merlin: Empowering Multimodal LLMs with Foresight Minds
1828	Poster	STSP: Spatial-Temporal Subspace Projection for Video Class-incremental Learning
1829	Poster	UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
1830	Poster	Contextual Correspondence Matters: Bidirectional Graph Matching for Video Summarization
1831	Poster	Weakly-Supervised Spatio-Temporal Video Grounding with Variational Cross-Modal Alignment
1832	Poster	AMEGO: Active Memory from long EGOcentric videos
1833	Poster	Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
1834	Poster	TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning
1835	Poster	Delving Deep into Engagement Prediction of Short Videos
1836	Poster	LITA: Language Instructed Temporal-Localization Assistant
1837	Poster	CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing
1838	Poster	Siamese Vision Transformers are Scalable Audio-visual Learners
1839	Poster	EvSign: Sign Language Recognition and Translation with Streaming Events
1840	Poster	WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
1841	Poster	Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
1842	Poster	Masked Angle-Aware Autoencoder for Remote Sensing Images
1843	Poster	Revisit Anything: Visual Place Recognition via Image Segment Retrieval
1844	Poster	Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
1845	Poster	Reinforcement Learning Friendly Vision-Language Model for Minecraft
1846	Poster	DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
1847	Poster	See and Think: Embodied Agent in Virtual Environment
1848	Poster	PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation
1849	Poster	HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
1850	Poster	Take A Step Back: Rethinking the Two Stages in Visual Reasoning
1851	Poster	Multi-Task Domain Adaptation for Language Grounding with 3D Objects
1852	Poster	MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
1853	Poster	Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
1854	Poster	LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
1855	Poster	How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
1856	Poster	MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
1857	Poster	Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
1858	Poster	Object-Oriented Anchoring and Modal Alignment in Multimodal Learning
1859	Poster	An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
1860	Poster	Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
1861	Poster	Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
1862	Poster	UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
1863	Poster	ReGround: Improving Textual and Spatial Grounding at No Cost
1864	Poster	Platypus: A Generalized Specialist Model for Reading Text in Various Forms
1865	Poster	Long-CLIP: Unlocking the Long-Text Capability of CLIP
1866	Poster	Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
1867	Poster	RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement
1868	Poster	Tokenize Anything via Prompting
1869	Poster	FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors
1870	Poster	De-confounded Gaze Estimation
1871	Poster	GalLop: Learning global and local prompts for vision-language models
1872	Poster	OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection
1873	Poster	CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
1874	Poster	Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
1875	Poster	Can OOD Object Detectors Learn from Foundation Models?
1876	Poster	VEON: Vocabulary-Enhanced Occupancy Prediction
1877	Poster	Efficient Vision Transformers with Partial Attention
1878	Poster	SAFARI: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
1879	Poster	ReMamber: Referring Image Segmentation with Mamba Twister
1880	Poster	Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
1881	Poster	A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
1882	Poster	Enriching Information and Preserving Semantic Congruence in Expanding Curvilinear Object Segmentation Datasets
1883	Poster	Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
1884	Poster	View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields
1885	Poster	Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization
1886	Poster	Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
1887	Poster	PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
1888	Poster	Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
1889	Poster	OpenDistill3D: Open-World 3D Instance Segmentation with Unified Self-Distillation for Continual Learning and Unknown Class Discovery
1890	Poster	Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation
1891	Poster	Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
1892	Poster	Bayesian Self-Training for Semi-Supervised 3D Segmentation
1893	Poster	Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation
1894	Poster	CSOT: Cross-Scan Object Transfer for Semi-Supervised LiDAR Object Detection
1895	Poster	Interactive 3D Object Detection with Prompts
1896	Poster	SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection
1897	Poster	Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection
1898	Poster	Benchmarking Object Detectors with COCO: A New Path Forward
1899	Poster	Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
1900	Poster	GRA: Detecting Oriented Objects through Group-wise Rotating and Attention
1901	Poster	DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
1902	Poster	AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
1903	Poster	Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
1904	Poster	Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
1905	Poster	cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process
1906	Poster	Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification
1907	Poster	Learning with Counterfactual Explanations for Radiology Report Generation
1908	Poster	Improving Medical Multi-modal Contrastive Learning with Expert Annotations
1909	Poster	Few-shot Defect Image Generation based on Consistency Modeling
1910	Poster	Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
1911	Poster	Learning Diffusion Models for Multi-View Anomaly Detection
1912	Poster	Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
1913	Poster	Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
1914	Poster	Enhancing Optimization Robustness in 1-bit Neural Networks through Stochastic Sign Descent
1915	Poster	Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
1916	Poster	SNP: Structured Neuron-level Pruning to Preserve Attention Scores
1917	Poster	Tiny Models are the Computational Saver for Large Models
1918	Poster	Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
1919	Poster	Trainable Highly-expressive Activation Functions
1920	Poster	HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
1921	Poster	To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning
1922	Poster	SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
1923	Poster	Linearly Controllable GAN: Unsupervised Feature Categorization and Decomposition for Image Generation and Manipulation
1924	Poster	Diagnosing and Re-learning for Balanced Multimodal Learning
1925	Poster	Visual Prompting via Partial Optimal Transport
1926	Poster	Pseudo-Labelling Should Be Aware of Disguising Channel Activations
1927	Poster	Efficient and Versatile Robust Fine-Tuning of Zero-shot Models
1928	Poster	Unsupervised Representation Learning by Balanced Self Attention Matching
1929	Poster	Optimal Transport of Diverse Unsupervised Tasks for Robust Learning from Noisy Few-Shot Data
1930	Poster	Gradient-based Out-of-Distribution Detection
1931	Poster	SLIM: Spuriousness Mitigation with Minimal Human Annotations
1932	Poster	Modeling Label Correlations with Latent Context for Multi-Label Recognition
1933	Poster	Rebalancing Using Estimated Class Distribution for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch
1934	Poster	Foster Adaptivity and Balance in Learning with Noisy Labels
1935	Poster	Self-Guided Generation of Minority Samples Using Diffusion Models
1936	Poster	Self-Cooperation Knowledge Distillation for Novel Class Discovery
1937	Poster	Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration
1938	Poster	Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams
1939	Poster	Few-shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt
1940	Poster	Exemplar-free Continual Representation Learning via Learnable Drift Compensation
1941	Poster	Open-World Dynamic Prompt and Continual Visual Representation Learning
1942	Poster	Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
1943	Poster	Simple Unsupervised Knowledge Distillation With Space Similarity
1944	Poster	AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition
1945	Poster	Dataset Growth
1946	Poster	Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
1947	Poster	MO-EMT-NAS: Multi-Objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets
1948	Poster	BAFFLE: A Baseline of Backpropagation-Free Federated Learning
1949	Poster	On the Evaluation Consistency of Attribution-based Explanations
1950	Poster	Debiasing surgeon: fantastic weights and how to find them
1951	Poster	Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
1952	Poster	Improving Adversarial Transferability via Model Alignment
1953	Poster	Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation
1954	Poster	Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures
1955	Poster	CipherDM: Secure Three-Party Inference for Diffusion Model Sampling
1956	Poster	UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
1957	Poster	Exact Diffusion Inversion via Bidirectional Integration Approximation
1958	Oral	Exact Diffusion Inversion via Bidirectional Integration Approximation
1959	Poster	ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
1960	Oral	ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
1961	Poster	Tackling Structural Hallucination in Image Translation with Local Diffusion
1962	Oral	Tackling Structural Hallucination in Image Translation with Local Diffusion
1963	Poster	Adversarial Diffusion Distillation
1964	Oral	Adversarial Diffusion Distillation
1965	Poster	Pyramid Diffusion for Fine 3D Large Scene Generation
1966	Oral	Pyramid Diffusion for Fine 3D Large Scene Generation
1967	Poster	Controlling the World by Sleight of Hand
1968	Oral	Controlling the World by Sleight of Hand
1969	Poster	Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
1970	Oral	Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
1971	Poster	OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
1972	Oral	OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
1973	Poster	MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
1974	Oral	MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
1975	Poster	C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
1976	Oral	C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
1977	Poster	Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
1978	Oral	Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
1979	Poster	Towards Neuro-Symbolic Video Understanding
1980	Oral	Towards Neuro-Symbolic Video Understanding
1981	Poster	DEVIAS: Learning Disentangled Video Representations of Action and Scene
1982	Oral	DEVIAS: Learning Disentangled Video Representations of Action and Scene
1983	Poster	Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
1984	Oral	Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
1985	Poster	E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
1986	Oral	E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
1987	Poster	Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos
1988	Oral	Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos
1989	Poster	LongVLM: Efficient Long Video Understanding via Large Language Models
1990	Oral	LongVLM: Efficient Long Video Understanding via Large Language Models
1991	Poster	Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
1992	Oral	Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
1993	Poster	Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
1994	Oral	Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
1995	Poster	A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
1996	Oral	A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
1997	Poster	Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
1998	Oral	Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
1999	Poster	Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
2000	Oral	Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
2001	Poster	BRAVE: Broadening the visual encoding of vision-language models
2002	Oral	BRAVE: Broadening the visual encoding of vision-language models
2003	Poster	MMBENCH: Is Your Multi-Modal Model an All-around Player?
2004	Oral	MMBENCH: Is Your Multi-Modal Model an All-around Player?
2005	Poster	uCAP: An Unsupervised Prompting Method for Vision-Language Models
2006	Oral	uCAP: An Unsupervised Prompting Method for Vision-Language Models
2007	Poster	HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
2008	Oral	HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
2009	Poster	An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
2010	Oral	An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
2011	Poster	GiT: Towards Generalist Vision Transformer through Universal Language Interface
2012	Oral	GiT: Towards Generalist Vision Transformer through Universal Language Interface
2013	Poster	Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
2014	Oral	Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
2015	Poster	Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°
2016	Poster	Tri^{2}-plane: Thinking Head Avatar via Feature Pyramid
2017	Poster	AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos
2018	Poster	AnimateMe: 4D Facial Expressions via Diffusion Models
2019	Poster	Real-data-driven 2000 FPS Color Video from Mosaicked Chromatic Spikes
2020	Poster	Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography
2021	Poster	Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats
2022	Poster	Self-Supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM
2023	Poster	Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis
2024	Poster	Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending
2025	Poster	UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation
2026	Poster	City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
2027	Poster	Few-shot NeRF by Adaptive Rendering Loss Regularization
2028	Poster	BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
2029	Poster	Generalizable Human Gaussians for Sparse View Synthesis
2030	Poster	Invertible Neural Warp for NeRF
2031	Poster	PISR: Polarimetric Neural Implicit Surface Reconstruction for Textureless and Specular Objects
2032	Poster	Improving Neural Surface Reconstruction with Feature Priors from Multi-View Images
2033	Poster	SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
2034	Poster	Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections
2035	Poster	3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting
2036	Poster	HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
2037	Poster	GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
2038	Poster	EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
2039	Poster	End-to-End Rate-Distortion Optimized 3D Gaussian Representation
2040	Poster	DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
2041	Poster	Human Hair Reconstruction with Strand-Aligned 3D Gaussians
2042	Poster	Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
2043	Poster	Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
2044	Poster	SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
2045	Poster	MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
2046	Poster	DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
2047	Poster	CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
2048	Poster	Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch Image
2049	Poster	Lagrangian Hashing for Compressed Neural Field Representations
2050	Poster	GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
2051	Poster	Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
2052	Poster	TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
2053	Poster	TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling
2054	Poster	Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
2055	Poster	LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
2056	Poster	Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation
2057	Poster	Synthesizing Environment-Specific People in Photographs
2058	Poster	Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models
2059	Poster	Shapefusion: 3D localized human diffusion models
2060	Poster	Fast Sprite Decomposition from Animated Graphics
2061	Poster	Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution
2062	Poster	WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
2063	Poster	Dolfin: Diffusion Layout Transformers without Autoencoder
2064	Poster	MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes
2065	Poster	RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion
2066	Poster	Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds
2067	Poster	FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation
2068	Poster	T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy
2069	Poster	SEED: A Simple and Effective 3D DETR in Point Clouds
2070	Poster	ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
2071	Poster	CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation
2072	Poster	Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes
2073	Poster	Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains
2074	Poster	Multi-modal Relation Distillation for Unified 3D Representation Learning
2075	Poster	NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
2076	Poster	Single-Photon 3D Imaging with Equi-Depth Photon Histograms
2077	Poster	Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment
2078	Poster	SelfGeo: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes
2079	Poster	Leveraging scale- and orientation-covariant features for planar motion estimation
2080	Poster	Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
2081	Poster	Bones Can't Be Triangles: Accurate and Efficient Vertebrae Keypoint Estimation through Collaborative Error Revision
2082	Poster	TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly
2083	Poster	SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction
2084	Poster	VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
2085	Poster	Human Pose Recognition via Occlusion-Preserving Abstract Images
2086	Poster	RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark
2087	Poster	6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry
2088	Poster	HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning
2089	Poster	On the Utility of 3D Hand Poses for Action Recognition
2090	Poster	Multi-Person Pose Forecasting with Individual Interaction Perceptron and Prior Learning
2091	Poster	ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
2092	Poster	Revisit Self-supervision with Local Structure-from-Motion
2093	Poster	AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
2094	Poster	High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior
2095	Poster	Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
2096	Poster	Benchmarking the Robustness of Cross-view Geo-localization Models
2097	Poster	Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
2098	Poster	Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection
2099	Poster	GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
2100	Poster	Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
2101	Poster	LEROjD: Lidar Extended Radar-Only Object Detection
2102	Poster	Towards Stable 3D Object Detection
2103	Poster	ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
2104	Poster	EgoPet: Egomotion and Interaction Data from an Animal's Perspective
2105	Poster	WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
2106	Poster	Beyond the Data Imbalance: Employing the Heterogeneous Datasets for Vehicle Maneuver Prediction
2107	Poster	GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
2108	Poster	ADMap: Anti-disturbance Framework for Vectorized HD Map Construction
2109	Poster	Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
2110	Poster	CarFormer: Self-Driving with Learned Object-Centric Representations
2111	Poster	DySeT: a Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction
2112	Poster	NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving
2113	Poster	Visual Relationship Transformation
2114	Poster	Local All-Pair Correspondence for Point Tracking
2115	Poster	Un-EVIMO: Unsupervised Event-based Independent Motion Segmentation
2116	Poster	Edge-Guided Fusion and Motion Augmentation for Event-Image Stereo
2117	Poster	Physical-Based Event Camera Simulator
2118	Poster	REDIR: Refocus-free Event-based De-occlusion Image Reconstruction
2119	Poster	Exploiting Dual-Correlation for Multi-frame Time-of-Flight Denoising
2120	Poster	Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation
2121	Poster	DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
2122	Poster	Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
2123	Poster	HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
2124	Poster	ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions
2125	Poster	Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
2126	Poster	MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
2127	Poster	Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
2128	Poster	Self-Supervised Audio-Visual Soundscape Stylization
2129	Poster	TC4D: Trajectory-Conditioned Text-to-4D Generation
2130	Poster	LivePhoto: Real Image Animation with Text-guided Motion Control
2131	Poster	Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
2132	Poster	Photorealistic Video Generation with Diffusion Models
2133	Poster	High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
2134	Poster	Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation
2135	Poster	Editable Image Elements for Controllable Synthesis
2136	Poster	Implicit Style-Content Separation using B-LoRA
2137	Poster	Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
2138	Poster	EraseDraw : Learning to Insert Objects by Erasing Them from Images
2139	Poster	Text2Place: Affordance-aware Text Guided Human Placement
2140	Poster	ProCreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation
2141	Poster	Label-free Neural Semantic Image Synthesis
2142	Poster	Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
2143	Poster	CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
2144	Poster	Context Diffusion: In-Context Aware Image Generation
2145	Poster	An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
2146	Poster	Stable Preference: Redefining training paradigm of human preference model for Text-to-Image Synthesis
2147	Poster	SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
2148	Poster	Large-scale Reinforcement Learning for Diffusion Models
2149	Poster	Latent Guard: a Safety Framework for Text-to-image Generation
2150	Poster	Arc2Face: A Foundation Model for ID-Consistent Human Faces
2151	Oral	Arc2Face: A Foundation Model for ID-Consistent Human Faces
2152	Poster	GAMMA-FACE: GAussian Mixture Models Amend Diffusion Models for Bias Mitigation in Face Images
2153	Poster	Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback
2154	Poster	Revisiting Feature Disentanglement Strategy in Diffusion Training and Breaking Conditional Independence Assumption in Sampling
2155	Poster	ByteEdit: Boost, Comply and Accelerate Generative Image Editing
2156	Poster	DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
2157	Poster	Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion
2158	Poster	Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis
2159	Poster	FMBoost: Boosting Latent Diffusion with Flow Matching
2160	Oral	FMBoost: Boosting Latent Diffusion with Flow Matching
2161	Poster	AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation
2162	Poster	Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
2163	Poster	L-DiffER: Single Image Reflection Removal with Language-based Diffusion Model
2164	Poster	LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement
2165	Poster	Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery
2166	Poster	Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal
2167	Poster	XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
2168	Poster	AdaDiffSR: Adaptive Region-aware Dynamic acceleration Diffusion Model for Real-World Image Super-Resolution
2169	Poster	Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
2170	Poster	Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
2171	Poster	BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow
2172	Poster	DualDn: Dual-domain Denoising via Differentiable ISP
2173	Poster	Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
2174	Poster	Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
2175	Poster	Functional Transform-Based Low-Rank Tensor Factorization for Multi-Dimensional Data Recovery
2176	Poster	Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
2177	Oral	Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
2178	Poster	Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images
2179	Poster	Energy-induced Explicit quantification for Multi-modality MRI fusion
2180	Poster	WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
2181	Poster	Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models
2182	Poster	GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields
2183	Poster	Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification
2184	Poster	Enhancing Tampered Text Detection through Frequency Feature Fusion and Decomposition
2185	Poster	T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
2186	Poster	Towards Unified Representation of Invariant-Specific Features in Missing Modality Face Anti-Spoofing
2187	Poster	Personalized Privacy Protection Mask Against Unauthorized Facial Recognition
2188	Poster	GRAPE: Generalizable and Robust Multi-view Facial Capture
2189	Poster	Seeing Faces in Things: A Model and Dataset for Pareidolia
2190	Poster	Beyond Viewpoint: Robust 3D Object Recognition under Arbitrary Views through Joint Multi-Part Representation
2191	Poster	An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers
2192	Poster	OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers
2193	Poster	DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
2194	Poster	Upper-body Hierarchical Graph for Skeleton Based Emotion Recognition in Assistive Driving
2195	Poster	SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders
2196	Poster	Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
2197	Poster	Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition
2198	Poster	Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment
2199	Poster	Classification Matters: Improving Video Action Detection with Class-Specific Attention
2200	Oral	Classification Matters: Improving Video Action Detection with Class-Specific Attention
2201	Poster	HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
2202	Poster	Appearance-based Refinement for Object-Centric Motion Segmentation
2203	Poster	Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
2204	Poster	Fine-grained Dynamic Network for Generic Event Boundary Detection
2205	Poster	Data Collection-free Masked Video Modeling
2206	Poster	Self-supervised visual learning from interactions with objects
2207	Poster	Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning
2208	Poster	Sequential Representation Learning via Static-Dynamic Conditional Disentanglement
2209	Poster	Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
2210	Poster	EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
2211	Poster	Video Question Answering with Procedural Programs
2212	Poster	ViLA: Efficient Video-Language Alignment for Video Question Answering
2213	Poster	ST-LLM: Large Language Models Are Effective Temporal Learners
2214	Poster	RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
2215	Poster	Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
2216	Poster	Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
2217	Poster	Nonverbal Interaction Detection
2218	Poster	PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
2219	Poster	Human-in-the-Loop Visual Re-ID for Population Size Estimation
2220	Poster	PreLAR: World Model Pre-training with Learnable Action Representation
2221	Poster	Learning to Build by Building Your Own Instructions
2222	Poster	Situated Instruction Following
2223	Poster	Where am I? Scene Retrieval with Language
2224	Poster	ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
2225	Poster	WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
2226	Poster	SegPoint: Segment Any Point Cloud via Large Language Model
2227	Poster	Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
2228	Poster	GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering
2229	Poster	LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images
2230	Poster	BLINK: Multimodal Large Language Models Can See but Not Perceive
2231	Poster	Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
2232	Poster	Teach CLIP to Develop a Number Sense for Ordinal Regression
2233	Poster	Common Sense Reasoning for Deep Fake Detection
2234	Poster	Efficient Inference of Vision Instruction-Following Models with Elastic Cache
2235	Poster	SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models
2236	Poster	Improving Vision and Language Concepts Understanding with Multimodal Counterfactual Samples
2237	Poster	Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
2238	Poster	CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
2239	Poster	Evaluating Text-to-Visual Generation with Image-to-Text Generation
2240	Poster	DOCCI: Descriptions of Connected and Contrasting Images
2241	Poster	Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
2242	Poster	LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
2243	Poster	Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
2244	Poster	DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
2245	Poster	Conceptual Codebook Learning for Vision-Language Models
2246	Poster	Do Generalised Classifiers really work on Human Drawn Sketches?
2247	Poster	3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
2248	Poster	Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
2249	Poster	PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery
2250	Poster	Discovering Unwritten Visual Classifiers with Large Language Models
2251	Poster	DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
2252	Poster	LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
2253	Poster	Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
2254	Poster	OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
2255	Poster	Rotary Position Embedding for Vision Transformer
2256	Poster	Multi-branch Collaborative Learning Network for 3D Visual Grounding
2257	Poster	SILC: Improving Vision Language Pretraining with Self-Distillation
2258	Poster	LiteSAM is Actually what you Need for segment Everything
2259	Poster	TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
2260	Poster	In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
2261	Poster	CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings
2262	Poster	SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
2263	Poster	Click Prompt Learning with Optimal Transport for Interactive Segmentation
2264	Poster	3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
2265	Poster	Segment and Recognize Anything at Any Granularity
2266	Poster	SOS: Segment Object System for Open-World Instance Segmentation With Object Priors
2267	Poster	Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images
2268	Poster	Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation
2269	Poster	AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
2270	Poster	Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation
2271	Poster	SAM-guided Graph Cut for 3D Instance Segmentation
2272	Poster	Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation
2273	Poster	Diff3DETR: Agent-based Diffusion Model for Semi-supervised 3D Object Detection
2274	Poster	Shifted Autoencoders for Point Annotation Restoration in Object Counting
2275	Poster	Learning Camouflaged Object Detection from Noisy Pseudo Label
2276	Poster	Just a Hint: Point-Supervised Camouflaged Object Detection
2277	Poster	Rectify the Regression Bias in Long-Tailed Object Detection
2278	Poster	PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition
2279	Poster	Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
2280	Poster	Visible and Clear: Finding Tiny Objects in Difference Map
2281	Poster	IRGen: Generative Modeling for Image Retrieval
2282	Poster	I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
2283	Poster	Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation
2284	Poster	Norma: A Noise Robust Memory-Augmented Framework for Whole Slide Image Classification
2285	Poster	GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
2286	Poster	BugNIST - a Large Volumetric Dataset for Detection under Domain Shift
2287	Poster	AD3: Introducing a score for Anomaly Detection Dataset Difficulty assessment using VIADUCT dataset
2288	Poster	GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
2289	Poster	Unsupervised, Online and On-The-Fly Anomaly Detection For Non-Stationary Image Distributions
2290	Poster	Cross-Domain Learning for Video Anomaly Detection with Limited Supervision
2291	Poster	Attention Beats Linear for Fast Implicit Neural Representation Generation
2292	Poster	OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
2293	Poster	ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
2294	Poster	AttnZero: Efficient Attention Discovery for Vision Transformers
2295	Poster	Isomorphic Pruning for Vision Models
2296	Poster	DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
2297	Poster	Robustness Tokens: Towards Adversarial Robustness of Transformers
2298	Poster	Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration
2299	Poster	Neural Spectral Decomposition for Dataset Distillation
2300	Poster	Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models
2301	Poster	Adaptive Multi-head Contrastive Learning
2302	Poster	Unsqueeze [CLS] Bottleneck to Learn Rich Representations
2303	Poster	Improving Zero-Shot Generalization for CLIP with Variational Adapter
2304	Poster	Learning to Obstruct Few-Shot Image Classification over Restricted Classes
2305	Poster	Improving Hyperbolic Representations via Gromov-Wasserstein Regularization
2306	Poster	HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions
2307	Poster	Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density
2308	Poster	SCOD: From Heuristics to Theory
2309	Poster	LNL+K: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration
2310	Poster	SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning
2311	Poster	Labeled Data Selection for Category Discovery
2312	Poster	PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
2313	Poster	Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
2314	Poster	Forget More to Learn More: Domain-specific Feature Unlearning for Semi-supervised and Unsupervised Domain Adaptation
2315	Poster	CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning
2316	Poster	Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling
2317	Poster	MagMax: Leveraging Model Merging for Seamless Continual Learning
2318	Poster	Pick-a-back: Selective Device-to-Device Knowledge Transfer in Federated Continual Learning
2319	Poster	Learning to Unlearn for Robust Machine Unlearning
2320	Poster	UNIC: Universal Classification Models via Multi-teacher Distillation
2321	Poster	Distributed Active Client Selection With Noisy Clients Using Model Association Scores
2322	Poster	Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
2323	Poster	FedTSA: A Cluster-based Two-Stage Aggregation Method for Model-heterogeneous Federated Learning
2324	Poster	Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge
2325	Poster	Rethinking Fast Adversarial Training: A Splitting Technique To Overcome Catastrophic Overfitting
2326	Poster	A high-quality robust diffusion framework for corrupted dataset
2327	Poster	Similarity of Neural Architectures using Adversarial Attack Transferability
2328	Poster	Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
2329	Poster	Resilience of Entropy Model in Distributed Neural Networks
2330	Poster	WBP: Training-time Backdoor Attacks through Hardware-based Weight Bit Poisoning
2331	Poster	Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
2332	Oral	Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
2333	Poster	Flatness-aware Sequential Learning Generates Resilient Backdoors
2334	Oral	Flatness-aware Sequential Learning Generates Resilient Backdoors
2335	Poster	Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks
2336	Oral	Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks
2337	Poster	Adversarial Robustification via Text-to-Image Diffusion Models
2338	Oral	Adversarial Robustification via Text-to-Image Diffusion Models
2339	Poster	Privacy-Preserving Adaptive Re-Identification without Image Transfer
2340	Oral	Privacy-Preserving Adaptive Re-Identification without Image Transfer
2341	Poster	R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
2342	Oral	R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
2343	Poster	Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
2344	Oral	Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
2345	Poster	A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
2346	Oral	A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
2347	Poster	Spline-based Transformers
2348	Oral	Spline-based Transformers
2349	Poster	Anytime Continual Learning for Open Vocabulary Classification
2350	Oral	Anytime Continual Learning for Open Vocabulary Classification
2351	Poster	Weighted Ensemble Models Are Strong Continual Learners
2352	Oral	Weighted Ensemble Models Are Strong Continual Learners
2353	Poster	COD: Learning Conditional Invariant Representation for Domain Adaptation Regression
2354	Oral	COD: Learning Conditional Invariant Representation for Domain Adaptation Regression
2355	Poster	On the Topology Awareness and Generalization Performance of Graph Neural Networks
2356	Oral	On the Topology Awareness and Generalization Performance of Graph Neural Networks
2357	Poster	Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning
2358	Oral	Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning
2359	Poster	Model Stock: All we need is just a few fine-tuned models
2360	Oral	Model Stock: All we need is just a few fine-tuned models
2361	Poster	A Direct Approach to Viewing Graph Solvability
2362	Oral	A Direct Approach to Viewing Graph Solvability
2363	Poster	ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems
2364	Oral	ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems
2365	Poster	A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
2366	Oral	A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
2367	Poster	Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
2368	Oral	Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
2369	Poster	Shape from Heat Conduction
2370	Oral	Shape from Heat Conduction
2371	Poster	Rasterized Edge Gradients: Handling Discontinuities Differentially
2372	Oral	Rasterized Edge Gradients: Handling Discontinuities Differentially
2373	Poster	Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
2374	Oral	Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
2375	Poster	HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
2376	Oral	HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
2377	Poster	S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
2378	Poster	Loc3Diff: Local Diffusion for 3D Human Head Synthesis and Editing
2379	Poster	PAV: Personalized Head Avatar from Unstructured Video Collection
2380	Poster	Instant 3D Human Avatar Generation using Image Diffusion Models
2381	Poster	Expressive Whole-Body 3D Gaussian Avatar
2382	Poster	High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
2383	Poster	Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement
2384	Poster	Image Demoireing in RAW and sRGB Domains
2385	Poster	Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures
2386	Poster	Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3D Computational Periscopy
2387	Poster	Single-Mask Inpainting for Voxel-based Neural Radiance Fields
2388	Poster	IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
2389	Poster	DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
2390	Poster	NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis
2391	Poster	CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering
2392	Poster	2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric Distortion Correction
2393	Poster	Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
2394	Poster	Deep Polarization Cues for Single-shot Shape and Subsurface Scattering Estimation
2395	Poster	High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs
2396	Poster	Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction
2397	Poster	MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
2398	Poster	Dual-Camera Smooth Zoom on Mobile Phones
2399	Poster	6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
2400	Poster	SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
2401	Poster	Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing
2402	Poster	Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
2403	Poster	CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
2404	Poster	Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
2405	Poster	EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control
2406	Poster	SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
2407	Poster	GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
2408	Poster	GenRC: Generative 3D Room Completion from Sparse Image Collections
2409	Poster	Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval
2410	Poster	Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees
2411	Oral	Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees
2412	Poster	DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
2413	Poster	Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting
2414	Poster	GVGEN: Text-to-3D Generation with Volumetric Representation
2415	Poster	VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation
2416	Poster	DreamReward: Aligning Human Preference in Text-to-3D Generation
2417	Poster	SemanticHuman-HD: High Resolution Semantic disentangled 3D Human Generation
2418	Poster	Disentangled Clothed Avatar Generation from Text Descriptions
2419	Poster	StructLDM: Structured Latent Diffusion for 3D Human Generation
2420	Poster	High-Fidelity Modeling of Generalizable Wrinkle Deformation
2421	Poster	ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild
2422	Poster	Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos
2423	Poster	Physics-Based Interaction with 3D Objects via Video Generation
2424	Oral	Physics-Based Interaction with 3D Objects via Video Generation
2425	Poster	Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder
2426	Poster	Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
2427	Poster	Self-supervised Shape Completion via Involution and Implicit Correspondences
2428	Poster	Self-Training Room Layout via Geometry-aware Ray-casting
2429	Poster	DiffCD: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting
2430	Poster	GaussReg: Fast 3D Registration with Gaussian Splatting
2431	Poster	AEDNet: Adaptive Embedding and Multiview-Aware Disentanglement for Point Cloud Completion
2432	Poster	PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration
2433	Poster	ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency
2434	Poster	DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding
2435	Poster	ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
2436	Poster	SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds
2437	Poster	MAD-DR: Map Compression for Visual Localization with Matchness Aware Descriptor Dimension Reduction
2438	Poster	Tensorial template matching for fast cross-correlation with rotations and its application for tomography
2439	Poster	Flowed Time of Flight Radiance Fields
2440	Poster	Zero-Shot Image Feature Consensus with Deep Functional Maps
2441	Poster	RSL-BA: Rolling Shutter Line Bundle Adjustment
2442	Poster	How Far Can a 1-Pixel Camera Go? Solving Vision Tasks using Photoreceptors and Computationally Designed Visual Morphology
2443	Poster	StereoGlue: Joint Feature Matching and Robust Estimation
2444	Poster	Hyperion – A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM
2445	Poster	Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information
2446	Poster	MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps
2447	Poster	iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning
2448	Poster	PACE: Pose Annotations in Cluttered Environments
2449	Poster	Global-to-Pixel Regression for Human Mesh Recovery
2450	Poster	3D Hand Pose Estimation in Everyday Egocentric Images
2451	Poster	Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
2452	Poster	AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
2453	Poster	Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images
2454	Poster	Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation
2455	Poster	CliffPhys: Camera-based Respiratory Measurement using Clifford Neural Networks
2456	Poster	Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions
2457	Poster	DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation
2458	Poster	Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation
2459	Poster	Deep Patch Visual SLAM
2460	Poster	ConGeo: Robust Cross-view Geo-localization across Ground View Variations
2461	Poster	GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
2462	Poster	SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
2463	Poster	Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
2464	Poster	Image-to-Lidar Relational Distillation for Autonomous Driving Data
2465	Poster	Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
2466	Poster	milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
2467	Poster	Hetecooper: Feature Collaboration Graph for Heterogeneous Collaborative Perception
2468	Poster	LetsMap: Unsupervised Representation Learning for Label-Efficient Semantic BEV Mapping
2469	Poster	Probabilistic Image-Driven Traffic Modeling via Remote Sensing
2470	Poster	Occupancy as Set of Points
2471	Poster	Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation
2472	Poster	Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
2473	Poster	Online Vectorized HD Map Construction using Geometry
2474	Poster	OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
2475	Poster	PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
2476	Poster	Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation
2477	Poster	Learning to Drive via Asymmetric Self-Play
2478	Poster	Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
2479	Poster	I Can't Believe It's Not Scene Flow!
2480	Poster	Motion and Structure from Event-based Normal Flow
2481	Poster	Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
2482	Poster	Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation
2483	Poster	UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation
2484	Poster	IAM-VFI : Interpolate Any Motion for Video Frame Interpolation with motion complexity map
2485	Poster	Human Motion Forecasting in Dynamic Domain Shifts: A Homeostatic Continual Test-time Adaptation Framework
2486	Poster	How Video Meetings Change Your Expression
2487	Poster	DIM: Dyadic Interaction Modeling for Social Behavior Generation
2488	Poster	Length-Aware Motion Synthesis via Latent Diffusion
2489	Poster	Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
2490	Poster	FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
2491	Poster	Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
2492	Poster	Explorative Inbetweening of Time and Space
2493	Poster	TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
2494	Poster	WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
2495	Poster	Pix2Gif: Motion-Guided Diffusion for GIF Generation
2496	Poster	Factorizing Text-to-Video Generation by Explicit Image Conditioning
2497	Poster	DNI: Dilutional Noise Initialization for Diffusion Video Editing
2498	Poster	DATENeRF: Depth-Aware Text-based Editing of NeRFs
2499	Poster	FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
2500	Poster	Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
2501	Poster	Using My Artistic Style? You Must Obtain My Authorization
2502	Poster	Learned Image Enhancement via Color Naming
2503	Poster	Region-Native Visual Tokenization
2504	Poster	Improving image synthesis with diffusion-negative sampling
2505	Poster	ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images
2506	Poster	SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions
2507	Poster	PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
2508	Poster	Visual Text Generation in the Wild
2509	Poster	ReCON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
2510	Poster	Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation
2511	Poster	TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
2512	Poster	Navigating Text-to-Image Generative Bias across Indic Languages
2513	Poster	Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
2514	Poster	MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
2515	Poster	Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
2516	Poster	LCM-Lookahead for Encoder-based Text-to-Image Personalization
2517	Poster	Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
2518	Poster	COIN-Matting: Confounder Intervention for Image Matting
2519	Poster	Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images
2520	Poster	ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
2521	Poster	Data Augmentation via Latent Diffusion for Saliency Prediction
2522	Poster	Score Distillation Sampling with Learned Manifold Corrective
2523	Poster	Thinking Outside the BBox: Unconstrained Generative Object Compositing
2524	Poster	Learning Quantized Adaptive Conditions for Diffusion Models
2525	Poster	FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models
2526	Poster	ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
2527	Poster	Lossy Image Compression with Foundation Diffusion Models
2528	Poster	AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
2529	Poster	QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images
2530	Poster	MetaWeather: Few-Shot Weather-Degraded Image Restoration
2531	Poster	Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
2532	Poster	Spatially-Variant Degradation Model for Dataset-free Super-resolution
2533	Poster	Towards Architecture-Agnostic Untrained Networks Priors for Image Reconstruction with Frequency Regularization
2534	Poster	Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
2535	Poster	Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution
2536	Poster	Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids
2537	Poster	Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context
2538	Poster	denoiSplit: a method for joint microscopy image splitting and unsupervised denoising
2539	Poster	Region-Aware Sequence-to-Sequence Learning for Hyperspectral Denoising
2540	Poster	CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems
2541	Poster	Plug-and-Play Learned Proximal Trajectory for 3D Sparse-View X-Ray Computed Tomography
2542	Poster	Unsupervised Multi-modal Medical Image Registration via Invertible Translation
2543	Poster	Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations
2544	Poster	ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization
2545	Poster	Spiking Wavelet Transformer
2546	Poster	Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model
2547	Poster	Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection
2548	Poster	CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching
2549	Poster	Noise-assisted Prompt Learning for Image Forgery Detection and Localization
2550	Poster	TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing
2551	Poster	Towards Certifiably Robust Face Recognition
2552	Poster	Oulu Remote-photoplethysmography Physical Domain Attacks Database (ORPDAD)
2553	Poster	Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement
2554	Poster	Affine steerers for structured keypoint description
2555	Poster	A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
2556	Poster	You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception
2557	Poster	TAPTR: Tracking Any Point with Transformers as Detection
2558	Poster	SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow
2559	Poster	Towards Physical World Backdoor Attacks against Skeleton Action Recognition
2560	Poster	MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
2561	Poster	Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph
2562	Poster	DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
2563	Poster	Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization
2564	Poster	Two-Stage Active Learning for Efficient Temporal Action Segmentation
2565	Poster	MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos
2566	Poster	PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
2567	Poster	VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network
2568	Poster	PALM: Predicting Actions through Language Models
2569	Poster	ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video
2570	Poster	Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
2571	Oral	Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
2572	Poster	VideoMamba: Spatio-Temporal Selective State Space Model
2573	Poster	Text-Guided Video Masked Autoencoder
2574	Poster	Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
2575	Poster	VISA: Reasoning Video Object Segmentation via Large Language Model
2576	Poster	LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
2577	Poster	BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
2578	Poster	COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset a Vision-Language Benchmark
2579	Poster	Audio-visual Generalized Zero-shot Learning the Easy Way
2580	Poster	Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
2581	Poster	SignGen: End-to-End Sign Language Video Generation with Latent Diffusion
2582	Poster	TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
2583	Poster	Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification
2584	Poster	OmniSat: Self-Supervised Modality Fusion for Earth Observation
2585	Poster	Statewide Visual Geolocalization in the Wild
2586	Poster	Pre-trained Visual Dynamics Representations for Efficient Policy Learning
2587	Poster	Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
2588	Poster	Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
2589	Poster	ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
2590	Poster	LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
2591	Poster	R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
2592	Poster	Agent3D-Zero: An Agent for Zero-shot 3D Understanding
2593	Poster	PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts
2594	Poster	An Explainable Vision Question Answer Model via Diffusion Chain-of-Thought
2595	Poster	Fully Authentic Visual Question Answering Dataset from Online Communities
2596	Poster	SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
2597	Poster	Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning
2598	Poster	BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
2599	Poster	Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs
2600	Poster	TrojVLM: Backdoor Attack Against Vision Language Models
2601	Poster	Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
2602	Oral	Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
2603	Poster	Attention Prompting on Image for Large Vision-Language Models
2604	Poster	LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
2605	Poster	Generalizing to Unseen Domains via Text-guided Augmentation
2606	Poster	MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
2607	Poster	TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
2608	Poster	Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
2609	Poster	Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
2610	Poster	Prompting Language-Informed Distribution for Compositional Zero-Shot Learning
2611	Poster	Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
2612	Poster	FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
2613	Poster	Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
2614	Oral	Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
2615	Poster	Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
2616	Poster	T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
2617	Poster	Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
2618	Poster	OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
2619	Poster	O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
2620	Poster	APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension
2621	Poster	GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method
2622	Poster	MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
2623	Poster	ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
2624	Poster	MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment
2625	Poster	Think before Placement: Common Sense Enhanced Transformer for Object Placement
2626	Poster	Eliminating Feature Ambiguity for Few-Shot Segmentation
2627	Poster	Diffusion-Guided Weakly Supervised Semantic Segmentation
2628	Poster	Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs
2629	Poster	Better Call SAL: Towards Learning to Segment Anything in Lidar
2630	Poster	MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
2631	Poster	DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation
2632	Poster	Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation
2633	Poster	Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models
2634	Poster	ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition
2635	Poster	Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
2636	Poster	EcoMatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching
2637	Poster	Class-Agnostic Object Counting with Text-to-Image Diffusion Model
2638	Poster	Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
2639	Poster	Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection
2640	Poster	Plain-Det: A Plain Multi-Dataset Object Detector
2641	Poster	Multi-scale Cross Distillation for Object Detection in Aerial Images
2642	Poster	PDT Uav Target Detection Dataset for Pests and Diseases Tree
2643	Poster	Region-Adaptive Transform with Segmentation Prior for Image Compression
2644	Poster	FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
2645	Poster	CC-SAM: Enhancing SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation
2646	Poster	Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model
2647	Poster	DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
2648	Poster	Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network
2649	Poster	MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks
2650	Poster	An Incremental Unified Framework for Small Defect Inspection
2651	Poster	Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection
2652	Poster	GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
2653	Poster	MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection
2654	Poster	PQ-SAM: Post-training Quantization for Segment Anything Model
2655	Poster	BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
2656	Poster	ELSE: Efficient Deep Neural Network Inference through Line-based Sparsity Exploration
2657	Poster	FairViT: Fair Vision Transformer via Adaptive Masking
2658	Poster	LPViT: Low-Power Semi-structured Pruning for Vision Transformers
2659	Poster	PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
2660	Poster	CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
2661	Poster	Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach
2662	Poster	Characterizing Model Robustness via Natural Input Gradients
2663	Poster	Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning
2664	Poster	FreeAugment: Data Augmentation Search Across All Degrees of Freedom
2665	Poster	Towards Multi-modal Transformers in Federated Learning
2666	Poster	Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception
2667	Poster	GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
2668	Poster	Soft Prompt Generation for Domain Generalization
2669	Poster	SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
2670	Poster	Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
2671	Poster	Deep Online Probability Aggregation Clustering
2672	Poster	Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection
2673	Poster	An accurate detection is not all you need to combat label noise in web-noisy datasets
2674	Poster	Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration
2675	Poster	ExMatch: Self-guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples
2676	Poster	Dynamic Data Selection for Efficient SSL via Coarse-to-Fine Refinement
2677	Poster	SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
2678	Poster	Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
2679	Poster	Integrating Markov Blanket Discovery into Causal Representation Learning for Domain Generalization
2680	Poster	Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence
2681	Poster	On the Approximation Risk of Few-Shot Class-Incremental Learning
2682	Poster	STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay
2683	Poster	RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning
2684	Poster	CLEO: Continual Learning of Evolving Ontologies
2685	Poster	Learning Representation for Multitask Learning through Self-Supervised Auxiliary Learning
2686	Poster	Improving Knowledge Distillation via Regularizing Feature Direction and Norm
2687	Oral	Improving Knowledge Distillation via Regularizing Feature Direction and Norm
2688	Poster	MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
2689	Poster	Federated Learning with Local Openset Noisy Labels
2690	Poster	Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents
2691	Poster	FedHARM: Harmonizing Model Architectural Diversity in Federated Learning
2692	Poster	Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks
2693	Poster	Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
2694	Poster	Shedding More Light on Robust Classifiers under the lens of Energy-based Models
2695	Poster	Inter-Class Topology Alignment for Efficient Black-Box Substitute Attacks
2696	Poster	AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
2697	Poster	FedHide: Federated Learning by Hiding in the Neighbors
2698	Poster	SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks
2699	Poster	Data Poisoning Quantization Backdoor Attack
2700	Poster	Event Trojan: Asynchronous Event-based Backdoor Attacks
2701	Keynote	Is distribution shift still an AI problem?
2702	Keynote	Fair, transparent, and accountable AI: What is legally required, what is ethically desired, and what is technically feasible?
2703	Keynote	Synthesia: From computer vision research to real-world AI avatars
2704	Oral Session	Oral 6A: Generative Models Ii

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECCV2024-PaperList

About

Releases

Packages

License

wangjiyuan9/ECCV2024-Full-PaperList

Folders and files

Latest commit

History

Repository files navigation

ECCV2024-PaperList

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages