From bc21e853f26136189cc94ae63e09406ca3b70a5c Mon Sep 17 00:00:00 2001 From: yyyyzy1 <1191867281@qq.com> Date: Thu, 1 Aug 2024 16:53:14 +0800 Subject: [PATCH] 3.0 --- index.html | 92 ++++++++++++++++++++++++++---------------------------- 1 file changed, 45 insertions(+), 47 deletions(-) diff --git a/index.html b/index.html index a46dee3..f2851e4 100644 --- a/index.html +++ b/index.html @@ -2,25 +2,25 @@ - - + + - - + + - - + + - SDHD-G: Semantic Dual-Hierarchical Dynamic 3D Gaussians + Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussians @@ -64,7 +64,7 @@
-

SDHD-G: Semantic Dual-Hierarchical Dynamic 3D Gaussians

+

Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussians

@@ -80,37 +80,37 @@

SDHD-G: Semantic Dual-Hierarchical Dyna -

detect:Cup"

+

Seg "Cup"

-

detect:"Broom"

+

Seg "Broom"

-

detect:"Toy"

+

Seg "Toy"

-

detect:"Cookie""

+

Seg "Cookie""

-

detect:"Chocolate"

+

Seg "Chocolate"

-

detect:"Mitts"

+

Seg "Mitts"

@@ -125,7 +125,8 @@

SDHD-G: Semantic Dual-Hierarchical Dyna


- Our method enables high-quality rendering and semantic understanding for both simple and complex dynamic scenes, providing a stable foundation for downstream tasks. + Our method is dedicated to achieving high-quality rendering and accurate semantic understanding of dynamic scenes, + while providing support for downstream tasks in 4D scenarios.

@@ -142,19 +143,18 @@

Abstract

- Dynamic 3D semantic Gaussians can be used for reconstructing and understanding dynamic scenes captured from a monocular camera, + Semantic 4D Gaussians can be used for reconstructing and understanding dynamic scenes captured from a monocular camera, resulting in a better handling of target information with temporal variations than static sences. - However, most current work focuses on static scenes, directly applying static methods for dynamic scenes is impractical, - as static methods fail to capture the temporal behaviors and features of dynamic targets. - To the best of our knowledge, only one existing work focuses on semantic comprehension of dynamic scenes based on 3DGS. - While this work demonstrates promising capabilities in simple scenes, - it struggles to achieve high-fidelity rendering and accurate semantic features in scenarios where the background contains significant noise and the dynamic foreground exhibits substantial deformation and intricate textures. - Because it simply combines dynamic reconstruction and understanding without considering the difference between static and dynamic Gaussians, leading to the mixture of static background and dynamic foreground features. - To address these limitations, we propose SDHD-G,consists of hierarchical Gaussian flows and hierarchical rendering weights. The former realizes effective separation of static and dynamic rendering and their features. - The former realizes effective separation of static and dynamic rendering and their features. - The latter is employed in scenes with complex background noise (e.g. the “broom” scene in Hypernerf) to enhance the rendering quality of dynamic foregrounds. - Extensive experiments show that our method consistently outperforms previous method on synthetic and real-world datasets. - + However, most recent work focuses on the semantics of static scenes. Directly applying them to understand dynamic scenes is impractical, + which fail to capture the temporal behaviors and features of dynamic targets. + To the best of our knowledge, few existing works focus on semantic comprehension of dynamic scenes based on 3DGS. + While demonstrating promising capabilities in simple scenes, it struggles to achieve high-fidelity rendering and accurate semantic features in scenarios where the static background contains significant noise and the dynamic foreground exhibits substantial deformation with intricate textures. + Because a uniform update strategy is applied to all Gaussians, overlooking the distinctions and interaction between dynamic and static distributions. + This leads to artifacts and noise during semantic segmentation, especially between dynamic foreground and static background. + To address these limitations, we propose the Dual-Hierarchical Optimization(DHO), + which consists hierarchical Gaussian flow and hierarchical rendering guidance. The former implements effective separation of static and dynamic rendering and their features. + The latter helps mitigate the issue of dynamic foreground rendering distortion in scenes where the static background has complex noise (e.g. the “broom” scene in HyperNeRF dataset). + Extensive experiments show that our method consistently outperforms previous method on both synthetic and real-world datasets.

@@ -172,10 +172,9 @@

Abstract

Method Overview

- Dynamic 3D Gaussians Distillation utilizes 3D Gaussian representation and optimizes - spatial parameters of the Gaussians and their deformation, concurrently with - appearance properties with a semantic feature per Gaussian. Our learned representation - enables efficient semantic understanding and manipulation of dynamic 3D scenes. + The overall pipeline of our model. We add semantic properties to each Gaussian and obtain the geometric deformation of the Gaussian at each timestamp t through the deformation field. + In the coarse stage, Gaussians are subjected to geometric constraints, while in the fine stage, geometric constraints are relaxed and semantic feature constraints are introduced. + We utilize dynamic foreground masks obtained from scene priors for hierarchical weighted rendering of the scene, enhancing the rendering quality of dynamic foreground in complex backgrounds.

@@ -195,8 +194,8 @@

Method Overview

Visual Results

- The following results show the novel view rendering views and the extracted semantic feature maps using our method, - evaluated on both the real-world HyperNeRF dataset and the synthetic D-NeRF dataset. The visualization of the feature maps is displayed using PCA for dimensionality reduction. + The following results show the novel rendering views and the extracted semantic feature maps using our method, + evaluated on both the real-world HyperNeRF dataset and the synthetic D-NeRF dataset. The visualization of the feature maps is displayed using PCA for dimension reduction. @@ -263,7 +262,7 @@

Visual Results


-

Segmentation on Synthetic dataset

+

Segmentation on Synthetic Dataset

Our method achieves excellent semantic segmentation performance not only on real-world datasets but also on synthetic datasets. @@ -281,31 +280,31 @@

Segmentation on Synthetic dataset

-

detect"Jacket"

+

Seg "Jacket"

-

detect"Helmet"

+

Seg "Helmet"

-

detect"Skull"

+

Seg "Skull"

-

detect"Lego Toy"

+

Seg "Lego Toy"

-

detect"Hands"

+

Seg "Hands"

@@ -321,10 +320,10 @@

Segmentation on Synthetic dataset


-

comparison

+

Comparison with Baseline

- Our approach exceeds Baseline in rendering quality, semantic feature integrity, and lexical detection accuracy + Our method outperforms the baseline in terms of rendering quality, semantic feature completeness, and semantic segmentation accuracy. (Our method is on the left, Baseline is on the right)

@@ -377,17 +376,16 @@

comparison

--> -

Video Example

+

-

Multi-Scale

+

Multi-Scale Semantic Feature and Segmentation

Visualization results of multi-scale dynamic semantic features.

@@ -415,15 +413,15 @@

Multi-Scale


-

Editing

+

Semantic Editing

Visual illustration of our method’s ability to semantically remove objects.

- + - +
remove "Cookie"Remove "Cookie" remove "Lemon"Remove "Lemon"