index.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="description" content="DHO is an integrated method to reconstruct and understand dynamic scenes.">
  <meta name="keywords" content="DHO, Semantics, Segmentation, 4D Gaussians Splatting, Foundation Models">
  <meta name="viewport" content="width=device-width, initial-scale=1">


  <meta property="og:title" content="Divide-and-Conquer: The Dual-Hierarchical Optimization for Semantic 4D Gaussians"/>


  <meta name="twitter:title" content="Divide-and-Conquer: The Dual-Hierarchical Optimization for Semantic 4D Gaussians">
  <meta name="twitter:description" content="DHO is an integrated method to reconstruct and understand dynamic scenes.">

  <meta name="twitter:card" content="summary_large_image">
 
  
  <title>Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussians</title>


  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro">
  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">

  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script src="https://documentcloud.adobe.com/view-sdk/main.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
</head>

<body>

<section class="hero">
  <div class="hero-body">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column has-text-centered">
          <h1 class="title is-1 publication-title">Divide-and-Conquer: The Dual-Hierarchical Optimization for Semantic 4D Gaussians</h1>
          

<section class="hero is-light is-small">
  <div class="hero-body">
      
    <div class="container">
      <div id="results-carousel" class="carousel results-carousel">
        <div class="item item-lamp-positive">
          <video poster="" id="lamp-positive" autoplay muted loop playsinline height="100%">
              <source src="./static/videos/segment_hypernerf/americano.mp4" type="video/mp4">
          </video>
          <p class="has-text-centered">Seg "Cup"</p>
        </div>
        <div class="item item-bike-positive">
          <video poster="" id="bike-positive" autoplay muted loop playsinline height="100%">
              <source src="./static/videos/segment_hypernerf/oven-mitts.mp4" type="video/mp4">
          </video>
          <p class="has-text-centered">Seg "Mitts"</p>
        </div>
        <div class="item item-alien-positive">
          <video poster="" id="alien-positive" autoplay muted loop playsinline height="100%">
              <source src="./static/videos/segment_hypernerf/cookie.mp4" type="video/mp4">
          </video>
          <p class="has-text-centered">Seg "Cookie"</p>
        </div>
        <div class="item item-camel-positive">
          <video poster="" id="camel-positive" autoplay muted loop playsinline height="100%">
              <source src="./static/videos/segment_hypernerf/chick.mp4" type="video/mp4">
          </video>
          <p class="has-text-centered">Seg "Toy""</p>
        </div>
        <div class="item item-hammer-positive-negative">
            <video poster="" id="hammer-positive-negative" autoplay muted loop playsinline height="100%">
                <source src="./static/videos/segment_hypernerf/chocolate.mp4" type="video/mp4">
            </video>
            <p class="has-text-centered">Seg "Chocolate"</p>
        </div>
        <div class="item item-new-video">
          <video poster="" id="new-video" autoplay muted loop playsinline height="100%">
              <source src="./static/videos/segment_hypernerf/broom.mp4" type="video/mp4">
          </video>
          <p class="has-text-centered">Seg "Broom"</p>
        </div>

      </div>
    </div>
  </div>
</section>

<section class="hero teaser">
  <div class="container is-max-desktop">
    <div class="hero-body">
      <h2 class="subtitle has-text-centered">
          <br>
          </b> Our method is dedicated to achieving high-quality rendering and accurate semantic understanding of dynamic scenes, 
          while providing support for downstream tasks in 4D scenarios.
      </h2>
    </div>
  </div>
</section>


<section class="section">
  <div class="container is-max-desktop">
    <!-- Abstract -->
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-3">Abstract</h2>
        <div class="content has-text-justified">
          <p>
            Semantic 4D Gaussians can be used for reconstructing and understanding dynamic scenes captured from a monocular camera,
            resulting in a better handling of target information with temporal variations than static sences.
            However, most recent work focuses on the semantics of static scenes. Directly applying them to understand dynamic scenes is impractical, 
            which fails to capture the temporal behaviors and features of dynamic objects.
            To the best of our knowledge, few existing works focus on semantic comprehension of dynamic scenes based on 3DGS. 
            While demonstrating promising capabilities in simple scenes, it struggles to achieve high-fidelity rendering and accurate semantic features in scenarios where the static background contains significant noise and the dynamic foreground exhibits substantial deformation with intricate textures. 
            Because the same update strategy is applied to all Gaussians, overlooking the distinctions and interaction between dynamic and static distributions. 
            This leads to artifacts and noise during semantic segmentation, especially between dynamic foreground and static background.
            To address these limitations, we propose the Dual-Hierarchical Optimization(DHO),
            which consists hierarchical Gaussian flow and hierarchical rendering guidance. The former implements effective separation of static and dynamic rendering and their features.
            The latter helps mitigate the issue of dynamic foreground rendering distortion in scenes where the static background has complex noise (e.g. the “broom” scene in HyperNeRF dataset).
            Extensive experiments show that our method consistently outperforms baselines on both synthetic and real-world datasets.
          </p>
        </div>
      </div>
    </div>
    <!--/ end of Abstract -->
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <!-- Method Overview-->
    <div class="columns is-centered">
      <div class="column is-full-width">
        <hr class="divider" />
        <h2 class="title is-3">Method Overview</h2>
        <div class="content has-text-justified">
          <p>
            The overall pipeline of our model. We add semantic properties to each Gaussian and obtain the geometric deformation of the Gaussian at each timestamp t through the deformation field. 
            In the coarse stage, Gaussians are subjected to geometric constraints, while in the fine stage, geometric constraints are relaxed and semantic feature constraints are introduced. 
            We utilize dynamic foreground masks obtained from scene priors for hierarchical rendering guidance of the scene, enhancing the rendering quality of dynamic foreground with complex backgrounds.
          </p>
        </div>
        <div class="two-col-image">
          <img src="./static/images/pipeline3.png" type="image/png">
        </div>
      </div>
    </div>
    <!-- End of Method Overview-->
    
    
    <!-- Results -->
    <div class="columns is-centered">
      <div class="column is-full-width">
        <hr class="divider" />
        <h2 class="title is-3">Visual Results</h2>
        <div class="content has-text-justified">
          <p>
            The following results show the novel rendering views and the extracted semantic feature maps using our method, 
            evaluated on both the real-world HyperNeRF dataset and the synthetic D-NeRF dataset. The visualization of the feature maps is displayed using PCA for dimension reduction.
            <table width="200" border="0" align="center">
            <tbody>

              <tr>
                <td align="center">
                    <video width="200" autoplay muted playsinline loop>
                        <source src="./static/videos/render_feature/cookie.mp4" type="video/mp4" />
                    </video>
                </td>
                 <td align="center"><video width="200" autoplay muted playsinline loop>
                    <source src="./static/videos/render_feature/chick.mp4" type="video/mp4" />
                </video></td>
                   <td align="center"><video width="200" autoplay muted playsinline loop>
                    <source src="./static/videos/render_feature/americano.mp4" type="video/mp4" />
                </video></td>
                           <td align="center"><video width="200" autoplay muted playsinline loop>
                    <source src="./static/videos/render_feature/torchocolate.mp4" type="video/mp4" />
                </video></td>
        </tr>
              <tr>
                <td align="center">Split-Cookie</td>
                <td align="center">ChickChicken</td>
                <td align="center">Americano</td>
                <td align="center">Torchocolate</td>
              </tr>
                </tr>
            </tbody>
          </table>
            <p>
            
        <table width="200" border="0" align="center">
            <tbody>
              <tr>
                <td align="center"><video width="400" autoplay muted playsinline loop>
                    <source src="./static/videos/render_feature/jumpingjacks.mp4" type="video/mp4" />
                </video></td>
                 <td align="center"><video width="400" autoplay muted playsinline loop>
                  <source src="./static/videos/render_feature/standup.mp4" type="video/mp4" />
                </video></td>
                   <td align="center"><video width="400" autoplay muted playsinline loop>
                    <source src="./static/videos/render_feature/trex.mp4" type="video/mp4" />
                </video></td>
                    <td align="center"><video width="400" autoplay muted playsinline loop>
                    <source src="./static/videos/render_feature/hook.mp4" type="video/mp4" />
                </video></td>
              </tr>
              <tr>
                  
                <td align="center" width=250>Jumpingjacks</td>
                <td align="center" width=250>Standup</td>
                <td align="center" width=250>Trex</td>
                <td align="center" width=250>Hook</td>
              </tr>
<!--                </tr>-->
            </tbody>
          </table>
          </p>
        </div>
      </div>
    </div>
    <!-- End of Results -->

   
    <div class="columns is-centered">
      <div class="column is-full-width">
        <hr class="divider" />
        <h2 class="title is-3">Segmentation on Synthetic Dataset</h2>
        <div class="content has-text-justified">
          <p>
            Our method achieves excellent semantic segmentation performance not only on real-world datasets but also on synthetic datasets.
          </p>
        </div>
      </div>
    </div>
  
    <section class="hero is-light is-small">
      <div class="hero-body">
        <div class="container">
          <div id="results-carousel" class="carousel results-carousel">
            <div class="item item-lamp-positive">
              <video poster="" id="lamp-positive" autoplay muted loop playsinline height="100%">
                  <source src="./static/videos/segment_dnerf/detect_jumpingjacks.mp4" type="video/mp4">
              </video>
              <p class="has-text-centered">Seg "Jacket"</p>
            </div>
            <div class="item item-bike-positive">
              <video poster="" id="bike-positive" autoplay muted loop playsinline height="100%">
                  <source src="./static/videos/segment_dnerf/detect_standup.mp4" type="video/mp4">
              </video>
              <p class="has-text-centered">Seg "Helmet"</p>
            </div>
            <div class="item item-alien-positive">
              <video poster="" id="alien-positive" autoplay muted loop playsinline height="100%">
                  <source src="./static/videos/segment_dnerf/detect_trex.mp4" type="video/mp4">
              </video>
              <p class="has-text-centered">Seg "Skull"</p>
            </div>
            <div class="item item-camel-positive">
              <video poster="" id="camel-positive" autoplay muted loop playsinline height="100%">
                  <source src="./static/videos/segment_dnerf/detect_lego.mp4" type="video/mp4">
              </video>
              <p class="has-text-centered">Seg "Shovels"</p>
            </div>
            <div class="item item-hammer-positive-negative">
                <video poster="" id="hammer-positive-negative" autoplay muted loop playsinline height="100%">
                    <source src="./static/videos/segment_dnerf/detect_hook.mp4" type="video/mp4">
                </video>
                <p class="has-text-centered">Seg "Hands"</p>
            </div>
          </div>
        </div>
      </div>
    </section>

    
    <div class="columns is-centered">
      <div class="column is-full-width">
        <hr class="divider" />
        <h2 class="title is-3">Comparison with Baseline</h2>
        <div class="content has-text-justified">
          <p>
            Our method outperforms the baseline in terms of rendering quality, semantic feature completeness, and semantic segmentation accuracy.
            (Our method is on the left, Baseline is on the right)
          </p>
        </div>
      </div> 
    </div> 
    
    <video width="1000" height="536" autoplay muted loop playsinline>
        <source src="./static/videos/compare/page_broom_duibi.mp4" type="video/mp4">
    </video>

  <div class="columns is-centered">
    <div class="column is-full-width">
      <hr class="divider" />
      <h2 class="title is-3">Multi-Scale Semantic Feature and Segmentation</h2>
      <div class="content has-text-justified">
        Visualization results of multi-scale dynamic semantic segmentation . </p>
          <table width="200" border="0" align="center">
              <tbody>
                      <tr>
                  <td align="center">Multi-Scale "Chickchicken"</td>
                  <td></td>
                  <td align="center">Multi-Scale "Broom"</td>
              </tr>
              <tr>
                  <td align="center"><video width="350" controls="controls" autoplay muted playsinline loop>
                      <source src="./static/videos/scales/chick.mp4" type="video/mp4" />
                  </video></td>
                  <td align="center"><video width="10">
                  <td align="center"><video width="350" controls="controls" autoplay muted playsinline loop>
                      <source src="./static/videos/scales/broom.mp4" type="video/mp4" />
                  </video></td>
              </tr>
    
              </tr>
              </tbody>
          </table>
            
<!-- Results -->
<div class="columns is-centered">
  <div class="column is-full-width">
    <hr class="divider" />
    <h2 class="title is-3">Semantic Editing</h2>
    <div class="content has-text-justified">
        Visual illustration of our method’s ability to semantically remove objects. </p>
        <table width="200" border="0" align="center">
            <tbody>
                    <tr>
                <td align="center">Remove "Cookie"</td>
                <td></td>
                <td align="center">Remove "Lemon"</td>
            </tr>
            <tr>
                <td align="center"><video width="435" controls="controls" autoplay muted playsinline loop>
                    <source src="./static/videos/remove/remove_cookie.mp4" type="video/mp4" />
                </video></td>
                <td align="center"><video width="10">
                <td align="center"><video width="343" controls="controls" autoplay muted playsinline loop>
                    <source src="./static/videos/remove/page_remove_lemon.mp4" type="video/mp4" />
                </video></td>
            </tr>

            </tr>
            </tbody>
        </table>
        
        
</body>
</html>