-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
370 lines (319 loc) · 16.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="DHO is an integrated method to reconstruct and understand dynamic scenes.">
<meta name="keywords" content="DHO, Semantics, Segmentation, 4D Gaussians Splatting, Foundation Models">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta property="og:title" content="Divide-and-Conquer: The Dual-Hierarchical Optimization for Semantic 4D Gaussians"/>
<meta name="twitter:title" content="Divide-and-Conquer: The Dual-Hierarchical Optimization for Semantic 4D Gaussians">
<meta name="twitter:description" content="DHO is an integrated method to reconstruct and understand dynamic scenes.">
<meta name="twitter:card" content="summary_large_image">
<title>Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussians</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://documentcloud.adobe.com/view-sdk/main.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">Divide-and-Conquer: The Dual-Hierarchical Optimization for Semantic 4D Gaussians</h1>
<section class="hero is-light is-small">
<div class="hero-body">
<div class="container">
<div id="results-carousel" class="carousel results-carousel">
<div class="item item-lamp-positive">
<video poster="" id="lamp-positive" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_hypernerf/americano.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Cup"</p>
</div>
<div class="item item-bike-positive">
<video poster="" id="bike-positive" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_hypernerf/oven-mitts.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Mitts"</p>
</div>
<div class="item item-alien-positive">
<video poster="" id="alien-positive" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_hypernerf/cookie.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Cookie"</p>
</div>
<div class="item item-camel-positive">
<video poster="" id="camel-positive" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_hypernerf/chick.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Toy""</p>
</div>
<div class="item item-hammer-positive-negative">
<video poster="" id="hammer-positive-negative" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_hypernerf/chocolate.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Chocolate"</p>
</div>
<div class="item item-new-video">
<video poster="" id="new-video" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_hypernerf/broom.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Broom"</p>
</div>
</div>
</div>
</div>
</section>
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<h2 class="subtitle has-text-centered">
<br>
</b> Our method is dedicated to achieving high-quality rendering and accurate semantic understanding of dynamic scenes,
while providing support for downstream tasks in 4D scenarios.
</h2>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
Semantic 4D Gaussians can be used for reconstructing and understanding dynamic scenes captured from a monocular camera,
resulting in a better handling of target information with temporal variations than static sences.
However, most recent work focuses on the semantics of static scenes. Directly applying them to understand dynamic scenes is impractical,
which fails to capture the temporal behaviors and features of dynamic objects.
To the best of our knowledge, few existing works focus on semantic comprehension of dynamic scenes based on 3DGS.
While demonstrating promising capabilities in simple scenes, it struggles to achieve high-fidelity rendering and accurate semantic features in scenarios where the static background contains significant noise and the dynamic foreground exhibits substantial deformation with intricate textures.
Because the same update strategy is applied to all Gaussians, overlooking the distinctions and interaction between dynamic and static distributions.
This leads to artifacts and noise during semantic segmentation, especially between dynamic foreground and static background.
To address these limitations, we propose the Dual-Hierarchical Optimization(DHO),
which consists hierarchical Gaussian flow and hierarchical rendering guidance. The former implements effective separation of static and dynamic rendering and their features.
The latter helps mitigate the issue of dynamic foreground rendering distortion in scenes where the static background has complex noise (e.g. the “broom” scene in HyperNeRF dataset).
Extensive experiments show that our method consistently outperforms baselines on both synthetic and real-world datasets.
</p>
</div>
</div>
</div>
<!--/ end of Abstract -->
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Method Overview-->
<div class="columns is-centered">
<div class="column is-full-width">
<hr class="divider" />
<h2 class="title is-3">Method Overview</h2>
<div class="content has-text-justified">
<p>
The overall pipeline of our model. We add semantic properties to each Gaussian and obtain the geometric deformation of the Gaussian at each timestamp t through the deformation field.
In the coarse stage, Gaussians are subjected to geometric constraints, while in the fine stage, geometric constraints are relaxed and semantic feature constraints are introduced.
We utilize dynamic foreground masks obtained from scene priors for hierarchical rendering guidance of the scene, enhancing the rendering quality of dynamic foreground with complex backgrounds.
</p>
</div>
<div class="two-col-image">
<img src="./static/images/pipeline3.png" type="image/png">
</div>
</div>
</div>
<!-- End of Method Overview-->
<!-- Results -->
<div class="columns is-centered">
<div class="column is-full-width">
<hr class="divider" />
<h2 class="title is-3">Visual Results</h2>
<div class="content has-text-justified">
<p>
The following results show the novel rendering views and the extracted semantic feature maps using our method,
evaluated on both the real-world HyperNeRF dataset and the synthetic D-NeRF dataset. The visualization of the feature maps is displayed using PCA for dimension reduction.
<table width="200" border="0" align="center">
<tbody>
<tr>
<td align="center">
<video width="200" autoplay muted playsinline loop>
<source src="./static/videos/render_feature/cookie.mp4" type="video/mp4" />
</video>
</td>
<td align="center"><video width="200" autoplay muted playsinline loop>
<source src="./static/videos/render_feature/chick.mp4" type="video/mp4" />
</video></td>
<td align="center"><video width="200" autoplay muted playsinline loop>
<source src="./static/videos/render_feature/americano.mp4" type="video/mp4" />
</video></td>
<td align="center"><video width="200" autoplay muted playsinline loop>
<source src="./static/videos/render_feature/torchocolate.mp4" type="video/mp4" />
</video></td>
</tr>
<tr>
<td align="center">Split-Cookie</td>
<td align="center">ChickChicken</td>
<td align="center">Americano</td>
<td align="center">Torchocolate</td>
</tr>
</tr>
</tbody>
</table>
<p>
<table width="200" border="0" align="center">
<tbody>
<tr>
<td align="center"><video width="400" autoplay muted playsinline loop>
<source src="./static/videos/render_feature/jumpingjacks.mp4" type="video/mp4" />
</video></td>
<td align="center"><video width="400" autoplay muted playsinline loop>
<source src="./static/videos/render_feature/standup.mp4" type="video/mp4" />
</video></td>
<td align="center"><video width="400" autoplay muted playsinline loop>
<source src="./static/videos/render_feature/trex.mp4" type="video/mp4" />
</video></td>
<td align="center"><video width="400" autoplay muted playsinline loop>
<source src="./static/videos/render_feature/hook.mp4" type="video/mp4" />
</video></td>
</tr>
<tr>
<td align="center" width=250>Jumpingjacks</td>
<td align="center" width=250>Standup</td>
<td align="center" width=250>Trex</td>
<td align="center" width=250>Hook</td>
</tr>
<!-- </tr>-->
</tbody>
</table>
</p>
</div>
</div>
</div>
<!-- End of Results -->
<div class="columns is-centered">
<div class="column is-full-width">
<hr class="divider" />
<h2 class="title is-3">Segmentation on Synthetic Dataset</h2>
<div class="content has-text-justified">
<p>
Our method achieves excellent semantic segmentation performance not only on real-world datasets but also on synthetic datasets.
</p>
</div>
</div>
</div>
<section class="hero is-light is-small">
<div class="hero-body">
<div class="container">
<div id="results-carousel" class="carousel results-carousel">
<div class="item item-lamp-positive">
<video poster="" id="lamp-positive" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_dnerf/detect_jumpingjacks.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Jacket"</p>
</div>
<div class="item item-bike-positive">
<video poster="" id="bike-positive" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_dnerf/detect_standup.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Helmet"</p>
</div>
<div class="item item-alien-positive">
<video poster="" id="alien-positive" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_dnerf/detect_trex.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Skull"</p>
</div>
<div class="item item-camel-positive">
<video poster="" id="camel-positive" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_dnerf/detect_lego.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Shovels"</p>
</div>
<div class="item item-hammer-positive-negative">
<video poster="" id="hammer-positive-negative" autoplay muted loop playsinline height="100%">
<source src="./static/videos/segment_dnerf/detect_hook.mp4" type="video/mp4">
</video>
<p class="has-text-centered">Seg "Hands"</p>
</div>
</div>
</div>
</div>
</section>
<div class="columns is-centered">
<div class="column is-full-width">
<hr class="divider" />
<h2 class="title is-3">Comparison with Baseline</h2>
<div class="content has-text-justified">
<p>
Our method outperforms the baseline in terms of rendering quality, semantic feature completeness, and semantic segmentation accuracy.
(Our method is on the left, Baseline is on the right)
</p>
</div>
</div>
</div>
<video width="1000" height="536" autoplay muted loop playsinline>
<source src="./static/videos/compare/page_broom_duibi.mp4" type="video/mp4">
</video>
<div class="columns is-centered">
<div class="column is-full-width">
<hr class="divider" />
<h2 class="title is-3">Multi-Scale Semantic Feature and Segmentation</h2>
<div class="content has-text-justified">
Visualization results of multi-scale dynamic semantic segmentation . </p>
<table width="200" border="0" align="center">
<tbody>
<tr>
<td align="center">Multi-Scale "Chickchicken"</td>
<td></td>
<td align="center">Multi-Scale "Broom"</td>
</tr>
<tr>
<td align="center"><video width="350" controls="controls" autoplay muted playsinline loop>
<source src="./static/videos/scales/chick.mp4" type="video/mp4" />
</video></td>
<td align="center"><video width="10">
<td align="center"><video width="350" controls="controls" autoplay muted playsinline loop>
<source src="./static/videos/scales/broom.mp4" type="video/mp4" />
</video></td>
</tr>
</tr>
</tbody>
</table>
<!-- Results -->
<div class="columns is-centered">
<div class="column is-full-width">
<hr class="divider" />
<h2 class="title is-3">Semantic Editing</h2>
<div class="content has-text-justified">
Visual illustration of our method’s ability to semantically remove objects. </p>
<table width="200" border="0" align="center">
<tbody>
<tr>
<td align="center">Remove "Cookie"</td>
<td></td>
<td align="center">Remove "Lemon"</td>
</tr>
<tr>
<td align="center"><video width="435" controls="controls" autoplay muted playsinline loop>
<source src="./static/videos/remove/remove_cookie.mp4" type="video/mp4" />
</video></td>
<td align="center"><video width="10">
<td align="center"><video width="343" controls="controls" autoplay muted playsinline loop>
<source src="./static/videos/remove/page_remove_lemon.mp4" type="video/mp4" />
</video></td>
</tr>
</tr>
</tbody>
</table>
</body>
</html>