diff --git a/.gitignore b/.gitignore index 5506c38d..8623d1a9 100644 --- a/.gitignore +++ b/.gitignore @@ -11,4 +11,5 @@ /dist /outputs /build -/src \ No newline at end of file +/src +/.vscode \ No newline at end of file diff --git a/.vscode/launch.json b/.vscode/launch.json deleted file mode 100644 index d71d58e7..00000000 --- a/.vscode/launch.json +++ /dev/null @@ -1,23 +0,0 @@ -{ - // Use IntelliSense to learn about possible attributes. - // Hover to view descriptions of existing attributes. - // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 - "version": "0.2.0", - "configurations": [ - { - "name": "Python Debugger: Remote Attach", - "type": "debugpy", - "request": "attach", - "connect": { - "host": "localhost", - "port": 5678 - }, - "pathMappings": [ - { - "localRoot": "${workspaceFolder}", - "remoteRoot": "." - } - ] - } - ] -} \ No newline at end of file diff --git a/README.md b/README.md index 43185941..d9b3714c 100755 --- a/README.md +++ b/README.md @@ -9,9 +9,9 @@ - We are releasing **[Stable Video 4D (SV4D)](https://huggingface.co/stabilityai/sv4d)**, a video-to-4D diffusion model for novel-view video synthesis. For research purposes: - **SV4D** was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object. - To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency. - - Please check our [project page](), [tech report]() and [video summary]() for more details. + - Please check our [project page](https://sv4d.github.io), [tech report](https://sv4d.github.io/static/sv4d_technical_report.pdf) and [video summary](https://www.youtube.com/watch?v=RBP8vdAWTgk) for more details. -**QUICKSTART** : `python scripts/sampling/simple_video_sample_4d.py --input_path assets/test_video1.mp4 --output_folder outputs/sv4d` (after downloading [SV4D](https://huggingface.co/stabilityai/sv4d) and [SV3D_u]((https://huggingface.co/stabilityai/sv3d)) from HuggingFace) +**QUICKSTART** : `python scripts/sampling/simple_video_sample_4d.py --input_path assets/test_video1.mp4 --output_folder outputs/sv4d` (after downloading [sv4d.safetensors](https://huggingface.co/stabilityai/sv4d) and [sv3d_u.safetensors](https://huggingface.co/stabilityai/sv3d) from HuggingFace into `checkpoints/`) To run **SV4D** on a single input video of 21 frames: - Download SV3D models (`sv3d_u.safetensors` and `sv3d_p.safetensors`) from [here](https://huggingface.co/stabilityai/sv3d) and SV4D model (`sv4d.safetensors`) from [here](https://huggingface.co/stabilityai/sv4d) to `checkpoints/` diff --git a/sgm/modules/spacetime_attention.py b/sgm/modules/spacetime_attention.py old mode 100755 new mode 100644