Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Multifocal Data (distinct cameras and resolutions) for Cultural Heritage #11

Open
dberga opened this issue Apr 10, 2024 · 5 comments

Comments

@dberga
Copy link
Owner

dberga commented Apr 10, 2024

Issues posted in:

Nerfstudio
nerfstudio-project#3059
nerfstudio-project#3057

SDFstudio
autonomousvision/sdfstudio#307

Hloc
cvg/Hierarchical-Localization#383

@dberga
Copy link
Owner Author

dberga commented Apr 16, 2024

Nerfstudio fails with images with different size regardless if they were correctly processed by colmap or hloc.

The transforms.json when converted considers 1 camera:
https://github.com/dberga/nerfstudio/blob/main/nerfstudio/process_data/colmap_utils.py#L461
Thus, any transforms.json that consider the first camera but different image sizes will crash on training (error on matrix size)

RuntimeError: Error(s) in loading state_dict for NerfactoModel:                                                                                                                                            
        size mismatch for field.embedding_appearance.embedding.weight: copying a param with shape torch.Size([10, 32]) from checkpoint, the shape in current model is torch.Size([17, 32]).                
        size mismatch for camera_optimizer.pose_adjustment: copying a param with shape torch.Size([10, 6]) from checkpoint, the shape in current model is torch.Size([17, 6]).

A possible solution is to add a padding mechanism (conforming a unique size for all images, padding with zeroes to the smaller images). This can be agnostic to dataset.

@dberga
Copy link
Owner Author

dberga commented Apr 16, 2024

Here is an example of transforms.json from LandMark https://github.com/InternLandMark/LandMark

single focal example

{
    "camera_model": "SIMPLE_PINHOLE",
    "fl_x": 427,
    "fl_y": 427,
    "w": 547,
    "h": 365,
    "frames": [
        {
            "file_path": "./images/image_0.png",
            "transform_matrix": []
        }
    ]
}

multi focal example

{
    "camera_model": "SIMPLE_PINHOLE",
    "frames": [
        {
            "fl_x": 1116,
            "fl_y": 1116,
            "w": 1420,
            "h": 1065,
            "file_path": "./images/image_0.png",
            "transform_matrix": []
        }
    ]
}

@dberga
Copy link
Owner Author

dberga commented Apr 25, 2024

Merged in nerfstudio-project@db93476

@dberga dberga closed this as completed Apr 25, 2024
@dberga dberga reopened this Apr 25, 2024
@dberga
Copy link
Owner Author

dberga commented Apr 26, 2024

Another solution (padding + mask path) nerfstudio-project#1465

@dberga
Copy link
Owner Author

dberga commented May 23, 2024

An issue regarding colmap to transforms
nerfstudio-project#2784

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant