Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric Depth Model of outdoor(large) can only predict the objects upto 80 meters? #216

Open
Itachi-6 opened this issue Nov 27, 2024 · 2 comments

Comments

@Itachi-6
Copy link

I'm using 'vitl' encoder and 'vkitti' for outdoor model. I used random outdoor image which contains cars and the depth was not that accurate.

The image which I used is:
frame

I viewed the image using opencv and got the (x,y) coordinates of the point for which I want the depth. I passed that coordinate to the model and it gave me 6.5 meters. If you see the car in the image, we can easily say that the car is at least 80 or 100 meters away.

Can this model only predict CORRECTLY upto 80 meters?

@khushbu-IITK
Copy link

@Itachi-6 I have two doubts

  1. How did you conclude that the unit is meters?
  2. Did you tried to see what can be adjusted to get correct prediction?
    It would be great if you could comment on the same.
    Thanks

@Itachi-6
Copy link
Author

@khushbu-IITK

  1. I you refer to the Metric depth estimation demo code, at the end it was mentioned that the output is a depth map in meter in numpy. depth = model.infer_image(raw_img) # HxW depth map in meters in numpy

  2. Unfortunately no. I guess this model gives good results(meters) for indoor environments. Although, finding depth is a very high complex task in itself which requires many factors to determine Accurate Depth. If you want to find depth of the points outdoor, then try learning about camera calibration where you'll understand how actually depth is computed.

Stereo vision setup is perfect for finding actual depth compared to these models. May be in future these models might produce great results but even then stereo vision setup will be at the top(at least in my opinion). The datasets which these models use are the results from these stereo vision setup cameras which will be used to train the model. Though setting up the stereo vision is a hassle, the outputs will be accurate. Learn about how to setup stereo cameras at a certain baseline, taking frames from left and right cameras, finding the correspondences and their disparities. Once the disparities are found of all the pixels, then even if you have a one or two actual depth reference points in the scene, you can calculate the entire image pixel's accurate depth.

But nonetheless, depthAnythingV2 is very good model to find indoor scene's depth map and for outdoor I guess its just upto some meters in what I tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants