Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: No such device (os error 19) #406

Open
ezone1987 opened this issue Sep 2, 2024 · 2 comments
Open

OSError: No such device (os error 19) #406

ezone1987 opened this issue Sep 2, 2024 · 2 comments

Comments

@ezone1987
Copy link

When testing radio_app_sv4d.py, the problem appears when loading the model.The model is already downloaded and device is set to default 'cpu', but it raises the OSError.
image

@janiceylau
Copy link

janiceylau commented Sep 24, 2024

Hi I also got the same problem. Here are the logs for your reference:

python scripts/sampling/simple_video_sample.py --input_path /net/.../hydrant.jpg --version sv3d_p --elevations_deg 10.0
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False
Initialized embedder # 1: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder # 2: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder # 3: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder # 4: ConcatTimestepEmbedderND with 0 params. Trainable: False
Traceback (most recent call last):
File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 349, in
Fire(sample)
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 98, in sample
model, filter = load_model(
File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 340, in load_model
model = instantiate_from_config(config.model).to(device).eval()
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/util.py", line 175, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/models/diffusion.py", line 81, in init
self.init_from_ckpt(ckpt_path)
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/models/diffusion.py", line 92, in init_from_ckpt
sd = load_safetensors(path)
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/safetensors/torch.py", line 313, in load_file
with safe_open(filename, framework="pt", device=device) as f:
OSError: No such device (os error 19)

@naveen-corpusant
Copy link

I was getting the same error with safetensors - turns out I was using a network mounted SSD, and safetensors memorymap doesn't play nicely with that.

A simple fix is to read the file into memory, and then pass that to safetensors:

def load_model_from_network_storage(checkpoint_path):
    """
    Load a safetensors model from network storage by first copying to memory
    
    Args:
        checkpoint_path: Path to the safetensors file
    """
    print(f"Loading model from: {checkpoint_path}")
    
    # Read the entire file into memory
    print("Reading file into memory...")
    with open(checkpoint_path, 'rb') as f:
        file_content = f.read()
    print(f"Read {len(file_content) / (1024*1024*1024):.2f}GB into memory")
    
    # Load using safetensors.torch.load
    try:
        print("Loading tensors...")
        tensors = safetensors.torch.load(file_content, device="cpu")
        print(f"Successfully loaded {len(tensors)} tensors")
        return tensors
    except Exception as e:
        print(f"Error loading tensors: {str(e)}")
        raise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants