OSError: No such device (os error 19) #406

ezone1987 · 2024-09-02T01:14:05Z

When testing radio_app_sv4d.py, the problem appears when loading the model.The model is already downloaded and device is set to default 'cpu', but it raises the OSError.

janiceylau · 2024-09-24T09:58:00Z

Hi I also got the same problem. Here are the logs for your reference:

python scripts/sampling/simple_video_sample.py --input_path /net/.../hydrant.jpg --version sv3d_p --elevations_deg 10.0
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False
Initialized embedder # 1: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder # 2: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder # 3: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder # 4: ConcatTimestepEmbedderND with 0 params. Trainable: False
Traceback (most recent call last):
File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 349, in
Fire(sample)
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 98, in sample
model, filter = load_model(
File "/net/work/lau/generative-models/scripts/sampling/simple_video_sample.py", line 340, in load_model
model = instantiate_from_config(config.model).to(device).eval()
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/util.py", line 175, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/models/diffusion.py", line 81, in init
self.init_from_ckpt(ckpt_path)
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/sgm/models/diffusion.py", line 92, in init_from_ckpt
sd = load_safetensors(path)
File "/net/work/lau/generative-models/.pt2/lib/python3.10/site-packages/safetensors/torch.py", line 313, in load_file
with safe_open(filename, framework="pt", device=device) as f:
OSError: No such device (os error 19)

naveen-corpusant · 2024-11-23T05:10:38Z

I was getting the same error with safetensors - turns out I was using a network mounted SSD, and safetensors memorymap doesn't play nicely with that.

A simple fix is to read the file into memory, and then pass that to safetensors:

def load_model_from_network_storage(checkpoint_path):
    """
    Load a safetensors model from network storage by first copying to memory
    
    Args:
        checkpoint_path: Path to the safetensors file
    """
    print(f"Loading model from: {checkpoint_path}")
    
    # Read the entire file into memory
    print("Reading file into memory...")
    with open(checkpoint_path, 'rb') as f:
        file_content = f.read()
    print(f"Read {len(file_content) / (1024*1024*1024):.2f}GB into memory")
    
    # Load using safetensors.torch.load
    try:
        print("Loading tensors...")
        tensors = safetensors.torch.load(file_content, device="cpu")
        print(f"Successfully loaded {len(tensors)} tensors")
        return tensors
    except Exception as e:
        print(f"Error loading tensors: {str(e)}")
        raise

koutini-sony mentioned this issue Dec 17, 2024

Use mmap.MAP_PRIVATE flag for loading .safetensors files on filesystems that don't support SHARED huggingface/safetensors#545

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSError: No such device (os error 19) #406

OSError: No such device (os error 19) #406

ezone1987 commented Sep 2, 2024

janiceylau commented Sep 24, 2024 •

edited

Loading

naveen-corpusant commented Nov 23, 2024

OSError: No such device (os error 19) #406

OSError: No such device (os error 19) #406

Comments

ezone1987 commented Sep 2, 2024

janiceylau commented Sep 24, 2024 • edited Loading

naveen-corpusant commented Nov 23, 2024

janiceylau commented Sep 24, 2024 •

edited

Loading