You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am having below error, while trying to load model on my 2x RTX 3060 using device_map="auto" param:
File /home/uad/sandbox/evo/venv/lib/python3.10/site-packages/accelerate/utils/modeling.py:1395, in check_device_map(model, device_map)
1393 if len(all_model_tensors) > 0:
1394 non_covered_params = ", ".join(all_model_tensors)
-> 1395 raise ValueError(
1396 f"The device_map provided does not give any device for the following parameters: {non_covered_params}"
1397 )
ValueError: The device_map provided does not give any device for the following parameters: backbone.unembed.weight
my code is:
In [2]: fromtransformersimportAutoConfig, AutoModelForCausalLM
...:
...: model_name='togethercomputer/evo-1-8k-base'
...: #model_name = "togethercomputer/evo-1-131k-base"
...:
...: model_config=AutoConfig.from_pretrained(model_name, trust_remote_code=True, revision="1.1_fix")
...: model_config.use_cache=True
...:
...: model=AutoModelForCausalLM.from_pretrained(
...: model_name,
...: config=model_config,
...: trust_remote_code=True,
...: revision="1.1_fix",
...: cache_dir="/llms/evo",
...: low_cpu_mem_usage=True,
...: device_map="auto". ## only updated here from repo code, so that it distributes the weights to multiple GPUs
...: )
What would be the root cause here and possible solution approaches?
Any help is much appreciated. Thanks
Here you can check out the whole stderr output:
Loading checkpoint shards: 100%|████████████████| 3/3 [00:03<00:00, 1.11s/it]
Some weights of StripedHyenaModelForCausalLM were not initialized from the model checkpoint at togethercomputer/evo-1-8k-base and are newly initialized: ['backbone.unembed.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 9
6 model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=True, revision="1.1_fix")
7 model_config.use_cache = True
----> 9 model = AutoModelForCausalLM.from_pretrained(
10 model_name,
11 config=model_config,
12 trust_remote_code=True,
13 revision="1.1_fix",
14 cache_dir="/media/raid/llms/evo",
15 low_cpu_mem_usage=True,
16 device_map="auto"
17 )
File /home/uad/sandbox/evo/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:3820, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3818 device_map_kwargs["force_hooks"] = True
3819 if not is_fsdp_enabled() and not is_deepspeed_zero3_enabled():
-> 3820 dispatch_model(model, **device_map_kwargs)
3822 if hf_quantizer is not None:
3823 hf_quantizer.postprocess_model(model)
File /home/uad/sandbox/evo/venv/lib/python3.10/site-packages/accelerate/big_modeling.py:351, in dispatch_model(model, device_map, main_device, state_dict, offload_dir, offload_index, offload_buffers, skip_keys, preload_module_classes, force_hooks)
317 """
318 Dispatches a model according to a given device map. Layers of the model might be spread across GPUs, offloaded on
319 the CPU or even the disk.
(...)
348 single device.
349 """
350 # Error early if the device map is incomplete.
--> 351 check_device_map(model, device_map)
353 # for backward compatibility
354 is_bnb_quantized = (
355 getattr(model, "is_quantized", False) or getattr(model, "is_loaded_in_8bit", False)
356 ) and getattr(model, "quantization_method", "bitsandbytes") == "bitsandbytes"
File /home/uad/sandbox/evo/venv/lib/python3.10/site-packages/accelerate/utils/modeling.py:1419, in check_device_map(model, device_map)
1417 if len(all_model_tensors) > 0:
1418 non_covered_params = ", ".join(all_model_tensors)
-> 1419 raise ValueError(
1420 f"The device_map provided does not give any device for the following parameters: {non_covered_params}"
1421 )
ValueError: The device_map provided does not give any device for the following parameters: backbone.unembed.weight
The text was updated successfully, but these errors were encountered:
I ran into the same issue. For some reason the backbone.unembed.weight parameters are not included in the default device map. I got it working with a custom device map like the following:
def make_new_device_map(num_devices:int, out_map_file:str):
# Read in default device map as basis for new
with open(DEFAULT_DEVICE_MAP, 'r') as indm:
device_map = json.load(indm)
# Distribute evenly across as many devices as available
# Count all blocks
device_modules = {}
device_list = list(range(num_devices))
for layer_name in device_map.keys():
module = '.'.join(layer_name.split('.')[:3])
device_modules[module] = None
device_modules['backbone.unembed'] = None
num_modules = len([x for x in device_modules.keys()])
# Assign blocks to devices
even_split = num_modules / num_devices
for i, key in enumerate(device_modules.keys()):
cur_device_idx = int(np.floor(i / even_split))
device_modules[key] = cur_device_idx
# Assign individual layers to devices (all within a block share same device)
for layer_name in device_map.keys():
module = '.'.join(layer_name.split('.')[:3])
device_map[layer_name] = device_modules[module]
device_map['backbone.unembed.weight'] = device_modules['backbone.unembed']
with open(out_map_file, 'w') as outdm:
json.dump(device_map, outdm)
And then you supply the new json device map to the load_checkpoint_and_dispatch() function.
Hi,
I am having below error, while trying to load model on my 2x RTX 3060 using device_map="auto" param:
my code is:
What would be the root cause here and possible solution approaches?
Any help is much appreciated. Thanks
Here you can check out the whole stderr output:
File /home/uad/sandbox/evo/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:558, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
556 else:
557 cls.register(config.class, model_class, exist_ok=True)
--> 558 return model_class.from_pretrained(
559 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
560 )
561 elif type(config) in cls._model_mapping.keys():
562 model_class = _get_model_class(config, cls._model_mapping)
File /home/uad/sandbox/evo/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:3820, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3818 device_map_kwargs["force_hooks"] = True
3819 if not is_fsdp_enabled() and not is_deepspeed_zero3_enabled():
-> 3820 dispatch_model(model, **device_map_kwargs)
3822 if hf_quantizer is not None:
3823 hf_quantizer.postprocess_model(model)
File /home/uad/sandbox/evo/venv/lib/python3.10/site-packages/accelerate/big_modeling.py:351, in dispatch_model(model, device_map, main_device, state_dict, offload_dir, offload_index, offload_buffers, skip_keys, preload_module_classes, force_hooks)
317 """
318 Dispatches a model according to a given device map. Layers of the model might be spread across GPUs, offloaded on
319 the CPU or even the disk.
(...)
348 single device.
349 """
350 # Error early if the device map is incomplete.
--> 351 check_device_map(model, device_map)
353 # for backward compatibility
354 is_bnb_quantized = (
355 getattr(model, "is_quantized", False) or getattr(model, "is_loaded_in_8bit", False)
356 ) and getattr(model, "quantization_method", "bitsandbytes") == "bitsandbytes"
File /home/uad/sandbox/evo/venv/lib/python3.10/site-packages/accelerate/utils/modeling.py:1419, in check_device_map(model, device_map)
1417 if len(all_model_tensors) > 0:
1418 non_covered_params = ", ".join(all_model_tensors)
-> 1419 raise ValueError(
1420 f"The device_map provided does not give any device for the following parameters: {non_covered_params}"
1421 )
ValueError: The device_map provided does not give any device for the following parameters: backbone.unembed.weight
The text was updated successfully, but these errors were encountered: