Fail to run readme_example.py #33

pegga1225 · 2024-12-20T02:52:23Z

Hi, I have a problem when I run readme_example.py to infer Mixtral-8x7B on a A100 GPU. The error message is as follows:

/home/xxx/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:411: FutureWarning: torch.cuda.amp.custom_fwd(args...)is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')instead. def forward(ctx, input, qweight, scales, qzeros, g_idx, bits, maxq): /home/xxx/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:419: FutureWarning:torch.cuda.amp.custom_bwd(args...)is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')instead. def backward(ctx, grad_output): /home/xxx/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning:torch.cuda.amp.custom_fwd(args...)is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')` instead.
@custom_fwd(cast_inputs=torch.float16)
CUDA extension not installed.
CUDA extension not installed.
Do not detect pre-installed ops, use JIT mode
[WARNING] FlashAttention is not available in the current environment. Using default attention.
Using /data/xxx/mirror/.cache/torch_extensions/py39_cu124 as PyTorch extensions root...
Emitting ninja build file /data/xxx/mirror/.cache/torch_extensions/py39_cu124/prefetch/build.ninja...
Building extension module prefetch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module prefetch...
Time to load prefetch op: 2.545353889465332 seconds
SPDLOG_LEVEL : (null)
2024-12-20 10:34:40.267 INFO Create ArcherAioThread for thread: , 0
2024-12-20 10:34:40.268 INFO Loading index file from , /home/xxx/moe-infinity/archer_index
2024-12-20 10:34:40.268 INFO Index file size , 995
2024-12-20 10:34:40.269 INFO Device count , 1
2024-12-20 10:34:40.269 INFO Enabled peer access for all devices
Loading model from offload_path ...
Model create: 76%|████████████████████████████████████████████████████████▌ | 760/994 [00:00<00:00, 2359.42it/s]MixtralConfig {
"_name_or_path": "/data/model_and_dataset/Mixtral-8x7B-Instruct-v0.1",
"architectures": [
"MixtralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mixtral",
"num_attention_heads": 32,
"num_experts_per_tok": 2,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"num_local_experts": 8,
"output_router_logits": false,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"router_aux_loss_coef": 0.02,
"router_jitter_noise": 0.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.47.1",
"use_cache": true,
"vocab_size": 32000
}

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:2 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
/home/xxx/anaconda3/envs/moe-infinity/lib/python3.9/site-packages/transformers/generation/utils.py:2134: UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on cuda, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cpu') before running .generate().
warnings.warn(
Model create: 94%|█████████████████████████████████████████████████████████████████████▏ | 930/994 [00:21<00:00, 2359.42it/s]translate English to German: How old are you?;
;inchct-- REYetetetetctet

ArcherTaskPool destructor`

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to run readme_example.py #33

Fail to run readme_example.py #33

pegga1225 commented Dec 20, 2024

Fail to run readme_example.py #33

Fail to run readme_example.py #33

Comments

pegga1225 commented Dec 20, 2024