CUDA error #95

Symfomany · 2023-08-25T18:48:40Z

Bonjour,

Lors de min finetuning j'ai une erreur:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Voici mon bout de code

`import torch
import shutil
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet
device = "cuda" if torch.cuda.is_available() else "cpu"

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-french", device=device)
output_dir = "/content/drive/MyDrive/wav-example/output2"

for filename in os.listdir(output_dir):
file_path = os.path.join(output_dir, filename)
try:
if os.path.isfile(file_path) or os.path.islink(file_path):
os.unlink(file_path)
elif os.path.isdir(file_path):
shutil.rmtree(file_path)
except Exception as e:
print(f"Failed to delete {file_path}. Reason: {e}")

first of all, you need to define your model's token set

however, the token set is only needed for non-finetuned models

if you pass a new token set for an already finetuned model, it'll be ignored during training

Notez que l'ajout de ces tokens est crucial, car leur absence pourrait affecter les performances du modèle ou même entraîner des erreurs lors de l'entraînement ou de l'inférence.

tokens = [
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
"'", "", "|", "", "~~", "~~"
]
token_set = TokenSet(tokens)

define your train/eval data

train_data = [
{"path": "/content/drive/MyDrive/wav-example/audio4.wav", "transcription": "bonjour je m'appelle Manuel je développe sous Androïd en Kotlin je fais des applications mobiles pour la société forestière je travaille dans la classification et reconnaissance vocale dans les essences et dans le domaine de la foresterie merci"},
]
eval_data = [
{"path": "/content/drive/MyDrive/wav-example/audio5.wav", "transcription": "je m'appelle Julien je développe sous Androïd fullstack pour la société forestière"},
]

the lines below will load the training and model arguments objects,

you can check the source code (huggingsound.trainer.TrainingArguments and huggingsound.trainer.ModelArguments) to see all the available arguments

training_args = TrainingArguments(
learning_rate=3e-4,
max_steps=1000,
eval_steps=200,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
)
model_args = ModelArguments(
activation_dropout=0.1,
hidden_dropout=0.1,
)

evaluation = model.evaluate(eval_data)

print(evaluation)

and finally, fine-tune your model

model.finetune(
output_dir,
train_data=train_data,
eval_data=eval_data, # the eval_data is optional
token_set=token_set,
training_args=training_args,
model_args=model_args,
)`

Sous Google Collab Pro + sous une carte GPU avec Cuda NVidia A100

Symfomany · 2023-08-28T08:45:20Z

C'est bien sur GPU A100 qu'il y a un problème, car sous V100 c'est good !

Symfomany · 2023-08-28T08:45:54Z

Une idée ?

Symfomany · 2023-08-28T12:52:39Z

Sur A100 GPU pardon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error #95

CUDA error #95

Symfomany commented Aug 25, 2023

Symfomany commented Aug 28, 2023

Symfomany commented Aug 28, 2023

Symfomany commented Aug 28, 2023

CUDA error #95

CUDA error #95

Comments

Symfomany commented Aug 25, 2023

first of all, you need to define your model's token set

however, the token set is only needed for non-finetuned models

if you pass a new token set for an already finetuned model, it'll be ignored during training

Notez que l'ajout de ces tokens est crucial, car leur absence pourrait affecter les performances du modèle ou même entraîner des erreurs lors de l'entraînement ou de l'inférence.

define your train/eval data

the lines below will load the training and model arguments objects,

you can check the source code (huggingsound.trainer.TrainingArguments and huggingsound.trainer.ModelArguments) to see all the available arguments

evaluation = model.evaluate(eval_data)

print(evaluation)

and finally, fine-tune your model

Symfomany commented Aug 28, 2023

Symfomany commented Aug 28, 2023

Symfomany commented Aug 28, 2023