Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error #95

Open
Symfomany opened this issue Aug 25, 2023 · 3 comments
Open

CUDA error #95

Symfomany opened this issue Aug 25, 2023 · 3 comments

Comments

@Symfomany
Copy link

Bonjour,

Lors de min finetuning j'ai une erreur:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Voici mon bout de code

`import torch
import shutil
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet
device = "cuda" if torch.cuda.is_available() else "cpu"

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-french", device=device)
output_dir = "/content/drive/MyDrive/wav-example/output2"

for filename in os.listdir(output_dir):
file_path = os.path.join(output_dir, filename)
try:
if os.path.isfile(file_path) or os.path.islink(file_path):
os.unlink(file_path)
elif os.path.isdir(file_path):
shutil.rmtree(file_path)
except Exception as e:
print(f"Failed to delete {file_path}. Reason: {e}")

first of all, you need to define your model's token set

however, the token set is only needed for non-finetuned models

if you pass a new token set for an already finetuned model, it'll be ignored during training

Notez que l'ajout de ces tokens est crucial, car leur absence pourrait affecter les performances du modèle ou même entraîner des erreurs lors de l'entraînement ou de l'inférence.

tokens = [
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
"'", "", "|", "", "", ""
]
token_set = TokenSet(tokens)

define your train/eval data

train_data = [
{"path": "/content/drive/MyDrive/wav-example/audio4.wav", "transcription": "bonjour je m'appelle Manuel je développe sous Androïd en Kotlin je fais des applications mobiles pour la société forestière je travaille dans la classification et reconnaissance vocale dans les essences et dans le domaine de la foresterie merci"},
]
eval_data = [
{"path": "/content/drive/MyDrive/wav-example/audio5.wav", "transcription": "je m'appelle Julien je développe sous Androïd fullstack pour la société forestière"},
]

the lines below will load the training and model arguments objects,

you can check the source code (huggingsound.trainer.TrainingArguments and huggingsound.trainer.ModelArguments) to see all the available arguments

training_args = TrainingArguments(
learning_rate=3e-4,
max_steps=1000,
eval_steps=200,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
)
model_args = ModelArguments(
activation_dropout=0.1,
hidden_dropout=0.1,
)

evaluation = model.evaluate(eval_data)

print(evaluation)

and finally, fine-tune your model

model.finetune(
output_dir,
train_data=train_data,
eval_data=eval_data, # the eval_data is optional
token_set=token_set,
training_args=training_args,
model_args=model_args,
)`

Sous Google Collab Pro + sous une carte GPU avec Cuda NVidia A100

image

@Symfomany
Copy link
Author

C'est bien sur GPU A100 qu'il y a un problème, car sous V100 c'est good !

@Symfomany
Copy link
Author

Une idée ?

@Symfomany
Copy link
Author

Sur A100 GPU pardon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant