Prompt_ids feature causing repetitions and hallucinations #35603

vchagari · 2025-01-10T07:28:16Z

System Info

System Info
Hi @sanchit-gandhi and @gante

Using Prompt Feature like it is mentioned here (#22395) causing the model output to have too many repetitions and too much of hallucinations.

I recorded an audio and gave it to the Whisper ASR model with prompt like as mentioned below.

More details:
Transformers Commit: 1c7e5e2

Test-Case: Steps how to reproduce the issue.
Audio contents: "The full name of Donald is Donald J. Trump Jr"
prompt = "Donald Duck"

model = WhisperForConditionalGeneration.from_pretrained(model_dir).to("cuda")
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
prompt_ids = processor.get_prompt_ids(prompt)
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to("cuda"), prompt_ids=prompt_ids, num_beams=4)
text = [processor.decode(predicted_id, skip_special_tokens=True) for predicted_id in predicted_ids]
transcript = text[0]

Output: The full name of Donald is Donald J. Trump Jr. Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck

Link to the audio: https://drive.google.com/file/d/1ud-B0uepD8Sk6ArkvJdqPmFWYpCmAooi/view?usp=drive_link

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Reproduction
Test-Case: Steps how to reproduce the issue.
Audio contents: "The full name of Donald is Donald J. Trump Jr"
prompt = "Donald Duck"

model = WhisperForConditionalGeneration.from_pretrained(model_dir).to("cuda")
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
prompt_ids = processor.get_prompt_ids(prompt)
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to("cuda"), prompt_ids=prompt_ids, num_beams=4)
text = [processor.decode(predicted_id, skip_special_tokens=True) for predicted_id in predicted_ids]
transcript = text[0]

Output: The full name of Donald is Donald J. Trump Jr. Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck

Expected behavior

Expected behavior
It has to give either "The full name of Donald is Donald J. Trump" or "The full name of Donald is Donald Duck", not infinite no of prompt key words.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-01-10T14:59:36Z

cc @eustlb

vchagari added the bug label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt_ids feature causing repetitions and hallucinations #35603

Prompt_ids feature causing repetitions and hallucinations #35603

vchagari commented Jan 10, 2025

Rocketknight1 commented Jan 10, 2025

Prompt_ids feature causing repetitions and hallucinations #35603

Prompt_ids feature causing repetitions and hallucinations #35603

Comments

vchagari commented Jan 10, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Jan 10, 2025