Timestamps out of sync #5

aaron2221 · 2023-04-29T11:10:27Z

aaron2221
Apr 29, 2023

Thank you so much for this, I'm a newbie, been using it with Subtitle Edit and it works wonders! There's this usual problem with Whisper regarding Timestamps being out of sync. I saw this: https://github.com/linto-ai/whisper-timestamped or this https://github.com/m-bain/whisperX and thought i could suggest implementing one of these fixes in your standalone version :)

Purfview · 2023-04-29T12:09:42Z

Purfview
Apr 29, 2023
Maintainer

Not super accurate timestamps are expected.

Implementing anything is out of scope for this repository, it just provides the compiled binaries.

1 reply

aaron2221 Apr 29, 2023
Author

Roger that, thank you!

aaron2221 · 2023-05-08T19:10:31Z

aaron2221
May 8, 2023
Author

I noticed that using your standalone version without SE results in much better timestamps!

1 reply

Purfview May 8, 2023
Maintainer

I noticed that using your standalone version without SE results in much better timestamps!

Yes, it's known that Faster-Whisper has a bit better transcription and timestamps if not used in Subtitle Edit.
It's because of SE's audio processing [ffmpeg], Faster-Whisper's internal audio processing is done by PyAV and for some reason it produces better results [sometimes much better].

Purfview · 2023-05-09T19:46:58Z

Purfview
May 9, 2023
Maintainer

For reference: SYSTRAN/faster-whisper#207

0 replies

aaron2221 · 2023-06-07T09:32:47Z

aaron2221
Jun 7, 2023
Author

I experimented with settings and I just wanted to report that for movies it's best to use --word_timestamps True as well as --vad_threshold command with a value between 0.80-0.95 to get a nearly perfect sync for subs. I realized that subtitles go out of sync when there's something the actor is saying which even I cannot understand- then, the software for some reason is aligning the text right there and continues to appear until the actual correct speech is said. High VAD values works great at ignoring difficult-to-understand-speech, of course that with such high VAD values, some important texts will go missing but at least the sync will stay correct for the most part. In theory, extremely low VAD values (0.40 and below) will yield the same result of accurate timestamps at the cost of more processing time and more irrelevant text to manually delete because it will literally transcribe every little mumble there is, however I didn't test it. regardless, I feel like the default of 0.45 is not great for movies. Hope that helps! Edit: I'm using large v2 model, it's better than v1 in most cases when it comes to movies. Good luck!

0 replies

Purfview · 2023-06-07T18:41:14Z

Purfview
Jun 7, 2023
Maintainer

I just wanted to report that for movies it's best to use --word_timestamps...

I was hesitating to enable it by default because sometimes some lines are missed with it, but as with VAD the lines can be missed too then I think there is no big harm. I can make a release today with it enabled by default.
Currently word_timestamps produce karaoke subtitles, I'll make it output normal subs and move "karaoke" style to option.

High VAD values works great at ignoring difficult-to-understand-speech...

No, high VAD values works good at ignoring easy-to-understand-speech too, 0.45 is medium value, 0.80-0.95 is extremely high.

I'm using large v2 model, it's better than v1 in most cases...

Yes, large-v1 is pretty bad, imho.

1 reply

aaron2221 Jun 7, 2023
Author

It would be extremely helpful if you could make word_timestamps produce subs just without the karaoke style, what I'm doing to combat this is copy & paste the transcription right out of the cmd window then paste it in a .srt file, from there i move it to SE so that segments are corrected automatically. Yeah it's a pretty long process!

By the way, what does beam_size do? During my testing I was messing with it but couldn't tell the difference all that much.

Purfview · 2023-06-07T21:20:19Z

Purfview
Jun 7, 2023
Maintainer

I've uploaded new r126 release. Karaoke style can be enabled with --highlight_words argument.

I'm doing to combat this is copy & paste the transcription right out of the cmd window then paste it in a .srt file

Ha, you could use -f=txt option.

By the way, what does beam_size do? During my testing I was messing with it but couldn't tell the difference all that much.

https://en.wikipedia.org/wiki/Beam_search
https://en.wikipedia.org/wiki/Greedy_algorithm

I set default to 1 but beam_size=5 potentially can improve transcription than make it worse, but usually makes it ~twice slower with no clear benefit in my use cases.

beam_size=1 is my own preference in my own use cases, maybe in your use cases you'll find beam_size=5 better.

2 replies

aaron2221 Jun 7, 2023
Author

Outstanding!! That's mega helpful and highlight_words is off by default, perfect!

-f=txt, oh well.......

aaron2221 Jun 16, 2023
Author

Have you seen that? SYSTRAN/faster-whisper#226
Apparently it can improve timestamps

Purfview · 2023-06-16T13:56:20Z

Purfview
Jun 16, 2023
Maintainer

Have you seen that? guillaumekln/faster-whisper#226 Apparently it can improve timestamps

Yes.
Did you read notes on r126 release: "Plus includes non-merged #225 & #226 PRs."

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestamps out of sync #5

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Timestamps out of sync #5

aaron2221 Apr 29, 2023

Replies: 7 comments · 5 replies

Purfview Apr 29, 2023 Maintainer

aaron2221 Apr 29, 2023 Author

aaron2221 May 8, 2023 Author

Purfview May 8, 2023 Maintainer

Purfview May 9, 2023 Maintainer

aaron2221 Jun 7, 2023 Author

Purfview Jun 7, 2023 Maintainer

aaron2221 Jun 7, 2023 Author

Purfview Jun 7, 2023 Maintainer

aaron2221 Jun 7, 2023 Author

aaron2221 Jun 16, 2023 Author

Purfview Jun 16, 2023 Maintainer

aaron2221
Apr 29, 2023

Replies: 7 comments 5 replies

Purfview
Apr 29, 2023
Maintainer

aaron2221 Apr 29, 2023
Author

aaron2221
May 8, 2023
Author

Purfview May 8, 2023
Maintainer

Purfview
May 9, 2023
Maintainer

aaron2221
Jun 7, 2023
Author

Purfview
Jun 7, 2023
Maintainer

aaron2221 Jun 7, 2023
Author

Purfview
Jun 7, 2023
Maintainer

aaron2221 Jun 7, 2023
Author

aaron2221 Jun 16, 2023
Author

Purfview
Jun 16, 2023
Maintainer