Replies: 7 comments 5 replies
-
Not super accurate timestamps are expected. Implementing anything is out of scope for this repository, it just provides the compiled binaries. |
Beta Was this translation helpful? Give feedback.
-
I noticed that using your standalone version without SE results in much better timestamps! |
Beta Was this translation helpful? Give feedback.
-
For reference: SYSTRAN/faster-whisper#207 |
Beta Was this translation helpful? Give feedback.
-
I experimented with settings and I just wanted to report that for movies it's best to use --word_timestamps True as well as --vad_threshold command with a value between 0.80-0.95 to get a nearly perfect sync for subs. I realized that subtitles go out of sync when there's something the actor is saying which even I cannot understand- then, the software for some reason is aligning the text right there and continues to appear until the actual correct speech is said. High VAD values works great at ignoring difficult-to-understand-speech, of course that with such high VAD values, some important texts will go missing but at least the sync will stay correct for the most part. In theory, extremely low VAD values (0.40 and below) will yield the same result of accurate timestamps at the cost of more processing time and more irrelevant text to manually delete because it will literally transcribe every little mumble there is, however I didn't test it. regardless, I feel like the default of 0.45 is not great for movies. Hope that helps! Edit: I'm using large v2 model, it's better than v1 in most cases when it comes to movies. Good luck! |
Beta Was this translation helpful? Give feedback.
-
I was hesitating to enable it by default because sometimes some lines are missed with it, but as with VAD the lines can be missed too then I think there is no big harm. I can make a release today with it enabled by default.
No, high VAD values works good at ignoring easy-to-understand-speech too, 0.45 is medium value, 0.80-0.95 is extremely high.
Yes, large-v1 is pretty bad, imho. |
Beta Was this translation helpful? Give feedback.
-
I've uploaded new r126 release. Karaoke style can be enabled with
Ha, you could use
https://en.wikipedia.org/wiki/Beam_search I set default to
|
Beta Was this translation helpful? Give feedback.
-
Yes. |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for this, I'm a newbie, been using it with Subtitle Edit and it works wonders! There's this usual problem with Whisper regarding Timestamps being out of sync. I saw this: https://github.com/linto-ai/whisper-timestamped or this https://github.com/m-bain/whisperX and thought i could suggest implementing one of these fixes in your standalone version :)
Beta Was this translation helpful? Give feedback.
All reactions