-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong volume after silence at start of track? #146
Comments
Based on the waveform screenshot it seems that the file has a very low volume to begin with. Is that correct? Maybe this has something to do with the volume detection not working properly. I am not sure what the y axis in Audacity indicates. According to that, normalization results in ~50x amplification of the level. As a quick check, does the same issue occur if you first peak-normalize the file, via |
PS: It may be a bug or special case not handled by the |
Yes, ffmpeg-normalize with
Ah, there's a way to display the Audacity waveform on a dB scale. The default view seems to be a linear amplitude scale. Here's my original example with the dB view (and also left channel only to save some screen space): EBU normalized: "The dark blue part of the waveform displays the tallest peak and the light blue part of the waveform displays the average RMS (Root Mean Square) value of the audio" (Source) The volume doesn't have to be that low for this behavior to happen, though it is more noticeable the lower the volume is.
Peak normalized ( Peak normalized and then EBU normalized (
Understood, and that's fine! It's good to just get a bit of insight on what might be happening. I already had a workaround in mind - detect how much silence there is at the start (with ffmpeg's silencedetect filter), trim off most of that silence, normalize, and then add the silence back. |
Thanks for the quick feedback. Interesting that this still happens for a peak-normalized input. So it's rather the property of that particular file with the combination of silence in the beginning that causes the error. Could the case be made that the faint increase in RMS of the first few seconds of signal gets amplified by loudnorm? Anyway, good that truncating the silence works. One could think about adding an option that performs auto-truncating of silence at the beginning, basically doing what you're doing, just automatically, but it's a bit of a convoluted solution for an edge case, and it might mess with music files that contain a bit of noise in the beginning, where re-adding pure silence is not an option, etc. |
Maybe, although here's an example where RMS doesn't particularly increase during the beginning of the signal: Peak normalized ( Peak normalized and then EBU normalized ( For the record, it can be easier to see the differences using the linear scale:
I agree, hopefully this just points to some part of the |
I can confirm this. It's time for the world to accept that loudnorm's implementation of ebu r128 limiting is dangerously opinionated or just plain broken. I wouldn't let it near music, that's for sure. Checking files in Logic Pro with Fabfilter Pro-L2 and it's two very different worlds. Really not sure what this slow ramp up is but it's happening on almost everything. Trying to figure out how to run Fab's VST2 in DPlug now. Again, this is no fault of ffmpeg-normalize and most certainly an upstream failure. |
Thanks for your comment. I guess that it's the silence that's throwing it off, no? I see that there is a limiter gain that's being calculated on a running window of samples, and that might explain why the limiter gain is increased frame-by-frame when that window is now filled with samples of a certain volume. I think one might be able to debug the issue by printing some intermediate values in ffmpeg but … I would have to find the time to do that. I've also tried contacting the original author of the filter but I was unsuccessful. PS: There is a |
I don't think it's the silence. As long as we can agree Fabfilter Pro-L2 is the gold standard for mastering and loudness, at least. With loudnorm I tried almost every variation with target level and LRA target (which seemed to have no effect). Fab was effortless as always. I noticed pumping on a lot of loudnorm files and decided to investigate. Here's Kenny Rogers' The Gambler:
It's so strange. I can't repeat the loudnorm behavior in Fab. To be fair, Fabfilter is amazing. I wasn't expecting that level of result but I also wasn't expecting this weird behavior. |
You're right, this looks odd. Sorry there isn't more that I can do … |
No worries and of course thanks again for your work with this. I might conduct some experiments with dynaudnorm as well just out of curiosity. Edit: Do not use dynaudnorm for music. |
Interesting to see that it doesn't need silence at the start to happen. |
I don't know much about the audio theory, but I guess a normalizer's job is to get to the desired peak and LUFS without undesired distortions, so if alimiter gets you there and then some, that should be ideal? As long as you can consistently get to the desired peak and LUFS ranges without trial and error. Since (IMO) a big idea with tools like ffmpeg-normalize is to automatically process files, perhaps in batch, without having to double-check and parameter-tweak files individually. If I'm reading correctly, the volume filter does RMS normalization, so it's 'simpler' than loudnorm - although that doesn't necessarily mean worse: https://trac.ffmpeg.org/wiki/AudioVolume (Again, if you're reaching your target loudness range, then it seems like you're good to go.) I guess it kind of puts into perspective that there are a lot of ways to approach audio mastering - and loudnorm / ffmpeg-normalize is just one thing in the toolkit. |
One could definitely also just do a first pass to get the statistics, then do a second pass with RMS normalization with an added limiter to prevent clipping. This might be another mode that could be implemented in I think the best way forward would actually be debugging the existing ffmpeg filter, logging the intermediate values of the function that gets called on every frame (perhaps adding a metadata injection), plotting them to see what causes the error. |
Where are samples so this can be fixed? |
In #146 (comment) the author mentioned a track that could possibly be obtained from, uhm, other sources. There are some waveforms included in the comment that are useful for diagnosis. |
I just can confirm that current loudnorm implementation is not correct at all, the scanner part is working well, but limiter/compressor/expander are buggy, and in worst cases can produce clipped output. |
Thanks for looking into this! Do you think it would be a lot of effort to fix this? I'm afraid I don't know enough about the underlying processing to help with that. As far as I know the original author is no longer actively maintaining the code. |
There is also an issue with timestamps rewriting, which could give issue online processing when gaps are present with timestamps and video too, causing lost of A/V sync. There is not a lot of effort, it should be just matter of rewriting some chunks of code, currently looking how to do it best. |
Well, for now on you should use 2-pass loudnorm only. 1-pass mess with dynamics too much imho. |
Were you able to improve the code with respect to some of the issues? That'd be great! This tool actually uses only two pass loudnorm, so that is not an issue. |
Two pass loudnorm, if properly used (report at end of summary is still linear and not dynamic), is just volume amplification with single constant gain value for whole audio. I got that gambler audio file, the master mix in ogg format, and I see nothing wrong with how 2 pass loudnorm process file. Perhaps reporter really need high dynamics processing, that may impact loudness range LRA where 2 pass loudnorm can not and should not do it at all. |
I've been using ffmpeg-normalize (EBU R128 method) to normalize the audio of gameplay recordings. Typically the recordings have a peak and LUFS significantly lower than the target volume, and I use ffmpeg-normalize to boost the volume. Sometimes there's silence in the audio, like when the game is loading or paused.
When there are at least 2-3 seconds of silence at the beginning of the audio track, the result I get with ffmpeg-normalize has a lower-than-expected volume right after the silence, and then the volume gradually climbs toward the expected volume over a period of time.
Here's an example. Waveform of original recording:
Zooming in on the original recording, to confirm that the volume is reasonably steady:
Normalization result, using
ffmpeg-normalize.exe original.aac -nt ebu -t -14 -c:a aac -o normalized.aac
- it takes roughly 90 seconds to climb to the volume I'd expect from normalization:If I trim most of the silence off the start, and then normalize, the volume seems to be fine throughout the track. Using
ffmpeg -ss 11 -i original.aac -copyts trim_11.aac
andffmpeg-normalize.exe trim_11.aac -nt ebu -t -14 -c:a aac -o trim_11_normalized.aac
:Windows 10, Python 3.8, ffmpeg 4.3.2. I'm happy to provide audio uploads, stats, more details/examples, etc. but I thought I'd check first - am I missing something obvious? Is this expected behavior, or am I missing a tuning parameter that would help?
The text was updated successfully, but these errors were encountered: