Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Offer audio normalization filter #45

Open
HannesJo0139 opened this issue Aug 6, 2020 · 31 comments
Open

Feature Request: Offer audio normalization filter #45

HannesJo0139 opened this issue Aug 6, 2020 · 31 comments
Labels
feature request New idea for project

Comments

@HannesJo0139
Copy link

Since there is an option to downmix audiostreams, it should also be possible to apply at least a basic normalization filter in order to prevent clipping. Eg "-ar {samplingrate} -af loudnorm"

@cdgriffith cdgriffith added the feature request New idea for project label Aug 21, 2020
@cdgriffith
Copy link
Owner

As a heads up this is currently not a High Priority thing for me to figure out as a standalone option in the GUI. Because it should be able to add audio track changes and filters via the "Custom ffmpeg options".

Filters are based on the output stream number, which can be found beside the audio tracks at their start in the format "incoming_stream_#:outgoing_stream_#"

track info

In the above example the Stereo track outdex is 1 so we would use -filter:1 loudnorm -ar:1 48000

custom options

Though for loudnorm specifically it looks like for files it's highly recommended to do dual pass on them and can use the ffmpeg-normalize tool for it.

@HannesJo0139
Copy link
Author

Yeah I totally agree. My intention behind this feature request was that Fastflix in its current state tempts you (I was actually thinking of newbies) to do downmixing without any clipping filters and that is a really really bad idea.

Maybe a temporary solution could be to display a small hint when selecting downmixing, like "We discourage you to downmix without clipping filter". Otherwise there is the danger of people messing up their media lib because they just didn't know better. (Somehow like people transcoding HDR stuff with handbrakes 10-bit encoder. Quite a similar pitfall.)

@cdgriffith
Copy link
Owner

Interesting, I thought FFmpeg's built in down-mixing was considered good by default / adheres to the ATSC standards? https://trac.ffmpeg.org/wiki/AudioChannelManipulation

@HannesJo0139
Copy link
Author

Hmm I'm a bit confused. I remember that I ve read this page too. And I agree it says they stick to ATSC standard. The standard says that it prevents any overflow. So no clipping. I'm not certain if I just got it wrong or if there was another reason for me to stick to that normalization filter. However, then it seems to be not as important as I said in my last comment.

@cdgriffith
Copy link
Owner

If you find more info please send it my way, I don't know much on the audio side of things and want to provide best / safest defaults as possible 😄

@MarcoRavich
Copy link

MarcoRavich commented Nov 2, 2020

Interesting, I thought FFmpeg's built in down-mixing was considered good by default / adheres to the ATSC standards? https://trac.ffmpeg.org/wiki/AudioChannelManipulation

Please check out these interesting discussions about "correct" FFMPEG stereo downmixing (and relative approaches):

Hope that helps.

@cdgriffith
Copy link
Owner

So maybe instead of some advanced filter system / page, might be easier to just add a few more downmix options / filters and allow the user to generate their own and save them?

For example, could add the nightmode stereo, stereo with LFE, and a 2.1 ("nightmode" stereo + LFE)

Then have a list in the config file with those as examples and allow more to be added?

@HannesJo0139
Copy link
Author

Sorry closed by mistake lol

@cdgriffith
Copy link
Owner

After some quick experimentation, the custom filters will probably work for stereo only. "2.1" is a bad idea.

Codecs like AAC don't have a notion of "2.1" so if you do a pan filter to 2.1 with FL+FC+LFE it actually maps those (in order) to C+FL+FR it seems.

@MarcoRavich
Copy link

After some quick experimentation, the custom filters will probably work for stereo only. "2.1" is a bad idea.

Codecs like AAC don't have a notion of "2.1" so if you do a pan filter to 2.1 with FL+FC+LFE it actually maps those (in order) to C+FL+FR it seems.

Agree, downmix to (hq) STEREO is the way to go.

@MarcoRavich
Copy link

OK, here are the tebasuna51' suggestions about (proper) multichannel audio downmixing with FFMPEG:

  1. If your input is 7.1, first do a 7.1 -> 5.1 downmix with the compand selected (see precedent posts):
    -filter_complex "asplit [f][s]; [f] pan=3.1|c0=c0|c1=c1|c2=c2|c3=c3 [r]; [s] pan=stereo|c0=0.5c4+0.5c6|c1=0.5c5+0.5c7, compand=attacks=0:decays=0:points=-90/-84|-8/-2|-6/-1|-0/-0.1, aformat=channel_layouts=stereo [d]; [r][d] amerge [a]" -map "[a]"

  2. If your audio equipment support Dolby ProLogic decoder (recover a 5.0 from your 2.0 file) use the DPLII downmix:
    -filter_complex "pan=stereo|FL=.3254FL+.2301FC+.2818BL+.1627BR|FR=.3254FR+.2301FC-.1627BL-.2818BR"

  3. If your audio equipment is only stereo (TV for instance) does not exist a proper way, because is not possible suply the same audio volume with 2 speakers than with 5 speakers, and there are many options. Select the desired:

a) The formal approach to preserve the balance between all the channels (of course the LFE channel is ignored by Dolby recommendation):
-filter_complex "pan=stereo|FL=.3694FL+.2612FC+.3694BL+0.0LFE|FR=.3694FR+.2612FC+.3694BR+0.0LFE, volumedetect"

b) The extreme dialog maximize:
-filter_complex "pan=stereo|FL=.5FL+.5FC+.0BL+0.0LFE|FR=.5FR+.5FC+.0BR+0.0LFE, volumedetect"

c) Any option between a) and b). The coeficients for each channel must sum 1 to avoid clip. For instance:
-filter_complex "pan=stereo|FL=.5FL+.4FC+.1BL+0.0LFE|FR=.5FR+.4FC+.1BR+0.0LFE, volumedetect"

I include the volumedetect filter to see if the mix admit a gain without clip.

Mix like:
FL=FC+0.30FL+0.30BL (sum of coeficients 1.6)
FL=0.5FC+0.707FL+0.707BL+0.5LFE (sum of coeficients 2.414)
are wrong because can produce clips.

Mix like (see the < instead the =)
FL < 1.0FL + 0.707FC + 0.707BL
are equivalent (automatic normalization):
FL = 0.414
FL + 0.293FC + 0.293BL

Hope that inspires !

Source: Audio encoding @ Doom9's Forum

@MarcoRavich
Copy link

It would be extremely interesting (and almost unique) to implement a "replaygain-based" normalization.

Some interesting resources:
https://github.com/Moonbase59/loudgain

(python)
https://github.com/kepstin/regainer
https://github.com/chaudum/rgain3

Hope that inspires !

@MarcoRavich
Copy link

MarcoRavich commented Apr 14, 2024

Bump.

Just found some interesting infos about Audacity approach...

Audacity's loudness normalization (which unlike normalization considers more than peaks) modifies your audio data in the source file. It offers two options, "perceived loudness" (default) and RMS:

  • perceived loudness: the default -23 LUFS (the EBU standard) will produce audio that is approximately 25% of full scale.
  • RMS: This will change the amplitude such that the result has the desired RMS level The default setting is -20dB which will also produce low level audio. Both LUFS and RMS normalization ensures that different audio projects come out at a relatively uniform volume.

Maybe is possible to implement those through FFMPEG...
...I'll ask in other places, stay tuned.

EDIT:
Thanks to @Type-Delta's FFnorm here are some FFMPEG normalization commandlines:

  • getting audio loudness
    ffmpeg -hide_banner -i audio.wav -af ebur128=framelog=verbose -f null - 2>&1 | awk "/I:/{print $2}"''
  • getting audio bitrate
    ffprobe -v fatal -select_streams a:0 -show_entries stream=bit_rate -of compact=p=0:nk=1 audio.wav"
  • modifying audio Gains
    ffmpeg -hide_banner -y -i input.wav -movflags use_metadata_tags -map_metadata 0 -id3v2_version 3 -q:a (QSCALE) -af "volume=(GAIN)dB" -id3v2_version 3 -b:a (BITRATE) -c:v copy ouput.wav

EDIT2:
ffmpeg-normalize is another cool (python) FFMPEG normalization utility by @slhck which supports Normalization and EBU R128 Normalization

EDIT3:
@Piklesh's auto-loudnorm performs this 2pass normalization:

  • First manual execution
    ffmpeg -i "my_folder/audio_file.ogg" -af loudnorm=I=-16:dual_mono=true:TP=-1.5:LRA=11:print_format=summary -f null -
  • Second manual execution using the audio information returned
    ffmpeg -i "my_folder/audio_file.ogg" -af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-27.2:measured_TP=-14.4:measured_LRA=0.1:measured_thresh=-37.7:offset=-0.5:linear=true:print_format=summary "my_folder/audio_file_normalized.ogg"

Hope that inspires !

@slhck
Copy link

slhck commented Apr 14, 2024

Thanks for the mention. The only thing I want to leave here is that 2-pass normalization is the preferred way of doing it. Essentially you just parse the first pass output and apply it to the second pass. The complex part is handling multiple audio streams, channel layouts (as mentioned above), metadata, etc., and the fact that not all files and settings will yield nice results. Since this project uses Python it may be a simple call to ffmpeg-normalize's API which deals with most obvious things.

@Type-Delta
Copy link

Type-Delta commented Apr 15, 2024

Thanks for the mention here!

Just want to point out that my commands are a mess. :p

the preferred way is to use 2-pass normalization (as @slhck
suggested) as it allows for more control over how normalization is applied.

I my case, FFnorm uses a simpler audio gain command for faster/simpler manual normalization (I couldn't get 2-pass normalization to work as fast) as it mainly focuses on normalizing hundreds of media file.

the actual normalizationaudio gain command is (EDIT: this is not a normalization command, all it does is simply apply gain to the audio stream, the gain is calculated manually by FFnorm)
ffmpeg -i input.wav -q:a (QSCALE) -af "volume=(GAIN)dB" ouput.wav

the other stuff is just me trying to preserved as much metadata
as possible.

Also, please don't use -q option with -b (at least not on the same stream) -q allows for variable bitrate while -b tries to match output bitrate to the bitrate specified.

In the actual code I use either of them depends on the situation.

Sorry if I make any mistakes, I'm still a noob when comes to FFmpeg commands.

@MarcoRavich
Copy link

MarcoRavich commented Apr 15, 2024

Hi everyone, thanks for all clarifications you've bringed here, I think they will be useful for @cdgriffith's software.

1st of all: I believe it's clear for all that normalization could be performed only AFTER the channels downmixing.

According to @cpuimage's FFmpeg_Loudness Overview:

Double pass is ideal for postprocessing, produces better results but slower. With double pass first you scan the media file you want to normalize and then apply target loudness parameters with the measured values.

So I would choose it by default (the first pass could be performed by default if the source is stereo or right after the downmix process, even without asking users) but, for normalization algorithms, the most reasonable approach might be implementing only some "commonly used" (as channel downmixing) ones.

Here are some standards listed by @nlebedevinc in his Audio normalization git:

There are several common audio normalization standards used in the industry. Here are a few examples:

  • Peak Normalization: Peak normalization adjusts the audio so that the highest peak in the waveform reaches a specified level, often 0 dB. This method simply amplifies or attenuates the audio uniformly, without considering the overall loudness perception.

  • RMS Normalization: Root Mean Square (RMS) normalization measures the average power of the audio signal over time and adjusts the gain to achieve a desired RMS level. RMS normalization takes into account the overall loudness of the audio, making it more suitable for achieving a consistent perceived loudness.

  • LUFS/LKFS Normalization: Loudness Units Full Scale (LUFS), also known as Loudness K-weighted Full Scale (LKFS), is a widely adopted standard for loudness normalization. It measures the perceived loudness of audio content using specific weighting filters to approximate human hearing. LUFS normalization ensures consistent loudness levels across different audio tracks and platforms, such as broadcasting and streaming services. EBU R 128 is a recommendation based on LUFS normalization.

  • ITU-R BS.1770: ITU-R BS.1770 is a standard developed by the International Telecommunication Union (ITU) that provides guidelines for measuring and normalizing audio program loudness. It specifies algorithms for measuring short-term loudness (momentary), integrated loudness (average over a defined duration), and loudness range (LRA) to achieve consistent loudness levels.

  • AES Streaming Loudness Recommendation (AES-41): AES-41 is an audio loudness recommendation developed by the Audio Engineering Society (AES) for streaming and online delivery. It focuses on delivering audio content with consistent loudness levels across various platforms and devices.

It's important to note that different platforms, industries, and countries may have specific requirements or variations in audio normalization standards. It's recommended to check the guidelines and specifications of the target platform or medium to ensure compliance with their specific standards.

Some has been already implemented (as XML) by @hz37 in his r128v3 project:
https://github.com/hz37/r128v3/tree/master/Win32/Release/presets

Hope this knowledge exchange will benefit all projects.

OT
Last but not least, I've collected some (mostly FFMPEG-based) normalization here for my HyMPS project: enjoy.

@slhck
Copy link

slhck commented Apr 15, 2024

That quoted explanation (and the entire repository) seems to have been created by ChatGPT, so take that with a grain of salt. It's at least a bit misleading in terms of what it counts as different "standards" — ITU-R BS.1770 is what simply defines LUFS and common target levels, and EBU R 128 is implementing BS.1770 for its underlying measurements. The de-facto industry standard is R 128.

@MarcoRavich
Copy link

That quoted explanation (and the entire repository) seems to have been created by ChatGPT

You're probably right, anyway it gives an idea of how many approaches to audio normalization has been developed.

ITU-R BS.1770 is what simply defines LUFS and common target levels, and EBU R 128 is implementing BS.1770 for its underlying measurements.

...does this means that 1st-pass (detection/measurement) can be the same for all ?

The de-facto industry standard is R 128.

I see.
So what you suggest to implement ?
Peak, RMS and R128 ?

Thanks for your efforts !

note: there's another interesting Command line helper for performing audio loudness normalization with ffmpeg's loudnorm audio filter that aims to be a simpler alternative to the ffmpeg-normalize Python script 'cause it Performs the loudness scanning pass of the given file and outputs the string of desired loudnorm options to be included in ffmpeg arguments (by @indiscipline).

@slhck
Copy link

slhck commented Apr 15, 2024

I see. So what you suggest to implement ? Peak, RMS and R128 ?

All of them, ideally, because users may want to choose one over the other for particular use cases.

there's another interesting Command line helper for performing audio loudness normalization

This one is quite simple, yes, and should do the job, but it does not check for the various edge cases in terms of option thresholds etc.

Essentially it's similar to the ffmpeg-supplied one: https://github.com/FFmpeg/FFmpeg/blob/master/tools/normalize.py

@MarcoRavich
Copy link

All of them, ideally, because users may want to choose one over the other for particular use cases.

So you mean: Peak, RMS, R128, BS.1770 and AES-41 ?

Anyway I would manage them as external (XML ?) files like @hz37 - that is working on dynadapt - did.

@slhck
Copy link

slhck commented Apr 15, 2024

No, I meant Peak, RMS, and R 128. Both peak and RMS are super easy to compute and implement (just beware of clipping), and R 128 uses BS.1770 under the hood. (AES41 has nothing to do with normalization; it's a standard for embedding of metadata. I guess you can thank ChatGPT for causing the confusion.)

@MarcoRavich
Copy link

(AES41 has nothing to do with normalization; it's a standard for embedding of metadata. I guess you can thank ChatGPT for causing the confusion.)

You're right, here's the AES page about normalization where - in the Normalization Targets section - they refers to ANSI A/85, EBU R128 and AES71 standards.
Their latest "Recommendations for Loudness" seems to be the TD1008 instead.

Anyway Peak, RMS, and R 128 seems to be sufficient for all.

Thanks again.

@MarcoRavich
Copy link

@slhck: there are two different presets in r128v3...

EBU R128:

<R128preset License="CC Attribution-NonCommercial 4.0 International" Programming="Hens Zimmerman Audio">
  <AlternativeBoundaryMode>0</AlternativeBoundaryMode>
  <ChannelCoupling>1</ChannelCoupling>
  <CompressFactor>0</CompressFactor>
  <DCBiasCorrection>0</DCBiasCorrection>
  <FrameLength>500</FrameLength>
  <GaussFilterSize>31</GaussFilterSize>
  <Loudness>-23</Loudness>
  <LoudnessRange>20</LoudnessRange>
  <MaxGainFactor>10</MaxGainFactor>
  <Memo>EBU R128 European broadcast - Hens Zimmerman Audio - 05-10-2018</Memo>
  <ProcessingMode>2</ProcessingMode>
  <TargetPeak>95</TargetPeak>
  <TargetRMS>0</TargetRMS>
  <Timestamp>October 06, 2018 00:00:57</Timestamp>
  <TruePeak>-1</TruePeak>
  <User>hensz</User>
</R128preset>

And EBU R128 conservative:

<R128preset License="CC Attribution-NonCommercial 4.0 International" Programming="Hens Zimmerman Audio">
  <AlternativeBoundaryMode>0</AlternativeBoundaryMode>
  <ChannelCoupling>1</ChannelCoupling>
  <CompressFactor>0</CompressFactor>
  <DCBiasCorrection>0</DCBiasCorrection>
  <FrameLength>500</FrameLength>
  <GaussFilterSize>31</GaussFilterSize>
  <Loudness>-23</Loudness>
  <LoudnessRange>8</LoudnessRange>
  <MaxGainFactor>10</MaxGainFactor>
  <Memo>EBU R128 European broadcast - Conservative setting! Hens Zimmerman Audio - 05-10-2018</Memo>
  <ProcessingMode>2</ProcessingMode>
  <TargetPeak>95</TargetPeak>
  <TargetRMS>0</TargetRMS>
  <Timestamp>October 06, 2018 00:01:34</Timestamp>
  <TruePeak>-2</TruePeak>
  <User>hensz</User>
</R128preset>

...do you think are both useful ?

The other interesting @hz37's preset approach is the Loudness check one: it switches the normalizer into loudness checker by just setting the AlternativeBoundaryMode parameter...

...dunno if @cdgriffith is interested in implementing a dedicated panel to let users customize all those parameters (would be great for flexibility, even if almost useless for many), but "injecting" such presets into FF's config file shouldn't be that difficult.

@slhck
Copy link

slhck commented Apr 16, 2024

I think it would be fine to keep the default ffmpeg settings and allow the user to override the individual filter values. The XML preset options are also nice but require a different backend as far as I can tell. I am not passionate about this, in any case I am just weighing in here without really contributing :)

@MarcoRavich
Copy link

MarcoRavich commented Apr 16, 2024

I am just weighing in here without really contributing :)

Same here (since I HAVE NO LONGER CODED for last 25 years), but I think discussing these techniques/approaches is a contribution - not just for FF - itself.

Let's wait for the @cdgriffith's opinion about.

Thanks

@cdgriffith
Copy link
Owner

A lot of these options sound cool, but way past any scope I envisioned for FastFlix I must admit!

For options that could be added as audio filters, I am currently working towards being able to add those, with #551 finally giving a dedicated audio conversion popup, where those could live.

audio_popup

Things that would need multiple passes would preset an issue, if the video encode isn't set to two pass as well, as FastFlix is heavily video first still (as in how the internal code is driven.)

If there are ways I can incorporate any of these as check boxes and just drop them into the ffmpeg command (or nvencc if possible) would be ideal. Best way to help with that is give straight forward ffmpeg examples! (Like @Type-Delta -af "volume=(GAIN)dB" I did not know about and could be an easier add.)

Anything requiring a full page is not out of the question from the "could we ever see it in FastFlix?" side, but sadly is way outside the time I have to put towards this project myself, and would have to be done by other contributors.

@MarcoRavich
Copy link

MarcoRavich commented Apr 17, 2024

1st of all, an heartfelt thank to @cdgriffith who confirms to be a developer that carefully listens to the community's needs/opinions.

Methods aside, the most important thing about audio normalization is to restrict its application to the "right" signals: it could be done on stereo ones, but not on multichannel ones.
That's why I believe that it should be selectable (otherwise it should remain greyed) only if the source stream is - or the user has enabled the downmix to - stereo.

About multipass normalization: well, if it doesn't upset the current coding chain too much, the 1st step - analysis/measurements - should be performed BEFORE (as the 2nd after) the video stream encoding.
To be clear:

flowchart TD;
    A[input file audio stream] --> B{is stereo ?};
    B -- Yes --> C[audio analysis - 1st pass];
    B -- No --> D[perform downmix];
    D-->B;
    C-->E[video encoding];
    E-->F[audio normalization - 2nd pass];
    F-->G[audio encoding]
    G-->H[mux a/v streams];
Loading

(note: flowcharting through markup is fun !)

Last but not least, yesterday I've "coded" (with ChatGPT help) this batch script which performs the FFMPEG's 2pass-audio-normalization correctly:

setlocal EnableDelayedExpansion

rem Check if at least one file was dragged onto the script icon
if "%~1"=="" (
    echo Error: No input file dragged.
    pause
    exit /b 1
)

rem Get the full path of the first dragged file
set "input_file=%~1"

rem Check if the dragged file exists
if not exist "%input_file%" (
    echo Error: Specified input file does not exist.
    pause
    exit /b 1
)

rem Extract the file name without extension and the output directory
for %%I in ("%input_file%") do (
    set "input_name=%%~nI"
    set "output_dir=%%~dpI"
)

rem Show available audio streams in the input file
echo List of audio streams in file %input_file%:
ffmpeg -i "%input_file%" 2>&1 | findstr "Stream"

rem Ask the user to select the audio stream for measurement and processing
set /p "selected_stream=Enter the audio stream number to process (e.g., 0 for the first stream): "

echo FFMPEG is analyzing audio stream %selected_stream% of file %input_file%...

rem Execute FFmpeg to analyze the input file and save measured values in a text file
ffmpeg -drc_scale 0 -i "%input_file%" -map 0:a:%selected_stream% -af loudnorm=I=-23:LRA=7:tp=-2:print_format=summary -f null - 2>&1 | findstr /R /C:"Input Integrated:" /C:"Input LRA:" /C:"Input True Peak:" /C:"Input Threshold:" > "%output_dir%%input_name%.msr"

echo Analysis completed.

rem Read the measured values from the .msr file and store them in variables
set "I="
set "tp="
set "LRA="
set "thresh="

for /f "tokens=1,* delims=:" %%a in ('type "%output_dir%%input_name%.msr"') do (
    set "measurement=%%a"
    set "value=%%b"
    
    rem Trim leading and trailing spaces from measurement and value variables
    for /f "tokens=* delims=" %%x in ("!measurement!") do set "measurement=%%x"
    for /f "tokens=* delims=" %%y in ("!value!") do set "value=%%y"
    
    rem Extract the numeric value and measurement unit
    for /f "tokens=1,*" %%v in ("!value!") do (
        set "numeric_value=%%v"
        set "unit=%%w"
    )

    rem Assign the measured values to corresponding variables
    if "!measurement!"=="Input Integrated" set "I=!numeric_value!"
    if "!measurement!"=="Input True Peak" set "tp=!numeric_value!"
    if "!measurement!"=="Input LRA" set "LRA=!numeric_value!"
    if "!measurement!"=="Input Threshold" set "thresh=!numeric_value!"
)

rem Display the acquired measured values from the .msr file
echo.
echo Measured values:
echo Input Integrated: !I! LUFS
echo Input True Peak: !tp! dBTP
echo Input LRA: !LRA! LU
echo Input Threshold: !thresh! LUFS
echo.

rem Ask for user confirmation before proceeding with audio normalization
set /p "proceed=Press Enter to proceed with EBU/R128 audio normalization using FFMPEG or CTRL+C to cancel..."

rem Execute FFmpeg with the measured values read from the .msr file
ffmpeg -drc_scale 0 -i "%input_file%" -map 0:a:%selected_stream% -af loudnorm=I=-23:LRA=7:tp=-2:measured_I=!I!:measured_LRA=!LRA!:measured_tp=!tp!:measured_thresh=!thresh! -acodec pcm_f32le -y "%output_dir%%input_name%.wav"

rem Display a success message if FFmpeg execution was successful
if %ERRORLEVEL% equ 0 (
    echo Normalization completed successfully.
) else (
    echo An error occurred during conversion.
)

rem Remove the temporary .msr file
del "%output_dir%%input_name%.msr"

echo.
echo The original file (%input_file%) has not been altered.
echo The audio has been extracted, converted, and normalized as: %output_dir%%input_name%.wav
pause
exit /b 0

(note1: the -drc_scale 0 parameter is for better - aka full range - AC3 decoding only, so FFMPEG automatically skips it for other audio codecs
note2: when a lossy-to-lossy transcode and/or volume change is performed, I always prefer to "transit" through the 32bit floating point audio representation in order to minimize - as mutch as possible - the losses)

Let me know, in the end, if you need me to modify it to replicate - video coding aside - the flowchart above.

Hope that helps/inspires !

EDIT: interesting - and simple - page about Audio normalization with FFmpeg

@indiscipline
Copy link

indiscipline commented Apr 26, 2024

This one is quite simple, yes, and should do the job, but it does not check for the various edge cases in terms of option thresholds etc.

Not quite sure what you mean by option thresholds? I'd be happy to look into adding additional features to my program.

Essentially it's [ffmpeg-loudnorm-helper] similar to the ffmpeg-supplied one: https://github.com/FFmpeg/FFmpeg/blob/master/tools/normalize.py

Oh, the ffmpeg script has been recently updated (after more than 10 years), haven't seen the new version yet. It's still not as user-friendly as ffmpeg-loudnorm-helper and has most options hardcoded. It also performs the conversion which is against separation of concerns principle and permits integrating into dynamic workflows. The script is more of a usage example than a tool.


Regarding the script above: there's no point in creating a temporary wav file with a sole intention of converting it further to a target format, unless some manual processing is required. The measurements should just be integrated as an additional filter processing for the encoding pass.

Regarding the flowchart, I suppose the correct approach would be to process audio in parallel. Moreover, it would be convenient for a proper user-facing tool to perform the loudness measurement in the background as soon as the option is selected, as it's not exactly a fast process. The results can be cached as they won't change unless the input file changes.

ffmpeg's loudnorm filter is rather capricious and falls back to dynamic normalization quickly, which is often undesirable, as it changes the dynamics of the audio and may introduce unwanted loudness fluctuations around drastic changes in loudness in the source material. On the other hand, it usually performs good enough and having a standard tool with more or less predictable behaviour is better than learning the nuances and corner cases of a novel reimplementation. Since FastFlix seems to be built around ffmpeg it's only natural to stick to its normalization capabilities.

@slhck
Copy link

slhck commented Apr 27, 2024

This one is quite simple, yes, and should do the job, but it does not check for the various edge cases in terms of option thresholds etc.

Not quite sure what you mean by option thresholds? I'd be happy to look into adding additional features to my program.

ffmpeg's loudnorm filter is rather capricious and falls back to dynamic normalization quickly, which is often undesirable, as it changes the dynamics of the audio and may introduce unwanted loudness fluctuations around drastic changes in loudness in the source material.

This is what I was referring to. Basically trying to detect when it would fall back to dynamic mode based on the measured variables and user-set thresholds, and at least warning the user that dynamic mode will be used. It's definitely not required and complicates the code, but some users have asked for it.

@indiscipline
Copy link

Basically trying to detect when it would fall back to dynamic mode based on the measured variables and user-set thresholds, and at least warning the user that dynamic mode will be used.

In this case you weren't completely correct as ffmpeg-loudnorm-helper does check that, although the check is as basic as it gets. It just computes the available threshold (delta between the current and requested TP levels) and sees if it's enough of a headroom for the requested linear Integrated Loudness change. It does warn the user in case there's no headroom but I haven't done any rigorous testing to determine if it catches all the cases of ffmpeg switching to Dynamic Normalization. The way to make it solid is to just copy the way ffmpeg itself triggers the switch.

@MarcoRavich
Copy link

MarcoRavich commented Apr 29, 2024

Hi everybody, just found this (GPL-licensed) standalone audio batch processing software that embeds:

  • an automatic RMS normalizer,
  • an Automatic Gain Controller (compressor)
  • a 10 bands automatic equalizer
  • a 3 bands semi-parametric equalizer
  • a file splitter
  • a noise gate
  • preset management
  • a preview function.

Screenshots:




Code: https://sourceforge.net/p/lastar/code/HEAD/tree/
Website: http://www.arthelion.com/index.php/en/windows-en/lastar

Since I'm not a developer, I honestly don't know if/how it can effectively help, but - as a multimedia content creator/editor/publisher - I find that its offered functions are quite interesting (= would like to see them implemented in encoder GUIs like FastFlix).

Hope that inspires.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New idea for project
Projects
None yet
Development

No branches or pull requests

6 participants