Skip to content

Latest commit

 

History

History
354 lines (235 loc) · 28.9 KB

combined-adaptive-compute-ambient-occlusion.md

File metadata and controls

354 lines (235 loc) · 28.9 KB

FidelityFX Combined Adaptive Compute Ambient Occlusion (CACAO) 1.4

Combined Adaptive Compute Ambient Occlusion (or CACAO for short) is a highly optimised adaptation of the Intel(R) ASSAO screen space ambient occlusion implementation [ASSAO-16].

CACAO provides 5 quality levels for SSAO generation (FFX_CACAO_QUALITY_LOWEST, FFX_CACAO_QUALITY_LOW, FFX_CACAO_QUALITY_MEDIUM, FFX_CACAO_QUALITY_HIGH, FFX_CACAO_QUALITY_HIGHEST), the last of which uses an adaptive approach.

alt text

Shading language requirements

HLSL GLSL CS_6_0

Note that the GLSL compiler must also support GL_EXT_samplerless_texture_functions and GL_GOOGLE_include_directive for #include handling used throughout the GLSL shader system.

Integration guidelines

Two matrices (projection, normalsToView) are required for CACAO to operate. The depth buffer is required as input, with normals being an optional input or otherwise computed from the depth buffer. The output is a one channel texture of ambient occlusion (AO) values.

A constant buffer needs to be filled with relevant values. Many values should be left as is in the provided implementation. Some values will be needed when integrating the effect. This can be due to different resolutions, different camera matrices, or altered settings. Such values are shown with a Y in the Modify column. Values shown with an N in the Modify column will normally be left as they are in the provided implementation.

Modify Element name Type Description
Y DepthUnpackConsts float2 Multiply and add values for clip to view depth conversion.
Y CameraTanHalfFOV float2 $tan({fov \over 2})$ for the x and y dimensions.
Y NDCToViewMul float2 Multiplication value for normalized device coordinates (NDC) to View conversion.
Y NDCToViewAdd float2 Addition value for NDC to view conversion.
Y DepthBufferUVToViewMul float2 Multiplication value for the depth buffer's UV to View conversion.
Y DepthBufferUVToViewAdd float2 Addition value for the depth buffer's UV to view conversion.
Y EffectRadius float The radius in world space of the occlusion sphere. A larger radius will make further objects contribute to the ambient occlusion of a point.
Y EffectShadowStrength float The linear multiplier for shadows. Higher values intensify the shadow.
Y EffectShadowPow float The exponent for shadow values. Larger values create darker shadows.
Y EffectShadowClamp float Clamps the shadow values to be within a certain range.
Y EffectFadeOutMul float Multiplication value for effect fade out. $EffectFadeOutMul = {-1 \over fadeOutTo - fadeOutFrom}$.
Y EffectFadeOutAdd float Addition value for effect fade out. $EffectFadeOutAdd = {fadeOutFrom \over (fadeOutTo - fadeOutFrom)} + 1$.
Y EffectHorizonAngleThreshold float Minimum angle necessary between geometry and a point to create occlusion. Adjusting this value helps reduce self-shadowing.
N EffectSamplingRadiusNearLimitRec float Default: $EffectRadius * 1.2$ . See implementation for details.
N DepthPrecisionOffsetMod float Default: $0.9992$. Offset used to prevent artifacts due to imprecision.
Y NegRecEffectRadius float Set to: $-1 \over EffectRadius$
N LoadCounterAvgDiv float Set to: $9 \over importanceMapWidth * importanceMapHeight * 255.0$
Y AdaptiveSampleCountLimit float Limits the total number of samples taken at adaptive quality levels.
Y InvSharpness float Set to $1 \over sharpness$. The sharpness controls how much blur should bleed over edges.
Y BlurNumPasses int Default is $4$. On lowest quality level default is $2$.
Y BilateralSigmaSquared float Only affects downsampled SSAO. Higher values create a larger blur.
Y BilateralSimilarityDistanceSigma float Only affects downsampled SSAO. Lower values create sharper edges.
N PatternRotScaleMatrices float4[4][5] Used for the sampling pattern. See implementation for details.
Y NormalsUnpackMul float Multiplication value to unpack normals. Set to $1$ if normals are already in $[-1, 1]$ range.
Y NormalsUnpackAdd float Addition value to unpack normals. Set to $0$ if normals are already in $[-1, 1]$ range
Y DetailAOStrength float Adds in more detailed shadows based on edges. These are less temporally stable.
Y SSAOBufferDimensions float2 Dimensions of SSAO buffer.
Y SSAOBufferInverseDimensions float2 $1 \over SSAOBufferDimensions$
Y DepthBufferDimensions float Dimensions of the depth buffer.
Y DepthBufferInverseDimensions float $1 \over DepthBufferDimensions$
Y DepthBufferOffset int2 Default is $(0, 0)$.
N PerPassFullResUVOffset float4[4] See implementation.
Y InputOutputBufferDimensions float2 Dimensions of the output AO buffer.
Y InputOutputBufferInverseDimensions float2 $1 \over InputOutputBuffer$.
Y ImportanceMapDimensions float2 Dimensions of the importance map.
Y ImportanceMapInverseDimensions float2 $1 \over ImportanceMapDimensions$.
Y DeinterleavedDepthBufferDimensions float2 Dimensions of the deinterleaved depth buffer.
Y DeinterleavedDepthBufferInverseDimensions float2 $1 \over DeinterleavedDepthBufferDimensions$.
Y DeinterleavedDepthBufferOffset float2 Default is $0$.
Y DeinterleavedDepthBufferNormalisedOffset float2 Default is $0$.
Y NormalsWorldToViewspaceMatrix mat4 Normal matrix.

The technique

Algorithm structure

The FidelityFX CACAO algorithm is comprised of several passes which are configured in different ways depending on the variant of the FidelityFX CACAO algorithm that is being used.

invert

The table below summarizes which passes of the FidelityFX CACAO algorithm are present in the different configurations one might choose to operate the algorithm with. Depending on the desired performance level, the level of quality may be adjusted. By adjusting the quality level, some passes which constitute the effect will be omitted.

In the table, a tick in the box denotes that the pass is present while a cross means that the pass is omitted. In all configurations, FidelityFX CACAO integrations should execute the passes in the order shown by the diagram shown above.

In addition to configuring the quality level, FidelityFX CACAO has an other option which allows the algorithm to run at scaled-down resolution. If this option is selected, an additional bilateral upsample will be performed as a final step in the algorithm. This is also illustrated in the rows of the table below.

Quality mode Native Prepare Generate SSAO Create importance map Generate adaptive SSAO Edge aware blur Apply Bilateral upsample
FFX_CACAO_QUALITY_LOWEST alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_LOW alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_MEDIUM alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_HIGH alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_HIGHEST alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_LOWEST alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_LOW alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_MEDIUM alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_HIGH alt text alt text alt text alt text alt text alt text alt text alt text
FFX_CACAO_QUALITY_HIGHEST alt text alt text alt text alt text alt text alt text alt text alt text

Prepare stage

The prepare stage transforms rendering data - such as depth and normal buffers - provided in the conventional formats into a more optimized data layout for consumption for the rest of the passes.

For all quality settings, this means generating a de-interleaved version of the depth buffer and normal buffers. Depending on the quality level selected, FidelityFX CACAO may also generate a mipmap chain for the de-interleaved depth buffers. This is done using FidelityFX SPD [SPD-19].

alt text

If the FidelityFX CACAO algorithm is operating at the FFX_CACAO_QUALITY_LOWEST quality mode, instead of generating four buffers (each with half resolution in each dimension), the algorithm will instead generate just two buffers (again at half resolution in each dimension), effectively discarding 50% of the input data from further consideration. Moreover, when operating at a downscaled resolution, the prepare pass will also generate lower resolution de-interleaved buffers (quarter resolution in each dimension, instead of half resolution in each dimension).

Please note: While this stage of the algorithm is implemented as two separate dispatches, they do not share any data. Therefore no pipeline barriers are required between the two dispatches that form the prepare pass.

The following tables describe the compute shader entry points that should be used depending on your resolution and quality mode. Depending on the resolution and quality mode, you should select an appropriate main function for the compute shader used in the prepare depth and prepare normals dispatches.

Depth preparation entry points

Depth preparation entry point Resolution Quality mode
FFX_CACAO_PrepareNativeDepthsAndMips Native FFX_CACAO_QUALITY_MEDIUM or above.
FFX_CACAO_PrepareDownsampledDepthsAndMips Downsampled FFX_CACAO_QUALITY_MEDIUM or above.
FFX_CACAO_PrepareNativeDepths Native FFX_CACAO_QUALITY_LOW
FFX_CACAO_PrepareDownsampledDepths Downsampled FFX_CACAO_QUALITY_LOW
FFX_CACAO_PrepareNativeDepthsHalf Native FFX_CACAO_QUALITY_LOWEST
FFX_CACAO_PrepareDownsampledDepthsHalf Downsampled FFX_CACAO_QUALITY_LOWEST

Normal preparation entry points

Normal preparation entry point Resolution Application normals provided
FFX_CACAO_PrepareNativeNormalsFromInputNormals Native alt text
FFX_CACAO_PrepareDownsampledNormalsFromInputNormals Downsampled alt text
FFX_CACAO_PrepareNativeNormals Native alt text
FFX_CACAO_PrepareDownsampledNormals Downsampled alt text

Resource inputs

The following table describes the inputs to the prepare process.

Name Type Notes
Application's depth buffer Depth buffer A depth buffer generated during the rendering of the scene. FidelityFX CACAO can support both a traditional Z buffer, as well as reverse Z.
[Optional] Application's normal buffer Normal buffer An optional buffer containing normals which have been generated during the rendering of the scene. If you choose not to provide this buffer, FidelityFX CACAO will generate a normal buffer from the depth buffer that has been provided. It achieves this by calculating an implied normal from the partial derivatives of a neighborhood of pixels in the depth buffer. The format of the normal buffer can be modified by changing FFX_CACAO_Prepare_LoadNormal during the integration process.

Resource outputs

The following table describes the outputs which are computed by the prepare process.

Name Type Notes
De-interleaved depth buffer R16_SFLOAT texture A depth buffer generated during the rendering of the scene.
De-interleaved depth MIP chain R16_SFLOAT texture A MIP chain containing a filtered set of de-interleaved depth buffers. NOTE: This is only generated at FFX_CACAO_QUALITY_MEDIUM quality or higher.
De-interleaved normal buffer R8B8B8A8_SNORM texture A de-interleaved normal buffer is generated using the partial derivatives of the depth buffer when no normal buffer is passed as an input.

Description

The process of de-interleaving is identical for both the depth and normal buffers, and is shown in the diagram below. Each group of 2x2 pixels is considered and separated into four separate textures, each a quarter of the resolution of the original input. The reason for this is to improve the efficiency of the cache hierarchy present in the GPU.

invert

In the diagram above, each square present in the image to the left represents a single pixel. You can see that each set of 2x2 pixels contains four unique colors.

Turning now to the right hand side of the diagram, we can see that pixels of each color are collected into their own textures, effectively creating four very similar downsampled textures from the original.

If FFX_CACAO_QUALITY_LOWEST is used, then 50% of the input pixels are discarded during the preparation pass. This is done by discarding the top right and bottom left pixels in each 2x2 grid. As one might expect, this does translate into a noticeable degradation in the resulting quality of the AO, but delivers a substantial improvement in the level of performance.

Generate SSAO (non-adaptive)

The generate SSAO stage calculates obscurance values, as well as detecting edges which are used in the subsequent edge aware blurring pass. Obscurance values encode the probability that a pixel is obscured by neighboring geometry (as reconstructed from the depth and normal buffers passed to FidelityFX CACAO) and are stored in the red channel of the output texture of the generate SSAO pass. The edge values are encoded with 2 bits per cardinal direction (north, east, south, and west). The edge values are determined by the strength of the depth discontinuity between the current pixel in the cardinal direction to the next pixel.

alt text

Resource inputs

Name Type Notes
De-interleaved depth MIP chain R16_SFLOAT texture The de-interleaved depth buffer generated during the prepare pass. If you are using FFX_CACAO_QUALITY_MEDIUM quality or higher, then you should provide the de-interleaved depth buffer complete with a MIP chain. See prepare pass for more details about the MIP chain generation.
De-interleaved normal buffer RGB888 normal buffer The normal buffer generated by the prepare pass.

Resource outputs

Name Type Notes
Intermediate target RG88 texture An intermediate render target with obscurance values in the red channel, and edge values in the green channel.

Description

For each pixel, the depth and normal values are sampled in a rotationally symmetric pattern around the pixel (see the diagram below). At higher quality levels, FidelityFX CACAO will sample depth values from multiple MIP levels. The sampling pattern is scaled depending on the depth of the pixel. The sampling pattern is rotated for neighboring pixels. For each pixel that is sampled, FidelityFX CACAO calculates an obscurance value. The final obscurance value for each pixel is a weighted average of all obscurance values from the samples.

alt text

The calculated obscurance value for a pixel with position p and normal n from a sample at position q is as follows.

alt text

The obscurance terms are the cosine of the angle between the hit direction and the normal, multiplied by a falloff which increases with the square of the distance between the pixel and the sample.

Generate adaptive SSAO, part 1

At adaptive quality levels, the purpose of the initial generate SSAO pass serves a slightly different purpose.

While the base pass calculates SSAO in the same way as the non-adaptive pass, it will exit early after writing untransformed obscurance values, as well as skipping the edge detection calculations. The adaptive SSAO generation takes additional inputs (the importance map, load counter, and output from the base pass), and then performs a variable number of additional samples after the base pass based on the computed importance for the location given by the importance map.

alt text

Resource inputs

Name Type Notes
De-interleaved depth mipmap chain R16_SFLOAT texture The de-interleaved depth buffer generated during the Prepare pass. If you are using FFX_CACAO_MEDIUM quality or higher, then you should provide the de-interleaved depth buffer complete with a mipmap chain. See Prepare pass for more details about the mipmap chain generation.
De-interleaved normal buffer R8G8B8A8_SNORM texture A de-interleaved normal buffer is generated using the partial derivatives of the depth buffer when no normal buffer is passed as an input.

Resource outputs

Name Type Notes
Intermediate target R8G8_UNORM An intermediate render target where the red channel contains the obscurance values.

Description

Same as the generate SSAO (non-adaptive) pass, but early exits after writing untransformed obscurance values and skipping the edge detection calculations.

Importance map generation

In adaptive quality, after the SSAO base pass has been run, an importance map is generated to determine where to use most samples in the final effect.

alt text

Resource inputs

Name Type Notes
Base Pass SSAO R8G8_UNORM The intermediate texture from the SSAO base pass containing obscurance values.

Resource outputs

Name Type Notes
Importance map R8_UNORM Each importance value in the importance map corresponds to an 8x8 square of SSAO values, and the importance is set to the difference between the minimum and maximum values in that square. The importance map is then blurred to avoid sharp transitions from important to unimportant areas.
Load Counter. R32_UINT Counter containing total importance sum.

Description

For each 8x8 square of the base pass SSAO obscurance values, the difference between the min and max values are computed. This is then blurred to create smoother transitions from areas of high importance to low importance.

Generate adaptive SSAO, part 2

alt text

Resource inputs

Name Type Notes
De-interleaved depth buffer. R16_FLOAT The de-interleaved depth buffer generated from the input depth buffer in the prepare pass.
De-interleaved normal buffer. R8G8B8A8_FLOAT The de-interleaved normal buffer generated from the input normal buffer in the prepare pass, or, generated from the depth buffer.
Base pass SSAO R8G8_UNORM The intermediate texture from the SSAO base pass containing obscurance values.
Importance map. R8_UNORM The blurred importance map.
Load Counter. R32_UINT Counter used to calculate the average total importance.

Resource outputs

Name Type Notes
SSAO Buffer R8G8_UNORM The output SSAO buffer containing the transformed obscurance values as well as edge values.

Description

For each pixel, extra samples of the depth and normal values are taken. This is done by sampling depths in a rotationally symmetric pattern around the pixel, effectively continuing from where it left off in the base pass. The amount of extra samples taken is based on the importance value stored in the importance map. For each pixel, CACAO computes an obscurance value per sample, combines this with the previously stored untransformed obscurance values from the base pass SSAO. The final obscurance value for each pixel is the weighted average of all the obscurance values from the base pass and this pass combined.

alt text

The calculated obscurance value for a pixel with position p and normal n from a sample at position q is as follows.

alt text

The obscurance terms are the cosine of the angle between the hit direction and the normal, multiplied by a falloff which increases with the square of the distance between the pixel and the sample.

Edge-aware blur

alt text

Resource inputs

Name Type Notes
Generated SSAO texture w/ edges R8G8_UNORM The non-blurred SSAO texture containing obscurance values and edges.

Resource outputs

Name Type Notes
Blurred SSAO texture w/ edges R8G8_UNORM The output SSAO buffer containing blurred obscurance values.

Description

The edge sensitive blur is applied after SSAO generation to help remove noise created by the random sampling. The blur has a 3x3 kernel, where each pixel is weighted by its edge value. The blur may be run for between 0 and 8 passes to effectively create a wider kernel.

Application

The final stage for the non-downsampled quality levels.

alt text

Resource inputs

Name Type Notes
De-interleaved SSAO textures R8G8_UNORM A texture containing the blurred obscurance and edge values generated by either the edge-aware blur pass, or the generate SSAO pass depending on if the number of edge-aware blur passes is greater than 0.

Resource outputs

Name Type Notes
Final output Output AO texture An output texture containing the final AO values. This is provided to the ffxCacaoContextDispatch function.

Description

The de-interleaved SSAO textures generated by the previous passes are taken and re-interleaved to output at the correct resolution. Neighbor samples are then taken for a high resolution blur to be applied. The result is written to the output AO texture.

Bi-lateral upsampling

A bi-lateral upsampler is used to create the final output for the downsampled quality levels. The upsampler uses a 5x5 kernel of input SSAO values and their corresponding depths and creates a blended output value.

alt text

Resource inputs

Name Type Notes
De-interleaved SSAO textures R8G8_UNORM The texture containing the previously compute AO values.
De-interleaved depth R16_FLOAT The De-interleaved depth textures from the prepare] pass.
Input depth R32_FLOAT The depth buffer.

Resource outputs

Name Type Notes
Final output Output AO texture An output texture containing the final AO values. This is provided to the ffxCacaoContextDispatch function.

Description

The bi-lateral upsampler creates a blended output value using a kernel of 5x5 input SSAO and depth values. This upsampler can run with edge awareness using the previously generated edges, or with no edge awareness.

Version history

Version Date Notes
1.0 May 2020 Initial release of FidelityFX CACAO.
1.1 August 2020 Adding vulkan version
1.2 February 2021 Minor sample updates
1.3 May 2023 Port to FidelityFX SDK

References

See also