diff --git a/Docs/featureguide/autoquant.rst b/Docs/featureguide/autoquant.rst index 8d12e2538b5..d7b1870af29 100644 --- a/Docs/featureguide/autoquant.rst +++ b/Docs/featureguide/autoquant.rst @@ -26,8 +26,8 @@ Workflow The workflow looks like this: - .. image:: ../../images/auto_quant_v2_flowchart.png - +.. image:: ../images/auto_quant_1.png + :height: 450 Before entering the optimization workflow, AutoQuant prepares by: 1. Checking the validity of the model and converting the model into an AIMET quantization-friendly format (`Prepare Model`). diff --git a/Docs/featureguide/mixed precision/amp.rst b/Docs/featureguide/mixed precision/amp.rst index 0ca2872b3c0..7fbdfbe3703 100644 --- a/Docs/featureguide/mixed precision/amp.rst +++ b/Docs/featureguide/mixed precision/amp.rst @@ -27,7 +27,7 @@ allowable accuracy drop, is passed to the API. The function changes the QuantSim Sim model in place with different quantizers having different bit-widths. This QuantSim model can be either exported or evaluated to get a quantization accuracy. -.. image:: ../../images/work_flow_amp.png +.. image:: ../../images/automatic_mixed_precision_1.png :width: 900px Mixed Precision Algorithm @@ -35,8 +35,8 @@ Mixed Precision Algorithm The algorithm involves 4 phases: - .. image:: ../../images/stages.png - :width: 150px +.. image:: ../../images/automatic_mixed_precision_2.png + :width: 700px 1) Find layer groups -------------------- @@ -45,8 +45,8 @@ The algorithm involves 4 phases: This helps in reducing search space over which the mixed precision algorithm operates. It also ensures that we search only over the valid bit-width settings for parameters and activations. - .. image:: ../../images/quantizer_groups.png - :width: 900px +.. image:: ../../images/automatic_mixed_precision_3.png + :width: 900px 2) Perform sensitivity analysis (Phase 1) ----------------------------------------- diff --git a/Docs/images/auto_quant_1.png b/Docs/images/auto_quant_1.png new file mode 100644 index 00000000000..1601567b627 Binary files /dev/null and b/Docs/images/auto_quant_1.png differ diff --git a/Docs/images/automatic_mixed_precision_1.png b/Docs/images/automatic_mixed_precision_1.png new file mode 100644 index 00000000000..58904048459 Binary files /dev/null and b/Docs/images/automatic_mixed_precision_1.png differ diff --git a/Docs/images/automatic_mixed_precision_2.png b/Docs/images/automatic_mixed_precision_2.png new file mode 100644 index 00000000000..5533cb541f9 Binary files /dev/null and b/Docs/images/automatic_mixed_precision_2.png differ diff --git a/Docs/images/automatic_mixed_precision_3.png b/Docs/images/automatic_mixed_precision_3.png new file mode 100644 index 00000000000..647f92d33f4 Binary files /dev/null and b/Docs/images/automatic_mixed_precision_3.png differ diff --git a/Docs/images/debugging_guidelines_1.png b/Docs/images/debugging_guidelines_1.png new file mode 100644 index 00000000000..63b883b76a5 Binary files /dev/null and b/Docs/images/debugging_guidelines_1.png differ diff --git a/Docs/images/quantization_workflow_1.png b/Docs/images/quantization_workflow_1.png new file mode 100644 index 00000000000..981ebec313d Binary files /dev/null and b/Docs/images/quantization_workflow_1.png differ diff --git a/Docs/images/quantization_workflow_2.png b/Docs/images/quantization_workflow_2.png new file mode 100644 index 00000000000..d250010c9aa Binary files /dev/null and b/Docs/images/quantization_workflow_2.png differ diff --git a/Docs/images/quantization_workflow_3.png b/Docs/images/quantization_workflow_3.png new file mode 100644 index 00000000000..e3aac23be55 Binary files /dev/null and b/Docs/images/quantization_workflow_3.png differ diff --git a/Docs/images/quantization_workflow_4.png b/Docs/images/quantization_workflow_4.png new file mode 100644 index 00000000000..ac8827d02a3 Binary files /dev/null and b/Docs/images/quantization_workflow_4.png differ diff --git a/Docs/images/quantization_workflow_5.png b/Docs/images/quantization_workflow_5.png new file mode 100644 index 00000000000..b89c986056e Binary files /dev/null and b/Docs/images/quantization_workflow_5.png differ diff --git a/Docs/userguide/debugging_guidelines.rst b/Docs/userguide/debugging_guidelines.rst index 74f8ceab5b2..8922719f2b6 100644 --- a/Docs/userguide/debugging_guidelines.rst +++ b/Docs/userguide/debugging_guidelines.rst @@ -15,11 +15,11 @@ Debugging workflow The steps are shown as a flow chart in the following figure and are described in more detail below: -.. image:: ../images/quantization_debugging_flow_chart.png - :height: 800 - :width: 700 +.. image:: ../images/debugging_guidelines_1.png + :height: 500 -1. FP32 confidence check + +1. FP32 confidence checks ------------------------ First, ensure that the floating-point and quantized model behave similarly in the forward pass, diff --git a/Docs/userguide/quantization_workflow.rst b/Docs/userguide/quantization_workflow.rst index 80bfbcef8e2..dbf62cb8454 100644 --- a/Docs/userguide/quantization_workflow.rst +++ b/Docs/userguide/quantization_workflow.rst @@ -24,10 +24,10 @@ without requiring actual quantized hardware. A quantization simulation workflow is illustrated here: -.. image:: ../images/quant_use_case_1.PNG - -2. Post-training quantization ------------------------------ +.. image:: ../images/quantization_workflow_1.png + +2. Post-training quantization (PTQ): +------------------------------------ Post-training quantization (PTQ) techniques make a model more quantization-friendly without requiring model retraining or fine-tuning. PTQ is recommended as a go-to tool in a quantization workflow because: @@ -37,7 +37,7 @@ or fine-tuning. PTQ is recommended as a go-to tool in a quantization workflow be The PTQ workflow is illustrated here: -.. image:: ../images/quant_use_case_3.PNG +.. image:: ../images/quantization_workflow_2.png 3. Quantization-aware training ------------------------------ @@ -55,7 +55,7 @@ but it can provide better accuracy, especially at lower bit-widths. A typical QAT workflow is illustrated here: -.. image:: ../images/quant_use_case_2.PNG +.. image:: ../images/quantization_workflow_3.png Supported precisions for on-target inference ============================================ @@ -108,7 +108,7 @@ lowering the precision. The figure below illustrates the recommended quantization workflow and the steps required to deploy the quantized model on the target device. -.. figure:: ../images/overall_quantization_workflow.png +.. figure:: ../images/quantization_workflow_4.png Recommended quantization workflow @@ -144,7 +144,7 @@ If the off-target quantized accuracy metric is not meeting expectations, you can techniques to improve the quantized accuracy for the desired precision. The decision between PTQ and QAT should be based on the quantized accuracy and runtime needs. -.. image:: ../images/quantization_workflow.png +.. image:: ../images/quantization_workflow_5.png Once the off-target quantized accuracy metric is satisfactory, proceed to :ref:`evaluate the on-target metrics` at this precision. If the on-target metrics