NNAPI delegate issue with DepthwiseConv2D nodes

This repo contains scripts and a tool to reproduce the NNAPI delegate issue with models containing DepthwiseConv2D nodes. Our experiments have revealed that DepthwiseConv2D nodes generate a multitude of problems when using NNAPI delegate. Here is a summary of our findings:

Inferring DepthwiseConv2D nodes with XNNPACK delegate is always faster than the NNAPI delegate (tested on Snapdragon 855 and Snapdragon 865 - Android 12).
On devices with Snapdragon 888 (tested with Android 12), INT8 models starting with a DepthwiseConv2D node always crash with the NNAPI delegate.
After a specific threshold for the node size, the elapsed time inside DepthwiseConv2D nodes increases significantly so that one single node can even take about 20 ms to run with NNAPI delegate. For example, the following model runs in 2.82 ms with NNAPI delegate on Snapdragon 855 (Android 10):

However, the next model which is very similar to the previous one runs in 19.62 ms on the same device with the same delegate:

On devices with Snapdragon 865 (tested with Android 12), quantized DepthwiseConv2D nodes with large kernel size (e.g. 7x7) cause accuracy loss with NNAPI delegate.
On devices with Snapdragon 855 (tested with Android 12), quantized DepthwiseConv2D nodes with stride size more than one always result in accuracy loss.

Note: To compute the accuracy loss, we consider the Cosine Similarity (CS) between the results generated by the INT8 tflite version with the NNAPI delegate and the expected results from the FP32 tflite version of the same model with the XNNPACK delegate.

Building and converting the model

model_files folder contains simple models representing the above-mentioned issues.
- You can also use generate_dummy_model.py to build the models and use convert_model.py to convert them to tflite.

tflite_inference tool

We have implemented a small tool to feed an input to our sample INT8 tflite models using the NNAPI delegate and compare the results with what we get from the corresponding FP32 tflite versions using the XNNPACK delegate.

PREREQUISITES:

Linux host computer
Connectivity to the target device via adb
Android NDK, version 22 or later
CMake 3.18 or later

BUILD INSTRUCTIONS

Unzip the tensorflow_lite_cpp_2_9_1_nightly.zip file inside the tflite_inference_tool folder.
In a terminal, from tflite_inference_tool folder:

$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles"
        -DCMAKE_SYSTEM_NAME=Android 
        -DANDROID_ABI=arm64-v8a 
        -DANDROID_STL=c++_shared 
        -DANDROID_NATIVE_API_LEVEL=27 
        -DCMAKE_VERBOSE_MAKEFILE=ON 
        -DCMAKE_TOOLCHAIN_FILE=<path-to-ndk>/build/cmake/android.toolchain.cmake 
        -DCMAKE_BUILD_TYPE=Release
        -DTensorFlowLite_ROOT=../tensorflow_lite_cpp_2_9_1_nightly ..
$ make

Here, you must replace with the absolute path of the ndk installed on your computer. If you installed NDK through Android studio, it is typically located at: /home/<username>/Android/Sdk/ndk/<version>/ on Linux
tensorflow_lite_cpp_2_9_1_nightly is TensorflowFlow Lite library (nightly version) package.

Run INSTRUCTIONS

WARNING: This step will write to your /data/local/tmp folder on device. Please make sure existing files in that folder are backed up as needed.

In a terminal, from tflite_inference_tool folder:

$ adb push ./build/model_test /data/local/tmp
$ adb push ./model_files /data/local/tmp

To run the tool you can use different parameters. In the following, we have listed the output of tool when running on Snapdragon 855:

Sample 1:

$ adb shell "cd /data/local/tmp && LD_LIBRARY_PATH=. ./model_test --model_a=model_files/int8_large_depthwiseConv_5x5_stride_2.tflite --model_b=model_files/fp32_large_depthwiseConv_5x5_stride_2.tflite --input_shape=52,92,480 --output_shape=26,46,480"

INFO: Created TensorFlow Lite delegate for NNAPI.
INFO: Initialized TensorFlow Lite runtime.
INFO: Replacing 3 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 1 partitions.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: Replacing 1 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 1 partitions.

The average elapsed time in nnapi delegate: 19.2918ms
The average elapsed time in xnnpack delegate: 5.97154ms
Cosine Similarity: 0.799906

Sample 2:

$ adb shell "cd /data/local/tmp && LD_LIBRARY_PATH=. ./model_test --model_a=model_files/int8_large_depthwiseConv_5x5_stride_1.tflite --model_b=model_files/fp32_large_depthwiseConv_5x5_stride_1.tflite --input_shape=52,92,480 --output_shape=52,92,480"

INFO: Created TensorFlow Lite delegate for NNAPI.
INFO: Initialized TensorFlow Lite runtime.
INFO: Replacing 3 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 1 partitions.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: Replacing 1 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 1 partitions.

The average elapsed time in nnapi delegate: 27.2882ms
The average elapsed time in xnnpack delegate: 9.43883ms
Cosine Similarity: 0.999899

Sample 3:

$ adb shell "cd /data/local/tmp && LD_LIBRARY_PATH=. ./model_test --model_a=model_files/int8_medium_depthwiseConv_5x5_stride_2.tflite --model_b=model_files/fp32_medium_depthwiseConv_5x5_stride_2.tflite --input_shape=32,32,384 --output_shape=16,16,384"

INFO: Created TensorFlow Lite delegate for NNAPI.
INFO: Initialized TensorFlow Lite runtime.
INFO: Replacing 3 node(s) with delegate (TfLiteNnapiDelegate) node, yielding 1 partitions.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: Replacing 1 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 1 partitions.

The average elapsed time in nnapi delegate: 2.58259ms
The average elapsed time in xnnpack delegate: 0.922166ms
Cosine Similarity: 0.839082

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
tflite_inference_tool		tflite_inference_tool
README.md		README.md
convert_model.py		convert_model.py
generate_dummy_model.py		generate_dummy_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NNAPI delegate issue with DepthwiseConv2D nodes

Building and converting the model

tflite_inference tool

PREREQUISITES:

BUILD INSTRUCTIONS

Run INSTRUCTIONS

About

Releases

Packages

Languages

Bahar-BM/nnapi_DWC

Folders and files

Latest commit

History

Repository files navigation

NNAPI delegate issue with DepthwiseConv2D nodes

Building and converting the model

tflite_inference tool

PREREQUISITES:

BUILD INSTRUCTIONS

Run INSTRUCTIONS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages