The implementation of yolov5s on android for the yolov5s export contest

yolov5s_android
?

The implementation of yolov5s on android for the yolov5s export contest.
Download the latest android apk from release and install your device.

Environment

  • Host Ubuntu18.04
  • Docker
    • Tensorflow 2.4.0
    • PyTorch 1.7.0
    • OpenVino 2021.3
  • Android App
    • Android Studio 4.2.1
    • minSdkVersion 28
    • targetSdkVersion 29
    • TfLite 2.4.0
  • Android Device
    • Xiaomi Mi11 (Storage 128GB/ RAM8GB)
    • OS MUI 12.5.8

We use docker container for host evaluation and model conversion.

git clone --recursive https://github.com/lp6m/yolov5s_android
cd yolov5s_android
docker build ./ -f ./docker/Dockerfile  -t yolov5s_android
docker run -it --gpus all -v `pwd`:/workspace yolov5s_anrdoid bash

Files

  • ./app
    • Android application.
    • To build application by yourself, copy ./tflite_model/*.tflite to app/tflite_yolov5_test/app/src/main/assets/, and build on Android Studio.
    • The app can perform inference with various configurations of input size, inference accuracy, and model accuracy.
    • For ‘Open Directory Mode’, save the detected bounding boxes results as a json file in coco format.
    • Realtime deteciton from camera image (precision and input size is fixed to int8/320). Achieved FPS is about 15FPS on Mi11.
    • NOTE Please select image/directory as an absolute path from ‘Device’. The app does not support select image/directory from ‘Recent’ in some devices.
  • ./benchmark
  • ./convert_model
    • Model conversion guide and model quantization script.
  • ./docker
    • Dockerfile for the evaluation and model conversion environment.
  • ./host
    • detect.py : Run detection for image with TfLite model on host environment.
    • evaluate.py: Run evaluation with coco validation dataset and inference results.
  • ./tflite_model
    • Converted TfLite Model.

Performance

Latency

These results are measured on Xiaomi Mi11.
Please refer benchmark/README.md about the detail of benchmark command.
The latency does not contain the pre/post processing time and data transfer time.

float32 model

delegate 640×640 [ms] 320×320 [ms]
None (CPU) 249 61
NNAPI (qti-gpu, fp32) 156 112
NNAPI (qti-gpu, fp16) 92 79

int8 model

We tried to accelerate the inference process by using NNAPI (qti-dsp) and offload calculation to Hexagon DSP, but it doesn’t work for now. Please see here in detail.

delegate 640×640 [ms] 320×320 [ms]
None (CPU) 95 23
NNAPI (qti-default) Not working Not working
NNAPI (qti-dsp) Not working Not working

Accuracy

Please refer host/README.md about the evaluation method.
We set conf_thresh=0.25 and iou_thresh=0.45 for nms parameter.

device, model, delegate 640×640 mAP 320×320 mAP
host GPU (Tflite + PyTorch, fp32) 27.8 26.6
host CPU (Tflite + PyTorch, int8) 26.6 25.5
NNAPI (qti-gpu, fp16) 28.5 26.8
CPU (int8) 27.2 25.8

Model conversion

This project focuses on obtaining a tflite model by model conversion from PyTorch original implementation, rather than doing own implementation in tflite.
We convert models in this way: PyTorch -> ONNX -> OpenVino -> TfLite.
To convert the model from OpenVino to TfLite, we use openvino2tensorflow. Please refer convert_model/README.md about the model conversion.

GitHub

https://github.com/lp6m/yolov5s_android