EDGE AI

TFLite Micro on STM32H7: Quantization, CMSIS-NN, and Sub-5mW Inference

Dec 2025 6 min read Ravi Kumar
INT8 quantization, CMSIS-NN kernels, and deploying an Edge Impulse-trained model on STM32H7 — benchmarked for accuracy and power.

Why TFLite Micro

TensorFlow Lite for Microcontrollers runs inference on devices with as little as 20KB RAM. On STM32H7 (1MB RAM, 480MHz Cortex-M7) you can run keyword spotting, gesture classification, and anomaly detection within tight power budgets.

INT8 Quantization

INT8 quantization reduces model size ~4x and inference latency ~2-3x with minimal accuracy loss. Use tf.lite.Optimize.DEFAULT with a representative calibration dataset to determine per-layer quantization scales.

CMSIS-NN Acceleration

ARM's CMSIS-NN provides optimized kernels using SIMD DSP extension instructions. TFLite Micro automatically uses CMSIS-NN when included. On STM32H7, a MobileNet-v1 (0.25x, 96x96) runs in ~18ms vs ~67ms without — a 3.7x speedup.

Deployment

constexpr int kTensorArenaSize = 80 * 1024;
uint8_t tensor_arena[kTensorArenaSize];
tflite::MicroInterpreter interpreter(
    model, resolver, tensor_arena, kTensorArenaSize
);
interpreter.AllocateTensors();
interpreter.Invoke();

Power Benchmarks

← Back to all posts