Arm Ethos-U NPUs are AI accelerators designed for efficient and scalable machine learning on embedded devices, supporting a range of Arm Cortex and Neoverse processors.
Performance Range: Scales from GOP/s to TOP/s to meet various ML performance requirements.
Integration: Compatible with both high-performance Cortex-A and low-power Cortex-M embedded systems.
Architecture: Includes integrated DMA, MAC array, and element-wise engines.
Energy Efficiency: Reduces energy consumption for ML workloads (e.g., ASR) by up to 90% compared to previous Cortex-M generations.
Network Support: Supports neural networks including CNNs and RNNs for audio processing, speech recognition, image classification, and object detection.
Operator Coverage: Executes heavy compute operators directly on the NPU (convolution, transformer, LSTM, RNN, pooling, activation functions, and primitive element-wise functions). Fallback kernels run on the tightly coupled Cortex-M (via CMSIS-NN) or Cortex-A (via Arm Compute Library).
Memory Optimization: Model compression reduces memory footprint by up to 70%. Offline compilation (operator/layer fusion and layer reordering) can further reduce system memory requirements by up to 90%.
Toolchain: Uses a unified toolchain across Arm Cortex and Ethos-U processors.