Performance Comparison of MobileNetV1 and MobileNetV3-Small for Tool Classification in Memory-Constrained Embedded Systems

Published: 07 May 2026 by MDPI in The 3rd International Electronic Conference on Machines and Applications session Condition Monitoring and Fault Diagnosis

Abstract:

The deployment of computer vision models on resource-constrained hardware, such as the ESP32 microcontroller, requires a critical balance between classification accuracy and memory footprint. This study investigates the performance of MobileNetV1 and MobileNetV3-Small architectures in scenarios characterized by limited data, aiming to identify the most efficient configuration for mechanical industrial tool classification. A dataset containing 106 images across five distinct classes was developed, utilizing an 80/20 train–test split with an additional 15% of training data reserved for validation. Both architectures were implemented using transfer learning with frozen ImageNet backbones and custom classification heads. MobileNetV1 was configured with a width multiplier of $\alpha = 0.25$ to aggressively reduce filters, while MobileNetV3-Small employed $\alpha = 1.0$ in "minimalistic" mode to exclude high-latency activation functions and attention modules. Both models were optimized using the Adam optimizer and categorical cross-entropy loss to ensure a controlled experimental comparison. Experimental results demonstrate that both architectures achieved equivalent performance, with a global accuracy of 91% and a weighted F1-score of 0.91. Confusion matrix analysis revealed that errors were primarily confined to visually similar classes, such as Allen keys and screwdrivers. However, a significant disparity emerged regarding model size: MobileNetV1 produced a 1.1 MB binary, whereas MobileNetV3-Small resulted in 2.1 MB, a nearly 90% increase in storage requirements without any gain in predictive performance. This research concludes that increased architectural complexity does not inherently translate to superior performance in small-data regimes. For memory-constrained devices like the ESP32, the scaled-down MobileNetV1 provides a superior cost–benefit ratio, maintaining high accuracy with a substantially smaller memory footprint. The findings highlight the necessity of prioritizing structural simplicity over architectural novelty when designing deep learning solutions for embedded AI applications.

Keywords: Embedded AI; MobileNet; Transfer Learning; Model Compression

12 Reads
0 Recommendations