The deployment of computer vision models on resource-constrained hardware, such as the ESP32 microcontroller, requires a critical balance between classification accuracy and memory footprint. This study investigates the performance of MobileNetV1 and MobileNetV3-Small architectures in scenarios characterized by limited data, aiming to identify the most efficient configuration for mechanical industrial tool classification. A dataset containing 106 images across five distinct classes was developed, utilizing an 80/20 train–test split with an additional 15% of training data reserved for validation. Both architectures were implemented using transfer learning with frozen ImageNet backbones and custom classification heads. MobileNetV1 was configured with a width multiplier of $\alpha = 0.25$ to aggressively reduce filters, while MobileNetV3-Small employed $\alpha = 1.0$ in "minimalistic" mode to exclude high-latency activation functions and attention modules. Both models were optimized using the Adam optimizer and categorical cross-entropy loss to ensure a controlled experimental comparison. Experimental results demonstrate that both architectures achieved equivalent performance, with a global accuracy of 91% and a weighted F1-score of 0.91. Confusion matrix analysis revealed that errors were primarily confined to visually similar classes, such as Allen keys and screwdrivers. However, a significant disparity emerged regarding model size: MobileNetV1 produced a 1.1 MB binary, whereas MobileNetV3-Small resulted in 2.1 MB, a nearly 90% increase in storage requirements without any gain in predictive performance. This research concludes that increased architectural complexity does not inherently translate to superior performance in small-data regimes. For memory-constrained devices like the ESP32, the scaled-down MobileNetV1 provides a superior cost–benefit ratio, maintaining high accuracy with a substantially smaller memory footprint. The findings highlight the necessity of prioritizing structural simplicity over architectural novelty when designing deep learning solutions for embedded AI applications.
Previous Article in event
Previous Article in session
Next Article in event
Next Article in session
Performance Comparison of MobileNetV1 and MobileNetV3-Small for Tool Classification in Memory-Constrained Embedded Systems
Published:
07 May 2026
by MDPI
in The 3rd International Electronic Conference on Machines and Applications
session Condition Monitoring and Fault Diagnosis
Abstract:
Keywords: Embedded AI; MobileNet; Transfer Learning; Model Compression
