Vision Foundation Models (VFMs) offer transformative potential for geospatial AI, but their application in data-constrained regions like Algeria is hindered by massive data requirements and high computational costs. This work introduces a novel framework that integrates VFMs with Few-Shot Learning (FSL) and an advanced ensemble technique, delivering a high-performance, data-efficient solution for mapping cereal crops, which are vital for Algerian food security.
Our objective was to develop a semantic segmentation pipeline for cereal mapping using a very limited, custom-collected dataset. We fine-tuned two architecturally distinct VFMs: the ViT-based Prithvi and the Swin Transformer-based Satlas. We then developed a Mixture of Experts (MoE) system, which combines these two fine-tuned "expert" models. A lightweight, trainable "gating network" learns to dynamically weigh the output of each expert on a per-image basis, synergistically leveraging their unique strengths.
The results highlight the exceptional performance of VFMs in a low-data regime. The fine-tuned Satlas model achieved a remarkable Overall Accuracy of 96.93% and a Cereal Class Intersection over Union (IoU) of 94.12%. The MoE system advanced this performance further, setting a new benchmark with an Overall Accuracy of 97.82% and a Cereal Class IoU of 95.58%. The MoE model demonstrated rapid convergence, showcasing its efficiency.
This study validates a highly effective framework for precision agriculture, proving that VFM ensembles can overcome data scarcity and deliver state-of-the-art performance, providing a tangible pathway for nations like Algeria to leverage cutting-edge AI for food security and sustainable resource management.
