Please login first
Mitigating Label Noise in Remote Sensing: A Pseudo-Labeling Method for Forest Classification with Sentinel-2
1 , 2 , * 2
1  Departement of Computer Science, The National Higher School of Artificial Intelligence (ENSIA), Algiers, 16000, Algeria
2  Departement of Scientific and Technological Watch, Algerian Space Agency, Algiers, 16000, Algeria
Academic Editor: Lucia Billeci

Abstract:

The accuracy of large-area forest mapping is often compromised by the label noise present in global land cover products like ESA WorldCover. This study introduces a robust semi-supervised framework designed to mitigate this issue by leveraging a small, trusted set of manually curated clean data to refine a large, noisy dataset.

Our approach employs a modified ResNet-18 architecture in a two-stage training process. First, the model is trained exclusively on the high-quality, manually labeled clean dataset. This initial "teacher" model is then used to generate high-confidence pseudo-labels for the extensive but noisy WorldCover data, effectively filtering and re-labeling uncertain or incorrect regions. In the second stage, the model is fine-tuned on a composite dataset containing both the original clean labels and the newly generated, reliable pseudo-labels. This strategy leverages the accuracy of the clean data to improve the utility of the noisy data, significantly enhancing model robustness and generalization. The methodology was tested using Sentinel-2 and Digital Elevation Model (DEM) data in a case study covering the diverse forest ecosystems of North Africa.

Our semi-supervised methodology demonstrated exceptional performance, achieving a final classification accuracy of 98.50% on a combined validation set. The initial training on clean data showed rapid convergence, underscoring the power of a high-quality seed dataset. This research offers a practical and highly effective strategy for improving land cover classification in any region where large, noisy datasets are available alongside limited high-quality ground truth, providing a scalable solution to support global conservation efforts.

Keywords: Semi-Supervised Learning; Label Noise; Forest Mapping; Pseudo-Labeling; Remote Sensing
Comments on this paper
Currently there are no comments available.


 
 
Top