Mitigating Label Noise in Remote Sensing: A Pseudo-Labeling Method for Forest Classification with Sentinel-2

Lilia Ammar Khodja; Mohammed El Amin Larabi; Meziane Iftene

Previous Article in event

Innovation and Optimization in Solar Energy: Generating Electricity with Reflective Silver Mirrors

Previous Article in session

Automating the Detection of Unplanned Urban Constructions through AI-Powered Super-Resolution and Multi-modal Data Fusion

Next Article in event

From Global Noise to Local Accuracy: An Abstaining Classifier Approach for Robust Forest Mapping with Noisy Global Data

Mitigating Label Noise in Remote Sensing: A Pseudo-Labeling Method for Forest Classification with Sentinel-2

Lilia Ammar Khodja

¹,

Mohammed El Amin Larabi

²,

Meziane Iftene

^{*

2}

¹ Departement of Computer Science, The National Higher School of Artificial Intelligence (ENSIA), Algiers, 16000, Algeria
² Departement of Scientific and Technological Watch, Algerian Space Agency, Algiers, 16000, Algeria

Academic Editor: Lucia Billeci

Published: 03 December 2025 by MDPI in The 6th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

The accuracy of large-area forest mapping is often compromised by the label noise present in global land cover products like ESA WorldCover. This study introduces a robust semi-supervised framework designed to mitigate this issue by leveraging a small, trusted set of manually curated clean data to refine a large, noisy dataset.

Our approach employs a modified ResNet-18 architecture in a two-stage training process. First, the model is trained exclusively on the high-quality, manually labeled clean dataset. This initial "teacher" model is then used to generate high-confidence pseudo-labels for the extensive but noisy WorldCover data, effectively filtering and re-labeling uncertain or incorrect regions. In the second stage, the model is fine-tuned on a composite dataset containing both the original clean labels and the newly generated, reliable pseudo-labels. This strategy leverages the accuracy of the clean data to improve the utility of the noisy data, significantly enhancing model robustness and generalization. The methodology was tested using Sentinel-2 and Digital Elevation Model (DEM) data in a case study covering the diverse forest ecosystems of North Africa.

Our semi-supervised methodology demonstrated exceptional performance, achieving a final classification accuracy of 98.50% on a combined validation set. The initial training on clean data showed rapid convergence, underscoring the power of a high-quality seed dataset. This research offers a practical and highly effective strategy for improving land cover classification in any region where large, noisy datasets are available alongside limited high-quality ground truth, providing a scalable solution to support global conservation efforts.

Keywords: Semi-Supervised Learning; Label Noise; Forest Mapping; Pseudo-Labeling; Remote Sensing

View Poster

39 Reads
0 Recommendations

Lilia Ammar Khodja

Mohammed El Amin Larabi

Meziane Iftene