A first approximation for acid sulfate soil mapping in areas with few soil samples

Virginia Estévez; Stefan Mattbäck; Anton Boman

doi:10.3390/ECRS2023-15831

Abstract:

Acid sulfate soil mapping is the first step to avoid possible environmental damages created by one of the most problematic soils existing in nature: the acid sulfate soils. This type of soil is especially hazardous when it is drained by agricultural or forestry land use. Nowadays, more objective and precise maps are possible thanks to the application of machine learning. The use of a supervised machine learning technique in acid sulfate soil mapping requires two different types of data: the soil samples and the environmental covariates created by remote sensing data. One of the problems in acid sulfate soil mapping is the lack of soil samples in some regions since the collection of soil samples and their analysis is a long process. This prevents the creation of acid sulfate soils occurrence maps. For a first recognition of these regions, in addition to using the remote sensing data of the area, a possible solution could be the use of soil samples from other areas with similar characteristics for training the model. The question is whether a machine learning model could correctly classify acid sulfate soils in an area where it has not been trained. If this were possible, this first prediction could be used to design an efficient sampling plan for the region. In previous works, Random Forest has shown high abilities for the correct prediction of acid sulfate soils. In this work, we analyze if Random Forest is able to correctly classify the soil samples in an area where it has not been trained. For this, two different regions located in southern Finland with a similar composition of their soils are considered. It is known that remote sensing data play a fundamental role in the detection of acid sulfate soils. In this study, the remote sensing data used are LiDAR and geophysics, which arise from airborne surveys. The raster data of both areas consist of 17 environmental covariates of different types: Quaternary geology, digital elevation model, terrain layers and aerogeophysics layers. Digital elevation model is made using LiDAR data, and the terrain layers are derived from the digital elevation model. In this work, we show that Random Forest is able to classify the acid sulfate soils of an area where it has not been trained. The precision of the model is above 60%. These results are very good for a model that has not been trained in the area of the prediction. Training the model in the same area improves the results by up to 10-13%. Therefore, training the model in a different region can be used for a first recognition of regions with limited soil samples as well as for the creation of the sampling plan design in those regions.