Please login first
Mathematical Foundation of Chaotic Random Forest Algorithm (CRFA) for High-Dimensional Geospatial Datasets
* 1 , * 2 , 3
1  Department of Radio Engineering Systems, Faculty of Radio Engineering and Telecommunications; St. Petersburg State Electrotechnical University "LETI"; St. Petersburg; 197376; Russia
2  Department of Photonics; Faculty of Electronics and Nano Electronics; St. Petersburg State Electrotechnical University "LETI"; St. Petersburg; 197376; Russia
3  Faculty of Computer Science and Technology, Bachelor in Computer Science and Artificial Intelligence, St-Petersburg State Electrotechnical University, 197022, St. Petersburg, Russia
Academic Editor: Marjan Mernik

Abstract:

The Random Forest algorithm, originally proposed by Leo Breiman, is a cornerstone of ensemble learning, relying on the principle of Bagging (Bootstrap Aggregating) to reduce the variance of decision tree classifiers. In high-dimensional geospatial datasets, such as those derived from multispectral satellite imagery, the traditional Random Forest Algorithm (RFA) often struggles with feature correlation and the curse of dimensionality. Our research proposes a transformative shift in ensemble learning by replacing the stochastic nature of traditional Bootstrap Aggregating (Bagging) with deterministic chaotic dynamics to enhance predictive stability and accuracy in complex spatial domains. Traditional Random Forest (RF) often suffers from sampling bias and sub-optimal convergence when dealing with high-dimensional geospatial data, such as Sentinel satellite imagery, where spatial autocorrelation and feature redundancy are prevalent. Our study first focuses on the mathematical formalization of chaotic maps, such as the Logistic and Tent maps, to ensure a more uniform coverage of the feature space. Secondly, the framework provides a rigorous mathematical demonstration of chaotic feature subspace selection and hyperparameter optimization via a chaotic search mechanism, effectively preventing the algorithm from getting trapped in local optima, a common failure mode in high-dimensional geospatial analysis. Finally, our study establishes the mathematical proofs for convergence and ergodicity, demonstrating that the CRFA maintains a superior bias–variance tradeoff compared to standard ensemble methods. Experimental results utilizing Sentinel-2 satellite datasets indicate that the CRFA significantly enhances classification accuracy and computational robustness. By integrating non-linear dynamics into the ensemble architecture, the proposed CRFA achieves a 4.5% increase in the Kappa coefficient and a reduction in training variance, providing a more reliable tool for complex land-cover mapping and environmental monitoring.

Keywords: Random Forest Algorithm, Chaotic Maps, High-Dimensional Data, Geospatial Analysis, Machine Learning, Feature Selection

 
 
Top