Please login first
Extension of the Log-Logistic Distribution for Groundwater Analysis and Potability Prediction Using Machine Learning Models
* 1 , 1 , 2 , 1 , 3 , 4 , 4 , 4
1  Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia
2  Department of Statistics, Ahmadu Bello University, Zaria, 810107, Nigeria
3  Kano State Agro - Climatic Resilience in Semi-Arid Landscapes, Kano State Ministry of Water Resources, Kano, Nigeria
4  Department of Mathematics, Aliko Dangote University of Science and Technology, Wudil 713281, Nigeria
Academic Editor: Ali Belarouci

Abstract:

Groundwater quality analysis and potability prediction are essential for public health and sustainable environmental management. This study adopts a two-part approach, integrating statistical modeling and machine learning classification to analyze groundwater data. In the modeling phase, we introduced the Inverse Power Log-Logistic (IPLL) distribution, specifically designed to capture the unique characteristics of groundwater quality data, focusing on pH and sulfate concentrations due to their significant impact on water potability and health. pH levels are critical as they affect water acidity and the potential for heavy metal dissolution, while sulfate concentrations are commonly associated with water taste and health risks when present in excess. We derived key structural properties of the IPLL distribution, including its moments, hazard, and survival functions, with parameters estimated using maximum likelihood estimation. Compared to conventional models, the IPLL distribution shows enhanced flexibility and accuracy, as measured by metrics such as AIC, BIC, and RMSE. In the classification phase, we applied machine learning algorithms—logistic regression, K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), and Random Forest—to predict groundwater potability. Performance was evaluated using accuracy, F1-score, and ROC-AUC. Random Forest achieved the highest accuracy (92.3%), F1-score (0.89), and ROC-AUC (0.94), with SVC following closely at 89.7% accuracy, 0.87 F1-score, and 0.92 ROC-AUC. Both KNN and logistic regression models also performed well, achieving accuracy scores of 87.5% and 85.2%, respectively. This study offers a comprehensive framework for groundwater analysis, combining advanced statistical modeling with effective machine learning classification. The IPLL distribution’s adaptability to environmental data and the machine learning models' predictive strength in potability assessment provide valuable insights for public health officials and environmental policymakers. This dual approach has broad potential applications in fields that require reliable data modeling and prediction.

Keywords: inverse power distribution; log-logistic distribution; groundwater; artificial intelligence; machine learning; logistic regression
Comments on this paper
Currently there are no comments available.



 
 
Top