Groundwater quality analysis and potability prediction are essential for public health and sustainable environmental management. This study adopts a two-part approach, integrating statistical modeling and machine learning classification to analyze groundwater data. In the modeling phase, we introduced the Inverse Power Log-Logistic (IPLL) distribution, specifically designed to capture the unique characteristics of groundwater quality data, focusing on pH and sulfate concentrations due to their significant impact on water potability and health. pH levels are critical as they affect water acidity and the potential for heavy metal dissolution, while sulfate concentrations are commonly associated with water taste and health risks when present in excess. We derived key structural properties of the IPLL distribution, including its moments, hazard, and survival functions, with parameters estimated using maximum likelihood estimation. Compared to conventional models, the IPLL distribution shows enhanced flexibility and accuracy, as measured by metrics such as AIC, BIC, and RMSE. In the classification phase, we applied machine learning algorithms—logistic regression, K-Nearest Neighbors (KNN), Support Vector Classifier (SVC), and Random Forest—to predict groundwater potability. Performance was evaluated using accuracy, F1-score, and ROC-AUC. Random Forest achieved the highest accuracy (92.3%), F1-score (0.89), and ROC-AUC (0.94), with SVC following closely at 89.7% accuracy, 0.87 F1-score, and 0.92 ROC-AUC. Both KNN and logistic regression models also performed well, achieving accuracy scores of 87.5% and 85.2%, respectively. This study offers a comprehensive framework for groundwater analysis, combining advanced statistical modeling with effective machine learning classification. The IPLL distribution’s adaptability to environmental data and the machine learning models' predictive strength in potability assessment provide valuable insights for public health officials and environmental policymakers. This dual approach has broad potential applications in fields that require reliable data modeling and prediction.
Previous Article in event
Next Article in event
Next Article in session
Extension of the Log-Logistic Distribution for Groundwater Analysis and Potability Prediction Using Machine Learning Models
Published:
02 December 2024
by MDPI
in The 5th International Electronic Conference on Applied Sciences
session Applied Physical Science
Abstract:
Keywords: inverse power distribution; log-logistic distribution; groundwater; artificial intelligence; machine learning; logistic regression
Comments on this paper