Objective: The purpose of this paper is to develop a machine learning-based classification model that can predict the risk level of maternal disease during pregnancy. The risk levels are classified as low or high based on clinical and physiological features. This model can provide early warnings, helping to reduce complications that typically occur during pregnancy.
Material/method: In this study fifteen classification algorithms were implemented and evaluated: Logistic Regression, Linear SVM(L1), RBF SVM, Decision Tree, Random Forest, XGBoost, AdaBoost, Bagging, KNN, Gaussian Naïve Bayes, Bernoulli Naïve Bayes, Ridge Classifier, Linear Discriminant Analysis, LightGBM, Extra Trees, and Deep Learning algorithms such as Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs), which are planned for future enhancements to capture nonlinear patterns. The dataset used in this model is Mendeley's Maternal Health Risk Assessment Dataset. The target variable in this dataset is the Risk Level, which is categorized as Low or High. The datasets include nearly 1,187 patient records. The key features in this dataset are Age, Systolic and Diastolic Blood Pressure, Blood Sugar (BS), Body Temperature, BMI, Heart Rate, Previous Complications, Preexisting Diabetes, Gestational Diabetes, and Mental Disease Indicators.
Results: Among the fifteen classification algorithms tested, XGBoost achieved the highest accuracy of 99% along with strong precision and recall for high-risk cases. Feature importance analysis showed that preexisting Diabetes, Blood Sugar, BMI, Heart Rate, and Mental Disease Indicators were the most influential predictors.
