Please login first
Next Article in event
A Comparative Study of Logistic Regression, Random Forest, and Gradient Boosting for Motor Insurance Lapse Prediction
1  Secondary School, Regents School Bali, Denpasar 80237, Indonesia
Academic Editor: Annamaria Olivieri

Published: 01 July 2026 by MDPI in The 1st International Online Conference on Risks session Insurance
Abstract:

This study examines the application of machine learning techniques to predict policyholder renewal behavior in motor vehicle insurance. Accurately identifying customers likely to lapse is crucial for pricing strategies and customer retention in actuarial practice. Using a dataset of motor insurance policies, three classification models, Logistic Regression, Random Forest, and LightGBM, were developed and compared. Exploratory analysis revealed a moderate class imbalance, with approximately 79.6% renewal and 20.4% lapse observations. Feature engineering was performed to construct variables such as age and driving experience. The models were evaluated using multiple performance metrics, including accuracy, precision, recall, specificity, F1-score, and the area under the ROC curve (AUC). The results show that Logistic Regression achieved the highest accuracy (79.4%) and recall (99.5%), but exhibited extremely low specificity (2.1%), indicating poor performance in identifying lapse cases. Random Forest provided a more balanced performance, with an AUC of 0.663 and improved specificity (7.1%), though still limited. LightGBM achieved the best overall discrimination ability, with the highest AUC (0.683) and a more balanced trade-off between recall (63.0%) and specificity (63.3%), despite lower overall accuracy. These findings suggest that while traditional models, such as Logistic Regression, may perform well on aggregate metrics, they can be misleading in imbalanced insurance datasets. Ensemble methods, particularly gradient boosting, offer superior capability in capturing complex patterns and improving classification balance. The study highlights the importance of using appropriate evaluation metrics beyond accuracy and demonstrates the practical relevance of machine learning methods in actuarial modeling and policyholder retention analysis.

Keywords: Motor insurance; Policy lapse prediction; Machine learning; LightGBM; Imbalanced data

 
 
Top