Please login first
Beyond Traditional airPLS: Improved Baseline Removal in SERS with Parameter-Focused Optimization and Prediction
1 , 2 , 3 , * 2 , * 4
1  School of Electrical and Computer Engineering, College of Engineering, The University of Georgia, Athens, GA, 30602, USA
2  School of Mathematics and Physics, Hebei University of Engineering, Handan, Hebei, 056038, China
3  Department of Epidemiology & Biostatistics, College of Public Health, The University of Georgia, Athens, GA, 30602, USA
4  Department of Physics and Astronomy, The University of Georgia, Athens, GA, 30602, USA
Academic Editor: Wan-liang Lu

Abstract:

Reliable baseline correction is a cornerstone of spectroscopic analysis, underpinning critical tasks such as peak identification and performance of machine learning classifiers. This process is particularly crucial in Surface-Enhanced Raman Spectroscopy (SERS), where subtle spectral features carry vital chemical signatures. Traditional baseline correction techniques often struggle with artifact introduction and extensive manual parameter adjustments. The adaptive iteratively reweighted penalized least squares (airPLS) algorithm, though widely appreciated for its speed, has notable drawbacks: its piecewise linear baseline fails to capture smooth backgrounds, it overestimates baselines by linking adjacent peak "feet," and yields significant mean absolute errors (MAE) in high-intensity regions. To address these limitations, we developed an innovative machine learning approach that predicts optimal airPLS parameters tailored to any input spectrum, eliminating the need for prior baseline knowledge. We fixed the smoothness parameter at 2 and systematically adjusted the penalizing and tolerance parameters. Using three peak types with four baseline profiles, we generated 6,000 simulated spectra with known true baselines. An iterative grid search optimization was used to identify optimal parameter sets for each spectrum, reducing average MAE by 96% compared to default airPLS. For practical deployment, we trained a machine learning model integrating principal component analysis with random forest, achieving direct parameter prediction from spectra while retaining 90% of the MAE reduction. We expanded the training dataset to 12,000 spectra, incorporating diverse peak characteristics guided by statistical distributions of optimal parameters, enhancing adaptability to real-world spectral variability. Importantly, we demonstrated that for both synthetic and experimental noisy spectra, our model successfully predicts parameters and baselines after simple denoising. Future work will focus on identifying optimal denoising strategies to further enhance results. By automating baseline correction, our approach enhances analytical precision with applications spanning virus detection, environmental monitoring, and beyond, making SERS more reliable and accessible.

Keywords: Baseline correction; Surface-enhanced Raman spectroscopy (SERS); adaptive iterative reweighted penalized least squares (airPLS); optimization; machine learning

 
 
Top