The inherent complexity of the Raman spectra of biomedical samples reflects the intricate molecular composition and intermolecular interactions of these diverse systems. Biomolecules, such as proteins, lipids, nucleic acids, and carbohydrates, contribute Raman-active vibrational modes, adding layers of complexity to the spectra. Moreover, the prevalence of water, a major component in biological samples, introduces broad and overlapping spectral features, presenting challenges in discerning other biomolecular signals.
Unraveling the complexities of biological Raman spectra is essential for bioscience and bioengineering research because it provides insight into cellular processes, disease states, and drug interactions. For the effective analysis of such complex data, robust and cutting-edge software is required that provides sophisticated algorithms for data preprocessing, thereby enhancing signal-to-noise ratio and revealing hidden spectral information. In addition, novel applications of this type may include machine learning algorithms for automated clustering analysis, enabling the identification of biomolecules and their conformational changes in diverse biological specimens.
We present PyFasma, a Python 3 package built around the most popular scientific Python libraries such as numpy, pandas, scipy, seaborn, and scikit-learn that aims to provide Raman spectroscopists with user-friendly interactive tools for the analysis of complex biomedical Raman data.
The package covers an assortment of methods for data preprocessing, including spike removal, cropping, smoothing, baseline correction, and batch-deconvolution of Raman bands, among others. Additionally, PyFasma facilitates Principal Components Analysis (PCA), Partial Least Squares Regression (PLSR), and two-class Discriminant Analysis (PLS-DA). Its robust functionalities and seamless integration with Jupyter notebooks enable scientists to perform in-depth and reproducible analysis of complex biospectroscopic data.