Next-generation sequencing (NGS) has profoundly transformed the field of genomics with its ability to detect molecular findings on a large scale, particularly for the somatic genome. Research on complex diseases such as lung cancer has shifted significantly as NGS technology provides an efficient method to unravel the genetic fingerprint of this extensively studied disorder. This advancement has opened new pathways for understanding the molecular underpinnings of lung cancer, facilitating more targeted approaches in diagnosis, treatment, and research. While NGS data are highly dimensional and complex, they posea significant challenge for data analysis and classification tasks. In this paper, we investigated feature engineering to improve the classification accuracy of lung cancer using NGS data. The goal of all these methods of dimensionality reduction, feature selection, and transformation techniques is to improve machine learning's predictive power. In this work, the dimensionality reduction method Principal Component Analysis (PCA) is used to optimize feature selection. Advanced transformation techniques like normalization and scaling are applied to optimize the data for better model performance. The efficacy of these techniques is evaluated through a comprehensive comparison of various machine learning classifiers, including Support Vector Machines (SVMs). The results demonstrate that efficient feature engineering enhances the classification accuracy and robustness of lung cancer prediction models, providing valuable insights for the development of precision medicine approaches in oncology.
Previous Article in event
Previous Article in session
Next Article in event
Next Article in session
Feature Engineering for Lung Cancer Classification Using Next-Generation Sequencing Data
Published:
04 December 2024
by MDPI
in The 5th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Next-generation sequencing; lung cancer; feature engineering; machine learning; dimensionality reduction; classification.
Comments on this paper