Autism Spectrum Disorder (ASD) affects a significant portion of the population, with approximately 636,000 students diagnosed with autism enrolled in schools in Brazil, reflecting a 48% increase compared to previous years, according to the 2023 School Census. However, the joint analysis of ASD-related data presents a challenge due to the heterogeneity of sources, including medical records, monitoring devices, clinical evaluations, and questionnaires. This project aims to address these challenges by developing an automated pipeline to collect, clean, transform, and integrate heterogeneous ASD data. Using tools such as Pandas, Amazon S3, Google BigQuery, Tableau, Scikit-learn, and TensorFlow, the pipeline ensures data quality and standardizes its formats and terminologies. Automation was managed by Apache Airflow, ensuring the continuous and efficient execution of the process. The integrated data enabled advanced analyses, such as identifying behavioral patterns, correlating clinical and monitoring data, and performing sentiment analysis in questionnaires. The findings provided valuable insights into ASD, surpassing the state of the art by offering more accurate predictive models and clear visualizations that support decision-making by healthcare professionals. The project resulted in the creation of a robust infrastructure that improves the quality and usability of available ASD data, contributing to the development of more effective interventions and targeted public policies.
Previous Article in event
Previous Article in session
Next Article in event
Integração e Padronização de Dados Heterogêneos de Autismo
Published:
04 December 2024
by MDPI
in The 5th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Autism Spectrum Disorder; data integration; data standardization; data pipeline; healthcare data
Comments on this paper