As AI and transformers make progress, Vision-Language Models (VLMs) are set to have a big impact on medical diagnostics when it comes to understanding picture-based data.
X-rays play a key role in radio diagnoses, helping doctors spot many different illnesses.
This research aims to train a Vision-Language Model Visual BERT, to diagnose diseases by using both written patient info and X-ray images.
In the past few years, researchers around the world have tried out various AI models to identify different health problems; however, since Vision-Language Models are pretty new, people use them for general tools like chatbots.
But these models could be useful for automating the process of diagnosing diseases from X-rays.
The proposed methodology involves merging three separate datasets to build a full dataset that covers many diseases that X-ray images can spot. By bringing together different datasets, the model can figure out how to spot a wide range of conditions, which makes it better at diagnosing and more useful.
This method shows how VLMs can change medical diagnostics for the better, giving doctors a smart way to spot diseases faster and more frequently.
Visual BERT is trained on the dataset to diagnose diseases using visual as well as textual data.
For the evaluation matrix, an accuracy score is used and after fine tuning the model on the combined dataset, we obtained a 60% accuracy score.
To test the model, a front-end app with Streamlit was made to make it easier to use for end users.
Previous Article in event
Next Article in event
Harnessing Vision-Language Models for Improved X-ray Interpretation and Diagnosis
Published:
02 December 2024
by MDPI
in The 5th International Electronic Conference on Applied Sciences
session Computing and Artificial Intelligence
Abstract:
Keywords: Vision Language Model; Medical Imaging; Computer Vision; Natural Language Processing
Comments on this paper