Please login first
Evaluating Agentic AI for Autonomous Hypothesis Generation in Clinical Datasets
* 1 , 2
1  School of Medicine, The University of Jordan, Queen Rania Street, Amman 11942, Jordan
2  Jordan Center for Disease Control (JCDC), Amman 11183, Jordan
Academic Editor: Guo-Min Li

Abstract:

Background:

Generative agentic AI systems are increasingly used to accelerate clinical research workflows, but their reliability for hypothesis discovery in patient datasets remains insufficiently evaluated. In this study, we evaluated whether an autonomous AI research agent could independently examine a deidentified clinical dataset, identify a significant association, select an appropriate statistical test, and produce a scientifically sound hypothesis.

Methods:

We evaluated ChatGPT in agent mode using structured prompt engineering and guardrails designed to reduce hallucination and improve its ability to detect patterns across patient records, identify similarities between cases, and uncover potential links and associations. As a benchmark task, we used the publicly available Brain Mets Lung MRI Path Segs dataset from The Cancer Imaging Archive, which contains clinical notes and pathology related information. We assessed whether the agent could independently identify a clinically relevant association and generate a significant hypothesis.

Results:

After guardrail based prompt refinement, the agent produced more grounded and auditable analyses. As an example of successful autonomous hypothesis generation, the workflow identified an association between the dataset field labeled PD L1 status and dominant lesion size, yielding a statistically significant result in the initial exploratory analysis. We manually reviewed the agent generated finding and confirmed that the identified association was statistically significant. Rather than being prespecified by the investigators, this hypothesis emerged from the AI system’s autonomous detection of patterns within the dataset. To the best of our knowledge, this has not been previously published in the literature. We therefore present it as an AI-derived, hypothesis-generating result that illustrates the potential of agentic AI to support pattern discovery and early stage clinical hypothesis formation.

Conclusion:

Prompt constrained agentic AI can support autonomous pattern detection, hypothesis generation, and statistical testing in de identified patient datasets.

Keywords: agentic AI; clinical research; hypothesis generation; pattern detection; statistical testing

 
 
Top