Evaluating Agentic AI for Autonomous Hypothesis Generation in Clinical Datasets

Abdullah Hamdan; Ola Hamdan

Previous Article in event

Clinical outcomes of Sacituzumab Govitecan in metastatic breast cancer: evidence of survival and quality of life from a systematic review and meta analysis

Previous Article in session

A systematic LC-HRMS urinary metabolomics framework for identifying urothelial malignancy biomarkers within a comparative multi-cancer study

Next Article in event

Bioinformatic Identification of ACTN4 as a Candidate Master Regulator of Local Invasion in Glioblastoma

Evaluating Agentic AI for Autonomous Hypothesis Generation in Clinical Datasets

Abdullah Muhammad Saied Assad Hamdan

^{*

1},

Ola Hamdan

¹ School of Medicine, The University of Jordan, Queen Rania Street, Amman 11942, Jordan
² Jordan Center for Disease Control (JCDC), Amman 11183, Jordan

Academic Editor: Guo-Min Li

Published: 05 June 2026 by MDPI in The 5th International Electronic Conference on Cancers session Novel Methods and Technologies for Research and Treatment

Abstract:

Background:

Generative agentic AI systems are increasingly used to accelerate clinical research workflows, but their reliability for hypothesis discovery in patient datasets remains insufficiently evaluated. In this study, we evaluated whether an autonomous AI research agent could independently examine a deidentified clinical dataset, identify a significant association, select an appropriate statistical test, and produce a scientifically sound hypothesis.

Methods:

We evaluated ChatGPT in agent mode using structured prompt engineering and guardrails designed to reduce hallucination and improve its ability to detect patterns across patient records, identify similarities between cases, and uncover potential links and associations. As a benchmark task, we used the publicly available Brain Mets Lung MRI Path Segs dataset from The Cancer Imaging Archive, which contains clinical notes and pathology related information. We assessed whether the agent could independently identify a clinically relevant association and generate a significant hypothesis.

Results:

After guardrail based prompt refinement, the agent produced more grounded and auditable analyses. As an example of successful autonomous hypothesis generation, the workflow identified an association between the dataset field labeled PD L1 status and dominant lesion size, yielding a statistically significant result in the initial exploratory analysis. We manually reviewed the agent generated finding and confirmed that the identified association was statistically significant. Rather than being prespecified by the investigators, this hypothesis emerged from the AI system’s autonomous detection of patterns within the dataset. To the best of our knowledge, this has not been previously published in the literature. We therefore present it as an AI-derived, hypothesis-generating result that illustrates the potential of agentic AI to support pattern discovery and early stage clinical hypothesis formation.

Conclusion:

Prompt constrained agentic AI can support autonomous pattern detection, hypothesis generation, and statistical testing in de identified patient datasets.

Keywords: agentic AI; clinical research; hypothesis generation; pattern detection; statistical testing

View Poster

32 Reads
1 Recommendation

Abdullah Hamdan

Ola Hamdan