AI systems and their responses to patient inquiries regarding common cancer symptoms

Reuben Thaker; Asutosh Gor

Previous Article in event

Predictive Value of Visual Assessment During High-Resolution Anoscopy in Anal Cancer Screening

Previous Article in session

Comprehensive Computational Profiling of MSH3 Missense Mutations and Stability Effects in Cancers and Neurodegenerative Disorders

Next Article in event

NDS: A Novel Deep Learning-Based Systems Biology Framework for Identifying Prognostic Biomarkers in Hepatocellular Carcinoma

AI systems and their responses to patient inquiries regarding common cancer symptoms

Reuben Thaker

^*,

Asutosh Gor

¹ Carolina Blood and Cancer Care Associates, 583 Healthcare Drive Rock Hill, SC 29732, USA

Academic Editor: Guo-Min Li

Published: 05 June 2026 by MDPI in The 5th International Electronic Conference on Cancers session Novel Methods and Technologies for Research and Treatment

Abstract:

Abstract
Patients and physicians are increasingly using Artificial Intelligence (1-2). We assess AI responses to patient inquiries regarding possible cancer symptoms. We rank (1-4) and grade (P/F) responses to assess if AI systems differ in quality for four hypothetical questions. We examine whether specialty affects AI assessments. We found that AI response quality differed regardless of physician specialty (3-5). Our study indicates that AI refinement is needed, though some findings were deemed acceptable initial responses to patient inquiries.

Methods
We evaluate four AI models [A1/Gemini 2.0, A2/ChatGPT turbo, A3/Claude 3.7, A4/ChatGPT3.5] responding to possible cancer symptoms (6-9). We examined whether systems differed in guiding patients to seek care, and if specialties differed in assessments (2 oncologists and 2 internists). Physicians ranked (1-4) and graded responses (pass/fail) regarding hypothetical patient inquiries:

"What should I do?"

Q1: '’There was blood on the toilet paper in the bathroom'’.

Q2: '’I have a lump in my breast’’.

Q3: “A mole changed color and bleeds sometimes.’’

Q4: “I chew tobacco and a sore in my mouth wont heal.’’

Results
Friedman test: x²=11.77 (p=0.0008). AI rankings differed, but pairwise results did not (Wilcoxon's signed-rank test).
Chi-square: x²=18.04 (p=0.00043). AI grades differed, as did AI-1(p=.0049) AI-2(p=0.036) vs AI-4 (Fisher’s exact test).
Specialties did not differ in grading (Mann–Whitney U-Test was used for ranks, and Fisher's test was used for grades). Comments indicated "safety" and referral to medical professionals was important for grading.

Discussion
Physicians may not advocate AI usage, but patient usage does occur. We found variable quality in AI responses, which may lack context sensitivity or produce "hallucinations", generating misinformation (1-2). Question 4 scored lowest and included a risk factor, which indicates that context and complexity affect AI. Future research may assess real-world patient interactions (3). Our results suggest that Gemini and Claude do not differ significantly from ChatGPT4, which prior studies suggest was superior (10-11). AI programs were all upgraded as of mid-2025.

Keywords: artificial intelligence, cancer disparities, health disparities, language learning models, cancer informatics

7 Reads
0 Recommendations

Reuben Thaker

Asutosh Gor