Mitigating Temporal Confabulation and Improving Calibrated Perception in Real-Time Vision&ndash;Language Models

Shahryar Wasif; Bin Hu

Previous Article in event

Benchmark of Subtyping Pathological Stimulus Persistence and Confabulation in Multimodal AI

Next Article in event

Enhancing Cognitive Function and Reducing Fatigue Through Occupational Therapy: Evidence from Postoperative Care in Older Adults

Next Article in session

Long-term curcumin supplementation in patients with anxiety prevents monocyte activation and reduces systemic sCD14 levels (a monocyte activation marker)

Mitigating Temporal Confabulation and Improving Calibrated Perception in Real-Time Vision–Language Models

Shahryar Wasif

Bin Hu

¹ Canadian Open Digital Health (OpenDH) program and Department of Clinical Neurosciences, Cumming school of Medicine, University of Calgary, Calgary T2N 1N4, Canada

Academic Editor: Jerrell Cassady

Published: 27 March 2026 by MDPI in The 1st International Online Conference on Behavioral Sciences session Experimental and Clinical Neurosciences

Abstract:

Introduction: Real-time vision–language models (VLMs) can exhibit “cognitive-like” failure patterns, including temporally unstable judgments, persistence of incorrect hypotheses, and overconfident confabulations under uncertainty. Conventional single-image benchmarks do not isolate these time-dependent behaviors. We present CMC (Confabulation Mitigation & Calibration), a lightweight reliability wrapper, and evaluate it within a shared cognitive testing platform that probes interpretable perceptual and metacognitive functions in both humans and AI.

Methods: The platform targets three functions relevant to hallucination-like errors: visual perception and orientation discrimination, temporal stability of belief across successive observations, and confidence calibration under time constraints. We used a Tumbling-E orientation task with randomized staircase difficulty to stress perceptual decision-making while recording response time, timeouts/abstentions, and step-to-step consistency. CMC combines selective re-analysis via change detection, a risk score that integrates temporal instability signals with uncertainty features, and a confirmation stage that routes high-risk outputs to verification or calibrated responses (verify/hedge/abstain). Human trials used a 3-second response budget; AI trials used a longer budget to distinguish perceptual failure from system latency.

Results: In 40 Tumbling-E trials, baseline accuracy was 50% (20/40) without CMC and improved to 75% (30/40) with CMC v1. CMC further reduced temporally unstable behaviors by escalating high-risk steps to verification and suppressing overconfident outputs when evidence was weak or timeouts were likely.

Conclusions: Evaluated as a cognitive testing problem rather than a static captioning task, CMC improves both accuracy and cognitive-faithful reliability, supporting rigorous neuroscience-aligned research and safer deployment of real-time VLM systems.

Keywords: Vision-language models, temporal hallucination, confabulation mitigation, confidence calibration

13 Reads
0 Recommendations

Shahryar Wasif

Bin Hu