Comparing GPT and Human Affective Evaluations of Social Images

Jongwan Kim

Previous Article in event

Baseline Vitamin B12 as a Prognostic Marker of Cognitive Progression in Mild Cognitive Impairment: A Systematic Review

Next Article in event

Does an OPTIMIST (Optimizing Psychosis Transformations Interpersonally with Mood, Insight, and Self-stigma Tracking) Approach Improve Prediction of Social Function in Individuals on the Psychosis Spectrum?

Comparing GPT and Human Affective Evaluations of Social Images

Jongwan Kim

¹ Department of Psychology, Jeonbuk National University, Jeonju-si 54896, South Korea

Academic Editor: John Parkinson

Published: 27 March 2026 by MDPI in The 1st International Online Conference on Behavioral Sciences session Cognition

Abstract:

Understanding how multimodal large language models interpret emotion is essential for evaluating their psychological plausibility. This study investigated how GPT encodes affect in complex social contexts by comparing its emotion ratings with human normative data. Using a database of 274 images depicting diverse interpersonal interactions, we examined model–human correspondence across both dimensional (valence, arousal) and categorical (angry, disgusted, fearful, happy, neutral, sad) measures. GPT-4o was instructed to generate continuous ratings on the same scales used in human assessments. At the dimensional level, GPT systematically assigned higher valence and arousal scores than humans, producing an affective landscape that appeared more positive and more activated overall. The model also exhibited greater variability within semantic categories, suggesting a looser or less constrained mapping of image features to affective dimensions. At the categorical level, confusion-matrix analyses revealed close alignment with human labels for highly prototypical expressions, particularly happiness, anger, and sadness. In contrast, the model frequently misclassified images associated with disgust, fear, and neutrality, indicating difficulty distinguishing among emotions that share overlapping contextual or semantic cues. Together, these findings suggest that GPT relies on a coarse, semantically organized evaluative axis when interpreting emotions in social scenes—an axis that only partially captures the structure of human affective representations.

Keywords: GPT; valence; arousal; social images

View Poster

26 Reads
0 Recommendations

Jongwan Kim