uncertainty_expression.png

Abstract

Background - Due to the human-like nature, large language models (LLMs) often output expressions of uncertainty such as "I'm sure that [...]" or "It could be [...]". However, few studies have explored how these expressions impact human users.

Our Work - To address this gap, we conducted a between-condition study (N = 156). Using the popular word guessing game Codenames, we simulated how LLMs assist humans in decision-making and explored the effects of different levels of verbalized uncertainty on user trust, satisfaction, and performance.

Screenshot of Codenames

Screenshot of Codenames

Findings - Our results showed that regardless of accuracy, medium uncertainty (expressed in plain text) consistently outperformed both high and low uncertainty across all metrics. Additionally, our qualitative findings revealed that users notice differences in uncertainty at varying levels of accuracy, nuances that quantitative metrics don't fully capture. This study offers important implications for the future design of LLMs, recommending balanced expressions of uncertainty and emphasizing accuracy first, especially for difficult tasks or when AI capability is low.

Authors: Zhengtao Xu, Tianqi Song, Yi-Chieh Lee


My Contribution: