Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)
2510.04950v1
cs.CL, cs.AI, cs.LG, cs.NE, stat.ME
2025-10-08
Авторы:
Om Dobariya, Akhil Kumar
Abstract
The wording of natural language prompts has been shown to influence the
performance of large language models (LLMs), yet the role of politeness and
tone remains underexplored. In this study, we investigate how varying levels of
prompt politeness affect model accuracy on multiple-choice questions. We
created a dataset of 50 base questions spanning mathematics, science, and
history, each rewritten into five tone variants: Very Polite, Polite, Neutral,
Rude, and Very Rude, yielding 250 unique prompts. Using ChatGPT 4o, we
evaluated responses across these conditions and applied paired sample t-tests
to assess statistical significance. Contrary to expectations, impolite prompts
consistently outperformed polite ones, with accuracy ranging from 80.8% for
Very Polite prompts to 84.8% for Very Rude prompts. These findings differ from
earlier studies that associated rudeness with poorer outcomes, suggesting that
newer LLMs may respond differently to tonal variation. Our results highlight
the importance of studying pragmatic aspects of prompting and raise broader
questions about the social dimensions of human-AI interaction.