What Questions Should Robots Be Able to Answer? A Dataset of User Questions for Explainable Robotics
2510.16435v1
cs.RO, cs.CL, cs.HC, I.2.9; H.5.2; H.5.0; I.2.8; I.2.7; J.4
2025-10-22
Авторы:
Lennart Wachowiak, Andrew Coles, Gerard Canal, Oya Celiktutan
Abstract
With the growing use of large language models and conversational interfaces
in human-robot interaction, robots' ability to answer user questions is more
important than ever. We therefore introduce a dataset of 1,893 user questions
for household robots, collected from 100 participants and organized into 12
categories and 70 subcategories. Most work in explainable robotics focuses on
why-questions. In contrast, our dataset provides a wide variety of questions,
from questions about simple execution details to questions about how the robot
would act in hypothetical scenarios -- thus giving roboticists valuable
insights into what questions their robot needs to be able to answer. To collect
the dataset, we created 15 video stimuli and 7 text stimuli, depicting robots
performing varied household tasks. We then asked participants on Prolific what
questions they would want to ask the robot in each portrayed situation. In the
final dataset, the most frequent categories are questions about task execution
details (22.5%), the robot's capabilities (12.7%), and performance assessments
(11.3%). Although questions about how robots would handle potentially difficult
scenarios and ensure correct behavior are less frequent, users rank them as the
most important for robots to be able to answer. Moreover, we find that users
who identify as novices in robotics ask different questions than more
experienced users. Novices are more likely to inquire about simple facts, such
as what the robot did or the current state of the environment. As robots enter
environments shared with humans and language becomes central to giving
instructions and interaction, this dataset provides a valuable foundation for
(i) identifying the information robots need to log and expose to conversational
interfaces, (ii) benchmarking question-answering modules, and (iii) designing
explanation strategies that align with user expectations.