Who is a Better Matchmaker? Human vs. Algorithmic Judge Assignment in a High-Stakes Startup Competition
2510.12692v1
cs.HC, cs.AI, cs.CL, cs.CY, cs.LG
2025-10-16
Авторы:
Sarina Xi, Orelia Pi, Miaomiao Zhang, Becca Xiong, Jacqueline Ng Lane, Nihar B. Shah
Abstract
There is growing interest in applying artificial intelligence (AI) to
automate and support complex decision-making tasks. However, it remains unclear
how algorithms compare to human judgment in contexts requiring semantic
understanding and domain expertise. We examine this in the context of the judge
assignment problem, matching submissions to suitably qualified judges.
Specifically, we tackled this problem at the Harvard President's Innovation
Challenge, the university's premier venture competition awarding over \$500,000
to student and alumni startups. This represents a real-world environment where
high-quality judge assignment is essential. We developed an AI-based
judge-assignment algorithm, Hybrid Lexical-Semantic Similarity Ensemble (HLSE),
and deployed it at the competition. We then evaluated its performance against
human expert assignments using blinded match-quality scores from judges on
$309$ judge-venture pairs. Using a Mann-Whitney U statistic based test, we
found no statistically significant difference in assignment quality between the
two approaches ($AUC=0.48, p=0.40$); on average, algorithmic matches are rated
$3.90$ and manual matches $3.94$ on a 5-point scale, where 5 indicates an
excellent match. Furthermore, manual assignments that previously required a
full week could be automated in several hours by the algorithm during
deployment. These results demonstrate that HLSE achieves human-expert-level
matching quality while offering greater scalability and efficiency,
underscoring the potential of AI-driven solutions to support and enhance human
decision-making for judge assignment in high-stakes settings.