Machine learning methods fail to provide cohesive atheoretical construction of personality traits from semantic embeddings
2510.09739v1
cs.LG, cs.AI, cs.CL, cs.HC
2025-10-15
Авторы:
Ayoub Bouguettaya, Elizabeth M. Stuart
Abstract
The lexical hypothesis posits that personality traits are encoded in language
and is foundational to models like the Big Five. We created a bottom-up
personality model from a classic adjective list using machine learning and
compared its descriptive utility against the Big Five by analyzing one million
Reddit comments. The Big Five, particularly Agreeableness, Conscientiousness,
and Neuroticism, provided a far more powerful and interpretable description of
these online communities. In contrast, our machine-learning clusters provided
no meaningful distinctions, failed to recover the Extraversion trait, and
lacked the psychometric coherence of the Big Five. These results affirm the
robustness of the Big Five and suggest personality's semantic structure is
context-dependent. Our findings show that while machine learning can help check
the ecological validity of established psychological theories, it may not be
able to replace them.