AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives
2510.04983v2
cs.CL, cs.AI, cs.CY, cs.LG
2025-10-08
Авторы:
Khalid Mehtab Khan, Anagha Kulkarni
Abstract
Identifying cultural capital (CC) themes in student reflections can offer
valuable insights that help foster equitable learning environments in
classrooms. However, themes such as aspirational goals or family support are
often woven into narratives, rather than appearing as direct keywords. This
makes them difficult to detect for standard NLP models that process sentences
in isolation. The core challenge stems from a lack of awareness, as standard
models are pre-trained on general corpora, leaving them blind to the
domain-specific language and narrative context inherent to the data. To address
this, we introduce AWARE, a framework that systematically attempts to improve a
transformer model's awareness for this nuanced task. AWARE has three core
components: 1) Domain Awareness, adapting the model's vocabulary to the
linguistic style of student reflections; 2) Context Awareness, generating
sentence embeddings that are aware of the full essay context; and 3) Class
Overlap Awareness, employing a multi-label strategy to recognize the
coexistence of themes in a single sentence. Our results show that by making the
model explicitly aware of the properties of the input, AWARE outperforms a
strong baseline by 2.1 percentage points in Macro-F1 and shows considerable
improvements across all themes. This work provides a robust and generalizable
methodology for any text classification task in which meaning depends on the
context of the narrative.