A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata
2510.22266v1
cs.LG, cs.AI, cs.CY
2025-10-29
Авторы:
Rodrigo Tertulino, Ricardo Almeida
Abstract
Identifying the factors that influence student performance in basic education
is a central challenge for formulating effective public policies in Brazil.
This study introduces a multi-level machine learning approach to classify the
proficiency of 9th-grade and high school students using microdata from the
System of Assessment of Basic Education (SAEB). Our model uniquely integrates
four data sources: student socioeconomic characteristics, teacher professional
profiles, school indicators, and director management profiles. A comparative
analysis of four ensemble algorithms confirmed the superiority of a Random
Forest model, which achieved 90.2% accuracy and an Area Under the Curve (AUC)
of 96.7%. To move beyond prediction, we applied Explainable AI (XAI) using
SHAP, which revealed that the school's average socioeconomic level is the most
dominant predictor, demonstrating that systemic factors have a greater impact
than individual characteristics in isolation. The primary conclusion is that
academic performance is a systemic phenomenon deeply tied to the school's
ecosystem. This study provides a data-driven, interpretable tool to inform
policies aimed at promoting educational equity by addressing disparities
between schools.
Ссылки и действия
Дополнительные ресурсы: