Automatic essay scoring: leveraging Jaccard coefficient and Cosine similaritywith n-gram variation in vector space model approach
2510.15311v1
cs.CL, cs.CY, cs.SE
2025-10-21
Авторы:
Andharini Dwi Cahyani, Moh. Wildan Fathoni, Fika Hastarita Rachman, Ari Basuki, Salman Amin, Bain Khusnul Khotimah
Abstract
Automated essay scoring (AES) is a vital area of research aiming to provide
efficient and accurate assessment tools for evaluating written content. This
study investigates the effectiveness of two popular similarity metrics, Jaccard
coefficient, and Cosine similarity, within the context of vector space
models(VSM)employing unigram, bigram, and trigram representations. The data
used in this research was obtained from the formative essay of the citizenship
education subject in a junior high school. Each essay undergoes preprocessing
to extract features using n-gram models, followed by vectorization to transform
text data into numerical representations. Then, similarity scores are computed
between essays using both Jaccard coefficient and Cosine similarity. The
performance of the system is evaluated by analyzing the root mean square error
(RMSE), which measures the difference between the scores given by human graders
and those generated by the system. The result shows that the Cosine similarity
outperformed the Jaccard coefficient. In terms of n-gram, unigrams have lower
RMSE compared to bigrams and trigrams.
Ссылки и действия
Дополнительные ресурсы: