Does Model Size Matter? A Comparison of Small and Large Language Models for Requirements Classification
2510.21443v1
cs.SE, cs.AI, cs.CL
2025-10-28
Авторы:
Mohammad Amin Zadenoori, Vincenzo De Martino, Jacek Dabrowski, Xavier Franch, Alessio Ferrari
Abstract
[Context and motivation] Large language models (LLMs) show notable results in
natural language processing (NLP) tasks for requirements engineering (RE).
However, their use is compromised by high computational cost, data sharing
risks, and dependence on external services. In contrast, small language models
(SLMs) offer a lightweight, locally deployable alternative. [Question/problem]
It remains unclear how well SLMs perform compared to LLMs in RE tasks in terms
of accuracy. [Results] Our preliminary study compares eight models, including
three LLMs and five SLMs, on requirements classification tasks using the
PROMISE, PROMISE Reclass, and SecReq datasets. Our results show that although
LLMs achieve an average F1 score of 2% higher than SLMs, this difference is not
statistically significant. SLMs almost reach LLMs performance across all
datasets and even outperform them in recall on the PROMISE Reclass dataset,
despite being up to 300 times smaller. We also found that dataset
characteristics play a more significant role in performance than model size.
[Contribution] Our study contributes with evidence that SLMs are a valid
alternative to LLMs for requirements classification, offering advantages in
privacy, cost, and local deployability.
Ссылки и действия
Дополнительные ресурсы: