Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models
2511.04728v1
cs.CR, cs.AI
2025-11-11
Авторы:
Daniyal Ganiuly, Assel Smaiyl
Abstract
Phishing emails continue to pose a persistent challenge to online
communication, exploiting human trust and evading automated filters through
realistic language and adaptive tactics. While large language models (LLMs)
such as GPT-4 and LLaMA-3-8B achieve strong accuracy in text classification,
their deployment in security systems requires assessing reliability beyond
benchmark performance. To address this, this study introduces the
Trustworthiness Calibration Framework (TCF), a reproducible methodology for
evaluating phishing detectors across three dimensions: calibration,
consistency, and robustness. These components are integrated into a bounded
index, the Trustworthiness Calibration Index (TCI), and complemented by the
Cross-Dataset Stability (CDS) metric that quantifies stability of
trustworthiness across datasets. Experiments conducted on five corpora, such as
SecureMail 2025, Phishing Validation 2024, CSDMC2010, Enron-Spam, and Nazario,
using DeBERTa-v3-base, LLaMA-3-8B, and GPT-4 demonstrate that GPT-4 achieves
the strongest overall trust profile, followed by LLaMA-3-8B and
DeBERTa-v3-base. Statistical analysis confirms that reliability varies
independently of raw accuracy, underscoring the importance of trust-aware
evaluation for real-world deployment. The proposed framework establishes a
transparent and reproducible foundation for assessing model dependability in
LLM-based phishing detection.
Ссылки и действия
Дополнительные ресурсы: