FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

2510.15232v1 cs.LG, cs.CL 2025-10-21

Авторы:

Tiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao, Arman Cohan, Chen Zhao

Abstract

Recent LLMs have demonstrated promising ability in solving finance related problems. However, applying LLMs in real-world finance application remains challenging due to its high risk and high stakes property. This paper introduces FinTrust, a comprehensive benchmark specifically designed for evaluating the trustworthiness of LLMs in finance applications. Our benchmark focuses on a wide range of alignment issues based on practical context and features fine-grained tasks for each dimension of trustworthiness evaluation. We assess eleven LLMs on FinTrust and find that proprietary models like o4-mini outperforms in most tasks such as safety while open-source models like DeepSeek-V3 have advantage in specific areas like industry-level fairness. For challenging task like fiduciary alignment and disclosure, all LLMs fall short, showing a significant gap in legal awareness. We believe that FinTrust can be a valuable benchmark for LLMs' trustworthiness evaluation in finance domain.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space

Towards Active Synthetic Data Generation for Finetuning Language Models

AlignSAE: Concept-Aligned Sparse Autoencoders

Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financ...

BanglaSentNet: An Explainable Hybrid Deep Learning Framework for Multi-Aspect Se...

Навигация