Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation
2510.09722v1
cs.CL, cs.AI, cs.CV
2025-10-15
Авторы:
Fanwei Zhu, Jinke Yu, Zulong Chen, Ying Zhou, Junhao Ji, Zhibo Yang, Yuxue Zhang, Haoyuan Hu, Zhenghao Liu
Abstract
Automated resume information extraction is critical for scaling talent
acquisition, yet its real-world deployment faces three major challenges: the
extreme heterogeneity of resume layouts and content, the high cost and latency
of large language models (LLMs), and the lack of standardized datasets and
evaluation tools. In this work, we present a layout-aware and
efficiency-optimized framework for automated extraction and evaluation that
addresses all three challenges. Our system combines a fine-tuned layout parser
to normalize diverse document formats, an inference-efficient LLM extractor
based on parallel prompting and instruction tuning, and a robust two-stage
automated evaluation framework supported by new benchmark datasets. Extensive
experiments show that our framework significantly outperforms strong baselines
in both accuracy and efficiency. In particular, we demonstrate that a
fine-tuned compact 0.6B LLM achieves top-tier accuracy while significantly
reducing inference latency and computational cost. The system is fully deployed
in Alibaba's intelligent HR platform, supporting real-time applications across
its business units.
Ссылки и действия
Дополнительные ресурсы: