How Well Do LLMs Imitate Human Writing Style?
2509.24930v1
cs.CL, cs.CY, I.2.7
2025-10-01
Авторы:
Rebira Jemama, Rajesh Kumar
Abstract
Large language models (LLMs) can generate fluent text, but their ability to
replicate the distinctive style of a specific human author remains unclear. We
present a fast, training-free framework for authorship verification and style
imitation analysis. The method integrates TF-IDF character n-grams with
transformer embeddings and classifies text pairs through empirical distance
distributions, eliminating the need for supervised training or threshold
tuning. It achieves 97.5\% accuracy on academic essays and 94.5\% in
cross-domain evaluation, while reducing training time by 91.8\% and memory
usage by 59\% relative to parameter-based baselines. Using this framework, we
evaluate five LLMs from three separate families (Llama, Qwen, Mixtral) across
four prompting strategies - zero-shot, one-shot, few-shot, and text completion.
Results show that the prompting strategy has a more substantial influence on
style fidelity than model size: few-shot prompting yields up to 23.5x higher
style-matching accuracy than zero-shot, and completion prompting reaches 99.9\%
agreement with the original author's style. Crucially, high-fidelity imitation
does not imply human-like unpredictability - human essays average a perplexity
of 29.5, whereas matched LLM outputs average only 15.2. These findings
demonstrate that stylistic fidelity and statistical detectability are
separable, establishing a reproducible basis for future work in authorship
modeling, detection, and identity-conditioned generation.
Ссылки и действия
Дополнительные ресурсы: