Diversity Augmentation of Dynamic User Preference Data for Boosting Personalized Text Summarizers
2510.10082v1
cs.CL, cs.LG
2025-10-15
Авторы:
Parthiv Chatterjee, Shivam Sonawane, Amey Hengle, Aditya Tanna, Sourish Dasgupta, Tanmoy Chakraborty
Abstract
Document summarization enables efficient extraction of user-relevant content
but is inherently shaped by individual subjectivity, making it challenging to
identify subjective salient information in multifaceted documents. This
complexity underscores the necessity for personalized summarization. However,
training models for personalized summarization has so far been challenging,
particularly because diverse training data containing both user preference
history (i.e., click-skip trajectory) and expected (gold-reference) summaries
are scarce. The MS/CAS PENS dataset is a valuable resource but includes only
preference history without target summaries, preventing end-to-end supervised
learning, and its limited topic-transition diversity further restricts
generalization. To address this, we propose $\mathrm{PerAugy}$, a novel
cross-trajectory shuffling and summary-content perturbation based data
augmentation technique that significantly boosts the accuracy of four
state-of-the-art baseline (SOTA) user-encoders commonly used in personalized
summarization frameworks (best result: $\text{0.132}$$\uparrow$ w.r.t AUC). We
select two such SOTA summarizer frameworks as baselines and observe that when
augmented with their corresponding improved user-encoders, they consistently
show an increase in personalization (avg. boost: $\text{61.2\%}\uparrow$ w.r.t.
PSE-SU4 metric). As a post-hoc analysis of the role of induced diversity in the
augmented dataset by \peraugy, we introduce three dataset diversity metrics --
$\mathrm{TP}$, $\mathrm{RTC}$, and \degreed\ to quantify the induced diversity.
We find that $\mathrm{TP}$ and $\mathrm{DegreeD}$ strongly correlate with
user-encoder performance on the PerAugy-generated dataset across all accuracy
metrics, indicating that increased dataset diversity is a key factor driving
performance gains.
Ссылки и действия
Дополнительные ресурсы: