Adaptive Sample Sharing for Linear Regression
2510.16986v1
stat.ML, cs.LG, stat.OT
2025-10-22
Авторы:
Hamza Cherkaoui, Hélène Halconruy, Yohan Petetin
Abstract
In many business settings, task-specific labeled data are scarce or costly to
obtain, which limits supervised learning on a specific task. To address this
challenge, we study sample sharing in the case of ridge regression: leveraging
an auxiliary data set while explicitly protecting against negative transfer. We
introduce a principled, data-driven rule that decides how many samples from an
auxiliary dataset to add to the target training set. The rule is based on an
estimate of the transfer gain i.e. the marginal reduction in the predictive
error. Building on this estimator, we derive finite-sample guaranties: under
standard conditions, the procedure borrows when it improves parameter
estimation and abstains otherwise. In the Gaussian feature setting, we analyze
which data set properties ensure that borrowing samples reduces the predictive
error. We validate the approach in synthetic and real datasets, observing
consistent gains over strong baselines and single-task training while avoiding
negative transfer.