Parametrising the Inhomogeneity Inducing Capacity of a Training Set, and its Impact on Supervised Learning
2510.18332v1
stat.ML, cs.LG, 62H20, 60G10, 68T05, 68T27, 60J20
2025-10-23
Авторы:
Gargi Roy, Dalia Chakrabarty
Abstract
We introduce parametrisation of that property of the available
training dataset, that necessitates an inhomogeneous correlation
structure for the function that is learnt as a model of the
relationship between the pair of variables, observations of which
comprise the considered training data. We refer to a parametrisation
of this property of a given training set, as its ``inhomogeneity
parameter''. It is easy to compute this parameter for small-to-large
datasets, and we demonstrate such computation on multiple
publicly-available datasets, while also demonstrating that
conventional ``non-stationarity'' of data does not imply a non-zero
inhomogeneity parameter of the dataset. We prove that - within the
probabilistic Gaussian Process-based learning approach - a training
set with a non-zero inhomogeneity parameter renders it imperative,
that the process that is invoked to model the sought function, be
non-stationary. Following the learning of a real-world multivariate
function with such a Process, quality and reliability of predictions
at test inputs, are demonstrated to be affected by the inhomogeneity
parameter of the training data.