Evaluating classification performance across operating contexts: A comparison of decision curve analysis and cost curves
2509.24608v1
cs.LG, stat.ML
2025-10-01
Авторы:
Louise AC Millard, Peter A Flach
Abstract
Classification models typically predict a score and use a decision threshold
to produce a classification. Appropriate model evaluation should carefully
consider the context in which a model will be used, including the relative
value of correct classifications of positive versus negative examples, which
affects the threshold that should be used. Decision curve analysis (DCA) and
cost curves are model evaluation approaches that assess the expected utility
and expected loss of prediction models, respectively, across decision
thresholds. We compared DCA and cost curves to determine how they are related,
and their strengths and limitations. We demonstrate that decision curves are
closely related to a specific type of cost curve called a Brier curve. Both
curves are derived assuming model scores are calibrated and setting the
classification threshold using the relative value of correct positive and
negative classifications, and the x-axis of both curves are equivalent. Net
benefit (used for DCA) and Brier loss (used for Brier curves) will always
choose the same model as optimal at any given threshold. Across thresholds,
differences in Brier loss are comparable whereas differences in net benefit
cannot be compared. Brier curves are more generally applicable (when a wider
range of thresholds are plausible), and the area under the Brier curve is the
Brier score. We demonstrate that reference lines common in each space can be
included in either and suggest the upper envelope decision curve as a useful
comparison for DCA showing the possible gain in net benefit that could be
achieved through recalibration alone.
Ссылки и действия
Дополнительные ресурсы: