DL101 Neural Network Outputs and Loss Functions
2511.05131v1
cs.LG, cs.AI, cs.NE
2025-11-11
Авторы:
Fernando Berzal
Abstract
The loss function used to train a neural network is strongly connected to its
output layer from a statistical point of view. This technical report analyzes
common activation functions for a neural network output layer, like linear,
sigmoid, ReLU, and softmax, detailing their mathematical properties and their
appropriate use cases. A strong statistical justification exists for the
selection of the suitable loss function for training a deep learning model.
This report connects common loss functions such as Mean Squared Error (MSE),
Mean Absolute Error (MAE), and various Cross-Entropy losses to the statistical
principle of Maximum Likelihood Estimation (MLE). Choosing a specific loss
function is equivalent to assuming a specific probability distribution for the
model output, highlighting the link between these functions and the Generalized
Linear Models (GLMs) that underlie network output layers. Additional scenarios
of practical interest are also considered, such as alternative output
encodings, constrained outputs, and distributions with heavy tails.
Ссылки и действия
Дополнительные ресурсы: