Comparing Contrastive and Triplet Loss in Audio-Visual Embedding: Intra-Class Variance and Greediness Analysis
2510.02161v1
cs.MM, cs.AI, cs.LG
2025-10-04
Авторы:
Donghuo Zeng
Abstract
Contrastive loss and triplet loss are widely used objectives in deep metric
learning, yet their effects on representation quality remain insufficiently
understood. We present a theoretical and empirical comparison of these losses,
focusing on intra- and inter-class variance and optimization behavior (e.g.,
greedy updates). Through task-specific experiments with consistent settings on
synthetic data and real datasets-MNIST, CIFAR-10-it is shown that triplet loss
preserves greater variance within and across classes, supporting finer-grained
distinctions in the learned representations. In contrast, contrastive loss
tends to compact intra-class embeddings, which may obscure subtle semantic
differences. To better understand their optimization dynamics, By examining
loss-decay rate, active ratio, and gradient norm, we find that contrastive loss
drives many small updates early on, while triplet loss produces fewer but
stronger updates that sustain learning on hard examples. Finally, across both
classification and retrieval tasks on MNIST, CIFAR-10, CUB-200, and CARS196
datasets, our results consistently show that triplet loss yields superior
performance, which suggests using triplet loss for detail retention and
hard-sample focus, and contrastive loss for smoother, broad-based embedding
refinement.
Ссылки и действия
Дополнительные ресурсы: