Beat Detection as Object Detection
2510.14391v1
cs.SD, cs.AI, cs.LG
2025-10-18
Авторы:
Jaehoon Ahn, Moon-Ryul Jung
Abstract
Recent beat and downbeat tracking models (e.g., RNNs, TCNs, Transformers)
output frame-level activations. We propose reframing this task as object
detection, where beats and downbeats are modeled as temporal "objects."
Adapting the FCOS detector from computer vision to 1D audio, we replace its
original backbone with WaveBeat's temporal feature extractor and add a Feature
Pyramid Network to capture multi-scale temporal patterns. The model predicts
overlapping beat/downbeat intervals with confidence scores, followed by
non-maximum suppression (NMS) to select final predictions. This NMS step serves
a similar role to DBNs in traditional trackers, but is simpler and less
heuristic. Evaluated on standard music datasets, our approach achieves
competitive results, showing that object detection techniques can effectively
model musical beats with minimal adaptation.
Ссылки и действия
Дополнительные ресурсы: