Breakdance Video classification in the age of Generative AI
2510.20287v1
cs.CV, cs.AI, cs.LG
2025-10-25
Авторы:
Sauptik Dhar, Naveen Ramakrishnan, Michelle Munson
Abstract
Large Vision Language models have seen huge application in several sports
use-cases recently. Most of these works have been targeted towards a limited
subset of popular sports like soccer, cricket, basketball etc; focusing on
generative tasks like visual question answering, highlight generation. This
work analyzes the applicability of the modern video foundation models (both
encoder and decoder) for a very niche but hugely popular dance sports -
breakdance. Our results show that Video Encoder models continue to outperform
state-of-the-art Video Language Models for prediction tasks. We provide
insights on how to choose the encoder model and provide a thorough analysis
into the workings of a finetuned decoder model for breakdance video
classification.
Ссылки и действия
Дополнительные ресурсы: