Frugal Federated Learning for Violence Detection: A Comparison of LoRA-Tuned VLMs and Personalized CNNs
2510.17651v1
cs.CV, cs.AI, cs.LG
2025-10-22
Авторы:
Sébastien Thuau, Siba Haidar, Ayush Bajracharya, Rachid Chelouah
Abstract
We examine frugal federated learning approaches to violence detection by
comparing two complementary strategies: (i) zero-shot and federated fine-tuning
of vision-language models (VLMs), and (ii) personalized training of a compact
3D convolutional neural network (CNN3D). Using LLaVA-7B and a 65.8M parameter
CNN3D as representative cases, we evaluate accuracy, calibration, and energy
usage under realistic non-IID settings. Both approaches exceed 90% accuracy.
CNN3D slightly outperforms Low-Rank Adaptation(LoRA)-tuned VLMs in ROC AUC and
log loss, while using less energy. VLMs remain favorable for contextual
reasoning and multimodal inference. We quantify energy and CO$_2$ emissions
across training and inference, and analyze sustainability trade-offs for
deployment. To our knowledge, this is the first comparative study of LoRA-tuned
vision-language models and personalized CNNs for federated violence detection,
with an emphasis on energy efficiency and environmental metrics. These findings
support a hybrid model: lightweight CNNs for routine classification, with
selective VLM activation for complex or descriptive scenarios. The resulting
framework offers a reproducible baseline for responsible, resource-aware AI in
video surveillance, with extensions toward real-time, multimodal, and
lifecycle-aware systems.
Ссылки и действия
Дополнительные ресурсы: