From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding
2510.01513v1
cs.CV, cs.AI, cs.CL, cs.IR
2025-10-04
Авторы:
Basem Rizk, Joel Walsh, Mark Core, Benjamin Nye
Abstract
Analysis of multi-modal content can be tricky, computationally expensive, and
require a significant amount of engineering efforts. Lots of work with
pre-trained models on static data is out there, yet fusing these opensource
models and methods with complex data such as videos is relatively challenging.
In this paper, we present a framework that enables efficiently prototyping
pipelines for multi-modal content analysis. We craft a candidate recipe for a
pipeline, marrying a set of pre-trained models, to convert videos into a
temporal semi-structured data format. We translate this structure further to a
frame-level indexed knowledge graph representation that is query-able and
supports continual learning, enabling the dynamic incorporation of new
domain-specific knowledge through an interactive medium.