S2Doc -- Spatial-Semantic Document Format
2511.01113v1
cs.DL, cs.CL, H.3.7; I.7.5; I.7.2
2025-11-06
Авторы:
Sebastian Kempf, Frank Puppe
Abstract
Documents are a common way to store and share information, with tables being
an important part of many documents. However, there is no real common
understanding of how to model documents and tables in particular. Because of
this lack of standardization, most scientific approaches have their own way of
modeling documents and tables, leading to a variety of different data
structures and formats that are not directly compatible. Furthermore, most data
models focus on either the spatial or the semantic structure of a document,
neglecting the other aspect. To address this, we developed S2Doc, a flexible
data structure for modeling documents and tables that combines both spatial and
semantic information in a single format. It is designed to be easily extendable
to new tasks and supports most modeling approaches for documents and tables,
including multi-page documents. To the best of our knowledge, it is the first
approach of its kind to combine all these aspects in a single format.