HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation

2508.03104v1 cs.LG, cs.AI 2025-08-09

Авторы:

Mengting Pan, Fan Li, Xiaoyang Wang, Wenjie Zhang, Xuemin Lin

Резюме на русском

Недавние развития в области контрастивного обучения (CL) на графах и гиперграфах позволили создавать эффективные модели без требований к меток. Однако для текстовоатрибутированных гиперграфов (TAHGs), в которых узлы связаны с богатым текстовым контекстом, существуют существенные ограничения. Недостаточное использование контекста текста вместе с топологией гиперграфа приводит к ограниченным представлениям. Дополнительно, случайные датасеты и ограничение на узлы и hyperedges не позволяют раскрыть все возможности CL. Таким образом, предлагается HiTeC — двухэтапный фреймворк с семантически ориентированным усилением (semantically-aware augmentation). В первой стадии вводится структурно-ориентированный контраст для текстового энкодера, во второй — семантически обогащенные методы генерации противоположностей. Новая многомерная квота позволяет учитывать длинные зависимости. Эксперименты подтверждают, что HiTeC эффективно расширяет границы CL для TAHGs, обеспечивая лучшую точность и сложность представлений.

Abstract

Contrastive learning (CL) has become a dominant paradigm for self-supervised hypergraph learning, enabling effective training without costly labels. However, node entities in real-world hypergraphs are often associated with rich textual information, which is overlooked in prior works. Directly applying existing CL-based methods to such text-attributed hypergraphs (TAHGs) leads to three key limitations: (1) The common use of graph-agnostic text encoders overlooks the correlations between textual content and hypergraph topology, resulting in suboptimal representations. (2) Their reliance on random data augmentations introduces noise and weakens the contrastive objective. (3) The primary focus on node- and hyperedge-level contrastive signals limits the ability to capture long-range dependencies, which is essential for expressive representation learning. Although HyperBERT pioneers CL on TAHGs, its co-training paradigm suffers from poor scalability. To fill the research gap, we introduce HiTeC, a two-stage hierarchical contrastive learning framework with semantic-aware augmentation for scalable and effective self-supervised learning on TAHGs. In the first stage, we pre-train the text encoder with a structure-aware contrastive objective to overcome the graph-agnostic nature of conventional methods. In the second stage, we introduce two semantic-aware augmentation strategies, including prompt-enhanced text augmentation and semantic-aware hyperedge drop, to facilitate informative view generation. Furthermore, we propose a multi-scale contrastive loss that extends existing objectives with an $s$-walk-based subgraph-level contrast to better capture long-range dependencies. By decoupling text encoder pretraining from hypergraph contrastive learning, this two-stage design enhances scalability without compromising representation quality. Extensive experiments confirm the effectiveness of HiTeC.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation

Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training

Навигация