LLM-based Contrastive Self-Supervised AMR Learning with Masked Graph Autoencoders for Fake News Detection

2508.18819v1 cs.CL, cs.SI 2025-08-28
Авторы:

Shubham Gupta, Shraban Kumar Chatterjee, Suman Kundu

Резюме на русском

#### Контекст М MODERN SOCIETY IS FACED WITH A SIGNIFICANT CHALLENGE DUE TO THE PROLIFERATION OF MISINFORMATION, WHICH COMPLICATES DECISION-MAKING AND UNDERMINES TRUST IN DIGITAL MEDIA. CURRENT APPROACHES TO FAKE NEWS DETECTION OFTEN FACE LIMITATIONS IN CAPTURING LONG-RANGE DEPENDENCIES, COMPLEX SEMANTIC RELATIONS, AND SOCIAL DYNAMICS THAT INFLUENCE THE DISSEMINATION OF NEWS. MOREOVER, THESE APPROACHES TYPICALLY REQUIRE EXTENSIVE LABELED DATASETS, WHICH MAKES THEIR DEPLOYMENT COSTLY AND RESOURCE-INTENSIVE. TO ADDRESS THESE CHALLENGES, THIS STUDY PROPOSES A NOVEL SELF-SUPERVISED FRAMEWORK FOR FAKE NEWS DETECTION THAT INTEGRATES COMPLEX SEMANTIC RELATIONS USING ABSTRACT MEANING REPRESENTATION (AMR) AND NEWS PROPAGATION DYNAMICS. THIS APPROACH AIMS TO OVERCOME THE SHORTCOMINGS OF EXISTING METHODOLOGIES BY HARNESSING THE POWER OF LARGE LANGUAGE MODELS (LLMS) AND ADVANCED GRAPH-BASED TECHNIQUES. #### Метод THE PROPOSED FRAMEWORK RELIES ON A COMBINATION OF ADVANCED NATURAL LANGUAGE PROCESSING (NLP) TECHNIQUES AND GRAPH-BASED LEARNING. THE KEY COMPONENTS INCLUDE: 1. **ABSTRACT MEANING REPRESENTATION (AMR):** THIS REPRESENTATION ENABLES THE CAPTURE OF COMPLEX SEMANTIC RELATIONS WITHIN NEWS TEXTS. BY TRANSFORMING TEXT INTO GRAPH STRUCTURES, IT ALLOWS THE MODEL TO IDENTIFY AND ANALYZE SEMANTIC COMPONENTS MORE EFFECTIVELY. 2. **MASKED GRAPH AUTOENCODERS (MGAE):** THIS TECHNIQUE IS EMPLOYED TO LEARN PROPAGATION FEATURES FROM SOCIAL CONTEXT GRAPHS. BY MASKING CERTAIN NODES AND RECONSTRUCTING THE GRAPH, THE MODEL CAN CAPTURE HIDDEN RELATIONS AND DYNAMICS WITHIN THE SOCIAL CONTEXT. 3. **LLM-BASED GRAPH CONTRASTIVE LOSS (LGCL):** THIS COMPONENT UTILIZES NEGATIVE ANCHOR POINTS GENERATED BY A LARGE LANGUAGE MODEL (LLM) TO ENHANCE THE SEPARABILITY OF FEATURES. THIS APPROACH ENABLES ZERO-SHOT LEARNING, WHERE THE MODEL CAN DISTINGUISH BETWEEN FAKE AND REAL NEWS WITHOUT EXPLICIT LABELS. 4. **MULTI-VIEW GRAPH LEARNING:** BY COMBINING SEMANTIC AND PROPAGATION-BASED FEATURES, THE FRAMEWORK CAN BETTER UNDERSTAND THE UNDERLYING STRUCTURE AND DYNAMICS OF NEWS DISSEMINATION. #### Результаты THE PERFORMANCE OF THE PROPOSED FRAMEWORK WAS EVALUATED USING A VARIETY OF EXPERIMENTS ON BENCHMARK DATASETS. THE MODEL WAS TRAINED AND TESTED USING A COMBINATION OF LABELED AND UNLABELED DATA, DEMONSTRATING ITS ABILITY TO PERFORM WELL IN SELF-SUPERVISED SETTINGS. COMPARISON WITH STATE-OF-THE-ART METHODOLOGIES SHOWED THAT THE PROPOSED APPROACH ACHIEVED SUPERIOR ACCURACY AND GENERALIZABILITY, EVEN WHEN WORKING WITH LIMITED LABELED DATA. SPECIFICALLY, THE INTRODUCTION OF THE LGCL LOSS FUNCTION AND THE MGAE COMPONENT SIGNIFICANTLY ENHANCED THE MODEL'S ABILITY TO CAPTURE COMPLEX SEMANTIC RELATIONS AND SOCIAL CONTEXTS. #### Значимость THE PROPOSED FRAMEWORK HAS WIDE APPLICATIONS IN THE FIELDS OF DIGITAL MEDIA, SOCIAL NETWORK ANALYSIS, AND MISINFORMATION DETECTION. BY INCORPORATING LARGE LANGUAGE MODELS AND ADVANCED GRAPH-BASED LEARNING TECHNIQUES, THE METHODOLOGY PROVIDES A ROBUST AND EFFICIENT SOLUTION FOR IDENTIFYING FAKE NEWS. ITS ABILITY TO OPERATE IN A SELF-SUPERVISED MANNER REDUCES THE NEED FOR EXTENSIVE LABELED DATA, MAKING IT MORE ACCESSIBLE AND COST-EFFECTIVE. FURTHERMORE, THE INTEGRATION OF SOCIAL CONTEXT FEATURES ENHANCES THE MODEL'S ABILITY TO UNDERSTAND THE UNDERLYING DYNAMICS OF NEWS DISSEMINATION, WHICH IS CRUCIAL FOR DETECTING MISINFORMATION. #### Выводы THE STUDY DEMONSTRATES THE EFFECTIVENESS OF THE PROPOSED SELF-SUPERVISED FRAMEWORK IN FAKE NEWS DETECTION. IT ACHIEVES SUPERIOR PERFORMANCE COMPARED TO STATE-OF-THE-ART METHODOLOGIES, EVEN WITH LIMITED LABELED DATA. THE INTEGRATION OF AMR, MGAE, AND LGCL TECHNIQUES PROVIDES A COMPREHENSIVE APPROACH TO CAPTURING COMPLEX SEMANTIC RELATIONS AND SOCIAL DYNAMICS. FUTURE WORK WILL FOCUS ON EXPANDING THE SCOPE OF APPLICATION TO OTHER FORMS OF MISINFORMATION AND IMPROVING THE MODEL'S ABILITY TO HANDLE MULTI-LINGUAL AND CROSS-DOMAIN CHALLENGES.

Abstract

The proliferation of misinformation in the digital age has led to significant societal challenges. Existing approaches often struggle with capturing long-range dependencies, complex semantic relations, and the social dynamics influencing news dissemination. Furthermore, these methods require extensive labelled datasets, making their deployment resource-intensive. In this study, we propose a novel self-supervised misinformation detection framework that integrates both complex semantic relations using Abstract Meaning Representation (AMR) and news propagation dynamics. We introduce an LLM-based graph contrastive loss (LGCL) that utilizes negative anchor points generated by a Large Language Model (LLM) to enhance feature separability in a zero-shot manner. To incorporate social context, we employ a multi view graph masked autoencoder, which learns news propagation features from social context graph. By combining these semantic and propagation-based features, our approach effectively differentiates between fake and real news in a self-supervised manner. Extensive experiments demonstrate that our self-supervised framework achieves superior performance compared to other state-of-the-art methodologies, even with limited labelled datasets while improving generalizability.

Ссылки и действия