Advancing Hate Speech Detection with Transformers: Insights from the MetaHate

2508.04913v1 cs.LG, cs.CL 2025-08-09

Авторы:

Santosh Chapagain, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi

Резюме на русском

Хейт спич — одна из наиболее вредных и распространенных форм дискурса в социальных сетях, включая X (ранее Twitter), Facebook и Instagram. Он приводит к серьезным последствиям для индивидуумов и общества. Авторы статьи "Advancing Hate Speech Detection with Transformers: Insights from the MetaHate" предлагают использовать трансформерные модели для его автоматического замещения. Они исследовали 36 датасетов, объединенных в MetaHate (1,2 млн примеров), и сравнили такие модели, как BERT, RoBERTa, GPT-2 и ELECTRA. Файнайдженная модель ELECTRA показала F1-меру 0,8980, лучше других. Анализ ошибок показал, что модели сталкиваются с сарказмом, кодированным языком и шумом в метках. Результаты подтверждают мощность трансформеров в решении задачи детекции хейт спича и выделяют необходимость улучшения понимания контекста в моделях.

Abstract

Hate speech is a widespread and harmful form of online discourse, encompassing slurs and defamatory posts that can have serious social, psychological, and sometimes physical impacts on targeted individuals and communities. As social media platforms such as X (formerly Twitter), Facebook, Instagram, Reddit, and others continue to facilitate widespread communication, they also become breeding grounds for hate speech, which has increasingly been linked to real-world hate crimes. Addressing this issue requires the development of robust automated methods to detect hate speech in diverse social media environments. Deep learning approaches, such as vanilla recurrent neural networks (RNNs), long short-term memory (LSTM), and convolutional neural networks (CNNs), have achieved good results, but are often limited by issues such as long-term dependencies and inefficient parallelization. This study represents the comprehensive exploration of transformer-based models for hate speech detection using the MetaHate dataset--a meta-collection of 36 datasets with 1.2 million social media samples. We evaluate multiple state-of-the-art transformer models, including BERT, RoBERTa, GPT-2, and ELECTRA, with fine-tuned ELECTRA achieving the highest performance (F1 score: 0.8980). We also analyze classification errors, revealing challenges with sarcasm, coded language, and label noise.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Advancing Hate Speech Detection with Transformers: Insights from the MetaHate

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space

Towards Active Synthetic Data Generation for Finetuning Language Models

AlignSAE: Concept-Aligned Sparse Autoencoders

Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financ...

BanglaSentNet: An Explainable Hybrid Deep Learning Framework for Multi-Aspect Se...

Навигация