Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models
2511.00519v1
cs.CL, I.2.7; I.7.1; K.4.1
2025-11-06
Авторы:
Ariyan Hossain, Khondokar Mohammad Ahanaf Hannan, Rakinul Haque, Nowreen Tarannum Rafa, Humayra Musarrat, Shoaib Ahmed Dipu, Farig Yousuf Sadeque
Abstract
Gender bias in language models has gained increasing attention in the field
of natural language processing. Encoder-based transformer models, which have
achieved state-of-the-art performance in various language tasks, have been
shown to exhibit strong gender biases inherited from their training data. This
paper investigates gender bias in contextualized word embeddings, a crucial
component of transformer-based models. We focus on prominent architectures such
as BERT, ALBERT, RoBERTa, and DistilBERT to examine their vulnerability to
gender bias. To quantify the degree of bias, we introduce a novel metric,
MALoR, which assesses bias based on model probabilities for filling masked
tokens. We further propose a mitigation approach involving continued
pre-training on a gender-balanced dataset generated via Counterfactual Data
Augmentation. Our experiments reveal significant reductions in gender bias
scores across different pronoun pairs. For instance, in BERT-base, bias scores
for "he-she" dropped from 1.27 to 0.08, and "his-her" from 2.51 to 0.36
following our mitigation approach. We also observed similar improvements across
other models, with "male-female" bias decreasing from 1.82 to 0.10 in
BERT-large. Our approach effectively reduces gender bias without compromising
model performance on downstream tasks.