Semi-Supervised Federated Multi-Label Feature Selection with Fuzzy Information Measures

2511.17796v1 cs.LG, stat.ML 2025-11-26
Авторы:

Afsaneh Mahanipour, Hana Khamfroush

Abstract

Multi-label feature selection (FS) reduces the dimensionality of multi-label data by removing irrelevant, noisy, and redundant features, thereby boosting the performance of multi-label learning models. However, existing methods typically require centralized data, which makes them unsuitable for distributed and federated environments where each device/client holds its own local dataset. Additionally, federated methods often assume that clients have labeled data, which is unrealistic in cases where clients lack the expertise or resources to label task-specific data. To address these challenges, we propose a Semi-Supervised Federated Multi-Label Feature Selection method, called SSFMLFS, where clients hold only unlabeled data, while the server has limited labeled data. SSFMLFS adapts fuzzy information theory to a federated setting, where clients compute fuzzy similarity matrices and transmit them to the server, which then calculates feature redundancy and feature-label relevancy degrees. A feature graph is constructed by modeling features as vertices, assigning relevancy and redundancy degrees as vertex weights and edge weights, respectively. PageRank is then applied to rank the features by importance. Extensive experiments on five real-world datasets from various domains, including biology, images, music, and text, demonstrate that SSFMLFS outperforms other federated and centralized supervised and semi-supervised approaches in terms of three different evaluation metrics in non-IID data distribution setting.

Ссылки и действия