An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis

2512.00979v1 stat.ML, cs.AI, cs.LG 2025-12-02
Авторы:

Victor Saquicela, Kenneth Palacio-Baus, Mario Chifla

Abstract

Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially when K-means is used to cluster variables rather than observations, has been scarcely explored. This study seeks to address this gap by proposing an innovative method that analyzes the relationship between clusters of variables obtained by applying K-means on transposed data and the principal components of PCA. Our approach involves applying PCA to the original data and K-means to the transposed data set, where the original variables are converted into observations. The contribution of each variable cluster to each principal component is then quantified using measures based on variable loadings. This process provides a tool to explore and understand the clustering of variables and how such clusters contribute to the principal dimensions of variation identified by PCA.

Ссылки и действия