Mixed membership estimation for categorical data with weighted responses

arXiv:2310.10989v2 Announce Type: replace-cross Abstract: The Grade of Membership (GoM) model, which allows subjects to belong to multiple latent classes, is a powerful tool for inferring latent classes in categorical data. However, its application is limited to categorical data with nonnegative integer responses, as it assumes that the response matrix is generated from Bernoulli or Binomial distributions, making it inappropriate for datasets with continuous or negative weighted responses. To address this, this paper proposes a novel model named the Weighted Grade of Membership (WGoM) model. Our WGoM is more general than GoM because it relaxes GoM's distribution constraint by allowing the response matrix to be generated from distributions like Bernoulli, Binomial, Normal, and Uniform as long as the expected response matrix has a block structure related to subjects' mixed memberships under the distribution. We show that WGoM can describe any response matrix with finite distinct elements. We then propose an algorithm to estimate the latent mixed memberships and other WGoM parameters. We derive the error bounds of the estimated parameters and show that the algorithm is statistically consistent. We also propose an efficient method for determining the number of latent classes $K$ for categorical data with weighted responses by maximizing fuzzy weighted modularity. The performance of our methods is validated through both synthetic and real-world datasets. The results demonstrate the accuracy and efficiency of our algorithm for estimating latent mixed memberships, as well as the high accuracy of our method for estimating $K$, indicating their high potential for practical applications.

2025-09-01 04:00 GMT · 1 day ago arxiv.org

arXiv:2310.10989v2 Announce Type: replace-cross Abstract: The Grade of Membership (GoM) model, which allows subjects to belong to multiple latent classes, is a powerful tool for inferring latent classes in categorical data. However, its application is limited to categorical data with nonnegative integer responses, as it assumes that the response matrix is generated from Bernoulli or Binomial distributions, making it inappropriate for datasets with continuous or negative weighted responses. To address this, this paper proposes a novel model named the Weighted Grade of Membership (WGoM) model. Our WGoM is more general than GoM because it relaxes GoM's distribution constraint by allowing the response matrix to be generated from distributions like Bernoulli, Binomial, Normal, and Uniform as long as the expected response matrix has a block structure related to subjects' mixed memberships under the distribution. We show that WGoM can describe any response matrix with finite distinct elements. We then propose an algorithm to estimate the latent mixed memberships and other WGoM parameters. We derive the error bounds of the estimated parameters and show that the algorithm is statistically consistent. We also propose an efficient method for determining the number of latent classes $K$ for categorical data with weighted responses by maximizing fuzzy weighted modularity. The performance of our methods is validated through both synthetic and real-world datasets. The results demonstrate the accuracy and efficiency of our algorithm for estimating latent mixed memberships, as well as the high accuracy of our method for estimating $K$, indicating their high potential for practical applications.

Original: https://arxiv.org/abs/2310.10989