Unsupervised learning of categorical data with competing models

IEEE Trans Neural Netw Learn Syst. 2012 Nov;23(11):1726-37. doi: 10.1109/TNNLS.2012.2213266.

Abstract

This paper considers the unsupervised learning of high-dimensional binary feature vectors representing categorical information. A cognitively inspired framework, referred to as modeling fields theory (MFT), is utilized as the basic methodology. A new MFT-based algorithm, referred to as accelerated maximum a posteriori (MAP), is proposed. Accelerated MAP allows simultaneous learning and selection of the number of models. The key feature of accelerated MAP is a steady increase of the regularization penalty resulting in competition among models. The differences between this approach and other mixture learning and model selection methodologies are described. The operation of this algorithm and its parameter selection are discussed. Numerical experiments aimed at finding performance limits are conducted. The performance with real-world data is tested by applying the algorithm to a text categorization problem and to the clustering Congressional voting data.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.