Fast Estimation of Approximate Matrix Ranks Using Spectral Densities

Neural Comput. 2017 May;29(5):1317-1351. doi: 10.1162/NECO_a_00951. Epub 2017 Mar 23.

Abstract

Many machine learning and data-related applications require the knowledge of approximate ranks of large data matrices at hand. This letter presents two computationally inexpensive techniques to estimate the approximate ranks of such matrices. These techniques exploit approximate spectral densities, popular in physics, which are probability density distributions that measure the likelihood of finding eigenvalues of the matrix at a given point on the real line. Integrating the spectral density over an interval gives the eigenvalue count of the matrix in that interval. Therefore, the rank can be approximated by integrating the spectral density over a carefully selected interval. Two different approaches are discussed to estimate the approximate rank, one based on Chebyshev polynomials and the other based on the Lanczos algorithm. In order to obtain the appropriate interval, it is necessary to locate a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that contribute to the matrix rank. A method for locating this gap and selecting the interval of integration is proposed based on the plot of the spectral density. Numerical experiments illustrate the performance of these techniques on matrices from typical applications.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, Non-U.S. Gov't