A Riemannian distance approach for constructing principal curves

Int J Neural Syst. 2010 Jun;20(3):209-18. doi: 10.1142/S0129065710002371.

Abstract

The determination of principal curves relies on the arc-length as a global index to describe the middle of the data distribution. With a non-constant data distribution, however, curves that are constructed by the approach introduced in reference may not reflect the middle of data distribution, as demonstrated in this article. This is particularly so for curve segments that have a large curvature and a high data density. To overcome this problem, the paper revisits the projection of the samples onto the curve by incorporating Riemannian distances. This analysis suggests estimating the density value of each sample relative to its neighbors and utilize this value to compute the projection index for the curve. The use of density values, in turn, allows penalizing distances between samples along with the arc-length. In a similar fashion to conventional principal curves algorithms, for example proposed by Hastie and Stuetzle and Tibshirani, the incorporation of Riemannian distances gives rise to an iterative algorithm that includes a projection and a self-consistent step. Application studies to simulated and experimental data sets shows that the proposed modification has the potential to outperform existing algorithms in areas of high curvature under an non-constant data distribution.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Mathematical Concepts
  • Models, Statistical*
  • Models, Theoretical*
  • Neural Networks, Computer*