Keyword extraction by nonextensivity measure

Phys Rev E Stat Nonlin Soft Matter Phys. 2011 May;83(5 Pt 2):056106. doi: 10.1103/PhysRevE.83.056106. Epub 2011 May 10.

Abstract

The presence of a long-range correlation in the spatial distribution of a relevant word type, in spite of random occurrences of an irrelevant word type, is an important feature of human-written texts. We classify the correlation between the occurrences of words by nonextensive statistical mechanics for the word-ranking process. In particular, we look at the nonextensivity parameter as an alternative metric to measure the spatial correlation in the text, from which the words may be ranked in terms of this measure. Finally, we compare different methods for keyword extraction.

Publication types

  • Research Support, Non-U.S. Gov't