Handling High-Dimension (High-Feature) MicroRNA Data

Methods Mol Biol. 2017:1617:179-186. doi: 10.1007/978-1-4939-7046-9_13.

Abstract

High-dimensional data, or high-feature variables, are often used to describe the characteristics of microRNA sequence and microarray data. As a consequence, the curse of high dimension often becomes a problem. High-dimension variables lead to many difficulties in processing and can be hard to understand. On the other aspect, as the sample size rather limited, the more variables, the more statistical error would be produced in the data processing. For the purpose of decreasing the dimension of variables, a degenerated k-mer method was suggested. To enhance the statistical robustness, the gapped k-mer method was introduced. In the last part of this chapter, some traditional supervised and unsupervised mathematical methods that used to decrease the dimensionality of the data are also described.

Keywords: Degenerated k-mer; Dimension decreasing; Gapped k-mer; High-dimension; miRNA.

MeSH terms

  • Algorithms
  • Animals
  • Gene Expression Profiling / methods*
  • Humans
  • MicroRNAs / genetics*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Sequence Analysis, RNA / methods*

Substances

  • MicroRNAs