Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology

Farit M Afendi; Naoaki Ono; Yukiko Nakamura; Kensuke Nakamura; Latifah K Darusman; Nelson Kibinge; Aki Hirai Morita; Ken Tanaka; Hisayuki Horai; Md Altaf-Ul-Amin; Shigehiko Kanaya

doi:10.5936/csbj.201301010

Data Mining Methods for Omics and Knowledge of Crude Medicinal Plants toward Big Data Biology

Comput Struct Biotechnol J. 2013 Mar 23:4:e201301010. doi: 10.5936/csbj.201301010. eCollection 2013.

Authors

Affiliations

¹ Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan ; Department of Statistics, Bogor Agricultural University, Jln. Meranti, Kampus IPB Darmaga, Bogor 16680, Indonesia.
² Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0101, Ikoma, Japan.
³ Maebashi Institute of technology, 450-1 Kamisadori, Maebashi-shi, Gunma, 371-0816 Japan.
⁴ Biopharmaca Research Center, Bogor Agricultural University, Kampas IPB Taman Kencana, Jln. Taman Kencana No. 3 Bogor 16151, Indonesia.
⁵ Department of Medicinal Resources, Institute of Natural Medicine, University of Toyama, 2630 Toyama, 930-0194, Japan.
⁶ Department of Electronic and Computer Engineering, Ibaraki National College of Technology, 866 Nakane, Hitachinaka, Ibaraki 312-8508, Japan.

Abstract

Molecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data. The present study reviews the usage of KNApSAcK Family DB in metabolomics and related area, discusses several statistical methods for handling multivariate data and shows their application on Indonesian blended herbal medicines (Jamu) as a case study. Exploration using Biplot reveals many plants are rarely utilized while some plants are highly utilized toward specific efficacy. Furthermore, the ingredients of Jamu formulas are modeled using Partial Least Squares Discriminant Analysis (PLS-DA) in order to predict their efficacy. The plants used in each Jamu medicine served as the predictors, whereas the efficacy of each Jamu provided the responses. This model produces 71.6% correct classification in predicting efficacy. Permutation test then is used to determine plants that serve as main ingredients in Jamu formula by evaluating the significance of the PLS-DA coefficients. Next, in order to explain the role of plants that serve as main ingredients in Jamu medicines, information of pharmacological activity of the plants is added to the predictor block. Then N-PLS-DA model, multiway version of PLS-DA, is utilized to handle the three-dimensional array of the predictor block. The resulting N-PLS-DA model reveals that the effects of some pharmacological activities are specific for certain efficacy and the other activities are diverse toward many efficacies. Mathematical modeling introduced in the present study can be utilized in global analysis of big data targeting to reveal the underlying biology.

Publication types

Review