A Survey of Biological Data in a Big Data Perspective

Big Data. 2022 Aug;10(4):279-297. doi: 10.1089/big.2020.0383. Epub 2022 Apr 7.

Abstract

The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.

Keywords: artificial intelligence approaches; big data; biological analysis; data mining; information reuse.

Publication types

  • Review

MeSH terms

  • Big Data*
  • Cloud Computing
  • Data Mining* / methods
  • Machine Learning
  • Neural Networks, Computer