Big data analytics in Cloud computing: an overview

J Cloud Comput (Heidelb). 2022;11(1):24. doi: 10.1186/s13677-022-00301-w. Epub 2022 Aug 6.

Abstract

Big Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable to deal with them. Besides being big, this data moves fast and has a lot of variety. Big Data is a concept that deals with storing, processing and analyzing large amounts of data. Cloud computing on the other hand is about offering the infrastructure to enable such processes in a cost-effective and efficient manner. Many sectors, including among others businesses (small or large), healthcare, education, etc. are trying to leverage the power of Big Data. In healthcare, for example, Big Data is being used to reduce costs of treatment, predict outbreaks of pandemics, prevent diseases etc. This paper, presents an overview of Big Data Analytics as a crucial process in many fields and sectors. We start by a brief introduction to the concept of Big Data, the amount of data that is generated on a daily bases, features and characteristics of Big Data. We then delve into Big Data Analytics were we discuss issues such as analytics cycle, analytics benefits and the movement from ETL to ELT paradigm as a result of Big Data analytics in Cloud. As a case study we analyze Google's BigQuery which is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. As a Platform as a Service (PaaS) supports querying using ANSI SQL. We use the tool to perform different experiments such as average read, average compute, average write, on different sizes of datasets.

Keywords: Analytics; Big data; BigQuery; Cloud computing.