Synchrotron Big Data Science

Small. 2018 Nov;14(46):e1802291. doi: 10.1002/smll.201802291. Epub 2018 Sep 17.

Abstract

The rapid development of synchrotrons has massively increased the speed at which experiments can be performed, while new techniques have increased the amount of raw data collected during each experiment. While this has created enormous new opportunities, it has also created tremendous challenges for national facilities and users. With the huge increase in data volume, the manual analysis of data is no longer possible. As a result, only a fraction of the data collected during the time- and money-expensive synchrotron beam-time is analyzed and used to deliver new science. Additionally, the lack of an appropriate data analysis environment limits the realization of experiments that generate a large amount of data in a very short period of time. The current lack of automated data analysis pipelines prevents the fine-tuning of beam-time experiments, further reducing their potential usage. These effects, collectively known as the "data deluge," affect synchrotrons in several different ways including fast data collection, available local storage, data management systems, and curation of the data. This review highlights the Big Data strategies adopted nowadays at synchrotrons, documenting this novel and promising hybridization between science and technology, which promise a dramatic increase in the number of scientific discoveries.

Keywords: big data; computation; large facility; machine learning; synchrotron.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't