Neural network training method for materials science based on multi-source databases

Sci Rep. 2022 Sep 12;12(1):15326. doi: 10.1038/s41598-022-19426-8.

Abstract

The fourth paradigm of science has achieved great success in material discovery and it highlights the sharing and interoperability of data. However, most material data are scattered among various research institutions, and a big data transmission will consume significant bandwidth and tremendous time. At the meanwhile, some data owners prefer to protect the data and keep their initiative in the cooperation. This dilemma gradually leads to the "data island" problem, especially in material science. To attack the problem and make full use of the material data, we propose a new strategy of neural network training based on multi-source databases. In the whole training process, only model parameters are exchanged and no any external access or connection to the local databases. We demonstrate its validity by training a model characterizing material structure and its corresponding formation energy, based on two and four local databases, respectively. The results show that the obtained model accuracy trained by this method is almost the same to that obtained from a single database combining all the local ones. Moreover, different communication frequencies between the client and server are also studied to improve the model training efficiency, and an optimal frequency is recommended.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Factual
  • Humans
  • Materials Science*
  • Neural Networks, Computer*