Extracting the Synthetic Route of Pd-Based Catalysts in Methanol Steam Reforming from the Scientific Literature

J Chem Inf Model. 2023 Oct 23;63(20):6249-6260. doi: 10.1021/acs.jcim.3c01442. Epub 2023 Oct 8.

Abstract

The structured material synthesis route is crucial for chemists in performing experiments and modern applications such as machine learning material design. With the exponential growth of the chemical literature in recent years, manual extraction from the published literature is time-consuming and labor-intensive. This study focuses on developing an automated method for extracting Pd-based catalyst synthesis routes from the chemical literature. First, a paragraph classification model based on regular expressions is employed to identify paragraphs that contain material synthesis processes. The identified paragraphs are verified using machine learning techniques. Second, natural language processing techniques are applied to automatically parse the material synthesis routes from the identified paragraphs, generate regularized flowcharts, and output structured data. Lastly, we utilized the structured data of the synthesis routes to train machine learning models and predict the performance of the materials. The extracted material entities include the product, preparation method, precursor, support, loading, synthesis operation, and operation condition. This method avoids extensive manual data annotation and improves the scientific literature information acquisition efficiency. The accuracy of the 11 material entities exceeds 80%, and the accuracy of the method, support, precursor, drying time, and reduction time exceeds 90%.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning
  • Methanol*
  • Natural Language Processing
  • Steam*

Substances

  • Steam
  • Methanol