MatGD: Materials Graph Digitizer

ACS Appl Mater Interfaces. 2024 Jan 10;16(1):723-730. doi: 10.1021/acsami.3c14781. Epub 2023 Dec 26.

Abstract

We developed Material Graph Digitizer (MatGD), which is a tool for digitizing a data line from scientific graphs. The algorithm behind the tool consists of four steps: (1) identifying graphs within subfigures, (2) separating axes and data sections, (3) discerning the data lines by eliminating irrelevant graph objects and matching with the legend, and (4) data extraction and saving. From the 62,534 papers in the areas of batteries, catalysis, and metal-organic frameworks (MOFs), 501,045 figures were mined. Remarkably, our tool showcased performance with over 99% accuracy in legend marker and text detection. Moreover, its capability for data line separation stood at 66%, which is much higher compared to those of other existing figure-mining tools. We believe that this tool will be integral to collecting both past and future data from publications, and these data can be used to train various machine learning models that can enhance material predictions and new materials discovery.

Keywords: battery; catalyst; data mining; figure mining; machine learning; metal−organic frameworks (MOFs).