Transcription between human-readable synthetic descriptions and machine-executable instructions: an application of the latest pre-training technology

Chem Sci. 2023 Aug 24;14(35):9360-9373. doi: 10.1039/d3sc02483k. eCollection 2023 Sep 13.

Abstract

AI has been widely applied in scientific scenarios, such as robots performing chemical synthetic actions to free researchers from monotonous experimental procedures. However, there exists a gap between human-readable natural language descriptions and machine-executable instructions, of which the former are typically in numerous chemical articles, and the latter are currently compiled manually by experts. We apply the latest technology of pre-trained models and achieve automatic transcription between descriptions and instructions. We design a concise and comprehensive schema of instructions and construct an open-source human-annotated dataset consisting of 3950 description-instruction pairs, with 9.2 operations in each instruction on average. We further propose knowledgeable pre-trained transcription models enhanced by multi-grained chemical knowledge. The performance of recent popular models and products showing great capability in automatic writing (e.g., ChatGPT) has also been explored. Experiments prove that our system improves the instruction compilation efficiency of researchers by at least 42%, and can generate fluent academic paragraphs of synthetic descriptions when given instructions, showing the great potential of pre-trained models in improving human productivity.