A data-driven sequencer that unveils latent "codons" in synthetic copolymers

Chem Sci. 2023 Mar 20;14(21):5619-5626. doi: 10.1039/d2sc06974a. eCollection 2023 May 31.

Abstract

The recent emergence of sequence engineering in synthetic copolymers has been innovating polymer materials, where short sequences, hereinafter called "codons" using an analogy from nucleotide triads, play key roles in expressing functions. However, the codon compositions cannot be experimentally determined owing to the lack of efficient sequencing methods, hindering the integration of experiments and theories. Herein, we propose a polymer sequencer based on mass spectrometry of pyrolyzed oligomeric fragments. Despite the random fragmentation along copolymer main-chains, the characteristic fragment patterns of the codons are identified and quantified via unsupervised learning of a spectral dataset of random copolymers. The codon complexities increase with their length and monomer component number. Our data-driven approach accommodates the increasing complexities by expanding the dataset; the codon compositions of binary triads, binary pentads and ternary triads are quantifiable with small datasets (N < 100). The sequencer allows describing copolymers with their codon compositions/distributions, facilitating sequence engineering toward innovative polymer materials.