Unsupervised learning of sequence-specific aggregation behavior for a model copolymer

Soft Matter. 2021 Sep 7;17(33):7697-7707. doi: 10.1039/d1sm01012c. Epub 2021 Aug 5.

Abstract

We apply a recently developed unsupervised machine learning scheme for local environments [Reinhart, Comput. Mater. Sci., 2021, 196, 110511] to characterize large-scale, disordered aggregates formed by sequence-defined macromolecules. This method provides new insight into the structure of these disordered, dilute aggregates, which has proven difficult to understand using collective variables manually derived from expert knowledge [Statt et al., J. Chem. Phys., 2020, 152, 075101]. In contrast to such conventional order parameters, we are able to classify the global aggregate structure directly using descriptions of the local environments. The resulting characterization provides a deeper understanding of the range of possible self-assembled structures and their relationships to each other. We also provide detailed analysis of the effects of finite system size, stochasticity, and kinetics of these aggregates based on the learned collective variables. Interestingly, we find that the spatiotemporal evolution of systems in the learned latent space is smooth and continuous, despite being derived from only a single snapshot from each of about 1000 monomer sequences. These results demonstrate the insight which can be gained by applying unsupervised machine learning to soft matter systems, especially when suitable order parameters are not known.