DNA encoded libraries have become an essential hit-finding tool in early drug discovery. Recent advances in synthetic methods for DNA encoded libraries have expanded the available chemical space, but precisely how each type of chemistry affects the DNA is unstudied. Available assays to quantify the damage are limited to write efficiency, where the ability to ligate DNA onto a working encoded library strand is measured, or qPCR is performed to measure the amplifiability of the DNA. These measures read signal quantity and overall integrity, but do not report on specific damages in the encoded information. Herein, we use next generation sequencing (NGS) to measure the quality of the read signal in order to quantify the truthfulness of the retrieved information. We identify CuAAC to be the worst offender in terms of DNA damage amongst commonly used reactions in DELs, causing an increase of G → T transversions. Furthermore, we show that the analysis provides useful information even in fully elaborated DELs; indeed we see that vestiges of the synthetic history, both chemical and biochemical, are written into the mutational spectra of NGS datasets.
Keywords: DNA damage; DNA encoded chemical libraries (DECLs); DNA encoded libraries (DELs); Next Generation Sequencing (NGS); on-DNA chemistry; quantitative PCR (qPCR).
Copyright © 2021 The Author(s). Published by Elsevier Ltd.. All rights reserved.