Elucidation of genomic organizations of transgenic soybean plants through de novo genome assembly with short paired-end reads

Mol Breed. 2020 Dec 31;41(1):1. doi: 10.1007/s11032-020-01191-z. eCollection 2021 Jan.

Abstract

Elucidation of the genomic organizations of transgene insertion sites is essential for the genetic studies of transgenic plants. Herein, we establish an analysis pipeline that identifies the transgene insertion sites as well as the presence of vector backbones, through de novo genome assembly with high-throughput sequencing data in two transgenic soybean lines, AtYUCCA6-#5 and 35S-UGT72E3/2-#7. Sequencing data of approximately 28× and 29× genome coverages for each line generated by high-throughput sequencing were de novo assembled. The databases generated from the de novo assembled sequences were used to search contigs that contained putative insertion sites and their flanking sequences (integration sites) of transgene fragments using transgenic vector sequences as queries. The predicted integration site sequences, which are located at three annotated genes that might regulate plant development or confer disease resistance, were then confirmed by local alignment against the soybean reference genome and PCR amplification. As results, we revealed the precise transgene-flanking sequences and sequence rearrangements at insertion sites in both the transgenic lines, as well as the aberrant insertion of a transgene fragment. Consequently, relative to experimental or enrichment technologies, our approach is straightforward and time-effective, providing an alternative method for the identification of insertion sites in transgenic plants.

Keywords: De novo assembly; Integration site; Transgene; Transgenic plant.