Identification, comprehensive characterization, and comparative genomics of the HERV-K(HML8) integrations in the human genome

Virus Res. 2023 Jan 2:323:198976. doi: 10.1016/j.virusres.2022.198976. Epub 2022 Oct 26.

Abstract

Around 8% of the human genome is composed by Human Endogenous Retroviruses (HERVs), ancient viral sequences inherited from the primate germ line after their infection by now extinct retroviruses. Given the still underexplored physiological and pathological roles of HERVs, it is fundamental to increase our information about the genomic composition of the different groups, to lay reliable foundation for functional studies. Among HERVs, the most characterized elements belong to the beta-like superfamily HERV-K, comprising 10 groups (HML1-10) with HML2 being the most recent and studied one. Among HMLs, the HML8 group is the only one still lacking a comprehensive genomic description. In the present work, we investigated HML8 sequences' distribution in the human genome (GRCh38/hg38), identifying 23 novel proviruses and characterizing the overall 78 HML8 proviruses in terms of genome structure, phylogeny, and integration pattern. HML8 elements were significantly enriched in human chromosomes 8 and X (p<0.005) while chromosomes 17 and 20 showed fewer integrations than expected (p<0.025 and p<0.005, respectively). Phylogenetic analyses classified HML8 members into 3 clusters, corresponding to the three LTR types MER11A, MER11B and MER11C. Besides different LTR types, common signatures in the internal structure suggested the potential existence of three different ancestral HML8 variants. Accordingly, time of integration estimation coupled with comparative genomics revealed that these three clusters have a different time of integration in the primates' genome, with MER11C elements being significantly younger than MER11A- and MER11B associated proviruses (p<0.005 and p<0.05, respectively). Approximately 30% of the HML8 elements were found co-localized within human genes, sometimes in exonic portions and with the same orientation, deserving further studies for their possible effects on gene expression. Overall, we provide the first detailed picture of the HML8 group distribution and variety among the genome, creating the backbone for the specific analysis of their transcriptional activity in healthy and diseased conditions.

Keywords: Endogenous retroviruses; HERV; HERV-K; HML8; Retrotransposons.