Comparative genomic analysis of collagen gene diversity

3 Biotech. 2019 Mar;9(3):83. doi: 10.1007/s13205-019-1616-9. Epub 2019 Feb 14.

Abstract

Collagen gene family, comprising 30% of the total protein mass in mammals, is the major part of extracellular matrix. To understand the complexity of collagen gene family, detailed sequence, phylogenetic and synteny analyses of 44 collagen genes were performed. According to sequence analysis results, Fibril-associated collagen with interrupted triple helices (FACITs) were identified as the most recently evolved vertebrate-specific collagens while Fibril-forming collagens and Collagen VI, VII, XXVI, and XXVIII were the most ancient collagens, originating at the time of choanoflagellates. Network-forming collagens were entirely conserved from arthopods to homo sapiens, except one gene loss event. Of note, bird specific gene dispensability of COL1A1, COL3A1, COL5A3 and COL11A2 genes was observed in Fibril-forming collagens. According to phylogenetic analysis, gene duplications in collagen family occurred at variable time points during invertebrate to vertebrate evolution. However, majority of gene duplications in FACITs and network-forming collagens occurred at fish time point, suggesting large scale duplications at the root of vertebrate lineage. Lastly, synteny analysis identified 12 conserved blocks containing 27 collagen genes in vertebrate species. Interestingly, dysregulation of seven conserved blocks including block1 (COL11A1), block3 (COL3A1, COL5A2), block5 (COL6A5, COL6A6), block7 (COL1A2), block9 (COL4A1, COL4A2), block11 (COL6A1, COL6A2, COL18A1) and block12 (COL4A5, COL4A6) were also reported in different diseases including cancer. The current study revealed many critical insights into sequence, structural and functional diversity of collagen gene family. In future, by using this information we may be able to establish the clinical and pathological relevance of these conserved collagen blocks in different diseases.

Keywords: Collagen; Deletion; Domain; Duplication; Phylogenetics; Synteny.