Complete Chloroplast Genome of Pinus massoniana (Pinaceae): Gene Rearrangements, Loss of ndh Genes, and Short Inverted Repeats Contraction, Expansion

Molecules. 2017 Sep 11;22(9):1528. doi: 10.3390/molecules22091528.

Abstract

The chloroplast genome (CPG) of Pinus massoniana belonging to the genus Pinus (Pinaceae), which is a primary source of turpentine, was sequenced and analyzed in terms of gene rearrangements, ndh genes loss, and the contraction and expansion of short inverted repeats (IRs). P. massoniana CPG has a typical quadripartite structure that includes large single copy (LSC) (65,563 bp), small single copy (SSC) (53,230 bp) and two IRs (IRa and IRb, 485 bp). The 108 unique genes were identified, including 73 protein-coding genes, 31 tRNAs, and 4 rRNAs. Most of the 81 simple sequence repeats (SSRs) identified in CPG were mononucleotides motifs of A/T types and located in non-coding regions. Comparisons with related species revealed an inversion (21,556 bp) in the LSC region; P. massoniana CPG lacks all 11 intact ndh genes (four ndh genes lost completely; the five remained truncated as pseudogenes; and the other two ndh genes remain as pseudogenes because of short insertions or deletions). A pair of short IRs was found instead of large IRs, and size variations among pine species were observed, which resulted from short insertions or deletions and non-synchronized variations between "IRa" and "IRb". The results of phylogenetic analyses based on whole CPG sequences of 16 conifers indicated that the whole CPG sequences could be used as a powerful tool in phylogenetic analyses.

Keywords: comparative genomics; conifer species; genome annotation; phylogenetic analysis; structural inversion.

MeSH terms

  • Chromosome Mapping
  • Gene Deletion*
  • Gene Ontology
  • Gene Rearrangement*
  • Genes, Plant*
  • Genome, Chloroplast*
  • Inverted Repeat Sequences
  • Isoenzymes / deficiency
  • Isoenzymes / genetics
  • Molecular Sequence Annotation
  • NADH Dehydrogenase / deficiency
  • NADH Dehydrogenase / genetics*
  • Phylogeny
  • Pinus / classification
  • Pinus / genetics*
  • Whole Genome Sequencing

Substances

  • Isoenzymes
  • NADH Dehydrogenase