Principal Component Analysis of Alternative Splicing Profiles Revealed by Long-Read ONT Sequencing in Human Liver Tissue and Hepatocyte-Derived HepG2 and Huh7 Cell Lines

Int J Mol Sci. 2023 Oct 24;24(21):15502. doi: 10.3390/ijms242115502.

Abstract

The long-read RNA sequencing developed by Oxford Nanopore Technology provides a direct quantification of transcript isoforms. That makes the number of transcript isoforms per gene an intrinsically suitable metric for alternative splicing (AS) profiling in the application to this particular type of RNA sequencing. By using this simple metric and recruiting principal component analysis (PCA) as a tool to visualize the high-dimensional transcriptomic data, we were able to group biospecimens of normal human liver tissue and hepatocyte-derived malignant HepG2 and Huh7 cells into clear clusters in a 2D space. For the transcriptome-wide analysis, the clustering was observed regardless whether all genes were included in analysis or only those expressed in all biospecimens tested. However, in the application to a particular set of genes known as pharmacogenes, which are involved in drug metabolism, the clustering worsened dramatically in the latter case. Based on PCA data, the subsets of genes most contributing to biospecimens' grouping into clusters were selected and subjected to gene ontology analysis that allowed us to determine the top 20 biological processes among which translation and processes related to its regulation dominate. The suggested metrics can be a useful addition to the existing metrics for describing AS profiles, especially in application to transcriptome studies with long-read sequencing.

Keywords: Huh7 and HepG2 cell lines; alternative splicing; human liver tissue; nanopore sequencing; pharmacogenes; transcriptome.

MeSH terms

  • Alternative Splicing*
  • Cell Line
  • Gene Expression Profiling / methods
  • Hepatocytes
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Liver
  • Principal Component Analysis
  • Protein Isoforms / genetics
  • Sequence Analysis, RNA / methods
  • Transcriptome

Substances

  • Protein Isoforms