Teaching genomics to life science undergraduates using cloud computing platforms with open datasets

Biochem Mol Biol Educ. 2022 Sep;50(5):446-449. doi: 10.1002/bmb.21646. Epub 2022 Aug 16.

Abstract

The final year of a biochemistry degree is usually a time to experience research. However, laboratory-based research projects were not possible during COVID-19. Instead, we used open datasets to provide computational research projects in metagenomics to biochemistry undergraduates (80 students with limited computing experience). We aimed to give the students a chance to explore any dataset, rather than use a small number of artificial datasets (~60 published datasets were used). To achieve this, we utilized Google Colaboratory (Colab), a virtual computing environment. Colab was used as a framework to retrieve raw sequencing data (analyzed with QIIME2) and generate visualizations. Setting up the environment requires no prior experience; all students have the same drive structure and notebooks can be shared (for synchronous sessions). We also used the platform to combine multiple datasets, perform a meta-analysis, and allowed the students to analyze large datasets with 1000s of subjects and factors. Projects that required increased computational resources were integrated with Google Cloud Compute. In future, all research projects can include some aspects of reanalyzing public data, providing students with data science experience. Colab is also an excellent environment in which to develop data skills in multiple languages (e.g., Perl, Python, Julia).

Keywords: Bioinformatics; Google Colab; Microbiome; QIIME2.

Publication types

  • Meta-Analysis

MeSH terms

  • COVID-19* / epidemiology
  • Cloud Computing*
  • Genomics
  • Humans
  • Software
  • Students