GDPF: a data resource for the distribution of prokaryotic protein families across the global biosphere

Nucleic Acids Res. 2024 Jan 5;52(D1):D724-D731. doi: 10.1093/nar/gkad869.

Abstract

Microorganisms encode most of the functions of life on Earth. However, conventional research has primarily focused on specific environments such as humans, soil and oceans, leaving the distribution of functional families throughout the global biosphere poorly comprehended. Here, we present the database of the global distribution of prokaryotic protein families (GDPF, http://bioinfo.qd.sdu.edu.cn/GDPF/), a data resource on the distribution of functional families across the global biosphere. GDPF provides global distribution information for 36 334 protein families, 19 734 superfamilies and 12 089 KEGG (Kyoto Encyclopedia of Genes and Genomes) orthologs from multiple source databases, covering typical environments such as soil, oceans, animals, plants and sediments. Users can browse, search and download the distribution data of each entry in 10 000 global microbial communities, as well as conduct comparative analysis of distribution disparities among multiple entries across various environments. The GDPF data resource contributes to uncovering the geographical distribution patterns, key influencing factors and macroecological principles of microbial functions at a global level, thereby promoting research in Earth ecology and human health.

MeSH terms

  • Animals
  • Ecology*
  • Humans
  • Multigene Family
  • Prokaryotic Cells*
  • Proteins* / genetics
  • Soil

Substances

  • Soil
  • Proteins