Introduction of a human- and keyboard-friendly N-glycan nomenclature

Beilstein J Org Chem. 2024 Mar 15:20:607-620. doi: 10.3762/bjoc.20.53. eCollection 2024.

Abstract

In the beginning was the word. But there were no words for N-glycans, at least, no simple words. Next to chemical formulas, the IUPAC code can be regarded as the best, most reliable and yet immediately comprehensible annotation of oligosaccharide structures of any type from any source. When it comes to N-glycans, the venerable IUPAC code has, however, been widely supplanted by highly simplified terms for N-glycans that count the number of antennae or certain components such as galactoses, sialic acids and fucoses and give only limited room for exact structure description. The highly illustrative - and fortunately now standardized - cartoon depictions gained much ground during the last years. By their very nature, cartoons can neither be written nor spoken. The underlying machine codes (e.g., GlycoCT, WURCS) are definitely not intended for direct use in human communication. So, one might feel the need for a simple, yet intelligible and precise system for alphanumeric descriptions of the hundreds and thousands of N-glycan structures. Here, we present a system that describes N-glycans by defining their terminal elements. To minimize redundancy and length of terms, the common elements of N-glycans are taken as granted. The preset reading order facilitates definition of positional isomers. The combination with elements of the condensed IUPAC code allows to describe even rather complex structural elements. Thus, this "proglycan" coding could be the missing link between drawn structures and software-oriented representations of N-glycan structures. On top, it may greatly facilitate keyboard-based mining for glycan substructures in glycan repositories.

Keywords: N-glycans; nomenclature; structural features.