Inference of germline polymorphisms in immunoglobulin genes from B cell receptor repertoires is complicated by somatic hypermutations, sequencing/PCR errors, and by varying length of reference alleles. The light chain inference is particularly challenging owing to large gene duplications and absence of D genes. We analyzed the light chain cDNA sequences from naïve B cell receptor repertoires from 100 individuals. We optimized light chain allele inference by tweaking parameters of the TIgGER functions, extending the germline reference sequences, and establishing mismatch frequency patterns at polymorphic positions to filter out false-positive candidates. We identified 48 previously unreported variants of light chain variable genes. We selected 14 variants for validation and successfully validated 11 by Sanger sequencing. Clustering of light chain 5'UTR, L-PART1, and L-PART2 revealed partial intron retention in 11 kappa and 9 lambda V alleles. Our results provide insight into germline variation in human light chain immunoglobulin loci.
Keywords: Genetics; Genomics; Immunology; Molecular genetics.
© 2021 The Authors.