Detection and Correction of Sample Misidentifications in a Biobank Using the MassARRAY System and Genomic Information

Biopreserv Biobank. 2023 Dec 11. doi: 10.1089/bio.2022.0211. Online ahead of print.

Abstract

With the number of samples increasing in many biobanks, one of the most pressing tasks is recording the correct relationships between information and the specimens. Genomic information is useful in determining the identity of these specimens. The Tohoku Medical Megabank Organization is running one of the largest biobanks in Japan. Here, we introduce a management system, which includes the development of a new probe set for the MassARRAY system for use during the production of proliferating T cells (T cells) and lymphoblastoid cell lines (LCLs). We selected single nucleotide variants that could be detected by next-generation sequencing and showed high resolution with ∼0.5 minor allele frequencies. After checking the set of probes against 96 samples from 48 people, we obtained no contradictory results in comparison with our genome sequence information. When we applied the set to our 3035 LCLs and 2256 T cells, the result showed 98.93% consistency with the corresponding genomic information. We surveyed the handling records of the 1.07% of samples that showed inconsistencies, and found that most had resulted from human errors (ID swapping between samples) during manual operations. After improving a few error-prone protocols, the error rate dropped to 0.47% for LCLs and 0% for T cells. Overall, the system that we developed shows high accuracy with easy and fast operability, and provides a good opportunity to improve the validation procedure to facilitate high-quality banking, especially in cases involving genomic information.

Keywords: LCLs; MassARRAY; T cells; TMM; biobanking.