ChromosomeNet: A massive dataset enabling benchmarking and building basedlines of clinical chromosome classification

Comput Biol Chem. 2022 Oct:100:107731. doi: 10.1016/j.compbiolchem.2022.107731. Epub 2022 Jul 16.

Abstract

Chromosome karyotyping analysis is a vital cytogenetics technique for diagnosing genetic and congenital malformations, analyzing gestational and implantation failures, etc. Since the chromosome classification as an essential stage in chromosome karyotype analysis is a highly time-consuming, tedious, and error-prone task, which requires a large amount of manual work of experienced cytogenetics experts. Many deep learning-based methods have been proposed to address the chromosome classification issues. However, two challenges still remain in current chromosome classification methods. First, most existing methods were developed by different private datasets, making these methods difficult to compare with each other on the same base. Second, due to the absence of reproducing details of most existing methods, these methods are difficult to be applied in clinical chromosome classification applications widely. To address the above challenges in the chromosome classification issue, this work builds and publishes a massive clinical dataset. This dataset enables the benchmarking and building chromosome classification baselines suitable for different scenarios. The massive clinical dataset consists of 126,453 privacy preserving G-band chromosome instances from 2763 karyotypes of 408 individuals. To our best knowledge, it is the first work to collect, annotate, and release a publicly available clinical chromosome classification dataset whose data size scale is also over 120,000. Meanwhile, the experimental results show that the proposed dataset can boost performance of existing chromosome classification models at a varied range of degrees, with the highest accuracy improvement by 5.39 % points. Moreover, the best baseline with 99.33 % accuracy reports state-of-the-art classification performance. The clinical dataset and state-of-the-art baselines can be found at https://github.com/CloudDataLab/BenchmarkForChromosomeClassification.

Keywords: Artificial intelligence; Benchmark and baselines; Biomedical image processing; Chromosome Classification; Chromosome karyotyping analysis; Clinical dataset; Deep learning.

MeSH terms

  • Algorithms*
  • Benchmarking*
  • Chromosomes / genetics
  • Humans