Deep learning for embryo evaluation using time-lapse: a systematic review of diagnostic test accuracy

Am J Obstet Gynecol. 2023 Nov;229(5):490-501. doi: 10.1016/j.ajog.2023.04.027. Epub 2023 Apr 26.

Abstract

Objective: This study aimed to investigate the accuracy of convolutional neural network models in the assessment of embryos using time-lapse monitoring.

Data sources: A systematic search was conducted in PubMed and Web of Science databases from January 2016 to December 2022. The search strategy was carried out by using key words and MeSH (Medical Subject Headings) terms.

Study eligibility criteria: Studies were included if they reported the accuracy of convolutional neural network models for embryo evaluation using time-lapse monitoring. The review was registered with PROSPERO (International Prospective Register of Systematic Reviews; identification number CRD42021275916).

Methods: Two reviewer authors independently screened results using the Covidence systematic review software. The full-text articles were reviewed when studies met the inclusion criteria or in any uncertainty. Nonconsensus was resolved by a third reviewer. Risk of bias and applicability were evaluated using the QUADAS-2 tool and the modified Joanna Briggs Institute or JBI checklist.

Results: Following a systematic search of the literature, 22 studies were identified as eligible for inclusion. All studies were retrospective. A total of 522,516 images of 222,998 embryos were analyzed. Three main outcomes were evaluated: successful in vitro fertilization, blastocyst stage classification, and blastocyst quality. Most studies reported >80% accuracy, and embryologists were outperformed in some. Ten studies had a high risk of bias, mostly because of patient bias.

Conclusion: The application of artificial intelligence in time-lapse monitoring has the potential to provide more efficient, accurate, and objective embryo evaluation. Models that examined blastocyst stage classification showed the best predictions. Models that predicted live birth had a low risk of bias, used the largest databases, and had external validation, which heightens their relevance to clinical application. Our systematic review is limited by the high heterogeneity among the included studies. Researchers should share databases and standardize reporting.

Keywords: artificial intelligence; convolutional neural network; deep learning; embryo assessment; time-lapse monitor.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Deep Learning*
  • Diagnostic Tests, Routine
  • Female
  • Humans
  • Pregnancy
  • Pregnancy Rate
  • Retrospective Studies
  • Systematic Reviews as Topic
  • Time-Lapse Imaging / methods