HiLDA: a statistical approach to investigate differences in mutational signatures

PeerJ. 2019 Aug 28:7:e7557. doi: 10.7717/peerj.7557. eCollection 2019.

Abstract

We propose a hierarchical latent Dirichlet allocation model (HiLDA) for characterizing somatic mutation data in cancer. The method allows us to infer mutational patterns and their relative frequencies in a set of tumor mutational catalogs and to compare the estimated frequencies between tumor sets. We apply our method to two datasets, one containing somatic mutations in colon cancer by the time of occurrence, before or after tumor initiation, and the second containing somatic mutations in esophageal cancer by sex, age, smoking status, and tumor site. In colon cancer, the relative frequencies of mutational patterns were found significantly associated with the time of occurrence of mutations. In esophageal cancer, the relative frequencies were significantly associated with the tumor site. Our novel method provides higher statistical power for detecting differences in mutational signatures.

Keywords: Colorectal cancer; Deconvolution; Latent dirichlet allocation; Mutational signatures; Somatic mutation.