PheWAS analysis on large-scale biobank data with PheTK

Tam C Tran; David J Schlueter; Chenjie Zeng; Huan Mo; Robert J Carroll; Joshua C Denny

doi:10.1101/2024.02.12.24302720

PheWAS analysis on large-scale biobank data with PheTK

medRxiv [Preprint]. 2024 Feb 13:2024.02.12.24302720. doi: 10.1101/2024.02.12.24302720.

Authors

Tam C Tran, David J Schlueter, Chenjie Zeng, Huan Mo, Robert J Carroll, Joshua C Denny

Abstract

Summary: With the rapid growth of genetic data linked to electronic health record data in huge cohorts, large-scale phenome-wide association study (PheWAS), have become powerful discovery tools in biomedical research. PheWAS is an analysis method to study phenotype associations utilizing longitudinal electronic health record (EHR) data. Previous PheWAS packages were developed mostly in the days of smaller biobanks and with earlier PheWAS approaches. PheTK was designed to simplify analysis and efficiently handle biobank-scale data. PheTK uses multithreading and supports a full PheWAS workflow including extraction of data from OMOP databases and Hail matrix tables as well as PheWAS analysis for both phecode version 1.2 and phecodeX. Benchmarking results showed PheTK took 64% less time than the R PheWAS package to complete the same workflow. PheTK can be run locally or on cloud platforms such as the All of Us Researcher Workbench ( All of Us ) or the UK Biobank (UKB) Research Analysis Platform (RAP).

Availability and implementation: The PheTK package is freely available on the Python Package Index (PyPi) and on GitHub under GNU Public License (GPL-3) at https://github.com/nhgritctran/PheTK . It is implemented in Python and platform independent. The demonstration workspace for All of Us will be made available in the future as a featured workspace.

Contact: PheTK@mail.nih.gov.

Publication types

Preprint