Natural Product Discovery Using Planes of Principal Component Analysis in R (PoPCAR)

Shaurya Chanana; Chris S Thomas; Doug R Braun; Yanpeng Hou; Thomas P Wyche; Tim S Bugni

doi:10.3390/metabo7030034

Natural Product Discovery Using Planes of Principal Component Analysis in R (PoPCAR)

Metabolites. 2017 Jul 13;7(3):34. doi: 10.3390/metabo7030034.

Authors

Shaurya Chanana¹, Chris S Thomas², Doug R Braun³, Yanpeng Hou⁴, Thomas P Wyche^{5

6}, Tim S Bugni⁷

Affiliations

¹ Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA. schanana@wisc.edu.
² Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA. csthomas4@wisc.edu.
³ Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA. drbraun1@wisc.edu.
⁴ Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA. yanpenghou@gmail.com.
⁵ Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA. thomas.wyche@merck.com.
⁶ Exploratory Science Center, Merck & Co., 320 Bent St., Cambridge, MA 02141, USA. thomas.wyche@merck.com.
⁷ Pharmaceutical Sciences Division, School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA. tim.bugni@wisc.edu.

Abstract

Rediscovery of known natural products hinders the discovery of new, unique scaffolds. Efforts have mostly focused on streamlining the determination of what compounds are known vs. unknown (dereplication), but an alternative strategy is to focus on what is different. Utilizing statistics and assuming that common actinobacterial metabolites are likely known, focus can be shifted away from dereplication and towards discovery. LC-MS-based principal component analysis (PCA) provides a perfect tool to distinguish unique vs. common metabolites, but the variability inherent within natural products leads to datasets that do not fit ideal standards. To simplify the analysis of PCA models, we developed a script that identifies only those masses or molecules that are unique to each strain within a group, thereby greatly reducing the number of data points to be inspected manually. Since the script is written in R, it facilitates integration with other metabolomics workflows and supports automated mass matching to databases such as Antibase.

Keywords: actinobacteria; marine actinomycetes; mass spectrometry; metabolomics; principal component analysis.

Grants and funding

R01 GM104192/GM/NIGMS NIH HHS/United States