Role of bacterial peptidase F inferred by statistical analysis and further experimental validation

HFSP J. 2008 Feb;2(1):29-41. doi: 10.2976/1.2820377. Epub 2008 Jan 7.

Abstract

Despite the quantity of high-throughput data available nowadays, the precise role of many proteins has not been elucidated. Available methods for classifying proteins and reconstructing metabolic networks are efficient for finding global categories, but do not answer the biologist's specific and targeted questions. Following Yamanishi et al. [Yamanishi, Y, Vert, JP, Nakaya, A, and Kaneisha, M (2003). "Extraction of correlated clusters from multiple genomic data by generalized kernel canonical correlation analysis." Bioinformatics 19, Suppl. 1, i323-i330] we used a kernel canonical correlation analysis (KCCA) to predict the role of the bacterial peptidase PepF. We integrated five existing data types: protein metabolic networks, microarray data, phylogenetic profiles, distances between proteins and incomplete two-dimensional-gel data (for which we propose a completion strategy), available for Lactococcus lactis to determine relationships between proteins. The predicted relationships were then used to guide our laboratory work which proved most of the predictions correct. PepF had previously been characterized as a zinc dependent endopeptidase [Nardi, M, Renault, P, and Monnet, V (1997). "Duplication of the pepF gene and shuffling of DNA fragments on the lactose plasmid of Lactococcus lactis." J. Bacteriol. 179, 4164-4171; Monnet, V, Nardi, M, Chopin, MC, and Gripon, JC (1994). "Biochemical and genetic characterization of PepF on oligoendopeptidase from Lactococcus lactis." J. Bio. Chem. 269, 32070-32076]. Analyzing a PepF mutant, we confirmed its participation in protein secretion through a strong relationship between the signal peptidase I and PepF predicted by the KCCA. The global nature of our approach made it possible to discover pleiotropic roles of the protein which had remained unknown using classical approaches.