Machine learning and data mining: strategies for hypothesis generation

M A Oquendo; E Baca-Garcia; A Artés-Rodríguez; F Perez-Cruz; H C Galfalvy; H Blasco-Fontecilla; D Madigan; N Duan

doi:10.1038/mp.2011.173

Machine learning and data mining: strategies for hypothesis generation

Mol Psychiatry. 2012 Oct;17(10):956-9. doi: 10.1038/mp.2011.173. Epub 2012 Jan 10.

Authors

M A Oquendo¹, E Baca-Garcia, A Artés-Rodríguez, F Perez-Cruz, H C Galfalvy, H Blasco-Fontecilla, D Madigan, N Duan

Affiliation

¹ Department of Psychiatry, New York State Psychiatric Institute and Columbia University, New York, NY 10032, USA. mao4@columbia.edu

PMID: 22230882
DOI: 10.1038/mp.2011.173

Abstract

Strategies for generating knowledge in medicine have included observation of associations in clinical or research settings and more recently, development of pathophysiological models based on molecular biology. Although critically important, they limit hypothesis generation to an incremental pace. Machine learning and data mining are alternative approaches to identifying new vistas to pursue, as is already evident in the literature. In concert with these analytic strategies, novel approaches to data collection can enhance the hypothesis pipeline as well. In data farming, data are obtained in an 'organic' way, in the sense that it is entered by patients themselves and available for harvesting. In contrast, in evidence farming (EF), it is the provider who enters medical data about individual patients. EF differs from regular electronic medical record systems because frontline providers can use it to learn from their own past experience. In addition to the possibility of generating large databases with farming approaches, it is likely that we can further harness the power of large data sets collected using either farming or more standard techniques through implementation of data-mining and machine-learning strategies. Exploiting large databases to develop new hypotheses regarding neurobiological and genetic underpinnings of psychiatric illness is useful in itself, but also affords the opportunity to identify novel mechanisms to be targeted in drug discovery and development.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Data Mining*
Humans
Mental Disorders / diagnosis*
Mental Disorders / therapy*
Models, Biological*