Machine learning and data mining: strategies for hypothesis generation

Mol Psychiatry. 2012 Oct;17(10):956-9. doi: 10.1038/mp.2011.173. Epub 2012 Jan 10.

Abstract

Strategies for generating knowledge in medicine have included observation of associations in clinical or research settings and more recently, development of pathophysiological models based on molecular biology. Although critically important, they limit hypothesis generation to an incremental pace. Machine learning and data mining are alternative approaches to identifying new vistas to pursue, as is already evident in the literature. In concert with these analytic strategies, novel approaches to data collection can enhance the hypothesis pipeline as well. In data farming, data are obtained in an 'organic' way, in the sense that it is entered by patients themselves and available for harvesting. In contrast, in evidence farming (EF), it is the provider who enters medical data about individual patients. EF differs from regular electronic medical record systems because frontline providers can use it to learn from their own past experience. In addition to the possibility of generating large databases with farming approaches, it is likely that we can further harness the power of large data sets collected using either farming or more standard techniques through implementation of data-mining and machine-learning strategies. Exploiting large databases to develop new hypotheses regarding neurobiological and genetic underpinnings of psychiatric illness is useful in itself, but also affords the opportunity to identify novel mechanisms to be targeted in drug discovery and development.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Data Mining*
  • Humans
  • Mental Disorders / diagnosis*
  • Mental Disorders / therapy*
  • Models, Biological*