AutoWeka: toward an automated data mining software for QSAR and QSPR studies

Methods Mol Biol. 2015:1260:119-47. doi: 10.1007/978-1-4939-2239-0_8.

Abstract

In biology and chemistry, a key goal is to discover novel compounds affording potent biological activity or chemical properties. This could be achieved through a chemical intuition-driven trial-and-error process or via data-driven predictive modeling. The latter is based on the concept of quantitative structure-activity/property relationship (QSAR/QSPR) when applied in modeling the biological activity and chemical properties, respectively, of compounds. Data mining is a powerful technology underlying QSAR/QSPR as it harnesses knowledge from large volumes of high-dimensional data via multivariate analysis. Although extremely useful, the technicalities of data mining may overwhelm potential users, especially those in the life sciences. Herein, we aim to lower the barriers to access and utilization of data mining software for QSAR/QSPR studies. AutoWeka is an automated data mining software tool that is powered by the widely used machine learning package Weka. The software provides a user-friendly graphical interface along with an automated parameter search capability. It employs two robust and popular machine learning methods: artificial neural networks and support vector machines. This chapter describes the practical usage of AutoWeka and relevant tools in the development of predictive QSAR/QSPR models.

Availability: The software is freely available at http://www.mt.mahidol.ac.th/autoweka.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining / methods*
  • Drug Discovery
  • Humans
  • Models, Molecular
  • Neural Networks, Computer*
  • Pharmaceutical Preparations / chemistry*
  • Quantitative Structure-Activity Relationship*
  • Software

Substances

  • Pharmaceutical Preparations