GA(M)E-QSAR: A novel, fully automatic Genetic-Algorithm-(Meta)-Ensembles approach for binary classification in ligand-based drug design This publication appears in: Journal of Chemical Information and Modeling Authors: P. Yunierkis, V. Cosmin Lazar, J. Taminau, M. Froeyen, M. Ángel Cabrera-Pérez and A. Nowé Volume: 52 Issue: 9 Pages: 23662386 Publication Date: Aug. 2012
Abstract: Computer-aided drug design has become an important component of the drug discovery process. Despite
the advances in this field, there is not a unique modeling approach than can be successfully applied to
solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble
modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)EQSAR
algorithm that combines the search and optimization capabilities of Genetic Algorithms with the
simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification
problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting
schemes to further improve the accuracy, generalization and robustness of the optimal Adaboost Single
Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our
algorithm using five data sets from the literature and found that it is capable to yield similar or better
classification results to what has been reported for these data sets with a higher enrichment of active
compounds relative to the whole actives subset when only the most active chemicals are considered.
More important, we compared our methodology with state of the art feature selection and classification
approaches and found that it can provide highly accurate, robust and generalizable models. In the case of
the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple
since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the
Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological
evaluation after virtual screening experiments.
|