Rational Use of Heterogeneous Data in QSAR Modeling of Cyclooxygenase/Lipoxygenase Inhibitors.

Paradigms and Technologies
Methods Development
Informatics

Abstract

Numerous studies have been published in recent years with acceptable quantitative QSAR modelling based on heterogeneous data. In many cases, the training sets for QSAR modelling were constructed from compounds tested by different biological assays, contradicting the opinion that QSAR modelling should be based on the data measured by a single protocol. We attempted to develop approaches that help to determine how heterogeneous data should be used for the creation of QSAR models on the basis of different sets of compounds tested by different experimental methods for the same target and the same end-point. To this end, more than one hundred QSAR models for the IC50 values of ligands interacting with cyclooxygenase 1,2 (COX) and seed lipoxygenase (LOX), obtained from ChEMBL database, were created using the GUSAR software. The QSAR models were tested on the external set, including 26 new thiazolidinone derivatives, which were experimentally tested for COX-1,2/LOX inhibition. The derivatives' IC50 values varied from 89 to 26 µM for LOX, from 200 to 0.018 µM for COX-1 and from 210 to 1 µM for COX-2. This study showed that the accuracy of the models depends on the distribution of IC50 values of low activity compounds in the training sets. In the most cases, QSAR models created based on the combined training sets had advantages in comparison with QSAR models based on a single publication. We introduced a new method of combination of quantitative data from different experimental studies based on the data of reference compounds, which was called "scaling".

Authors

Lagunin, Alexey A; Geronikaki, Athina; Eleftheriou, Phaedra; Pogodin, Pavel; Zakharov, Alexey;

External Links