Next Article in Journal
Huntington’s Disease: A Review of the Known PET Imaging Biomarkers and Targeting Radiotracers
Next Article in Special Issue
Towards an Understanding of the Mode of Action of Human Aromatase Activity for Azoles through Quantum Chemical Descriptors-Based Regression and Structure Activity Relationship Modeling Analysis
Previous Article in Journal
Synthesis and Spectral Study of a New Family of 2,5-Diaryltriazoles Having Restricted Rotation of the 5-Aryl Substituent
Previous Article in Special Issue
Nano-(Q)SAR for Cytotoxicity Prediction of Engineered Nanomaterials
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hepatotoxicity Modeling Using Counter-Propagation Artificial Neural Networks: Handling an Imbalanced Classification Problem

1
National Institute of Chemistry, Hajdrihova 19, 1001 Ljubljana, Slovenia
2
Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, 1000 Ljubljana, Slovenia
*
Author to whom correspondence should be addressed.
Molecules 2020, 25(3), 481; https://doi.org/10.3390/molecules25030481
Submission received: 31 December 2019 / Revised: 20 January 2020 / Accepted: 21 January 2020 / Published: 23 January 2020
(This article belongs to the Special Issue Integrated QSAR)

Abstract

:
Drug-induced liver injury is a major concern in the drug development process. Expensive and time-consuming in vitro and in vivo studies do not reflect the complexity of the phenomenon. Complementary to wet lab methods are in silico approaches, which present a cost-efficient method for toxicity prediction. The aim of our study was to explore the capabilities of counter-propagation artificial neural networks (CPANNs) for the classification of an imbalanced dataset related to idiosyncratic drug-induced liver injury and to develop a model for prediction of the hepatotoxic potential of drugs. Genetic algorithm optimization of CPANN models was used to build models for the classification of drugs into hepatotoxic and non-hepatotoxic class using molecular descriptors. For the classification of an imbalanced dataset, we modified the classical CPANN training algorithm by integrating random subsampling into the training procedure of CPANN to improve the classification ability of CPANN. According to the number of models accepted by internal validation and according to the prediction statistics on the external set, we concluded that using an imbalanced set with balanced subsampling in each learning epoch is a better approach compared to using a fixed balanced set in the case of the counter-propagation artificial neural network learning methodology.

1. Introduction

The liver is the primary site of xenobiotic metabolism, and as such, is prone to suffering from toxic effects. Moreover, it is well known that drug-induced liver injury (DILI) poses one of the main challenges in the drug development process. Aside from efficacy, toxicity is the main reason for the termination of drug development [1], while hepatotoxicity is one of the biggest threats in the drug development process. Hepatotoxicity presents the main reason for the withdrawal of drugs from the market [2,3]. Therefore, it is important to address the problem in the early stages of drug development. Preclinical stages consist of obligatory animal testing. However, a study on 3290 approved drugs and formulations [4] suggests that the absence of hepatotoxic effects in animals does not generally predict safety in humans. The study also implies medium predictive power of a positive result on animals in the case of liver disorders. Apart from moderate predictivity, other reasons aimed at reducing animal testing are ethical concerns and implementation of the reduction, refinement, and replacement (“3R”) strategy in research [5]. Human primary hepatocytes (hPH) are considered the gold standard for studying in vitro hepatotoxicity, but their major limitation is that they rapidly de-differentiate and liver-specific functions, such as albumin production and cytochrome P450 expression, decline quickly over the first 24–48 h of culture [6]. The use of immortalized primary hepatocytes, which suffer from hindered biotransformation capabilities, is frequent. Alternatives to in vitro toxicity include induced pluripotent stem cells [7] (especially in high throughput screening), while more complex models are being developed, such as organoids or liver-on-a-chip [8,9]. In vitro methods evaluating endpoints related to DILI can achieve a sensitivity of up to 70% [10]. However, while they substitute animal testing, in vitro methods are not truly equal to the in vivo system and may also be time consuming and expensive. In this regard, in silico methods offer a time- and cost-efficient method to address the issue of toxicity early in the drug development process [11]. In recent years, efforts have been made to predict hepatotoxicity using various techniques of quantitative structure–activity relationship (QSAR) modeling. QSAR studies of DILI are based mainly on small datasets. One such study was conducted by Cruz-Monteagudo [12]. They achieved good results using 3D molecular descriptors and a training dataset with 33 compounds in the hepatotoxic class and 41 compounds in the non-hepatotoxic class. In their study, based on a structurally and pharmacologically diverse training dataset, they developed classification models using linear discriminant analysis (LDA), artificial neural networks using radial basis function (RBF) architecture, and the OneR classification algorithm. In the external validation experiment, they achieved sensitivity of 100% and specificity of 67% using the LDA model, sensitivity of 67% and specificity of 67% using the RBF model, and sensitivity of 67% and specificity of 100% using the OneR classifier. A study with a larger number of compounds was conducted by Ekins et al. [13]. In the study, they used extended connectivity fingerprint counts along a few additional interpretable 2D descriptors for the Bayesian model. Sensitivity and specificity on the external set of 237 compounds were 56% and 67%, respectively. On a smaller set of 28 structurally similar compounds with different DILI potential, the performance was similar. Fourches et al. [14] developed QSAR models with support-vector machines (SVMs) for hepatotoxicity, with a dataset of 531 compounds (248 hepatotoxic, 283 non-hepatotoxic) acquired with text mining, and achieved accuracy from 55.7% to 72.6% with external cross-validation. As discussed in Kotsampasakou et al. [15], data curation is of high importance and text mining is error-prone and presents the reason for a lower quality of dataset. More accurate predictions can be made using ensemble modeling, as was shown in the study of Liew et al. [10]. The ensemble model, constructed by the stacking of random forest, k-nearest neighbor, and naive Bayes models with naive Bayes, was based on 2D descriptors. While the average sensitivity and specificity of 617 base classifiers on the external set of 120 compounds were 62.4% and 61.8%, respectively, the ensemble model had 84.5% sensitivity and 65.1% specificity for 101 objects of the external set that were inside the applicability domain. It is worth noting, however, that the ensemble model could not separate non-toxic compounds from structurally very similar toxic compounds, the problem also encountered in the study of Ekins et al. [13]. The study also suggested that weakly hepatotoxic compounds have a big influence on decision boundary and their removal greatly affects prediction metrics. In the recent study by Wang et al. [16], which used 450 molecules, an ensemble model was built mainly on different molecular fingerprints using multiple machine learning algorithms, and achieved an accuracy of 81.67%, sensitivity of 64.55%, specificity of 96.15%, and the area under the receiver operating characteristic curve (AUC) of 80.35% on the external validation set. A recursive random forest approach on a set containing 122 DILI-positive and 932 DILI-negative compounds with adjusting decision threshold was used in the study by Zhu et al. [17] to achieve good sensitivity and specificity (both around 80%) for models each using different types of structure presentation (CDK, MACCS, and Mold2 descriptors). In that way, the authors significantly reduced the number of meaningful descriptors, as they ended up with 8, 26, and 20 descriptors. In this study, ensemble modeling also significantly improved performance.
In our study, we employ counter-propagation artificial neural networks (CPANNs) in an attempt to capture the nonlinear relationship between molecular structure and hepatotoxicity of drugs. CPANNs combine supervised and unsupervised learning strategies and have been successfully applied to different QSAR problems [18,19,20]. Due to the learning strategies used, they are especially suitable for classification problems. To our knowledge, the method has never been used for modeling hepatotoxicity before. The majority of the drugs in the data set used do not exhibit hepatotoxicity. To deal with the imbalanced dataset, we apply two approaches. One approach presents manual balancing of the training dataset prior to model building with CPANNs. In the second approach, we modify the classical training algorithm of CPANNs by integrating random subsampling into the training procedure to obtain balanced representation of two toxicity classes during the training of a CPANN. Molecular descriptors calculated from 2D representation of compounds are used in optimizations of CPANN models by a genetic algorithm. With 148 models passing internal validation criteria, we construct two consensus models, one using 24 models and the other one using 124 models. The first consensus model, based on a manually-constructed balanced training dataset, yields sensitivity and specificity of 0.5 and 0.79, respectively. The second consensus model, obtained using models developed by the modified training algorithm, has sensitivity and specificity of 0.65 and 0.85, respectively. A simple metric is constructed to identify descriptors important for the prediction of hepatotoxicity.

2. Results and Discussion

2.1. Dataset

After data curation, our dataset contained 524 compounds, which were gathered using five literature sources. As can be seen in Figure 1, there were no compounds that were selected from just one source; an exception is the LiverTox database.
Different approaches that were used to deal with dataset imbalance influenced the selection of compounds into training set, test set (TE1), internal validation set (TE2), and external validation set. The underlying reason was the difference in the size of training sets for the two approaches. In the first approach, a balanced number of hepatotoxic and non-hepatotoxic compounds was used in the training set, which consisted of 216 compounds. For the second approach, where the modified CPANN training algorithm was applied, an imbalanced training set of 404 compounds, with 26.7% compounds in the hepatotoxic class, was used. Thus, the test set, internal validation set, and external validation set used in the first approach contained a larger number of compounds from the non-hepatotoxic class. The class distribution of compounds among the sets is given in Table 1. The selection of compounds into the validation set was made based on the distribution of compounds on the Kohonen top-map shown in Figure 2 and Figure 3 for the first and the second approach, respectively. The red color in Figure 2 and Figure 3 indicates compounds selected for the validation set. Principal component analysis (PCA) was performed to verify if it could separate compounds belonging to the hepatotoxic and non-hepatotoxic classes. It was observed that the first few principal components could not separate the two classes of compounds. More details are given in the Supplementary Materials (Supplementary file: Supplementary1.pdf).

2.2. Models

Optimizations of CPANNs were made using the manually balanced training set in the first approach and using the modified CPANN training algorithm in the second approach. The optimizations for the first approach resulted in 24 models that passed our criteria, i.e., with sensitivity and specificity on training, test set TE1, and test set TE2 of at least 0.7, where test set TE2 was used as the internal validation set. The second approach resulted in 124 models passing the same criteria. The results for individual models are given in Table 2, Table 3 and Table 4, where each table presents the results for one of the three optimization criteria used. The averages of sensitivity and specificity calculated based on the results of individual models for the first approach were 0.55 and 0.71, respectively, while for the second approach, the averages were 0.63 and 0.67.
Using the models developed by the first and second approach, consensus predictions were made for external validation set compounds. For the first approach, where the manually balanced training set was used, the obtained sensitivity and specificity for consensus predictions were 0.5 and 0.79, respectively. In the second approach, where models were developed by modified CPANN training algorithm, consensus predictions for external validation set resulted in sensitivity of 0.65 and specificity of 0.85. Comparing the consensus predictions with averages calculated from individual models, the consensus predictions based on the models from the second approach resulted in slightly increased sensitivity and significantly improved specificity. The consensus predictions were made considering and without considering the applicability domain of the models according to the method described by Minovski et al. [18]. In both cases, the consensus predictions were the same for all objects.

2.3. Descriptors

Since the composition of datasets and descriptors differed in the two approaches used for considering the imbalanced dataset, the importance of the selected descriptors was also assessed separately. In the first approach, where 24 models were selected, 143 unique descriptors were selected at least once, while in the second approach, 218 unique descriptors were used at least once. In order to assess the importance of selected descriptors, the following Equation (1) was used:
I d = 1 n m s × a n n m
In Equation (1), nm is the number of models where a descriptor could be potentially selected, s is 1 if descriptor is selected or 0 if it is not, n is the number of all of selected descriptors in the model, and a is the number of all descriptors that could be selected by a model (either 50, 98, or 181).
We removed descriptors that appeared less than four times in the accepted models for both approaches to lower the chances of a descriptor being randomly selected. Descriptors reaching Id of at least 2 are given in Table 5 and Table 6 for the first and the second approach, respectively. To focus on the second modeling approach, which we assumed was superior, among the most important descriptors were two JGI descriptors (Table 6), which represented the total charge transfer (at topological distances 6 and 4). Interestingly, descriptor H%, which represented the percentage of H atoms in molecule, was placed highly in both tables. Another interpretable descriptor scoring high for the second approach was the Uc descriptor, which was correlated to the number of unsaturated bonds in a molecule.
It was difficult to draw conclusions of the role of the listed descriptors in hepatotoxicity. However, we plan to shed light on the effect of relevant descriptors in the future. To do so, we plan to improve the counter-propagation neural networks methodology to better discriminate compounds found in activity cliffs, which was a frequent event in the dataset used. Hopefully, future work will lead to a better understanding of the mechanisms of drug-induced liver injury.

3. Materials and Methods

3.1. Dataset

The dataset used in the study was compiled using data from five literature sources related to drug-induced liver injury (DILI) [12,21,22,23,24]. Names of the compounds in the literature sources were used to extract SMILES from PubChem [25]. From the collection of acquired data, chemical entities, such as proteins, larger peptides, inorganic compounds, compounds containing elements, which were very uncommon in the dataset, and mixtures of compounds were removed. We also removed compounds that are found in drugs where the only route of administration is topical according to DrugBank [26]. Compounds were, where applicable, protonated or deprotonated in Pipeline Pilot [27] to achieve canonical SMILES representation of molecules. Finally, duplicates with the same active pharmaceutical ingredient were removed, while their information for toxicity classes was retained. Three sources [21,23,24] tackled the DILI problem with more than two classes. We assigned compounds to two classes, DILI-positive and DILI-negative, according to the following rules. From source [21], most-DILI-concern and less-DILI-concern were considered DILI-positive and no-DILI-concern were considered DILI-negative. The toxicity classes “severe DILI” and ”non-severe DILI” from source [23] were considered DILI-positive, while the non-DILI class was considered DILI-negative. Compounds found in [24] were considered DILI-positive if they belonged to class A or B and DILI-negative if they belonged to class E. C, D, and E* classes from that source were not assigned to either the DILI-positive or DILI-negative class, and remained as such since they represent compounds with a lower likelihood of being toxic or nontoxic. In a way, such compounds represent unreliable data. Next, we checked if there were any conflicts in toxicity classes in the data from different sources. For example, if a compound was present in two out of five sources and classified as DILI-positive, we assigned the DILI-positive class to it. Similarly, if compound was present in three sources and classified as DILI-negative in two of them and as D in Livertox, the compound was not considered for further use (possible conflict of classes). From source [22], we extracted only compounds that had been manually checked for DILI (named calibration set in the source). In this way, we obtained 524 compounds that were used in our study.

3.2. Selection of Descriptors and Datasets

Descriptors used to represent compounds were 0-, 1-, and 2-dimensional descriptors calculated using Dragon [28] software. The selection of descriptors for modeling was made in two steps. The first step involved the removal of correlated and constant descriptors, and in the second step, the descriptors were reduced using Kohonen mapping. In the paper by Topliss and Edwards [29], where multiple regression models were studied, a limit of 0.8 for the pairwise correlation coefficient was suggested. However, we did not want to remove too many descriptors in the first step, thus we used a higher limit. The descriptors with pairwise correlation coefficient below 0.95 were kept. Descriptors with more than 70% constant values in the entire set were discarded. In that way, we obtained 580 descriptors, encoding a dataset of 524 compounds. Descriptors were normalized using z-scores. Compounds were mapped on 2-dimensional space using the Kohonen artificial neural network to select subsets used for the modeling. First, the objects for the external validation set were selected from the Kohonen top-map. We tried to maximize covered chemical space, while also attempting that validation set objects would reflect the density of all objects on the map, reflecting important aspects of diversity and closeness of sets [30]. Since we used two different approaches for dealing with an imbalanced dataset, each approach requiring a different size training set, we also used the validation sets of different sizes. After the exclusion of the external validation set, descriptors were again checked for constant values. Then, the number of descriptors was reduced. Kohonen maps of different sizes (5 × 5, 7 × 7, 10 × 10) were used to map the objects of the transposed data matrix. Kohonen top-maps were obtained, from which the closest and farthest objects (descriptor), considering the Euclidean distance to the neuron, were selected from each neuron. This resulted in sets containing 50, 98, and 181 descriptors used for modeling. The remaining compounds (excluding the validation set) were again mapped on the Kohonen top-map, this time using 50, 98, and 181 descriptors. The training set and two test sets (represented by TE1 and TE2) were selected from each top-map. Again, we tried to cover chemical space for each of the three sets, again taking the density of objects on the map into account to some degree. However, if a neuron was excited by up to two objects, they were favourably put in the training set. Descriptor values for all sets were normalized with z-score, with mean and standard deviation corresponding to the training set. Test set TE1 was used for model optimization; test set TE2 was used to estimate the performance of the models and select models, which were then externally validated using the external validation set.
Information about the compounds and descriptors used are available in Supplementary Files “distribution_of_compounds_into_sets.xlsx” and “compound_descriptors.xlsx”. The workflow of the modeling approach used is schematically presented in Figure 4.

3.3. Models

When developing classification models, we used three different sets of descriptors, as explained in the previous Section 3.2. Due to the imbalanced nature of our dataset, having more non-toxic compounds than toxic, we also considered training with balanced training sets and an imbalanced training set. In that way, three different combinations of training and test sets were obtained for each of the two validation sets. The balanced training set contained 108 objects from each class, while the imbalanced training set was made up of 108 DILI-positive and 296 DILI-negative compounds. The imbalanced training set was further used in two ways. The first approach was to use all objects in the training set for each epoch of training. Secondly, due to the imbalanced nature of the training set, having 108 DILI-positive compounds and 296 DILI-negative compounds in the set, we used a balanced subset of 216 compounds in each epoch of training, which consisted of all 108 DILI-positive compounds and randomly selected 108 DILI-negative compounds for each epoch. Training was performed using the counter-propagation artificial neural network. The first approach of dealing with the imbalanced training set (i.e., not to deal with it) yielded no satisfactory results, so the first approach further in the text refers to the fixed balanced training set and the second approach refers to the imbalanced training set that was used as balanced in each epoch of training.

3.4. Optimization of Models Using Genetic Algorithm

We applied the genetic algorithm for the optimization of models. Optimizations were performed on test set TE1 or both the training and test set TE1 together, depending on the selected optimization criterion. We performed 7020 optimizations in total, each of them having a unique combination of an optimization criterion, a parameter describing the effect of number of selected descriptors on the cost function, number of neurons in the network, and the parameters used for genetic algorithm, such as the number of chromosomes for crossover. We selected three optimization criteria to optimize models. The selected optimization criteria (OC1, OC2, OC3) are written in Equations (2)–(4). The optimization criteria were based on the Matthew correlation coefficient (MCC) of test set TE1, the minimum of sensitivity and specificity of test set TE1, and the product of MCC of training and MCC of test set TE1. The effect of the number of selected descriptors (Nselected) on the optimization criterion was taken into account using Equation (5), where p is the parameter defined prior to each optimization and Ndescriptors is the total number of descriptors used in the training set. Matthew correlation coefficient, sensitivity, and specificity were calculated according to Equations (6)–(8), respectively. In Equations (6)–(8), TP, TN, FP, and FN indicate the number of true positive, true negative, false positive, and false negative predictions, respectively.
From all optimizations, we selected models where the sensitivity and specificity for all three sets (training, test TE1, and test TE2) were at least 0.7, and all three MCC were at least 0.5. Additionally, if any of the best three chromosomes during optimization had on average sensitivity and specificity of at least 0.7 for all three sets in the last 20 generations of optimization, all three models were also selected. Afterwards, we trained each such selected model 100 times using a random order of training objects. Models with average sensitivity and average specificity of 0.7 or higher for all three sets were finally selected as acceptable models.
OC1 = min(sensitivity(test set TE1),specificity (test set TE1))∙f,
OC2= MCC(test set TE1)∙f,
OC3= MCC(training set)∙MCC(test set TE1)∙f,
f = 1 − p∙(Nselected-1)/Ndescriptors,
MCC = (TPTN – FPFN)/(((TP + FP)∙(TP + FN)∙(TN + FP)∙(TN + FN))1/2),
sensitivity = TP/(TP + FN),
specificity = TN/(TN + FP).

3.5. Kohonen and Counter-Propagation Artificial Neural Networks

Classification models were developed using counter-propagation artificial neural networks (CPANNs). Detailed descriptions of CPANNs can be found in the literature [19,20]. Here, a brief explanation of the training algorithm is given with a description of modifications made to the algorithm that was used when the number of objects in the two target classes of the training dataset was considerably different.
Counter-propagation artificial neural network models can be described as a 3D matrix of weights divided into two layers. A single column of weights in each layer represents one neuron. The upper layer of neurons in CPANNs is known as the Kohonen layer, and the layer beneath is the output layer, also known as the Grossberg layer. The weights in the Kohonen layer correspond to independent variables of objects (e.g., molecular descriptors), and the weights in the output layer correspond to dependent variables (e.g., target toxicity classes) of the input objects.
During the training, competitive learning is used in the Kohonen layer. The independent variables of an input object are compared to all the neurons in the Kohonen layer and the neuron that is the most similar to the object is selected as the central neuron, often called the “winning neuron”. The similarity between an object and a neuron is determined by calculating the Euclidean distance between the neuron weights and independent variables of the object. The neuron with the shortest Euclidean distance is selected as the central neuron. After the central neuron is determined, the corrections of the weights can be made. The corrections depend on a neighborhood function, learning rate, and differences between the current neuron weights and the corresponding variable of the input object. The neighborhood function defines the amount of correction made at a given topological distance from the central neuron. In our study, a triangular neighborhood function was used, where the correction was maximal on the central neuron and decreased with topological distance from the central neuron. The neighborhood function also decreased during the training so that at the end of the training only the weights corresponding to the central neuron were corrected. Learning rate, η(t), linearly decreased during the training according to Equation 9, with the largest value (vmax) in the first iteration (t = 1) of the training and the smallest value (vmin) at the end of the training (t = tmax):
η(t) = vmin + (vmaxvmin)∙(tmaxt)/(tmax – 1).
The weights are corrected so that the weights in the Kohonen layer become more similar to the independent variables, and the weights in the output layer become more similar to the target values of the input object. The weights are updated according to Equation 10, where the second term in the sum represents the correction made at given iteration t:
newwi,j = oldwi,j + η(t)∙a(c,j,t)∙(xioldwi,j).
In Equation (10), newwi,j and oldwi,j represent the new and previous values of the weight on neuron j, corresponding to variable i of the input object; xi represents the actual value of the variable i of the input object, and a(c,j,t) is the value of neighborhood function for neuron j and central neuron c.
The training lasts for a predefined number of epochs, where one epoch means that each object from the training set is used exactly once for the training. Here, we introduced a modification that we made to the training procedure when the number of objects in one class was considerably larger than in the other class, i.e., when the imbalanced dataset was used for training. Normally, all the input objects are used in one epoch of training. Due to imbalanced data, we made a random subsampling of input objects without repetition prior to each epoch of training, so that the subsample contained a balanced number of objects from each class. Therefore, one epoch in such a case was equal to the number of training iterations when each object from the subsample was used exactly once. Practically, that meant that the neural network was trained with the number of objects equal to the number of objects in the subsample, and in each epoch a different training set was used. In this way, we still used all data from the training set, however the objects from the majority class were less frequently used during the training than those from the minority class.

3.6. Genetic Algorithm

The genetic algorithm was used for the optimization of the counter-propagation artificial neural network models. The genetic algorithm was employed for the selection of descriptors used in the model and for the optimization of learning rate parameters. A description of genetic algorithms is given in the literature [31]. Genetic algorithm optimization starts with the creation of an initial random population of chromosomes. Chromosomes may define the features to be used in a model and other parameters that may be involved in the development of the model. Using the encoded information from the chromosomes, models are built for each chromosome and ranked according to preselected optimization criterion. A new population of chromosomes is then created by mating chromosomes with good optimization criteria. Mutation operators are applied to the new population of chromosomes to introduce random genetic alterations in the chromosomes. The created population of chromosomes is again used for the creation of new models, and all the steps repeat until a predefined number of populations is reached or when no improvement in optimization criterion is observed.

4. Conclusions

Drug-induced liver injury (DILI) presents one of the major concerns in the drug development process. Using in silico approaches, one may efficiently evaluate the possible hepatotoxic effect of drugs. Imbalance of the dataset and activity cliffs present major obstacles to developing useful QSAR models. In this study, we encountered both difficulties when performing the modeling of DILI using counter-propagation artificial neural networks (CPANNs). The modification of the classical training algorithm with the integration of random subsampling within the training procedure of CPANN presented in this work was intended to solve the problem of dataset imbalance. A number of optimizations of CPANNs were performed using a genetic algorithm to find acceptable models. Based on the comparison of the results, we conclude that using an imbalanced set with balanced subsampling in each learning epoch is a better approach compared to using a fixed balanced set in the case of the counter-propagation artificial neural network learning methodology. The models obtained using the modified CPANN training algorithm were applied for consensus predictions and showed sensitivity of 0.65 and specificity of 0.85 for compounds in the external validation set.

Supplementary Materials

The following are available online, Supplementary1.pdf: Results of Principal Component Analysis, compound_descriptors.xlsx, distribution_of_compounds_into_sets.xlsx.

Author Contributions

Conceptualization, B.B. and V.D.; methodology, B.B.; formal analysis, B.B.; investigation, B.B.; data curation, B.B.; writing—original draft preparation, B.B. and V.D.; visualization, B.B.; supervision, V.D. All authors have read and agreed to the published version of the manuscript.

Funding

The work was funded by Javna Agencija za Raziskovalno Dejavnost Republike Slovenije (ARRS; Slovenian Research Agency) through PhD study grant for young researchers (B.B.) and through research program P1-0017 (V.D.).

Acknowledgments

The authors thank Prof. Dr. Marjana Novič for discussions regarding QSAR modeling approaches.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Waring, M.J.; Arrowsmith, J.; Leach, A.R.; Leeson, P.D.; Mandrell, S.; Owen, R.M.; Pairaudeau, G.; Pennie, W.D.; Pickett, S.D.; Wang, J.; et al. An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat. Rev. Drug Discov. 2015, 14, 475–486. [Google Scholar] [CrossRef] [PubMed]
  2. Watkins, P.B. Drug Safety Sciences and the Bottleneck in Drug Development. Clin. Pharmacol. Ther. 2011, 89, 788–790. [Google Scholar] [CrossRef] [PubMed]
  3. Onakpoya, I.J.; Heneghan, C.J.; Aronson, J.K. Post-marketing withdrawal of 462 medicinal products because of adverse drug reactions: A systematic review of the world literature. BMC Med. 2016, 14, 10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Clark, M.; Steger-Hartmann, T. A big data approach to the concordance of the toxicity of pharmaceuticals in animals and humans. Regul. Toxicol. Pharmacol. 2018, 96, 94–105. [Google Scholar] [CrossRef]
  5. Fenwick, N.; Griffin, G.; Gauthier, C. The welfare of animals used in science: How the “Three Rs” ethic guides improvements. Can. Vet. J. 2009, 50, 523–530. [Google Scholar]
  6. Soldatow, V.Y.; Lecluyse, E.L.; Griffith, L.G.; Rusyn, I. In vitro models for liver toxicity testing. Toxicol. Res. (Camb). 2013, 2, 23–39. [Google Scholar] [CrossRef] [Green Version]
  7. Grimm, F.A.; Iwata, Y.; Sirenko, O.; Bittner, M.; Rusyn, I. High-content assay multiplexing for toxicity screening in induced pluripotent stem cell-derived cardiomyocytes and hepatocytes. Assay Drug Dev. Technol. 2015, 13, 529–546. [Google Scholar] [CrossRef]
  8. Pettinato, G.; Lehoux, S.; Ramanathan, R.; Salem, M.M.; He, L.X.; Muse, O.; Flaumenhaft, R.; Thompson, M.T.; Rouse, E.A.; Cummings, R.D.; et al. Generation of fully functional hepatocyte-like organoids from human induced pluripotent stem cells mixed with Endothelial Cells. Sci. Rep. 2019, 9, 1–21. [Google Scholar] [CrossRef] [Green Version]
  9. Vernetti, L.A.; Senutovitch, N.; Boltz, R.; DeBiasio, R.; Ying Shun, T.; Gough, A.; Taylor, D.L. A human liver microphysiology platform for investigating physiology, drug safety, and disease models. Exp. Biol. Med. 2016, 241, 101–114. [Google Scholar] [CrossRef]
  10. Liew, C.Y.; Lim, Y.C.; Yap, C.W. Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J. Comput. Aided. Mol. Des. 2011, 25, 855–871. [Google Scholar] [CrossRef]
  11. Przybylak, K.R.; Cronin, M.T.D. In silico models for drug-induced liver injury - Current status. Expert Opin. Drug Metab. Toxicol. 2012, 8, 201–217. [Google Scholar] [CrossRef] [PubMed]
  12. Cruz-Monteagudo, M.; Cordeiro, M.N.D.S.; Borges, F. Computational chemistry approach for the early detection of drug-induced idiosyncratic liver toxicity. J. Comput. Chem. 2008, 29, 533–549. [Google Scholar] [CrossRef] [PubMed]
  13. Ekins, S.; Williams, A.J.; Xu, J.J. A predictive ligand-based Bayesian model for human drug-induced liver injury. Drug Metab. Dispos. 2010, 38, 2302–2308. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Fourches, D.; Barnes, J.C.; Day, N.C.; Bradley, P.; Reed, J.Z.; Tropsha, A. Cheminformatics analysis of assertions mined from literature that describe drug-induced liver injury in different species. Chem. Res. Toxicol. 2010, 23, 171–183. [Google Scholar] [CrossRef] [Green Version]
  15. Kotsampasakou, E.; Montanari, F.; Ecker, G.F. Predicting drug-induced liver injury: The importance of data curation. Toxicology 2017, 389, 139–145. [Google Scholar] [CrossRef]
  16. Wang, Y.; Xiao, Q.; Chen, P.; Wang, B. In silico prediction of drug-induced liver injury based on ensemble classifier method. Int. J. Mol. Sci. 2019, 20, 4106. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Zhu, X.-W.; Li, S.-J. In Silico Prediction of Drug-Induced Liver Injury Based on Adverse Drug Reaction Reports. Toxicol. Sci. 2017, 158, 391–400. [Google Scholar] [CrossRef]
  18. Minovski, N.; Župerl, Š.; Drgan, V.; Novič, M. Assessment of applicability domain for multivariate counter-propagation artificial neural network predictive models by minimum Euclidean distance space analysis: A case study. Anal. Chim. Acta 2013, 759, 28–42. [Google Scholar] [CrossRef]
  19. Zupan, J.; Novič, M.; Gasteiger, J. Neural networks with counter-propagation learning strategy used for modelling. Chemom. Intell. Lab. Syst. 1995, 27, 175–187. [Google Scholar] [CrossRef]
  20. Zupan, J.; Novič, M.; Ruisánchez, I. Kohonen and counterpropagation artificial neural networks in analytical chemistry. Chemom. Intell. Lab. Syst. 1997, 38, 1–23. [Google Scholar] [CrossRef]
  21. Chen, M.; Suzuki, A.; Thakkar, S.; Yu, K.; Hu, C.; Tong, W. DILIrank: The largest reference drug list ranked by the risk for developing drug-induced liver injury in humans. Drug Discov. Today 2016, 21, 648–653. [Google Scholar] [CrossRef] [PubMed]
  22. Zhu, X.; Kruhlak, N.L. Construction and analysis of a human hepatotoxicity database suitable for QSAR modeling using post-market safety data. Toxicology 2014, 321, 62–72. [Google Scholar] [CrossRef]
  23. Zhang, J.; Doshi, U.; Suzuki, A.; Chang, C.W.; Borlak, J.; Li, A.P.; Tong, W. Evaluation of multiple mechanism-based toxicity endpoints in primary cultured human hepatocytes for the identification of drugs with clinical hepatotoxicity: Results from 152 marketed drugs with known liver injury profiles. Chem. Biol. Interact. 2016, 255, 3–11. [Google Scholar] [CrossRef] [PubMed]
  24. Hoofnagle, J.H.; Serrano, J.; Knoben, J.E.; Navarro, V.J. LiverTox: A website on drug-induced liver injury. Hepatology 2013, 57, 873–874. [Google Scholar] [CrossRef] [PubMed]
  25. Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2019 update: Improved access to chemical data. Nucleic Acids Res. 2019, 47, D1102–D1109. [Google Scholar] [CrossRef] [Green Version]
  26. Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; et al. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074–D1082. [Google Scholar] [CrossRef]
  27. BIOVIA Pipeline Pilot (Release 2014); Dassault Systèmes: San Diego, CA, USA, 2016.
  28. Mauri, A.; Consonni, V.; Pavan, M.; Todeschini, R. DRAGON software: An easy approach to molecular descriptor calculations. MATCH Commun. Math. Comput. Chem. 2006, 56, 237–248. [Google Scholar]
  29. Topliss, J.G.; Edward, R.P. Chance Factors in Studies of Quantitative Structure-Activity Relationship. J. Med. Chem. 1979, 22, 1238–1244. [Google Scholar] [CrossRef]
  30. Golbraikh, A.; Tropsha, A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput. Aided. Mol. Des. 2002, 16, 357–369. [Google Scholar] [CrossRef]
  31. Leardi, R. Nature-Inspired Methods in Chemometrics: Genetic Algorithms and Artificial Neural Networks; Elsevier: Amsterdam, The Netherlands, 2003; ISBN 9780444513502. [Google Scholar]
Figure 1. Venn diagram of selected compounds. Compounds in our dataset that we classified either as DILI-positive or DILI-negative were extracted from different literature sources. Only LiverTox contained compounds that were not present in any of the other sources.
Figure 1. Venn diagram of selected compounds. Compounds in our dataset that we classified either as DILI-positive or DILI-negative were extracted from different literature sources. Only LiverTox contained compounds that were not present in any of the other sources.
Molecules 25 00481 g001
Figure 2. Selection of validation set compounds for the first approach using the Kohonen top-map.
Figure 2. Selection of validation set compounds for the first approach using the Kohonen top-map.
Molecules 25 00481 g002
Figure 3. Selection of validation set compounds for the second approach using the Kohonen top-map.
Figure 3. Selection of validation set compounds for the second approach using the Kohonen top-map.
Molecules 25 00481 g003
Figure 4. Modeling workflow. A total of 1259 Dragon descriptors were calculated for our dataset. The number of descriptors was reduced based on pairwise correlation and the frequency of the most common value. For the first modeling approach, where selection of the training set resulted in balanced classes, 102 validation objects were removed. For the second approach, where selection of the training set resulted in imbalanced classes, 40 objects were removed. For both approaches, the number of descriptors was further reduced: 50, 98, and 181 descriptors were selected. Based on the selected descriptors, objects for the training set (TR), test set 1 (TE1), and test set 2 (TE2) were selected. We applied the genetic algorithm for optimization of the models. Models passing the criteria were selected for internal validation with TE2. Lastly, models passing internal validation were externally validated with the corresponding validation set.
Figure 4. Modeling workflow. A total of 1259 Dragon descriptors were calculated for our dataset. The number of descriptors was reduced based on pairwise correlation and the frequency of the most common value. For the first modeling approach, where selection of the training set resulted in balanced classes, 102 validation objects were removed. For the second approach, where selection of the training set resulted in imbalanced classes, 40 objects were removed. For both approaches, the number of descriptors was further reduced: 50, 98, and 181 descriptors were selected. Based on the selected descriptors, objects for the training set (TR), test set 1 (TE1), and test set 2 (TE2) were selected. We applied the genetic algorithm for optimization of the models. Models passing the criteria were selected for internal validation with TE2. Lastly, models passing internal validation were externally validated with the corresponding validation set.
Molecules 25 00481 g004
Table 1. Distribution of compounds into classes for the datasets used.
Table 1. Distribution of compounds into classes for the datasets used.
TRTE1TE2VA
Ninit Hepat.Non-H.Hepat.Non-H.Hepat.Non-H.Hepat.Non-H.
182a108296202020202020
b108108208320832082
98a108296202020202020
b108108208320832082
50a108296202020202020
b108108208320832082
TR—training set; TE1—the first test set; TE2—the second test set; VA—external validation set. Ninit—Number of initial descriptors in the dataset. Hepat—Number of compounds in hepatotoxic class. Non-H—Number of compounds in non-hepatotoxic class. a— Imbalanced training set used during optimization. b—Manually balanced training set used during optimization.
Table 2. Average values of sensitivity and specificity obtained for models optimized using optimization criterion 1. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in the training set.
Table 2. Average values of sensitivity and specificity obtained for models optimized using optimization criterion 1. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in the training set.
OC1 TRTE1TE2VA
Nin.modelSens.Spec.Sens.Spec.Sens.Spec.Sens.Spec.
1821 a0.81 ± 0.020.87 ± 0.020.82 ± 0.070.84 ± 0.030.74 ± 0.070.70 ± 0.040.56 ± 0.060.68 ± 0.04
2 b0.92 ± 0.020.79 ± 0.020.90 ± 0.060.92 ± 0.050.76 ± 0.080.78 ± 0.090.64 ± 0.080.72 ± 0.08
3 b0.91 ± 0.020.79 ± 0.020.86 ± 0.060.90 ± 0.060.72 ± 0.090.78 ± 0.090.63 ± 0.080.73 ± 0.09
4 b0.92 ± 0.020.78 ± 0.020.88 ± 0.070.90 ± 0.070.71 ± 0.070.77 ± 0.080.65 ± 0.070.75 ± 0.08
5 b0.86 ± 0.030.76 ± 0.030.85 ± 0.070.86 ± 0.070.72 ± 0.080.71 ± 0.080.57 ± 0.080.78 ± 0.08
6 b0.92 ± 0.020.81 ± 0.020.91 ± 0.060.92 ± 0.060.72 ± 0.070.74 ± 0.090.70 ± 0.070.66 ± 0.07
7 b0.88 ± 0.030.77 ± 0.030.77 ± 0.080.83 ± 0.080.71 ± 0.070.71 ± 0.070.73 ± 0.070.77 ± 0.09
8 b0.88 ± 0.020.77 ± 0.030.90 ± 0.040.88 ± 0.060.77 ± 0.080.70 ± 0.080.64 ± 0.070.68 ± 0.09
981 b0.89 ± 0.020.76 ± 0.020.91 ± 0.060.90 ± 0.050.72 ± 0.080.70 ± 0.090.74 ± 0.080.48 ± 0.10
2 b0.89 ± 0.020.77 ± 0.020.87 ± 0.050.92 ± 0.040.77 ± 0.050.70 ± 0.060.60 ± 0.060.57 ± 0.08
3 b0.89 ± 0.020.77 ± 0.020.86 ± 0.050.91 ± 0.040.75 ± 0.060.71 ± 0.070.60 ± 0.060.58 ± 0.09
4 b0.90 ± 0.020.77 ± 0.020.86 ± 0.050.91 ± 0.060.76 ± 0.070.71 ± 0.060.68 ± 0.070.63 ± 0.08
5 b0.94 ± 0.020.82 ± 0.020.84 ± 0.070.92 ± 0.050.70 ± 0.060.71 ± 0.070.69 ± 0.060.65 ± 0.07
6 b0.85 ± 0.020.71 ± 0.020.88 ± 0.070.83 ± 0.070.73 ± 0.070.73 ± 0.070.69 ± 0.060.49 ± 0.09
7 b0.95 ± 0.020.80 ± 0.020.88 ± 0.050.89 ± 0.070.75 ± 0.070.77 ± 0.080.54 ± 0.080.68 ± 0.07
8 b0.96 ± 0.020.79 ± 0.020.87 ± 0.060.88 ± 0.060.75 ± 0.080.74 ± 0.080.53 ± 0.070.67 ± 0.07
9 b0.95 ± 0.020.81 ± 0.020.87 ± 0.050.90 ± 0.060.76 ± 0.070.79 ± 0.070.53 ± 0.080.70 ± 0.07
10 b0.78 ± 0.030.73 ± 0.030.81 ± 0.080.80 ± 0.060.71 ± 0.070.75 ± 0.060.63 ± 0.060.62 ± 0.07
11 b0.87 ± 0.030.71 ± 0.030.80 ± 0.080.84 ± 0.070.72 ± 0.090.76 ± 0.080.64 ± 0.080.58 ± 0.09
12 b0.93 ± 0.020.80 ± 0.020.93 ± 0.070.93 ± 0.050.71 ± 0.060.77 ± 0.080.66 ± 0.070.66 ± 0.08
13 b0.82 ± 0.030.75 ± 0.020.84 ± 0.060.86 ± 0.080.72 ± 0.080.74 ± 0.070.72 ± 0.080.64 ± 0.06
14 b0.85 ± 0.020.76 ± 0.020.86 ± 0.070.90 ± 0.070.70 ± 0.080.72 ± 0.080.70 ± 0.070.64 ± 0.07
15 b0.79 ± 0.030.73 ± 0.020.90 ± 0.050.92 ± 0.060.77 ± 0.050.72 ± 0.070.64 ± 0.050.77 ± 0.07
16 b0.79 ± 0.030.72 ± 0.020.89 ± 0.070.93 ± 0.050.80 ± 0.060.71 ± 0.060.64 ± 0.060.78 ± 0.07
501 a0.77 ± 0.020.71 ± 0.020.88 ± 0.040.83 ± 0.040.77 ± 0.040.73 ± 0.040.64 ± 0.030.68 ± 0.04
2 a0.77 ± 0.020.72 ± 0.020.86 ± 0.040.84 ± 0.020.76 ± 0.040.75 ± 0.030.63 ± 0.040.71 ± 0.04
3 a0.77 ± 0.020.71 ± 0.020.89 ± 0.040.82 ± 0.040.77 ± 0.030.73 ± 0.050.64 ± 0.030.68 ± 0.04
4 a0.81 ± 0.020.79 ± 0.020.85 ± 0.080.83 ± 0.030.72 ± 0.060.71 ± 0.030.59 ± 0.070.73 ± 0.04
5 a0.86 ± 0.030.86 ± 0.020.84 ± 0.060.81 ± 0.030.73 ± 0.080.71 ± 0.040.58 ± 0.080.70 ± 0.04
6 b0.84 ± 0.030.72 ± 0.030.80 ± 0.070.84 ± 0.060.70 ± 0.070.71 ± 0.070.61 ± 0.080.64 ± 0.09
OC1—optimization criterion 1; TR—Training set; TE1—The first test set; TE2—The second test set; VA—External validation set. Nin.—Number of initial descriptors in the dataset; model—Model number; Sens.—Sensitivity; Spec.—Specificity. a Imbalanced training set used during optimization. b Manually balanced training set used during optimization.
Table 3. The average values of sensitivity and specificity obtained for models optimized using optimization criterion 2. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in training set.
Table 3. The average values of sensitivity and specificity obtained for models optimized using optimization criterion 2. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in training set.
OC2 TRTE1TE2VA
Nin.modelSens.Spec.Sens.Spec.Sens.Spec.Sens.Spec.
1821 a0.92 ± 0.020.92 ± 0.020.93 ± 0.050.85 ± 0.030.76 ± 0.070.72 ± 0.040.39 ± 0.090.69 ± 0.04
2 a0.93 ± 0.020.92 ± 0.020.94 ± 0.050.85 ± 0.040.76 ± 0.070.72 ± 0.040.39 ± 0.080.67 ± 0.04
3 a0.90 ± 0.030.90 ± 0.030.83 ± 0.070.86 ± 0.040.72 ± 0.080.71 ± 0.040.50 ± 0.080.76 ± 0.04
4 a0.89 ± 0.030.89 ± 0.030.84 ± 0.070.85 ± 0.040.74 ± 0.070.73 ± 0.040.50 ± 0.100.76 ± 0.04
5 a0.77 ± 0.020.79 ± 0.030.82 ± 0.060.86 ± 0.040.70 ± 0.090.74 ± 0.040.61 ± 0.080.77 ± 0.04
6 a0.80 ± 0.020.84 ± 0.020.91 ± 0.050.90 ± 0.040.73 ± 0.080.72 ± 0.040.58 ± 0.060.73 ± 0.03
7 a0.81 ± 0.030.84 ± 0.020.93 ± 0.040.88 ± 0.040.71 ± 0.060.74 ± 0.040.56 ± 0.070.72 ± 0.03
8 a0.82 ± 0.030.86 ± 0.020.84 ± 0.080.79 ± 0.040.73 ± 0.080.72 ± 0.040.57 ± 0.080.73 ± 0.04
9 a0.82 ± 0.030.87 ± 0.020.86 ± 0.080.78 ± 0.040.74 ± 0.090.71 ± 0.040.59 ± 0.090.70 ± 0.04
10 b0.92 ± 0.020.79 ± 0.020.85 ± 0.060.95 ± 0.050.72 ± 0.080.72 ± 0.080.65 ± 0.070.76 ± 0.08
11 b0.92 ± 0.020.79 ± 0.020.86 ± 0.060.94 ± 0.050.71 ± 0.070.76 ± 0.070.68 ± 0.060.69 ± 0.10
12 b0.92 ± 0.020.79 ± 0.020.85 ± 0.060.95 ± 0.050.72 ± 0.070.76 ± 0.070.67 ± 0.070.68 ± 0.07
13 b0.91 ± 0.020.78 ± 0.020.83 ± 0.080.91 ± 0.060.76 ± 0.070.75 ± 0.080.63 ± 0.070.69 ± 0.08
14 b0.92 ± 0.020.77 ± 0.020.84 ± 0.080.89 ± 0.060.73 ± 0.070.76 ± 0.090.62 ± 0.070.71 ± 0.08
15 b0.95 ± 0.020.85 ± 0.020.91 ± 0.050.95 ± 0.050.70 ± 0.090.74 ± 0.080.67 ± 0.060.81 ± 0.07
16 b0.95 ± 0.020.84 ± 0.020.92 ± 0.040.95 ± 0.050.72 ± 0.070.73 ± 0.070.69 ± 0.050.81 ± 0.07
17 b0.95 ± 0.020.84 ± 0.020.92 ± 0.040.95 ± 0.050.73 ± 0.090.75 ± 0.080.68 ± 0.060.80 ± 0.07
18 b0.88 ± 0.020.75 ± 0.020.85 ± 0.070.92 ± 0.060.72 ± 0.070.72 ± 0.070.61 ± 0.100.66 ± 0.08
19 b0.90 ± 0.020.78 ± 0.020.84 ± 0.070.82 ± 0.070.74 ± 0.080.71 ± 0.080.59 ± 0.080.63 ± 0.09
20 b0.90 ± 0.020.77 ± 0.020.80 ± 0.090.81 ± 0.090.75 ± 0.060.72 ± 0.090.71 ± 0.090.64 ± 0.09
21 b0.91 ± 0.020.79 ± 0.020.79 ± 0.080.81 ± 0.070.73 ± 0.050.72 ± 0.070.60 ± 0.060.60 ± 0.08
981 b0.82 ± 0.030.75 ± 0.020.86 ± 0.080.82 ± 0.080.73 ± 0.070.77 ± 0.080.70 ± 0.070.64 ± 0.09
2 b0.91 ± 0.020.78 ± 0.020.78 ± 0.080.91 ± 0.060.71 ± 0.080.71 ± 0.080.61 ± 0.080.74 ± 0.08
3 b0.89 ± 0.020.76 ± 0.020.86 ± 0.080.91 ± 0.060.70 ± 0.070.74 ± 0.060.64 ± 0.070.59 ± 0.08
4 b0.89 ± 0.030.73 ± 0.020.89 ± 0.070.92 ± 0.050.73 ± 0.070.72 ± 0.080.67 ± 0.080.56 ± 0.09
5 b0.85 ± 0.020.78 ± 0.020.84 ± 0.060.93 ± 0.060.77 ± 0.060.79 ± 0.100.63 ± 0.080.74 ± 0.06
6 b0.88 ± 0.030.77 ± 0.030.78 ± 0.080.81 ± 0.090.75 ± 0.070.79 ± 0.070.70 ± 0.090.71 ± 0.06
7 b0.87 ± 0.020.78 ± 0.020.78 ± 0.070.90 ± 0.080.76 ± 0.060.84 ± 0.070.62 ± 0.070.73 ± 0.07
8 b0.93 ± 0.020.76 ± 0.020.86 ± 0.070.93 ± 0.050.81 ± 0.070.72 ± 0.080.68 ± 0.060.66 ± 0.06
9 b0.91 ± 0.020.76 ± 0.020.83 ± 0.080.91 ± 0.060.79 ± 0.060.75 ± 0.070.60 ± 0.060.68 ± 0.08
10 b0.91 ± 0.020.76 ± 0.020.80 ± 0.080.90 ± 0.060.78 ± 0.070.75 ± 0.070.60 ± 0.080.68 ± 0.07
11 b0.91 ± 0.020.75 ± 0.020.81 ± 0.080.91 ± 0.060.77 ± 0.070.75 ± 0.080.60 ± 0.080.67 ± 0.09
12 b0.82 ± 0.030.70 ± 0.030.84 ± 0.080.85 ± 0.070.75 ± 0.080.76 ± 0.080.68 ± 0.090.59 ± 0.07
13 b0.86 ± 0.030.74 ± 0.030.83 ± 0.080.88 ± 0.060.70 ± 0.070.71 ± 0.070.55 ± 0.080.71 ± 0.08
14 b0.91 ± 0.020.78 ± 0.020.80 ± 0.080.90 ± 0.050.70 ± 0.080.82 ± 0.060.67 ± 0.070.55 ± 0.08
15 b0.91 ± 0.020.77 ± 0.020.81 ± 0.080.89 ± 0.060.71 ± 0.060.81 ± 0.060.68 ± 0.080.57 ± 0.08
16 b0.86 ± 0.020.78 ± 0.020.88 ± 0.070.93 ± 0.060.73 ± 0.080.75 ± 0.060.64 ± 0.060.73 ± 0.07
17 b0.85 ± 0.030.79 ± 0.020.85 ± 0.080.92 ± 0.060.72 ± 0.080.79 ± 0.050.64 ± 0.060.74 ± 0.07
18 b0.81 ± 0.020.79 ± 0.020.87 ± 0.060.92 ± 0.050.76 ± 0.060.81 ± 0.060.65 ± 0.060.75 ± 0.07
19 b0.91 ± 0.020.77 ± 0.020.89 ± 0.070.88 ± 0.070.73 ± 0.090.71 ± 0.090.61 ± 0.080.51 ± 0.09
20 b0.87 ± 0.020.73 ± 0.020.78 ± 0.080.83 ± 0.060.73 ± 0.080.73 ± 0.070.64 ± 0.070.56 ± 0.08
21 b0.90 ± 0.020.72 ± 0.020.81 ± 0.070.88 ± 0.060.71 ± 0.070.72 ± 0.080.53 ± 0.080.63 ± 0.09
501 a0.76 ± 0.030.74 ± 0.030.85 ± 0.070.78 ± 0.040.71 ± 0.080.72 ± 0.050.61 ± 0.080.70 ± 0.04
2 a0.77 ± 0.030.74 ± 0.040.86 ± 0.070.78 ± 0.050.72 ± 0.070.71 ± 0.040.61 ± 0.070.69 ± 0.04
3 a0.77 ± 0.040.74 ± 0.040.85 ± 0.080.78 ± 0.040.71 ± 0.080.71 ± 0.050.60 ± 0.070.69 ± 0.04
4 b0.97 ± 0.010.83 ± 0.020.88 ± 0.060.85 ± 0.080.71 ± 0.080.72 ± 0.080.55 ± 0.070.79 ± 0.08
5 b0.96 ± 0.020.83 ± 0.020.87 ± 0.070.85 ± 0.060.71 ± 0.080.70 ± 0.080.59 ± 0.070.75 ± 0.08
OC2—optimization criterion 2; TR—Training set; TE1—The first test set; TE2—The second test set; VA—External validation set. Nin.—Number of initial descriptors in the dataset; model—Model number; Sens.—Sensitivity; Spec.—Specificity. a Imbalanced training set used during optimization. b Manually balanced training set used during optimization.
Table 4. The average values of sensitivity and specificity obtained for models optimized using optimization criterion 3. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in training set.
Table 4. The average values of sensitivity and specificity obtained for models optimized using optimization criterion 3. Average values and standard deviations are given for 100 models built using selected descriptors and different permutations of objects in training set.
OC3 TRTE1TE2VA
Nin.modelSens.Spec.Sens.Spec.Sens.Spec.Sens.Spec.
1821 a0.91 ± 0.030.91 ± 0.030.88 ± 0.060.81 ± 0.040.71 ± 0.080.70 ± 0.040.47 ± 0.080.74 ± 0.04
2 a0.90 ± 0.030.90 ± 0.030.83 ± 0.060.86 ± 0.040.70 ± 0.090.71 ± 0.040.52 ± 0.090.74 ± 0.04
3 a0.86 ± 0.030.85 ± 0.030.86 ± 0.060.83 ± 0.050.77 ± 0.080.70 ± 0.040.50 ± 0.100.73 ± 0.04
4 b0.94 ± 0.020.82 ± 0.020.93 ± 0.060.89 ± 0.060.74 ± 0.070.72 ± 0.080.63 ± 0.080.65 ± 0.07
5 b0.94 ± 0.020.81 ± 0.020.94 ± 0.050.91 ± 0.060.73 ± 0.080.71 ± 0.080.65 ± 0.070.64 ± 0.07
6 b0.93 ± 0.020.84 ± 0.020.90 ± 0.040.96 ± 0.050.72 ± 0.070.71 ± 0.090.69 ± 0.060.66 ± 0.09
7 b0.93 ± 0.020.83 ± 0.020.91 ± 0.040.96 ± 0.050.73 ± 0.070.71 ± 0.080.66 ± 0.070.66 ± 0.08
8 b0.93 ± 0.020.82 ± 0.020.90 ± 0.050.92 ± 0.060.73 ± 0.080.71 ± 0.070.61 ± 0.060.74 ± 0.07
9 b0.93 ± 0.020.82 ± 0.020.89 ± 0.050.92 ± 0.060.76 ± 0.070.73 ± 0.070.63 ± 0.070.75 ± 0.07
10 b0.89 ± 0.030.77 ± 0.030.84 ± 0.060.90 ± 0.070.75 ± 0.080.72 ± 0.090.73 ± 0.070.64 ± 0.07
11 b0.88 ± 0.020.78 ± 0.020.89 ± 0.070.91 ± 0.060.70 ± 0.080.72 ± 0.090.64 ± 0.080.68 ± 0.08
12 b0.86 ± 0.020.81 ± 0.030.88 ± 0.080.96 ± 0.060.75 ± 0.070.73 ± 0.080.65 ± 0.070.72 ± 0.07
13 b0.87 ± 0.020.81 ± 0.020.86 ± 0.070.95 ± 0.050.73 ± 0.070.72 ± 0.090.68 ± 0.060.72 ± 0.08
14 b0.87 ± 0.020.80 ± 0.020.87 ± 0.070.95 ± 0.050.75 ± 0.080.72 ± 0.080.67 ± 0.070.73 ± 0.07
15 b0.92 ± 0.020.83 ± 0.020.84 ± 0.060.92 ± 0.060.72 ± 0.070.71 ± 0.080.63 ± 0.060.73 ± 0.06
16 b0.93 ± 0.020.83 ± 0.020.85 ± 0.070.91 ± 0.060.72 ± 0.070.71 ± 0.090.64 ± 0.060.78 ± 0.08
17 b0.94 ± 0.020.81 ± 0.020.93 ± 0.060.91 ± 0.060.75 ± 0.070.77 ± 0.080.66 ± 0.060.72 ± 0.08
18 b0.94 ± 0.020.81 ± 0.020.92 ± 0.060.91 ± 0.070.74 ± 0.080.72 ± 0.080.65 ± 0.050.72 ± 0.08
19 b0.96 ± 0.010.86 ± 0.020.92 ± 0.060.95 ± 0.050.70 ± 0.070.70 ± 0.060.58 ± 0.060.72 ± 0.06
981 a0.86 ± 0.030.89 ± 0.020.91 ± 0.060.76 ± 0.040.71 ± 0.080.71 ± 0.050.52 ± 0.100.65 ± 0.04
2 b0.95 ± 0.020.82 ± 0.020.79 ± 0.070.93 ± 0.060.70 ± 0.060.77 ± 0.060.57 ± 0.070.68 ± 0.06
3 b0.91 ± 0.020.78 ± 0.020.82 ± 0.070.88 ± 0.060.70 ± 0.080.71 ± 0.070.68 ± 0.080.61 ± 0.08
4 b0.91 ± 0.020.79 ± 0.020.80 ± 0.070.90 ± 0.050.73 ± 0.070.71 ± 0.060.69 ± 0.070.64 ± 0.07
5 b0.92 ± 0.020.80 ± 0.020.85 ± 0.070.95 ± 0.050.71 ± 0.090.74 ± 0.070.63 ± 0.070.65 ± 0.07
6 b0.87 ± 0.020.77 ± 0.020.78 ± 0.080.87 ± 0.070.73 ± 0.060.72 ± 0.070.63 ± 0.070.63 ± 0.07
7 b0.86 ± 0.020.75 ± 0.020.79 ± 0.070.87 ± 0.070.73 ± 0.070.71 ± 0.070.62 ± 0.070.63 ± 0.08
8 b0.87 ± 0.020.76 ± 0.020.79 ± 0.080.85 ± 0.080.73 ± 0.060.73 ± 0.080.60 ± 0.080.61 ± 0.09
9 b0.86 ± 0.030.75 ± 0.030.73 ± 0.090.92 ± 0.060.72 ± 0.070.72 ± 0.080.67 ± 0.090.64 ± 0.08
10 b0.87 ± 0.020.77 ± 0.020.74 ± 0.090.93 ± 0.060.75 ± 0.060.75 ± 0.060.70 ± 0.080.58 ± 0.09
11 b0.92 ± 0.020.80 ± 0.020.84 ± 0.070.91 ± 0.060.76 ± 0.080.80 ± 0.060.71 ± 0.070.68 ± 0.07
12 b0.92 ± 0.020.80 ± 0.020.84 ± 0.060.90 ± 0.060.71 ± 0.070.80 ± 0.080.66 ± 0.080.68 ± 0.08
13 b0.92 ± 0.020.79 ± 0.020.86 ± 0.070.93 ± 0.050.73 ± 0.080.80 ± 0.080.68 ± 0.070.68 ± 0.07
14 b0.86 ± 0.020.79 ± 0.030.74 ± 0.060.92 ± 0.060.71 ± 0.080.78 ± 0.080.58 ± 0.070.73 ± 0.09
15 b0.86 ± 0.030.79 ± 0.030.72 ± 0.070.91 ± 0.050.72 ± 0.080.78 ± 0.080.53 ± 0.070.68 ± 0.09
16 b0.92 ± 0.020.79 ± 0.020.88 ± 0.060.90 ± 0.050.75 ± 0.050.72 ± 0.070.64 ± 0.060.64 ± 0.08
17 b0.93 ± 0.020.80 ± 0.020.73 ± 0.090.96 ± 0.040.70 ± 0.070.71 ± 0.080.64 ± 0.070.70 ± 0.08
18 b0.93 ± 0.020.80 ± 0.020.79 ± 0.080.93 ± 0.050.72 ± 0.070.77 ± 0.080.63 ± 0.070.68 ± 0.08
19 b0.92 ± 0.020.81 ± 0.020.74 ± 0.080.87 ± 0.060.73 ± 0.080.79 ± 0.060.67 ± 0.090.65 ± 0.08
20 b0.91 ± 0.020.81 ± 0.020.83 ± 0.070.91 ± 0.050.74 ± 0.070.76 ± 0.070.56 ± 0.080.68 ± 0.08
21 b0.91 ± 0.020.81 ± 0.020.83 ± 0.080.92 ± 0.060.74 ± 0.070.75 ± 0.080.57 ± 0.090.66 ± 0.08
22 b0.92 ± 0.020.81 ± 0.020.85 ± 0.070.95 ± 0.050.76 ± 0.070.81 ± 0.070.60 ± 0.080.68 ± 0.09
23 b0.91 ± 0.020.80 ± 0.020.77 ± 0.080.87 ± 0.070.72 ± 0.080.70 ± 0.070.63 ± 0.070.65 ± 0.08
24 b0.91 ± 0.020.80 ± 0.030.78 ± 0.090.89 ± 0.050.72 ± 0.080.73 ± 0.070.59 ± 0.070.64 ± 0.07
25 b0.91 ± 0.020.81 ± 0.020.84 ± 0.080.96 ± 0.040.71 ± 0.070.74 ± 0.070.58 ± 0.070.69 ± 0.09
26 b0.94 ± 0.020.84 ± 0.020.78 ± 0.080.92 ± 0.060.83 ± 0.060.72 ± 0.080.62 ± 0.080.66 ± 0.07
27 b0.94 ± 0.020.84 ± 0.020.81 ± 0.090.93 ± 0.060.83 ± 0.060.74 ± 0.070.59 ± 0.080.66 ± 0.06
28 b0.94 ± 0.020.84 ± 0.020.81 ± 0.070.92 ± 0.060.83 ± 0.050.72 ± 0.090.61 ± 0.080.63 ± 0.06
29 b0.96 ± 0.020.85 ± 0.020.83 ± 0.080.95 ± 0.050.70 ± 0.090.74 ± 0.080.63 ± 0.100.63 ± 0.09
30 b0.96 ± 0.010.85 ± 0.020.83 ± 0.070.92 ± 0.060.72 ± 0.070.70 ± 0.070.54 ± 0.070.66 ± 0.08
31 b0.97 ± 0.020.86 ± 0.020.82 ± 0.070.94 ± 0.040.72 ± 0.070.72 ± 0.070.54 ± 0.070.67 ± 0.07
32 b0.96 ± 0.020.85 ± 0.020.85 ± 0.070.95 ± 0.040.71 ± 0.080.71 ± 0.060.55 ± 0.070.62 ± 0.07
33 b0.88 ± 0.030.77 ± 0.030.82 ± 0.080.91 ± 0.070.78 ± 0.070.73 ± 0.080.67 ± 0.080.68 ± 0.08
34 b0.88 ± 0.030.77 ± 0.030.80 ± 0.080.91 ± 0.060.77 ± 0.070.73 ± 0.070.66 ± 0.080.70 ± 0.08
35 b0.88 ± 0.030.77 ± 0.030.80 ± 0.070.91 ± 0.060.75 ± 0.080.72 ± 0.060.66 ± 0.090.68 ± 0.08
36 b0.96 ± 0.020.83 ± 0.020.74 ± 0.070.93 ± 0.060.79 ± 0.060.71 ± 0.060.65 ± 0.070.66 ± 0.07
37 b0.95 ± 0.020.82 ± 0.020.73 ± 0.070.94 ± 0.050.79 ± 0.060.71 ± 0.060.65 ± 0.060.65 ± 0.07
38 b0.96 ± 0.020.82 ± 0.020.76 ± 0.070.90 ± 0.060.78 ± 0.060.70 ± 0.070.65 ± 0.080.65 ± 0.06
39 b0.86 ± 0.030.71 ± 0.030.85 ± 0.080.83 ± 0.060.70 ± 0.090.72 ± 0.080.68 ± 0.070.63 ± 0.09
40 b0.87 ± 0.020.71 ± 0.030.82 ± 0.080.82 ± 0.060.74 ± 0.080.71 ± 0.080.66 ± 0.060.61 ± 0.08
41 b0.83 ± 0.020.77 ± 0.020.80 ± 0.080.90 ± 0.060.71 ± 0.070.78 ± 0.060.54 ± 0.070.59 ± 0.08
42 b0.82 ± 0.020.76 ± 0.030.80 ± 0.070.90 ± 0.060.76 ± 0.080.73 ± 0.070.53 ± 0.070.54 ± 0.06
43 b0.83 ± 0.030.74 ± 0.020.81 ± 0.080.91 ± 0.050.76 ± 0.080.73 ± 0.060.56 ± 0.070.54 ± 0.07
44 b0.95 ± 0.020.79 ± 0.020.71 ± 0.090.90 ± 0.040.74 ± 0.070.71 ± 0.080.64 ± 0.070.69 ± 0.07
45 b0.94 ± 0.020.81 ± 0.020.84 ± 0.060.90 ± 0.060.71 ± 0.070.79 ± 0.060.76 ± 0.070.65 ± 0.07
501 a0.92 ± 0.020.92 ± 0.020.82 ± 0.050.79 ± 0.040.72 ± 0.070.70 ± 0.030.53 ± 0.060.69 ± 0.04
2 a0.91 ± 0.020.93 ± 0.020.83 ± 0.070.83 ± 0.040.72 ± 0.070.70 ± 0.040.60 ± 0.070.66 ± 0.04
3 b0.92 ± 0.020.78 ± 0.020.82 ± 0.070.83 ± 0.070.71 ± 0.090.71 ± 0.060.61 ± 0.070.77 ± 0.08
4 b0.87 ± 0.030.76 ± 0.020.84 ± 0.090.85 ± 0.070.73 ± 0.080.71 ± 0.070.66 ± 0.070.66 ± 0.08
5 b0.87 ± 0.030.75 ± 0.020.85 ± 0.070.83 ± 0.090.72 ± 0.080.72 ± 0.080.66 ± 0.070.66 ± 0.07
6 b0.95 ± 0.020.83 ± 0.020.80 ± 0.080.93 ± 0.060.72 ± 0.070.73 ± 0.070.55 ± 0.080.62 ± 0.07
7 b0.96 ± 0.020.83 ± 0.020.81 ± 0.070.93 ± 0.060.71 ± 0.080.74 ± 0.070.58 ± 0.070.63 ± 0.07
OC3—optimization criterion 3; TR—Training set; TE1—The first test set; TE2—The second test set; VA—External validation set. Nin.—Number of initial descriptors in the dataset; model—Model number; Sens.—Sensitivity; Spec.—Specificity. a Imbalanced training set used during optimization. b Manually balanced training set used during optimization.
Table 5. Most important descriptors for models using the first modeling approach.
Table 5. Most important descriptors for models using the first modeling approach.
DescriptorDescriptionId
J_D/DtBalaban-like index from distance/detour matrix8.945987
GATS5vGeary autocorrelation of lag 5 weighted by van der Waals volume8.042931
H%Percentage of H atoms7.579833
SpMin1_Bh(s)Smallest eigenvalue n. 1 of Burden matrix weighted by I-state6.506972
CATS2D_02_AACATS2D Acceptor-Acceptor at lag 025.406386
IC2Information content index (neighborhood symmetry of 2-order)5.3672
GATS1vGeary autocorrelation of lag 1 weighted by van der Waals volume4.810587
GATS2vGeary autocorrelation of lag 2 weighted by van der Waals volume4.727913
BACBalaban centric index4.58365
SpPosA_XNormalized spectral positive sum from chi matrix4.303807
P_VSA_LogP_6P_VSA-like on LogP, bin 63.877732
C-006CH2RX3.640303
P_VSA_e_3P_VSA-like on Sanderson electronegativity, bin 33.475949
P_VSA_MR_2P_VSA-like on Molar Refractivity, bin 23.236547
MATS8mMoran autocorrelation of lag 8 weighted by mass3.138591
nCsp3Number of sp3 hybridized carbon atoms2.675997
PDIPacking density index2.585321
P_VSA_m_4P_VSA-like on mass, bin 42.511289
SpAD_EA(dm)Spectral absolute deviation from edge adjacency mat. weighted by dipole moment2.35969
CATS2D_04_AACATS2D Acceptor-Acceptor at lag 042.332174
X5AvAverage valence connectivity index of order 52.196837
X5AAverage connectivity index of order 52.100552
Table 6. Most important descriptors for models using the second modeling approach.
Table 6. Most important descriptors for models using the second modeling approach.
DescriptorDescriptionId
JGI6Mean topological charge index of order 63.502671
JGI4Mean topological charge index of order 43.398279
SdssCSum of dssC E-states3.372717
H%Percentage of H atoms3.287295
UcUnsaturation count3.071672
P_VSA_LogP_6P_VSA-like on LogP, bin 62.985918
H-052H attached to C0(sp3) with 1X attached to next C2.805648
MAXDNMaximal electrotopological negative variation2.620215
Chi1_EA(dm)Connectivity-like index of order 1 from edge adjacency mat. Weighted by dipole moment2.576782
SpMax_B(m)Leading eigenvalue from Burden matrix weighted by mass2.540987
GATS5mGeary autocorrelation of lag 5 weighted by mass2.480224
SpAD_EA(dm)Spectral absolute deviation from edge adjacency mat. Weighted by Dipole moment2.416726
GATS1iGeary autocorrelation of lag 1 weighted by ionization potential2.365205
SsssNSum of sssN E-states2.352358
SpMAD_EA(bo)Spectral mean absolute deviation from edge adjacency mat. Weighted by bond order2.344266
ChiA_B(s)Average Randic-like index from Burden matrix weighted by I-State2.338868
NssONumber of atoms of type ssO2.223777
VE2sign_AAverage coefficient of the last eigenvector from adjacency matrix2.169277
MATS2pMoran autocorrelation of lag 2 weighted by polarizability2.169277
MATS1pMoran autocorrelation of lag 1 weighted by polarizability2.103962
SpMin1_Bh(v)Smallest eigenvalue n. 1 of Burden matrix weighted by van der Waals volume2.089443
ChiA_B(v)Average Randic-like index from Burden matrix weighted by van der Waals volume2.043667
RbridRing bridge count2.040469
nCsp3Number of sp3 hybridized Carbon atoms2.038696
C-040R-C(=X)-X / R-C#X / X=C=X2.022

Share and Cite

MDPI and ACS Style

Bajželj, B.; Drgan, V. Hepatotoxicity Modeling Using Counter-Propagation Artificial Neural Networks: Handling an Imbalanced Classification Problem. Molecules 2020, 25, 481. https://doi.org/10.3390/molecules25030481

AMA Style

Bajželj B, Drgan V. Hepatotoxicity Modeling Using Counter-Propagation Artificial Neural Networks: Handling an Imbalanced Classification Problem. Molecules. 2020; 25(3):481. https://doi.org/10.3390/molecules25030481

Chicago/Turabian Style

Bajželj, Benjamin, and Viktor Drgan. 2020. "Hepatotoxicity Modeling Using Counter-Propagation Artificial Neural Networks: Handling an Imbalanced Classification Problem" Molecules 25, no. 3: 481. https://doi.org/10.3390/molecules25030481

Article Metrics

Back to TopTop