Next Article in Journal
Impact of Leucine and Magnesium Stearate on the Physicochemical Properties and Aerosolization Behavior of Wet Milled Inhalable Ibuprofen Microparticles for Developing Dry Powder Inhaler Formulation
Previous Article in Journal
Automated Interlaboratory Comparison of Therapeutic Drug Monitoring Data and Its Use for Evaluation of Published Therapeutic Reference Ranges
Previous Article in Special Issue
Fine-tuning of BERT Model to Accurately Predict Drug–Target Interactions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DoubleSG-DTA: Deep Learning for Drug Discovery: Case Study on the Non-Small Cell Lung Cancer with EGFRT790M Mutation

Department of Pharmacology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
*
Authors to whom correspondence should be addressed.
Pharmaceutics 2023, 15(2), 675; https://doi.org/10.3390/pharmaceutics15020675
Submission received: 4 January 2023 / Revised: 5 February 2023 / Accepted: 14 February 2023 / Published: 16 February 2023
(This article belongs to the Special Issue Computational Intelligence (CI) Tools in Drug Discovery and Design)

Abstract

:
Drug–targeted therapies are promising approaches to treating tumors, and research on receptor–ligand interactions for discovering high-affinity targeted drugs has been accelerating drug development. This study presents a mechanism-driven deep learning-based computational model to learn double drug sequences, protein sequences, and drug graphs to project drug–target affinities (DTAs), which was termed the DoubleSG-DTA. We deployed lightweight graph isomorphism networks to aggregate drug graph representations and discriminate between molecular structures, and stacked multilayer squeeze-and-excitation networks to selectively enhance spatial features of drug and protein sequences. What is more, cross-multi-head attentions were constructed to further model the non-covalent molecular docking behavior. The multiple cross-validation experimental evaluations on various datasets indicated that DoubleSG-DTA consistently outperformed all previously reported works. To showcase the value of DoubleSG-DTA, we applied it to generate promising hit compounds of Non-Small Cell Lung Cancer harboring E G F R T 790 M mutation from natural products, which were consistent with reported laboratory studies. Afterward, we further investigated the interpretability of the graph-based “black box” model and highlighted the active structures that contributed the most. DoubleSG-DTA thus provides a powerful and interpretable framework that extrapolates for potential chemicals to modulate the systemic response to disease.

1. Introduction

Clinically acquired resistance is an insurmountable dilemma for small-molecule kinase inhibitors to treat cancer [1]. Nevertheless, locating small-molecule ligands with high affinity and good properties for target proteins in a broad chemical space has been a primary challenge in drug research and development (R&D) [2]. To date, it cannot be overstated to describe the kinase drugs approved by The U.S. Food and Drug Administration (FDA) to overcome clinical resistance driven by protein kinase “gatekeeper” mutation as “desert oasis”. Lung cancer is the leading cause of cancer-related deaths worldwide, with non-small cell lung cancer (NSCLC) being the most common type of lung cancer. Secondary epidermal growth factor receptor (EGFR) mutations in threonine 790 (T790M) lead to acquired resistance which severely affects patient prognosis. Therefore, strategies or drugs to overcome resistance are urgent to prolong the survival of patients with NSCLC.
Laborious wet labs and high-throughput screening techniques are so time-consuming and challenging that they are unsuitable for screening candidate drugs from a broad range of compound groups in pre-drug R&D. With improvements in machine learning theory and an abundance of pharmacological data available, machine learning provides sufficient power for the development of precision medicine and artificially intelligent drug design (AIDD). Many encouraging scientific achievements have convincingly demonstrated the potential of these approaches. For instance, the knowledge graph (KG) enables to detect of the drivers of tumor resistance and adverse drug reactions in a wider multi-omics space [3,4]; reinforcement learning (RL) has been found to be particularly effective in the de novo design and multi-objective optimization of drug molecules [5,6,7]. Deep learning is a powerful data-driven algorithm in machine learning, which offers significant advantages to reveal implicit relationships between drugs, diseases, and genes that are not easily detected, owing to the powerful generalization and representation extraction capability. Some in silico methods that explore potential drug–target associations to advance drug R&D have been developed to narrow the research concentration areas toward the more workable drugs.
Some studies have viewed DTA prediction as a binary classification task, borrowing binary numbers (1/0) to label whether the two are combined [8,9,10], while some others treat it as a regression task and use floating-point numbers to indicate DTAs [11,12,13].
The random forest (RF) algorithm broke the previous methods of relying on multi-parameter scoring functions to infer DTA [14], which has proven to be convincing for extrapolating drug–target relationships in larger chemical spaces. KronRLS [15] and SimBoost [12] were regression-based machine-learning approaches that evaluated similarities between drugs and proteins to determine DTA. Various excellent deep-learning works have been presented. DeepDTA [8] and Attention-DTA [16] leveraged the convolutional neural networks (CNNs) to obtain the hidden relationships of atomic and amino acid sequences. DeepCDA incorporated the long- to short-term memory network which aims to alleviate the phenomenon of gradient disappearance and gradient explosion [17]. MATT-DTI deployed relation-aware self-attention with position embedding to reinforce relative positional associations among atoms [13]. Transformer-based works have come to the fore in various natural language processing (NLP) tasks. DMIL-PPDTA utilized the transformer encoder to enrich word embeddings of drug and protein sequences, aiming to learn hidden associations from the raw data [18]. DeepAtom [19] extrapolated node-level interaction information relevant to binding from the voxelized protein–compound complex structures. Nevertheless, these models rely on known 3D drug–target complexes, and the computational burden of complex 3D convolutional networks to extract the features of massive complexes is expensive. GraphDTA [11] and MGraphDTA [20] represented compounds as topological graphs and evaluated several types of Graph Neural Network (GNN) variants, including Graph Convolutional Network (GCN) [21], Graph Isomorphism Network (GIN) [22], and the Graph Attention Network (GAT) [23], with the aim of replacing CNN and achieving excellent performance. Additionally, DGraphDTA encoded both drugs and proteins into the graphs for inferring DTA by GNN [24]. Among those graph-based methods, they not only effectively avoid the drawbacks of few complex samples and high computational cost, but compensate for the problem of inadequate SMILES (Simplified Molecular Input Line Entry System) [25] for drug representation, and the molecule graph is closer to the natural description of compounds.
Although these methods produce excellent prediction results, they are difficult to generalize to real-world problems. Firstly, the molecular similarity principle [26] states that molecules with similar structures usually show similar biological activities and physicochemical properties; conversely, there are significant differences. Therefore, the model must discriminate between molecular structures over a wide chemical space. Moreover, modeling underlying complicated mapping patterns between compounds and proteins simply concatenate, which deviates from the non-covalent interaction between the receptor and ligand. More importantly, these approaches have limited interpretability as a result of the “black-box” property of graph neural networks. Considering that the false-positive statistics generated by the binary classification task directly impair the robustness of the model, here, predicting DTA was regarded as a regression problem. We propose a three-channel DoubleSG-DTA theoretical framework based on GINs and multiple attention mechanisms to address the aforementioned problems, which significantly outshines other regression-based SOTA methods on various benchmark datasets. Afterward, we visualize the gradient of atomic contributions in graph representations and compare them with the molecular docking poses to further extend the interpretability of the graph-based model.
This paper presents the main contributions as follows:
  • DoubleSG-DTA combined graph isomorphism networks and the squeeze-and-excitation networks to extract multimodal representations of drugs in parallel, aiming to enhance the model to discriminate between compound structures and selectively suppress redundant information to disturb model decisions.
  • The design of cross-multi-head attention mechanisms to model the reality-based non-covalent molecular docking behavior of drug substructures and subsequences with target proteins, respectively;
  • Application of the DoubleSG-DTA to screen promising hit compounds of the NSCLC harboring E G F R T 790 M mutation from natural products, which have been consistent with reported laboratory studies.

2. Double Sequence and Graph to Predict Drug–Target Affinity (DoubleSG-DTA)

This work developed the DoubleSG-DTA model with three-channel multimodal representations, four-channel interaction, and one-channel output for DTA prediction, which deployed multilayer GINs and multiple attention blocks, as shown in Figure 1. Primarily, we took the drug graphs and SMILES as inputs into the drug representation learning models. Multilayer GINs [22] and squeeze-and-excitation networks (SENets) [27] are jointly used as feature extractors for drugs. Additionally, the protein representation learning model captures the dominant feature of the over-redundant protein sequences that are highly dependent on stacked SENets. Moreover, to further encode the drug–target mutual interaction information, we designed cross-multi-head attention to model the reality-based non-covalent molecular docking behavior of drug substructures and subsequences with target proteins, respectively. Ultimately, we decoupled the attention coefficients into the Multilayer Perceptrons (MLPs) to predict DTA. This section presents the building blocks of our framework in order.

2.1. Word Embedding and Graph Encoding

Initially, we utilized high-dimensional word embeddings to uniquely encode drug and protein sequences. To this aim, we built label/integer dictionaries for drug SMILES and protein FASTA sequences, which consist of 64 and 22 key-value pairs, respectively. For example, the SMILES of Propylene glycol “CC(O)CO” and the E G F R T 790 M [28] protein subsequence “NWCVQIA” are encoded as [ 22 22 4 33 3 1 22 33 ] and [ 14 21 2 22 15 8 1 ] according to the SMILES dictionary {‘C’:22, ‘N’:34, ‘O’:33, ‘(’:4, ‘)’:3} and the protein dictionary {‘A’:1, ‘N’:14, ‘C’:2, ‘Q’:15, ‘I’:8, ‘V’:22, ‘W’:21}. We then map each integer vector into word embeddings D e R l d × l e and P e R l p × l e by embedding layers. Where l d and l p denote the size of the SMILES and protein FASTA sequence, l e represents the embedding dimensions.
We convert SMILES to their corresponding molecular graphs G = V , E and extract atom features by RDKit [29], where E and V are the sets of edges and atoms, respectively. Each atom node in a drug is represented by a multi-dimension vector of 10 molecular descriptors (atom symbol, atom number, hybridization, number of adjacent atoms, chirality, formal charge, aromaticity, number of bonded hydrogens, and explicit and implicit valence).

2.2. Drug and Protein Sequence Representation Learning Model

The CNNs construct text features by fusing spatial correlations between features that benefit from the convolutional kernel’s local receptive field but are likewise limited by it. In computer vision, the squeeze-and-excitation (SE) block with channel attention was integrated into existing architectures, which adaptively rescales channel-wise feature weights by explicitly modeling non-mutually-exclusive relationships between channels [27]. The research has confirmed that the SENets achieved superior performance for image classification with a slight increase in computational cost [27]. Accordingly, we stacked multilayer SENets designed to selectively enhance effective statistics and suppress noise to disturb model decisions. Given U R H × W × C as the feature matrix of the convolution layer output, we routed it to the SE block, where U = [ u 1 , u 2 , , u C ] .
SE module makes use of squeeze, excitation, and reweighting operators. The squeeze operator intrinsically aims to transform the dimensions of the feature matrix U and obtain channel-wise statistics z R C by applying the global average pooling operation.
z c = F s q ( u c ) = 1 H × W i = 1 H j = 1 W u c ( i , j ) .
The excitation module leverages two learnable FCNs with the gating mechanism to learn inter-channel non-linear interaction and filter non-dominant features.
s = F e x ( Z , W ) = σ W 2 δ ( W 1 z ) ,
where the δ is the Rectified Linear Unit (ReLU) activation function, and σ is the sigmoid function, and W 1 R C r × C and W 2 R C × C r are the two learnable weight matrices. The reduction ratio was set to r = 16 to reconcile the balance between performance and complexity [27].
The reweighting representation x c was computed by applying the channel-wise multiplication operation to the channel attention weight s c and the feature map u c .
x c = F s c ( u c , s c ) = s c × u c .
where X = x 1 , x 2 , x C , x c R H × W .
The word embeddings D e and P e are directly fed into the convolutional layers, then delivered to the SE block accompanied by a global max pooling operation to calculate desired feature information. Hence, the drug and protein sequence representations can be expressed as:
D S E N e t = g m p S E C N N D e P S E N e t = g m p S E C N N P e .

2.3. Drug Graph Representation Learning Model

Drug molecules are non-Euclidean chemical structures that consist of entities (atoms) and relations (bonds) with rich semantic information and complex spatial structures. This is essential for accurately discriminating between drug molecules and precisely predicting the binding affinity of different compound molecules with proteins. Nevertheless, that is beyond the reach of traditional GNNs.
Meanwhile, we take into account that drugs with similar substructures may react pharmacologically with target proteins with the same or similar protein binding pockets. Interestingly, graph isomorphism networks [22] with injectivity broadly follow a flexible message-passing scheme that enables atoms to recursively update semantic information through aggregating near and far neighboring atomic features. A sufficient number of iterations allows the GIN to be perfectly equipped with the most powerful ability to “read-out” drug graph representations and identify drug molecules.
GIN updates atom feature vectors via the MLPs, ensuring that GIN still satisfies injectivity after K-iterations of aggregation. The graph representation is obtained by summing all of the atom feature vectors in the drug. Formally, the kernel function of GINs updates atom feature vector D v k , and the drug graph representation D G I N is:
D v k = M L P k 1 + ε k · D v k 1 + i N v D i k 1 D G I N = C O N C A T R E A D O U T D v k | v G ,
where N v is a set of nodes adjacent to atom i. The R E A D O U T function is a graph-level pooling function. We made ε a learnable parameter.
The successful construction of deep GINs is highly dependent on the ReLU activation function and batch normalization, while batch normalization can effectively alleviate the vanishing gradient and over-smoothing problems.
G I N ( l + 1 ) G = B N L a y e r G I N l G D G I N = D r o p o u t δ G I N n G , W
where B N L a y e r denotes node-level batch normalization.

2.4. Drug Molecule and Target Protein Interaction Model

Drug molecules binding to target proteins is actually an identification relationship similar to the “lock and key” model. Inspired by previous attention-based methods [13,17,30], we constructed two cross-multi-head attention modules to model non-covalent molecular docking behavior between compounds and proteins, instead of simply connecting drug and protein representations that inherently generates more intrusive information. Concretely, we observed the associations among molecules’ substructures, subsequences, and residues from multiple independent perspectives. The cross-multi-head attention blocks take the drug and protein sequences feature matrices D S E N e t R l d × l c and P S E N e t R l p × l c of SENets, and the drug graph-level representation D G I N R l d × l g of the GIN as inputs, respectively.
In the following paragraphs, we construct learnable linear transition layers so that each head can fully learn from the high-dimensional features. Afterward, we combine D S E N e t , D G I N with P S E N e t by adopting the cross-multi-head attention mechanism.
Q s = δ ( D S E N e t W s e n e t + b s e n e t ) , Q g = δ ( D G I N W g i n + b g i n ) K = δ ( P S E N e t W s e n e t + b s e n e t ) , V = δ ( P S E N e t W s e n e t + b s e n e t )
where W s e n e t R l c × l a , W g i n R l g × l a , and b s e n e t , b g i n are the learnable weights and bias terms, respectively. Q, K, and V represent queries, keys, and values vectors. An individual scaled dot–product attention module was expressed as mapping the Q with K-V pairs to the similarity matrix. Multi-head attention jointly concerned different representation subspaces at distinct positions by concatenating h individual attention units [31].
We obtained one of the cross-multi-head attention weight A D P 1 as follows:
A t t e n t i o n ( Q s , K , V ) = S o f t max Q s K T l c / h · V
h e a d i = A t t e n t i o n ( Q s W i Q , K W i K , V W i V ) A D P 1 = C o n c a t [ h e a d 1 , , h e a d h ] W O
where W i Q , W i K , W i V , and W O are parameter matrices for learning linear projections. Next, another cross-multi-head attention coefficient A D P 2 was computed as:
A t t e n t i o n ( Q g , K , V ) = S o f t max Q g K T l g / h · V
h e a d j = A t t e n t i o n ( Q g W j Q , K W j K , V W j V ) A D P 2 = C o n c a t [ h e a d 1 , , h e a d h ] W O
Afterward, we decoupled the attention weight A D P to obtain drug attention weight α d and protein attention weight α p by applying row-wise sum and column-wise sum operations. We updated the drug representation α D and protein representation α P .
A D P = C o n c a t [ A D P 1 , A D P 2 ]
α D = C oncat [ α d D s e n e t , α d D g i n ] , α P = α p P s e n e t
where ⊙ is an element-wise product. The drug–target interaction weight I d p can be interpreted as modeling the significant semantic correlations between target proteins and compound features.
I d p = g a p C o n c a t g m p α D , g m p α P
where g a p is the global average pooling operation.

2.5. Drug and Target Protein Binding Affinity Prediction

Finally, interaction information I d p was fed directly into MLPs to map the drug–target affinity score. Here, this MLPs consists of four layers, each followed by a ReLU and dropout layer, which are applied to alleviate the model from over-fitting.
D T A = M L P ( I d p ) .

3. Materials and Methods

3.1. Benchmark Datasets

This research assessed the DoubleSG-DTA with three benchmark datasets: Davis [32], KIBA [33], and BindingDB [34] datasets. The statistics of the Davis, KIBA, and BindingDB datasets and split strategy have been listed in Table 1.
p K d = log 10 K d 1 × 10 9
The Davis dataset was highly biased and discrete. We converted the K d values into log space according to Equation (16) [8], and the KIBA dataset comprises KIBA scores for about 118 K protein–compound interactions, and KIBA scores were derived from different bioactivity measures, such as K i , K d , or I C 50 . The BindingDB dataset collects binding affinities for small molecule drugs and target proteins for public access.

3.2. Evaluation Metrics

To ensure consistency and a fair comparison, we applied the Concordance index (CI, ↑), Mean Square Error (MSE, ↓), and Regression toward the mean ( r m 2 index, ↑) as performance metrics following previous studies [8,11,13] to assess the model.
MSE: The MSE metric was commonly used to measure the difference between the ground truths and the predicted values, and minimizing the MSE was the main training objective.
CI: The CI metric was introduced to measure the probability of the concordance between the ground truths and the predicted values. CI values range between 0.50 and 1.0, with values less than 0.7 indicating less convincing model prediction, 0.71 to 0.90 indicating moderate prediction accuracy, and more than 0.9 indicating reliable predictions.
r m 2 : The r m 2 metric was extensively adopted to evaluate the external predictive performance of regression-based models, and an acceptable model has a r m 2 value greater than 0.5.
M S E = 1 N i = 1 N D T A i L a b e l i 2
D T A i and L a b e l i mean the predictive value and the ground truth, respectively.
C I = 1 Z δ i > δ j ζ D T A m a x D T A m i n
D T A m a x and D T A m i n represent the predictive values of the highest affinity δ i and the lowest affinity δ j . ζ ( x ) expresses the step function [15], where ζ x = 1 , x > 0 ; 0.5 , x = 0 ; 0 , x < 0 ; , Z is a normalization constant.
r m 2 = r 2 × ( 1 r 2 r 0 2 ) .
Generally, an acceptable model has a r m 2 value greater than 0.5, where the r 0 2 and r 2 designate squared correlation coefficients of interception or not.
More importantly, the Pearson correlation coefficient was employed to measure the linear correlation between the ground truths and predicted values. The Pearson correlation coefficient can be calculated as follows.
P e a r s o n D T A , L a b e l = C o v D T A , L a b e l σ D T A σ L a b e l ,
where C o v means co-variance, and σ represents the standard deviation.

3.3. Hyperparameter Settings

Experiments were conducted with an NVIDIA RTX A5000 GPU. We adopted five-fold cross-validation to evaluate the quality of previously reported works and DoubleSG-DTA model, Table 2 gives the hyperparameter settings in experiments.

3.4. Baselines

In this part, we conducted experiments applying the MSE(↓), CI(↑), and r m 2 (↑) to assess the DoubleSG-DTA method and previous studies on the above three benchmark datasets, including DeepDTA [8], GraphDTA [11], MATT-DTI [13], AttentionDTA [16], DeepCDA [17], and DMIL-PPDTA [18]. Besides, we also benchmarked our work against proteochemometrics methods [35], including the support vector machine (SVM), feedforward neural network (FNN), SimBoost [12], Random Forest (RF) [14], and KronRLS [15].

4. Results and Discussion

4.1. Comparison against Baselines in Regression Tasks

Table 3, Table 4 and Table 5 summarize the quantitative results of the DoubeSG-DTA and previously studied models on the benchmark datasets. Obviously, DoubleSG-DTA achieved significantly superior performances to other regression-based methods on various datasets.
Considering the Davis dataset, the MSE metric of the DoubleSG-DTA model was 0.219, 0.004 lower than the best DMIL-PPDTA [18] model in the sequence-based models, and the CI and r m 2 metrics of our model were 0.902 and 0.725, 0.009 and 0.04 higher than FNN [20] model in the sequence-based models, respectively. When comparing with the best GraphDTA [11] model in the graph-based models, the CI value was increased by 0.009 and the MSE value was decreased by 4.37%.
Considering the KIBA dataset, the MSE and r m 2 metric of the DoubleSG-DTA model were 0.138 and 0.787, 6.12% lower and 0.003 higher than the best DMIL-PPDTA [18] model in the sequence-based models, and the CI metrics of our model were 0.896, 0.007 higher than the MATT-DTI [13] model in the sequence-based models, respectively. When compared with the best GraphDTA [11] model in the graph-based models, the CI value was increased by 0.005 and the MSE value was decreased by 0.001.
Considering the BindingDB dataset, the MSE metric of the DoubleSG-DTA model was 0.533, 11.61% lower than the best AttentionDTA [16] model in the sequence-based models, and the CI and r m 2 metrics were 0.862 and 0.726, which were 0.01 and 0.039 higher than it, respectively. When compared with the best GraphDTA [11] model in the graph-based models, the CI and r m 2 metrics were increased by 0.005 and 0.023, respectively, and the MSE metric was decreased by 4.31%.
Figure 2 presents that the predictive values and ground truths show approximately overlapping distribution trends in the KIBA, Davis, and BindingDB datasets. In addition, using the Pearson correlation enabled us to make an unbiased assessment for DoubleSG-DTA that is optimized for MSE. In particular, our model achieved even better Pearson correlations of 0.852, 0.894, and 0.867 in the three benchmark datasets, respectively.
These results indicate that the powerful graph isomorphism networks, coupled with the lightweight squeeze-and-excitation networks enable the DoubleSG-DTA to perform exceptionally well under the support of cross-multi-head attention.

4.2. Ablation Study 1: The Effect of Graph Isomorphism Network Layers on Model Performance

Extracting drug representations highly relies on the graph computational capability of GIN. We conducted an ablation experiment to investigate the contribution of graph isomorphism network depth on prediction performance. It can be seen from Figure 3 that the DoubleSG-DTA outperforms all other settings when the count of layers of GINs L 4 , 5 , and the CI and r m 2 metrics of the DoubleSG-DTA model tend to decrease as the number of GIN layers increases, and the MSE metric of the main objective of DoubleSG-DTA training increases sharply. GIN performs a weighted average of its own features and near and far neighboring node features to update the node’s new features, with the aim of capturing graph representations and discriminating between graph structures. However, increasing the number of layers infinitely will cause the feature vectors of nodes within the same cluster to gradually converge to similarity, which may lead to node-wise over-smoothing and impair model decision-making performance [36]. Therefore, the appropriate depth of GIN facilitates obtaining drug graph representations, while stacking a collection of GIN layers may cause over-smoothing and vanishing gradients problems.

4.3. Ablation Study 2: The Effect of Se Block on Model Performance

This work forgoes the CNNs used in previous studies [8,13,16,17] as the feature extractor but instead creates multilayer squeeze-and-excitation networks to construct textual features of drug and amino acid sequences, which was compared with a CNN-based method. As shown in Table 6, although the multilayer SE modules with channel attention were embedded into the DoubleSG-DTA model that caused the model parameters to rise and also caused higher model complexity, there was no significant increase in the training time of the model on the three benchmark datasets. Therefore, controlled experiments demonstrated that the DoubleSG-DTA model with SENet blocks (DoubleSG-DTA + SENet) achieves considerable improvements at a slightly additional computational burden than the models without it (DoubleSG-DTA + CNN). Overall, our findings suggest that SENets significantly reduce the model’s error rate, which benefits from inter-channel attention.

4.4. Ablation Study 3: Interaction Learning with Cross-Multi-Head Attention Mechanism

Ultimately, this study investigated the impact of the cross-multi-head attention mechanism modeling the reality-based molecular docking behavior of drug molecules and target proteins, and compared it against the method of concatenating both. As shown in Table 7, the MSE index of the DoubleSG-DTA model with cross-multi-head attention decreased by 9.50%, 10.39%, and 3.79% compared to the latter in the Davis, KIBA, and BindingDB datasets, respectively. Besides, the r m 2 index increased by 0.012, 0.014, and 0.024. Overall, after using the cross-multi-head attention mechanism, the complete DoubleSG-DTA model led to more considerable improvements.

5. Case Study on the NSCLC with EGFR T 790 M Mutation

According to the statistics of cancer data in 2021 [37], lung cancer mortality increased to around 46% of total cancer mortality, among which NSCLC accounted for approximately 85% of lung malignancies. Patients with NSCLC are normally accompanied by epidermal growth factor receptor (EGFR) mutations [38], which brings great challenges to the treatment of NSCLC. In recent years, the remarkable achievements of small-molecule EGFR tyrosine kinase inhibitors (EGFR-TKIs) in targeted therapy have brought light to NSCLC patients. First-generation EGFR-TKIs (Gefitinib and Erlotinib) and second-generation EGFR-TKI (Afatinib) significantly improved the prognosis of advanced NSCLC patients compared to platinum-based chemotherapy. Unfortunately, the majority of patients develop E G F R T 790 M mutation, resulting in severe resistance symptoms [39]. Inevitably, despite the high selectivity of the third-generation EGFR-TKI (Osimertinib) targeting NSCLC harboring E G F R T 790 M mutation, patients develop secondary resistance [40].
Natural products continue to be a precious source of templates with structural complexity and numerous pharmacophores in drug R&D, especially effective in cancer. For instance, paclitaxel [41] and vincristine [42] have been widely invested in the clinical treatment of tumors. In this section, we preferred to screen high-affinity and good properties targeted inhibitors of NSCLC with E G F R T 790 M mutation from natural products. We hope our results may provide clues for medical scientists to develop highly selective natural drugs.
For the above purpose, we acquired the FASTA sequence of mutant protein E G F R T 790 M (PDB ID:2JIT [28]) from the Protein Data Bank [43] and collected 2645 natural compounds from Selleck Chemicals https://www.selleck.cn/ (accessed on 4 January 2023), which are easily optimized for good human oral bioavailability (OB > 40%) and drug-likeness (DL > 0.18) [44,45]. Table 8 provides information on the top 10 natural products predicted by DoubleSG-DTA, which have the highest affinity to the E G F R T 790 M mutant protein.
Then, we carried out a comprehensive literature survey on the top 10 natural products. Based on the study [46], gossypol not only significantly increased the sensitivity to EGFR-TKIs in H1975 cells carrying E G F R L 858 R / T 790 M , but inhibited cell proliferation and induced apoptosis. The Gö6976 is derived from Staurosporine, experimental confirmation that Gö6976 (at 500 nanomolar) exhibits significant binding affinity for E G F R T 790 M mutants, while it shows a significantly lower affinity for wild-type EGFR [47]. The research results indicate that Shikonin has selective cytotoxic effects on gefitinib-resistant NSCLC cell lines carrying E G F R T 790 M mutation, while relatively safe to normal lung cells [48]. Gossypol acetic acid significantly enhances sensitized lung cancer cells carrying E G F R L 858 R / T 790 M mutation to gefitinib and overcomes EGFR-TKIs resistance [49,50]. According to the above-mentioned report, such natural products may be promising strategies to combat resistance in NSCLC harboring E G F R T 790 M mutation.
Table 8. Docking information of the top 10 natural products with the highest affinity.
Table 8. Docking information of the top 10 natural products with the highest affinity.
Natural ProductsMFMWH-BondsBinding-Energy (KJ/mol)
Gossypol [46]C30H30O8518.604−12.636
Gossypol acetic acid [50]C32H34O10578.603−14.644
Staurosporine [47]C28H26N4O3466.503−18.744
EmodinC15H10O5270.244−13.933
PhyscionC16H12O5284.263−16.862
Aurantio-obtusinC17H14O7330.294−17.531
Shikonin [48]C16H16O5288.293−13.180
RheinC15H8O6284.226−16.192
ObtusifolinC16H12O5284.263−15.104
ChrysophanolC15H10O4254.245−16.025

6. Molecular Docking and Biological Interpretation

To further validate such new interactions, computational docking was performed via AutoDock [51]. As shown in Figure 4, we employed the most efficient, reliable, and successful Lamarckian genetic algorithm in Autodock to perform an adaptive global–local search for the lowest-energy ligand–receptor docked conformation, and predicted the binding free energy via an empirical binding free energy force field [52]. The ligand–receptor binding energy includes electrostatic interactions, hydrogen bonding, van der Waals forces and hydrophobic interactions, and so forth, and the structural stability is negatively correlated with the binding energy value. Furthermore, an acceptable molecular docking conformation that has a binding energy of less than −5.0208 KJ/mol. Drug molecule ligands interact stably with target proteins in the above manner, aiming to exert a variety of biological activities such as anti-inflammatory and anti-tumor activities of the drug molecules, and to stimulate the physiological and pharmacological functions of the protein. As shown in Figure 4 and Table 8, the docking indicates that the top 10 natural compounds can be stably docked to the E G F R T 790 M protein by generating multiple hydrogen bonds.
Graph neural networks have always been criticized because of their poor interpretability, and these models are commonly thought of as “black boxes”. In this work, inspired by Grad-AAM [20] and Grad-CAM [53], which employed the gradient-weighted class activation mapping method, the regions of graph structure that contribute most to the prediction results are visualized as heatmaps, enhancing the interpretability of deep learning-based network models processing graph data.
Since the last layer of the GINs of DoubleSG-DTA incorporates the richest high-level semantic information, the drug graph representations are visualized to produce heatmaps depicting the atoms and functional groups that contribute most prominently to predicting DTA. We denote the feature map of the last graph convolution layer as F. In order to obtain the probability map P of atomic node v for a given drug molecule, we calculate the gradient of the predicted affinity D T A of the molecule binding to the target protein at the c-th channel of the feature map F and atomic node v. The gradient W c has been calculated as follows.
W c = 1 | V | v V D T A F v c .
Next, a weighted combination of the data for each channel of the feature map F was performed, followed by the ReLU activation function.
P = δ c W c F c .
Finally, the gradient weights were scaled to the range of 0 to 1 using min–max normalization to obtain a probability map P of the weighted distribution of the drug molecules, which was further rendered into a heatmap.
As shown in Figure 4, the active structures in the heatmaps overlap with molecular docking sites by more than 77.14%, and the mathematical calculation formulation is given as Equation (23). Figure 4 explains that describing the drug molecules as graphs and learning the topological pattern structures of the drug molecules with an appropriate depth of GIN can accurately discriminate between drug molecular active structures.
o v e r l a p r a t e = 1 N i = 1 N P d r u g P p r o t e i n ,
where N denotes the number of drugs, P p r o t e i n stands for the number of molecular docking sites, and P d r u g is the number of atoms and functional groups that contributes the most and is identical to the molecular docking site.

7. Conclusions

This investigation presented an interpretable deep learning-based computational model to project the affinity of drug–target pairs for aiding in drug discovery. The experimental results indicated that the simple yet powerful graph isomorphism networks coupled with the lightweight squeeze-and-excitation networks made the DoubleSG-DTA perform exceptionally well with the support of cross-multi-head attention compared with all previously reported works. Extensive experiments have revealed that (i) the most appropriate number of graph isomorphism network layers for extracting drug graph representations and discriminating between molecular structures is 4 , 5 , (ii) the SE block with the soft attention mechanism selectively emphasized information features by expanding the perceptual field, significantly boosting the model’s decision making, and (iii) fully modeling the interaction between compounds and proteins facilitates further performance in predicting drug–target binding affinity. Ultimately, the well-established DoubleSG-DTA was applied to screen promising high-affinity compounds of Non-Small Cell Lung Cancer with E G F R T 790 M mutation from natural products to provide some clues for medical scientists. In addition, drug graph representations were visualized as heatmaps, in which the active structures that contributed the most covered almost all molecular docking sites, which may provide biological interpretation and entry points for later molecular optimization. Overall, DoubleSG-DTA may be an effective in silico drug discovery tool for medical challenges and urgent public health emergencies.

Author Contributions

All the authors have contributed in various degrees to ensure the quality of this work. Y.Q., conceptualization, methodology, investigation, visualization, writing-original draft; W.N., methodology, visualization, formal analysis; X.X., writing-review and editing; Y.Q., writing-review and editing; L.T., conceptualization, supervision, funding acquisition; Q.W., conceptualization, validation, project administration, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant No. 81473234, Guangzhou, China), the Guangdong Basic and Applied Basic Research Foundations (grant No. 2019A1515012215, Guangzhou, China), and the Joint Fund of the National Natural Science Foundation of China (grant No. U1303221, Guangzhou, China).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code at https://github.com/YongtaoQian/DoubleSG-DTA (accessed on 4 January 2023).

Acknowledgments

We gratefully acknowledge the editors and reviewers for reviewing the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DoubleSG-DTADouble Sequence and Graph to Predict drug–target Affinity
DTAdrug–target affinities
EGFRepidermal growth factor receptor
EGFR-TKIsEGFR tyrosine kinase inhibitors
NSCLCNon-Small Cell Lung Cancer
T790Mthreonine 790 mutations
R& DResearch and Development
SMILESSimplified Molecular Input Line Entry System
GINGraph Isomorphism Network
SENetSqueeze-and-Excitation Network
MLPMultilayer Perceptrons
ReLURectified Linear Unit activation function
gapglobal average pooling
gmpglobal max pooling
RFRandom Forest
SVMSupport Vector Machine
FNNFeedforward Neural Network
CIConcordance index
MSEMean Square Error
r m 2 Regression toward the mean
MFMolecular Formula
MWMolecular Weight(g/mol)
H-BondsHydrogen Bonds

References

  1. Zhou, Y.; Xiang, S.; Yang, F.; Lu, X. Targeting Gatekeeper Mutations for Kinase Drug Discovery. J. Med. Chem. 2022, 65, 15540–15558. [Google Scholar] [CrossRef] [PubMed]
  2. Chan, H.S.; Shan, H.; Dahoun, T.; Vogel, H.; Yuan, S. Advancing drug discovery via artificial intelligence. Trends Pharmacol. Sci. 2019, 40, 592–604. [Google Scholar] [CrossRef]
  3. Gogleva, A.; Polychronopoulos, D.; Pfeifer, M.; Poroshin, V.; Ughetto, M.; Martin, M.J.; Thorpe, H.; Bornot, A.; Smith, P.D.; Sidders, B.; et al. Knowledge graph-based recommendation framework identifies drivers of resistance in EGFR mutant non-small cell lung cancer. Nat. Commun. 2022, 13, 1667. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, M.; Ma, X.; Si, J.; Tang, H.; Wang, H.; Li, T.; Ouyang, W.; Gong, L.; Tang, Y.; He, X.; et al. Adverse drug reaction discovery using a tumor-biomarker knowledge graph. Front. Genet. 2021, 11, 625659. [Google Scholar] [CrossRef] [PubMed]
  5. Popova, M.; Isayev, O.; Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 2018, 4, eaap7885. [Google Scholar] [CrossRef] [Green Version]
  6. Li, Y.; Pei, J.; Lai, L. Structure-based de novo drug design using 3D deep generative models. Chem. Sci. 2021, 12, 13664–13675. [Google Scholar] [CrossRef] [PubMed]
  7. Chen, Z.; Min, M.R.; Parthasarathy, S.; Ning, X. A deep generative model for molecule optimization via one fragment modification. Nat. Mach. Intell. 2021, 3, 1040–1049. [Google Scholar] [CrossRef]
  8. Öztürk, H.; Özgür, A.; Ozkirimli, E. DeepDTA: Deep drug–target binding affinity prediction. Bioinformatics 2018, 34, i821–i829. [Google Scholar] [CrossRef] [Green Version]
  9. Gao, K.Y.; Fokoue, A.; Luo, H.; Iyengar, A.; Dey, S.; Zhang, P. Interpretable Drug Target Prediction Using Deep Neural Representation. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; Volume 2018, pp. 3371–3377. [Google Scholar]
  10. Wang, L.; You, Z.H.; Chen, X.; Xia, S.X.; Liu, F.; Yan, X.; Zhou, Y.; Song, K.J. A computational-based method for predicting drug–target interactions by using stacked autoencoder deep neural network. J. Comput. Biol. 2018, 25, 361–373. [Google Scholar] [CrossRef] [PubMed]
  11. Nguyen, T.; Le, H.; Quinn, T.P.; Nguyen, T.; Le, T.D.; Venkatesh, S. GraphDTA: Predicting drug–target binding affinity with graph neural networks. Bioinformatics 2021, 37, 1140–1147. [Google Scholar] [CrossRef]
  12. He, T.; Heidemeyer, M.; Ban, F.; Cherkasov, A.; Ester, M. SimBoost: A read-across approach for predicting drug–target binding affinities using gradient boosting machines. J. Cheminform. 2017, 9, 24. [Google Scholar] [CrossRef] [PubMed]
  13. Zeng, Y.; Chen, X.; Luo, Y.; Li, X.; Peng, D. Deep drug–target binding affinity prediction with multiple attention blocks. Briefings Bioinform. 2021, 22, bbab117. [Google Scholar] [CrossRef]
  14. Li, H.; Leung, K.S.; Wong, M.H.; Ballester, P.J. Low-quality structural and interaction data improves binding affinity prediction via random forest. Molecules 2015, 20, 10947–10962. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Pahikkala, T.; Airola, A.; Pietilä, S.; Shakyawar, S.; Szwajda, A.; Tang, J.; Aittokallio, T. Toward more realistic drug–target interaction predictions. Briefings Bioinform. 2015, 16, 325–337. [Google Scholar] [CrossRef]
  16. Zhao, Q.; Duan, G.; Yang, M.; Cheng, Z.; Li, Y.; Wang, J. AttentionDTA: Drug–target binding affinity prediction by sequence-based deep learning with attention mechanism. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022; Online ahead of print. [Google Scholar] [CrossRef]
  17. Abbasi, K.; Razzaghi, P.; Poso, A.; Amanlou, M.; Ghasemi, J.B.; Masoudi-Nejad, A. DeepCDA: Deep cross-domain compound–protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics 2020, 36, 4633–4642. [Google Scholar] [CrossRef] [PubMed]
  18. Wang, C.; Chen, Y.; Zhao, L.; Wang, J.; Wen, N. Modeling DTA by Combining Multiple-Instance Learning with a Private-Public Mechanism. Int. J. Mol. Sci. 2022, 23, 11136. [Google Scholar] [CrossRef] [PubMed]
  19. Rezaei, M.A.; Li, Y.; Wu, D.; Li, X.; Li, C. Deep learning in drug design: Protein-ligand binding affinity prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 19, 407–417. [Google Scholar] [CrossRef]
  20. Yang, Z.; Zhong, W.; Zhao, L.; Chen, C.Y.C. MGraphDTA: Deep multiscale graph neural network for explainable drug–target binding affinity prediction. Chem. Sci. 2022, 13, 816–833. [Google Scholar] [CrossRef]
  21. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  22. Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
  23. Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. Statistics 2017, 1050, 20. [Google Scholar]
  24. Jiang, M.; Li, Z.; Zhang, S.; Wang, S.; Wang, X.; Yuan, Q.; Wei, Z. Drug–target affinity prediction using graph neural network and contact maps. RSC Adv. 2020, 10, 20701–20712. [Google Scholar] [CrossRef] [PubMed]
  25. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
  26. Hendrickson, J.B. Concepts and applications of molecular similarity. Science 1991, 252, 1189–1190. [Google Scholar] [CrossRef]
  27. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  28. Yun, C.H.; Mengwasser, K.E.; Toms, A.V.; Woo, M.S.; Greulich, H.; Wong, K.K.; Meyerson, M.; Eck, M.J. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. Proc. Natl. Acad. Sci. USA 2008, 105, 2070–2075. [Google Scholar] [CrossRef] [Green Version]
  29. Landrum, G. RDKit: Open-Source Cheminformatics. 2006. Available online: http://rdkit.org/ (accessed on 4 January 2023).
  30. Zhao, Q.; Zhao, H.; Zheng, K.; Wang, J. HyperAttentionDTI: Improving drug–protein interaction prediction by sequence-based deep learning with attention mechanism. Bioinformatics 2022, 38, 655–662. [Google Scholar] [CrossRef]
  31. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
  32. Davis, M.I.; Hunt, J.P.; Herrgard, S.; Ciceri, P.; Wodicka, L.M.; Pallares, G.; Hocker, M.; Treiber, D.K.; Zarrinkar, P.P. Comprehensive analysis of kinase inhibitor selectivity. Nat. Biotechnol. 2011, 29, 1046–1051. [Google Scholar] [CrossRef]
  33. Tang, J.; Szwajda, A.; Shakyawar, S.; Xu, T.; Hintsanen, P.; Wennerberg, K.; Aittokallio, T. Making sense of large-scale kinase inhibitor bioactivity data sets: A comparative and integrative analysis. J. Chem. Inf. Model. 2014, 54, 735–743. [Google Scholar] [CrossRef]
  34. Liu, T.; Lin, Y.; Wen, X.; Jorissen, R.N.; Gilson, M.K. BindingDB: A web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 2007, 35, D198–D201. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Bongers, B.J.; IJzerman, A.P.; Van Westen, G.J. Proteochemometrics–recent developments in bioactivity and selectivity modeling. Drug Discov. Today Technol. 2019, 32, 89–98. [Google Scholar] [CrossRef] [PubMed]
  36. Zhao, L.; Akoglu, L. Pairnorm: Tackling oversmoothing in gnns. arXiv 2019, arXiv:1909.12223. [Google Scholar]
  37. Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer statistics, 2021. CA Cancer J. Clin. 2021, 71, 7–33. [Google Scholar] [CrossRef] [PubMed]
  38. Remon, J.; Hendriks, L.E.; Cardona, A.F.; Besse, B. EGFR exon 20 insertions in advanced non-small cell lung cancer: A new history begins. Cancer Treat. Rev. 2020, 90, 102105. [Google Scholar] [CrossRef]
  39. Leonetti, A.; Sharma, S.; Minari, R.; Perego, P.; Giovannetti, E.; Tiseo, M. Resistance mechanisms to osimertinib in EGFR-mutated non-small cell lung cancer. Br. J. Cancer 2019, 121, 725–737. [Google Scholar] [CrossRef]
  40. Soria, J.C.; Ohe, Y.; Vansteenkiste, J.; Reungwetwattana, T.; Chewaskulyong, B.; Lee, K.H.; Dechaphunkul, A.; Imamura, F.; Nogami, N.; Kurata, T.; et al. Osimertinib in untreated EGFR-mutated advanced non–small-cell lung cancer. N. Engl. J. Med. 2018, 378, 113–125. [Google Scholar] [CrossRef]
  41. Scribano, C.M.; Wan, J.; Esbona, K.; Tucker, J.B.; Lasek, A.; Zhou, A.S.; Zasadil, L.M.; Molini, R.; Fitzgerald, J.; Lager, A.M.; et al. Chromosomal instability sensitizes patient breast tumors to multipolar divisions induced by paclitaxel. Sci. Transl. Med. 2021, 13, eabd4811. [Google Scholar] [CrossRef]
  42. Said, R.; Tsimberidou, A.M. Pharmacokinetic evaluation of vincristine for the treatment of lymphoid malignancies. Expert Opin. Drug Metab. Toxicol. 2014, 10, 483–494. [Google Scholar] [CrossRef]
  43. Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The protein data bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef] [Green Version]
  44. Liu, H.; Wang, J.; Zhou, W.; Wang, Y.; Yang, L. Systems approaches and polypharmacology for drug discovery from herbal medicines: An example using licorice. J. Ethnopharmacol. 2013, 146, 773–793. [Google Scholar] [CrossRef]
  45. Xu, X.; Zhang, W.; Huang, C.; Li, Y.; Yu, H.; Wang, Y.; Duan, J.; Ling, Y. A novel chemometric method for the prediction of human oral bioavailability. Int. J. Mol. Sci. 2012, 13, 6964–6982. [Google Scholar] [CrossRef] [PubMed]
  46. Xu, J.; Zhu, G.Y.; Cao, D.; Pan, H.; Li, Y.W. Gossypol overcomes EGFR-TKIs resistance in non-small cell lung cancer cells by targeting YAP/TAZ and EGFRL858R/T790M. Biomed. Pharmacother. 2019, 115, 108860. [Google Scholar] [CrossRef]
  47. Lee, H.J.; Schaefer, G.; Heffron, T.P.; Shao, L.; Ye, X.; Sideris, S.; Malek, S.; Chan, E.; Merchant, M.; La, H.; et al. Noncovalent Wild-type–Sparing Inhibitors of EGFR T790MReversible Inhibitors of EGFR T790M. Cancer Discov. 2013, 3, 168–181. [Google Scholar] [CrossRef] [Green Version]
  48. Li, X.; Fan, X.X.; Jiang, Z.B.; Loo, W.T.; Yao, X.J.; Leung, E.L.H.; Chow, L.W.; Liu, L. Shikonin inhibits gefitinib-resistant non-small cell lung cancer by inhibiting TrxR and activating the EGFR proteasomal degradation pathway. Pharmacol. Res. 2017, 115, 45–55. [Google Scholar] [CrossRef]
  49. Renner, O.; Mayer, M.; Leischner, C.; Burkard, M.; Berger, A.; Lauer, U.M.; Venturelli, S.; Bischoff, S.C. Systematic Review of Gossypol/AT-101 in Cancer Clinical Trials. Pharmaceuticals 2022, 15, 144. [Google Scholar] [CrossRef] [PubMed]
  50. Zhao, R.; Zhou, S.; Xia, B.; Zhang, C.y.; Hai, P.; Zhe, H.; Wang, Y.y. AT-101 enhances gefitinib sensitivity in non-small cell lung cancer with EGFR T790M mutations. BMC Cancer 2016, 16, 491. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  51. Forli, S.; Huey, R.; Pique, M.E.; Sanner, M.F.; Goodsell, D.S.; Olson, A.J. Computational protein–ligand docking and virtual drug screening with the AutoDock suite. Nat. Protoc. 2016, 11, 905–919. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  52. Laederach, A.; Reilly, P.J. Specific empirical free energy function for automated docking of carbohydrates to proteins. J. Comput. Chem. 2003, 24, 1748–1757. [Google Scholar] [CrossRef] [PubMed]
  53. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Figure 1. Architecture of the presented DoubleSG-DTA model.
Figure 1. Architecture of the presented DoubleSG-DTA model.
Pharmaceutics 15 00675 g001
Figure 2. Correlation distribution between ground truths and predictive values on benchmark datasets, (a) scatter and (b) kernel density estimate plots.
Figure 2. Correlation distribution between ground truths and predictive values on benchmark datasets, (a) scatter and (b) kernel density estimate plots.
Pharmaceutics 15 00675 g002
Figure 3. Impact of the layers of the graph isomorphism network on the performance of DoubleSG-DTA.
Figure 3. Impact of the layers of the graph isomorphism network on the performance of DoubleSG-DTA.
Pharmaceutics 15 00675 g003
Figure 4. The blue box shows the heatmaps of atomic contributions. In the red box, are molecular docking poses of the top 10 natural drugs with E G F R T 790 M mutant proteins.
Figure 4. The blue box shows the heatmaps of atomic contributions. In the red box, are molecular docking poses of the top 10 natural drugs with E G F R T 790 M mutant proteins.
Pharmaceutics 15 00675 g004
Table 1. The detailed statistics of Davis, KIBA, and BindingDB datasets.
Table 1. The detailed statistics of Davis, KIBA, and BindingDB datasets.
DatasetNo. ProteinsNo. DrugsNo. InteractionsInteractions
Train DataValidation DataTest Data
Davis4426830,05620,03750095010
KIBA2292111118,25478,83619,70919,709
BindingDB162018,04456,52537,68494219420
Table 2. The hyperparameters of DoubleSG-DTA.
Table 2. The hyperparameters of DoubleSG-DTA.
HyperparametersDavis DatasetKIBA DatasetBindingDB Dataset
Embedding Size128128128
SENet layers333
GIN layers[3, 4, 5, 6, 7][3, 4, 5, 6, 7][3, 4, 5, 6, 7]
Number of filters in SENets[16, 32, 48][32, 64, 96][32, 64, 96]
Hidden size in MLPs[1024, 1024, 512][1024, 1024, 512][1024, 1024, 512]
Number of attention heads888
Epoch600600600
Learning rate0.00010.00010.0001
Batch Size51210241024
Dropout rate0.20.20.2
OptimizerAdamAdamAdam
Activation FunctionReLUReLUReLU
Loss FunctionMSElossMSElossMSEloss
Table 3. Comparison of previous studies and the DoubleSG-DTA on the Davis dataset.
Table 3. Comparison of previous studies and the DoubleSG-DTA on the Davis dataset.
DatasetMethodsProteinCompoundsInteractionCI(std)↑MSE↓ r m 2 (std)↑
DavisRandom Forest [14]ECFPPSC0.854 (0.002)0.3590.549 (0.005)
SVM [20]ECFPPSC0.857 (0.001)0.3830.513 (0.003)
FNN [20]ECFPPSC0.893 (0.003)0.2440.685 (0.015)
KronRLS [15]Smith-WatermanPubchem Sim0.871 (0.001)0.3790.407 (0.005)
SimBoost [12]Smith-WatermanPubchem Sim0.872 (0.001)0.2820.644 (0.006)
DeepDTA [8]CNNCNNConcatention&FCN0.878 (0.004)0.2610.630 (0.017)
DeepCDA [17]CNN&LSTM 1CNN&LSTMTwo-sided Attention&FCN0.891 (0.003)0.2480.649 (0.009)
MATT-DTI [13]CNNCNN&Relation-aware Self-AttentionMulti-head Attention&FCN0.891 (0.002)0.2270.683 (0.017)
AttentionDTA [16]CNNCNNMulti-head Attention&FCN0.887 (0.005)0.2450.657 (0.024)
DMIL-PPDTA [18]TransformerTransformerMulti-head attention&FCN0.880 (0.007)0.2230.642 (0.017)
GraphDTA [11]CNNGINConcatention&FCN0.893 (—)0.229
GraphDTA [11]CNNGATConcatention&FCN0.892 (—)0.232
GraphDTA [11]CNNGCNConcatention&FCN0.890 (—)0.254
GraphDTA [11]CNNGAT&GCNConcatention&FCN0.881 (—)0.245
DoubleSG-DTACNNGIN+CNN 2Concatention&FCN0.886 (0.003)0.2500.688 (0.031)
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.902 (0.008)0.2190.725 (0.008)
1 & stands for concatenating learning. 2 + stands for parallel learning. Bold text indicates the best result.
Table 4. Comparison of previous studies and the DoubleSG-DTA on the KIBA dataset.
Table 4. Comparison of previous studies and the DoubleSG-DTA on the KIBA dataset.
DatasetMethodsProteinCompoundsInteractionCI(std)↑MSE↓ r m 2 (std)↑
KIBARandom Forest [14]ECFPPSC0.837 (0.000)0.2450.581 (0.000)
SVM [20]ECFPPSC0.799 (0.001)0.3080.513 (0.004)
FNN [20]ECFPPSC0.818 (0.005)0.2160.659 (0.015)
KronRLS [15]Smith-WatermanPubchem Sim0.782 (0.001)0.4110.342 (0.001)
SimBoost [12]Smith-WatermanPubchem Sim0.836 (0.001)0.2220.629 (0.007)
DeepDTA [8]CNNCNNConcatention&FCN0.863 (0.002)0.1940.673 (0.009)
DeepCDA [17]CNN&LSTMCNN&LSTMTwo-sided Attention&FCN0.889 (0.002)0.1760.682 (0.008)
MATT-DTI [13]CNNCNN&Relation-aware Self-AttentionMulti-head Attention&FCN0.889 (0.001)0.1500.756 (0.011)
AttentionDTA [16]CNNCNNMulti-head Attention&FCN0.882 (0.004)0.1620.735 (0.003)
DMIL-PPDTA [18]TransformerTransformerMulti-head attention&FCN0.881 (0.003)0.1470.784 (0.006)
GraphDTA [11]CNNGINConcatention&FCN0.882 (—)0.147
GraphDTA [11]CNNGATConcatention&FCN0.866 (—)0.179
GraphDTA [11]CNNGCNConcatention&FCN0.889 (—)0.139
GraphDTA [11]CNNGAT&GCNConcatention&FCN0.891 (—)0.139
DoubleSG-DTACNNGIN+CNNConcatention&FCN0.856 (0.002)0.1640.721 (0.009)
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.896 (0.010)0.1380.787 (0.005)
Bold text indicates the best result.
Table 5. Comparison of previous studies and the DoubleSG-DTA on the BindingDB dataset.
Table 5. Comparison of previous studies and the DoubleSG-DTA on the BindingDB dataset.
DatasetMethodsProteinCompoundsInteractionCI(std)↑MSE↓ r m 2 (std)↑
BindingDBKronRLS [15]Smith-WatermanPubchem Sim0.815 (0.003)0.939
DeepDTA [8]CNNCNNConcatention & FCN0.826 (0.001)0.7030.669 (0.004)
DeepCDA [17]CNN & LSTMCNN & LSTMTwo-sided Attention & FCN0.822 (0.001)0.8440.631 (0.002)
AttentionDTA [16]CNNCNNMulti-head Attention & FCN0.852 (0.003)0.6030.687 (0.013)
GraphDTA [11]CNNGINConcatention & FCN0.857 (—)0.5570.703 (—)
GraphDTA [11]CNNGATConcatention & FCN0.817 (—)0.9290.555 (—)
GraphDTA [11]CNNGCNConcatention & FCN0.850 (—)0.6380.647 (—)
GraphDTA [11]CNNGAT & GCNConcatention & FCN0.855 (—)0.5930.682 (—)
DoubleSG-DTACNNGIN+CNNConcatention & FCN0.853 (0.001)0.6240.642 (0.008)
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.862 (0.002)0.5330.726 (0.009)
Bold text indicates the best result.
Table 6. Investigating the contributions of SENet on Davis, KIBA, and BindingDB datasets.
Table 6. Investigating the contributions of SENet on Davis, KIBA, and BindingDB datasets.
DatasetMethodsProteinCompoundsInteractionCI(std)↑MSE↓ r m 2 (std)↑Time 1 (std)
DavisDoubleSG-DTACNNGIN+CNNCross-Multi-head Attention&FCN0.897 (0.008)0.2290.713 (0.077)4.102 (0.061)
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.902 (0.008)0.2190.725 (0.008)4.139 (0.066)
KIBADoubleSG-DTACNNGIN+CNNCross-Multi-head Attention&FCN0.887 (0.014)0.1470.760 (0.048)19.619 (0.357)
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.896 (0.010)0.1380.787 (0.005)20.023 (0.109)
BindingDBDoubleSG-DTACNNGIN+CNNCross-Multi-head Attention&FCN0.854 (0.001)0.6140.646 (0.009)13.787 (0.203)
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.862 (0.002)0.5330.726 (0.009)14.276 (0.165)
1Time (s) denotes the time that our proposed DoubleSG-DTA model took to train an epoch.
Table 7. Investigating the contributions of the cross-multi-head attention mechanism on Davis, KIBA, and BindingDB datasets.
Table 7. Investigating the contributions of the cross-multi-head attention mechanism on Davis, KIBA, and BindingDB datasets.
DatasetMethodsProteinCompoundsInteractionCI(std)↑MSE↓ r m 2 (std)↑Pearson↑
DavisDoubleSG-DTASENetGIN+SENetConcatenation&FCN0.892 (0.007)0.2420.713 (0.026)0.845
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.902 (0.008)0.2190.725 (0.008)0.852
KIBADoubleSG-DTASENetGIN+SENetConcatenation&FCN0.878 (0.018)0.1540.773 (0.063)0.880
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.896 (0.010)0.1380.787 (0.005)0.894
BindingDBDoubleSG-DTASENetGIN+SENetConcatenation&FCN0.859 (0.002)0.5540.702 (0.009)0.862
DoubleSG-DTASENetGIN+SENetCross-Multi-head Attention&FCN0.862 (0.002)0.5330.726 (0.009)0.867
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qian, Y.; Ni, W.; Xianyu, X.; Tao, L.; Wang, Q. DoubleSG-DTA: Deep Learning for Drug Discovery: Case Study on the Non-Small Cell Lung Cancer with EGFRT790M Mutation. Pharmaceutics 2023, 15, 675. https://doi.org/10.3390/pharmaceutics15020675

AMA Style

Qian Y, Ni W, Xianyu X, Tao L, Wang Q. DoubleSG-DTA: Deep Learning for Drug Discovery: Case Study on the Non-Small Cell Lung Cancer with EGFRT790M Mutation. Pharmaceutics. 2023; 15(2):675. https://doi.org/10.3390/pharmaceutics15020675

Chicago/Turabian Style

Qian, Yongtao, Wanxing Ni, Xingxing Xianyu, Liang Tao, and Qin Wang. 2023. "DoubleSG-DTA: Deep Learning for Drug Discovery: Case Study on the Non-Small Cell Lung Cancer with EGFRT790M Mutation" Pharmaceutics 15, no. 2: 675. https://doi.org/10.3390/pharmaceutics15020675

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop