DDI-SSL: Drug–Drug Interaction Prediction Based on Substructure Signature Learning

Liang, Yuan

doi:10.3390/app131910750

Open AccessArticle

DDI-SSL: Drug–Drug Interaction Prediction Based on Substructure Signature Learning

by

Yuan Liang

^1,2

¹

School of Information Engineering, Suqian University, Suqian 223800, China

²

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China

Appl. Sci. 2023, 13(19), 10750; https://doi.org/10.3390/app131910750

Submission received: 4 July 2023 / Revised: 17 September 2023 / Accepted: 20 September 2023 / Published: 27 September 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Drug–drug interactions (DDIs) are entities composed of different chemical substructures (functional groups). In existing methods that predict drug–drug interactions based on the usage of substructures, each node is perceived as the epicenter of a sub-pattern, and adjacent nodes eventually become centers of similar substructures, resulting in redundancy. Furthermore, the significant differences in structure and properties among compounds can lead to unrelated pairings, making it difficult to integrate information. This heterogeneity negatively affects the prediction results. In response to these challenges, we propose a drug–drug interaction prediction method based on substructure signature learning (DDI-SSL). This method extracts useful information from local subgraphs surrounding drugs and effectively utilizes substructures to assist in predicting drug side effects. Additionally, a deep clustering algorithm is used to aggregate similar substructures, allowing any individual subgraph to be reconstructed using this set of global signatures. Furthermore, we developed a layer-independent collaborative attention mechanism to model the mutual influence between drugs, generating signal strength scores for each class of drugs to mitigate noise caused by heterogeneity. Finally, we evaluated DDI-SSL on a comprehensive dataset and demonstrated improved performance in DDI prediction compared to state-of-the-art methods.

Keywords:

drug–drug interactions; substructure graph convolution operator; substructure signatures; substructure extraction; collaborative attention mechanism

1. Introduction

Drug–drug interactions (DDIs) can elicit numerous detrimental consequences within an organism. When multiple conflicting pharmaceuticals are concurrently consumed, a DDI may ensue, inflicting potential peril to the organism. DDIs are, therefore, dangerous and can even be life-threatening when multiple drugs are used. Evaluating this danger has spurred research to ascertain if certain drugs can coexist when consumed jointly. However, many diseases have complex pathological processes, and combination therapy with drugs has become an effective approach to treating diseases and relieving suffering.

In particular, when treating patients with multiple diseases [1,2], it is common that multiple drugs are combined, yet their interactions can be unpredictable, potentially causing life-threatening side effects. Nonetheless, drug combination therapies merge multiple medications, usually employed individually in patients. Given that these drugs can influence various proteins, they may boost efficacy by addressing biological process overlaps. For example, the drug combination of venetoclax and idarubicin has recently been shown to have excellent anti-leukemia efficacy in treating acute myeloid leukemia [3]. These two drugs act in an interactive manner, targeting complementary mechanisms simultaneously to improve cure rates [3]; while drug combination therapy proves potent in addressing numerous ailments, the key repercussion of utilizing such mixtures in patients is an elevated likelihood of adverse reactions, stemming from drug-to-drug interactions. Manually pinpointing side effects from various drugs proves challenging, given the impracticality of examining every conceivable drug pairing, and such adverse outcomes are seldom detected in limited clinical studies [4].

Furthermore, drug combination therapy is considered an increasingly serious problem, affecting 15% of the US population in health care [5]. In the US, over 177 billion is spent annually to treat side effects caused by multidrug combinations. According to statistics, 30% of adverse reactions in current reports are related to drug–drug interactions (DDIs), and adverse reactions also stand as a primary cause for the removal of drugs from the marketplace [6], which can lead to a significant increase in morbidity and mortality [1,7]. Therefore, DDI has become a focus of clinical research (including DDI extraction [8], DDI prediction tasks, etc.). In clinical trials, the thorough evaluation of potential DDIs is often constrained by manpower, and the surging volume of biomedical data becomes unmanageable, regardless of its accessibility [9], which results in unknown adverse interactions among many drugs in the market.

In medicine, the identification of DDIs is typically performed through extensive clinical trials in a drug research environment. However, testing involves using many combinations of drugs, making the whole process very expensive. Using computational methods to identify DDIs can serve as an inexpensive, fast, and rapid alternative by predicting the risk of potential DDIs by extracting knowledge from known DDIs. Drug chemistry knowledge indicates that a drug comprises entities made up of distinct functional units (chemical sub-patterns). These units dictate the drug’s pharmacokinetic (how it is metabolized) and pharmacodynamic (its impact on the body) characteristics, as well as eventual interactions. In prevailing techniques leveraging substructure detection for DDI identification, each node is perceived as a substructure’s core, leading to a substructure count equivalent to the number of nodes. Adjacent nodes ultimately become centers of similar substructures, leading to redundancy, and a set of very similar substructures is overused, which can have many negative effects.

In recent decades, many machine-learning- and deep-learning-based methods have made progress in identifying potential DDI risks between two-drug combinations. However, most of these methods are limited in characterizing drug molecules. To address this issue, we used clustering or pooling algorithms to aggregate similar substructures and retain only one representative substructure. Consequently, we introduce an innovative approach for identifying DDIs based on substructure signature learning (DDI-SSL), which operates directly on the interaction representation between the original subgraphs of drugs to achieve more topologically preserved and physically interpretable feature extraction. Thus, substructure markers can be effectively used to aid in predicting drug–drug interactions. DDI-SSL offers these technical advancements:

(1): Enhancing the model’s clarity. By employing an attention mechanism, we derive signal intensity ratings for subgraphs. Each substructure is labeled with related drug attributes, influenced by adjustable weights through this attention process, resulting in malleable dimensions and atypically contoured substructures. Additionally, because the receptive field of message passing is variable, the model has high scalability. Both the freedom of the model and the interpretability of the model’s predictions are improved.
(2): Improving the model’s generalization ability. Compared to the global molecular structure, DDI-SSL extracts useful information from local subgraphs centered on drug atoms, proposing an interaction modeling network to accurately mine the corresponding modality of drug composition individuals, thereby achieving a modular representation of complex topological structures. The subgraph formula may project local structural details of the attribute graph into the subgraph signature set, reducing noise by anchoring relevant information. Meanwhile, the final representation of drugs is no longer a single vector but an expression of the connections between substructures, effectively improving the model’s generalization ability.
(3): Integrating a cooperative attention mechanism. The forecast of DDI between two pharmaceuticals hinges on the interaction score matrix learned among their substructures. However, the structural and property differences between molecules themselves can cause many irrelevant pairings, leading to the production of many unmergeable pieces of information. A collaborative attention mechanism can be used to model the mutual influence between drugs. The addition of collaborative attention effectively avoids this type of noise. This method provides an explanation for driving the DDI process and can provide domain experts with more detailed information, including which substructures may cause DDIs.

2. Related Work

Drugs are entities consist of various functional clusters or chemical segments, which shape their pharmacokinetic (how the organism processes them) and pharmacodynamic (how they affect the organism) properties, as well as the final interaction. Therefore, the processing of drugs depends on their similarity in chemical structure or other features, including their individual side effects, targets, etc. Many existing methods [10,11,12,13,14,15,16] involve graph similarity calculations. For example, Zhang et al. [17] established a set of prediction methods based on the similarity of 14 drugs. Later, ref. [16] proposed a matrix completion method for user DDI prediction, where drug similarity is used as the regularization part to preserve the manifold of drugs in a lower-dimensional space. A limitation of such presumption is that analogous drugs (or typical chemical entities) might not always share identical biological behaviors [7]. Even with certain resemblant attributes, these might be inconsequential to the specific predictive endeavor at hand.

In recent times, diverse deep learning approaches emerged for drug–drug interaction (DDI) prediction [6,18,19,20,21,22,23,24,25], such as the multimodal deep autoencoder proposed by Zhang et al. [23] and the graph autoencoder applied on a DDI network by Feng et al. [24]. Despite the proven effectiveness of deep learning in other tasks, its potential in DDI prediction, especially in extracting features from the raw drug representation (i.e., chemical structure), has not been fully explored. Previous methods have relied solely on global structure when processing drug information, which can include irrelevant substructures and negatively affect predictions [7,26].

Two recent methods, MR-GNN [19] and GoGNN [27], have addressed these issues. Both tap into deep learning’s robust feature extraction strength [28,29,30,31] by immediately handling the unprocessed molecular graph depiction with graph neural networks (GNNs). Implicitly, they recognize the crucial influence of substructures in forecasting DDI. MR-GNN uses LSTM to learn a comprehensive feature representation for different drugs [19], while GoGNN represents drugs as concatenations of different substructures [27]. In contrast, SSI-DDI and other previous methods that use substructures to predict DDIs assume that every node symbolizes a substructure’s core, leading to overlapping and interference due to the vast disparities among drug substructures.

However, these previous methods only model the similarities between drugs and do not reveal which substructures interact with each other. Conversely, our advanced approach views substructures as standalone units and dynamically learns varying dimensions and configurations of substructures from pharmaceutical molecular diagrams, utilizing substructure markers to discern the interplays within drug attribute dimensions. We then use this information to enrich the interaction between substructures and improve the embedding and model prediction. Additionally, our method characterizes DDIs using the overall probability score of drug interactions, demonstrating how to use the correlation and interaction between each substructure during DDI prediction to improve interpretability. Furthermore, we employ a cooperative attention framework to assist prediction, enhancing both specialized and general users’ comprehension of the predictive outcomes.

In addition, existing multimodal methods [21,32,33] use drug features that include some chemical structures, externally introduced knowledge graphs, semantic information, etc., to assist the model in prediction by aggregating this high-order heterogeneous information. However, these methods contain considerable noise in the externally introduced heterogeneous information and ignore the complementarity between modalities. In contrast, DDI-SSL avoids using a large amount of heterogeneous information but instead explores the local substructures of drugs in depth. The subgraph formula allows the local structural details of the attribute graph to be projected onto the subgraph signature set, thereby reducing noise by anchoring relevant information and achieving a modular representation of complex topological structures.

One of the most important types of methods is to enhance the transferability and generalization ability of molecular interaction prediction by introducing self-supervised contrast [34]. An alternative approach strengthens the model’s resilience by fabricating multi-visual, multi-scale characteristics [35,36,37], amplifying its capacity for broader learning generalization. Each method kind draws from contrastive education to optimize the shared data between localized and overarching situations, including cross-level and scalar-level modules, and fuse internal and external features at different granularity levels. Among them, global representation is the representation of drug interaction graphs, and local representation is the representation of individual nodes. Although such methods construct the mutual information of global and local structures, they only apply to some scenarios, and the model needs to build an extensive collection of both affirmative and negative sample pairs. DDI-SSL projects the local structure of the attribute graph onto the subgraph signature set to explore local structural information. Finally, the collaborative attention mechanism is used to model the mutual influence between drugs, and the addition of collaborative attention effectively avoids the inherent heterogeneity in the structure and properties of molecules.

Another type of method is to learn some truly common substructures from molecular structures [38,39,40] by reducing misleading other substructures to reduce unnecessary entity noise or balance the data distribution in a graph network. SumGNN introduces a knowledge graph [39], which contains rich structural information but may also contain considerable noise. The algorithm generates inference paths by extracting local subgraphs to summarize subgraphs, uses transformer architecture to assign a learning weight to each edge in the subgraph by a self-attention module, and then prunes the edges in the entire subgraph according to the weight threshold. Although this type of method also uses substructures, on the one hand, the graph structure selects a knowledge graph, which requires considerable manpower. On the other hand, by extracting useful information from the local subgraphs around drugs, accurately mining the modality corresponding to the composition of drug components, and achieving a modular representation of complex topological structures, DDI-SSL may project the local structural details of multiple attribute graphs onto the subgraph signature set and reduce noise by anchoring relevant information.

3. Methods

3.1. Problem Definition

This segment offers an explicit delineation of the issue and delineates the components of the DDI-SSL system sequentially, including input representation and computational procedures. The comprehensive structure is depicted in Figure 1.

Given a set of drugs G, a set of interaction types

L = {I_{i}}_{i = 1}^{M}

, and a DDI dataset

D = {{(G_{x}, G_{y}, r)}_{i}}_{i = 1}^{N}

, where

G_{x} \in G

and

G_{y} \in G

,

I_{i}

denotes the interaction type, the objective seeks to cultivate a model f:G×I×G→[0,1] capable of ascertaining the likelihood of a given pair of drugs resulting in a specific interaction type

I_{i}

.

3.2. Input

The model takes DDI tuples

(G_{x}, G_{y}, r)

as input, where both drugs

G_{x}

and

G_{y}

are represented as an undirected graph

G = (V, E)

, where

V = {v_{i}}_{i = 1}^{n}

is the set of nodes,

E = {(v_{s}, v_{t})}_{i = 1}^{m}

is the set of edges,

{I_{r}} \in L

denotes the interaction type, and each interaction type is represented by a matrix

M_{r} \in R^{D \times D}

. Initially,

G = (V, E)

simply represents the chemical structure of the drug molecule, where

v_{i}

represents an atom with a feature vector

h_{i} \in R^{F}

(where F is the number of features) and

(v_{s}, v_{t})

represents a chemical bond between atoms

v_{s}

and

v_{t}

.

3.3. Substructure Graph Convolution Operator

Duvenaud [41] and Kearnes et al. [42] have shown that learning features from chemical entity molecules provides more information than manually extracting molecule representations (e.g., molecular fingerprints). Conventional techniques include graph neural networks (GNNs)—deep learning methodologies tailored for data with a graph-oriented structure. Predominantly, GNNs comprise layers of graph convolutional operators. In these layers, node attributes are refreshed by amalgamating features from adjacent nodes [43]. In this work, graph attention network (GAT) layers [29] are used as the convolutional operator for node feature updating. Attention mechanisms allow the model to handle inputs of variable sizes and focus on the parts of the features that are highly correlated. A linear transformation is applied to all node features in the graph:

{\hat{h}}_{s}^{(l + 1)} = W^{(l + 1)} h_{s}^{(l)}, s = 1, \dots, n,

(1)

where

W^{(l + 1)} \in R^{F^{'} \times F}

is a learnable transformation matrix from the l-th GAT layer to the

(l + 1)

-th GAT layer. They determine the subspaces of node features and the interactions between subspaces. We use attention mechanisms [44] to determine the importance of each (neighboring) node j in neighborhood

N_{(i)}

of node i and represent it as

α_{i j} \in R

. This importance indicates that every node

j \in N_{(i)}

does not hold consistent significance when revising the feature vector of node i. Consequently, each node is endowed with a mutable significance weight

α_{i j}

. The self-importance of node i, denoted as

α_{i i}

, is also factored in to preserve intrinsic information. The significance of node j relative to node i, represented by

α_{i j}

, is calculated in the ensuing manner:

α_{i j} = \frac{e x p (l e a k y R e l u (e_{i j}))}{\sum_{s \in N_{i} \cup i} e x p (L e a k y R e l u (e_{i s}))}

(2)

In this equation,

L e a k y R e l u

is an activation function (used to mitigate the zero-gradient problem that may occur with the ReLU activation function) and

e_{i j}

is a learnable weight, the inner product of which is

a \in R^{2 F^{'}}

. We concatenate

{\hat{h}}_{i}^{(l + 1)}

and

{\hat{h}}_{j}^{(l + 1)}

using Equation (1) to obtain the following equation:

e_{i j} = a^{T} ({\hat{h}}_{i}^{l + 1} | | {\hat{h}}_{j}^{l + 1})

(3)

T is the transpose operator, which can also be computed as the dot product, || or concatenation. The framework refreshes node i by amalgamating all nodes encompassed within

N i \cup i

, where each node is scaled by its importance weight

α i j

according to Equation (3). This produces a preactivated feature vector:

z_{i}^{(l + 1)} = \sum_{j \in N_{i} \cup i} α_{i s} {\hat{h}}_{s}^{(l + 1)}

(4)

In order to feed this feature vector into the subsequent GAT layer, an activation function is engaged to harness intricate nonlinear cues, yielding the activated feature vector for node i. This is demonstrated as follows:

h_{i}^{(t + 1)} = σ (z_{i}^{(l + 1)})

(5)

Here,

σ

is the activation function. We use the

E L U

[45] activation function, similar to [29]; based on empirical evidence, it even outshines the widely recognized

R e L U

activation function in performance.

3.3.1. Multihead Attention

DDI-SSL also utilizes a multihead attention mechanism, where each attention head interacts between specific functional subspaces. By “cross-pollinating” information between each other, it proves useful for exploring complex real-world data. We employ c GAT layers, and for each node s, the multihead attention representation

[{\hat{b}}_{s}^{(l + 1)}] = {[W^{(l + 1)}]}^{(l + 1)} h_{s}^{(l + 1)}

is obtained through Equation (1) to perform parallel linear transformations. From Equation (2) to Equation (4), the representation

{[z^{(c)}]}^{(l + 1)}

generated by the

t^{t h}

head for node i will be concatenated to form the activated representation

h_{i}^{(l + 1)}

, where || denotes concatenation.

h_{i}^{(t + 1)} = {σ (| |}_{c = 1}^{C} {[z_{i}^{(c)}]}^{(l + 1)})

(6)

3.3.2. Normalization

Following the output of every GAT layer, we incorporate a LayerNorm layer and employ the

σ

activation function. Given that the LayerNorm layer [46] produces optimal outcomes, we opt to utilize it, and subsequently, Equation (5) is transformed to:

h_{i}^{(t + 1)} = σ (LayerNorm {(z^{(l + 1)})}_{i}) .

(7)

Equation (6) can be modified in the same way. At the same time, initial normalization should be applied to the input data, which are then used as a normalized preprocessed input to the model. Initially, the feature of node i is only related to the feature of the associated atom, but under the graph convolution operation, the feature of node i begins to be concatenated with the features of the connecting nodes. Hence, node i is not merely symbolic of a basic atom but embodies a substructure that encompasses itself along with its adjacent nodes.

3.4. Substructure Extraction

Through our research, we found that the convolution operation collects information from different substructures, and each substructure updates the feature vector

h_{i}^{(l + 1)}

(as shown in Equation (5)) centered around i. Meanwhile, its preactivation vector

z^{(l + 1)}

contains substructure information composed of nodes

s \in N_{i} \cup i

. As we move from the previous layer to the next, the range of the local neighborhood expands. In addition, some nodes that are not part of the substructures are also considered in the attention Equation (2), enabling the learning of correlations between different substructures.

In the GAT layer, all substructure information (represented by a single-node feature vector

i^{(l)}

) is aggregated (as shown in Equation (4)), and each substructure is represented by a learnable parameter

β_{i} \in R

, which can be understood as its importance. We use

z_{i}^{(l)}

instead of

h_{i}^{(l)}

here because we only need the current layer’s output rather than the output modified by the nonlinear layer

h_{i}^{(l)}

. Therefore, all drug substructure information

G_{x} \in G

identified in the l-th layer can be represented by the following equation:

g_{x}^{(l)} = \sum_{i}^{n} β_{i} z_{i}^{(l)} .

(8)

SAGPooling [47] is deployed to discern the significance

β_{i}

of every node within the graph. When presented with a graph characterized by

X

and

A

, SAGPooling computes the importance vector

β

, with its elements being the coefficients

β_{i}

:

β = softmax (SAGPooling (A, X))

(9)

SAGPooling takes into account both the contextual and the topographical data of nodes to gauge their significance within the overarching graph.

3.5. Substructure Signature Learning

The purpose of learning substructure flags is to map a large number of substructure instances onto K representative subgraph patterns so that any subgraph can be reconstructed by this global “topological dictionary”. This approach not only preserves the similarity between subgraphs but also provides a unified modular platform for the representation of different attribute graphs.

This is also the basis for modeling individual interactions (projection pooling mechanism) by effectively preserving the identity of subgraph individuals and avoiding information loss caused by traditional methods. We use clustering or dictionary encoding to segment n attribute graphs into

N = \sum_{i = 1}^{n} k_{i}

subgraph instances and learn K flag vectors, the criteria of which are as follows: (1) high coverage, subgraph flags should faithfully reflect the distribution of subgraphs; (2) high discriminability, the graph representation formed by subgraph flags should significantly distinguish samples of different categories. The former is achieved through dictionary reconstruction, while the latter is ensured by an end-to-end learning framework. The subgraph flag matrix

U

can directly represent the subgraph flags. In addition, clustering can also be implemented through nonnegative dictionaries. Let

U = μ_{1}, μ_{2}, \dots, μ_{k} \in R^{c \times K}

be the subgraph flag matrix and require that all subgraph instances within the attribute graph can be reconstructed by the column vectors of

U

, i.e., minimizing the following reconstruction error:

min \sum_{l = 1}^{n} \sum_{i = 1}^{k_{l}} \sum_{j = 1}^{K} | | μ_{j} - α_{l i j} g_{l, [:, i]} {| |}^{2}, s . t . α_{l i j} \leq 0

(10)

where

α_{l i j}

is the reconstruction coefficient of the i-th subgraph in the attribute graph

G_{i}

with respect to the subgraph symbol

μ_{i}

. In addition, subgraph symbols can also be calculated through deep clustering. We use a clustering indicator matrix

W_{i} \in R^{n_{i} \times K}

to cluster the subgraphs of each attribute graph

G_{i}

, where the

j k

-th element represents the probability that the j-th substructure of

G_{i}

belongs to the k-th subgraph symbol of

G_{k}

, and the form of this probability can be determined by a Gaussian or t-distribution function, for example:

W_{i} (j, k) = \frac{(1 + | | g_{i} (j, :) - μ_{k} {| |}^{2} {/ α)}^{- \frac{α + 1}{2}}}{\sum_{k^{'}} (1 + | | g_{i} (j, :) - μ_{k}^{'} {| |}^{2} {/ α)}^{- \frac{α + 1}{2}}},

(11)

Kullback–Leibler divergence can be trained to learn the above. The objective function is:

\begin{matrix} min_{U, H_{i}^{'} s} \sum_{i} K L (W_{i}, {\tilde{W}}_{i}) \\ s . t . {\tilde{W}}_{l} (j, k) = & \frac{W_{i}^{2} (j, k) / \sum_{l} W_{i} (l, k)}{\sum_{k^{'}} [W_{i}^{2} (j, k^{'}) / \sum_{l} W_{i} (l, k^{'})]} . \end{matrix}

(12)

Here,

{\tilde{W}}_{l}

is the sparse (self-sharpening) version of

W_{l}

. Minimizing the KL divergence above ensures that each subgraph instance is only assigned to a small number of subgraph signature points, thus forming the dictionary reconstruction code for the coefficients. The choice of the number of subgraph signatures K is controlled by the complexity of the topology. For attribute graphs with large sizes and numbers of nodes, K should be chosen to be a sufficiently large value, also known as “structural resolution”.

The pooling operator of the topology-preserving graph pooling algorithm projects the local structure of each graph onto common structural signatures

μ_{j}, μ_{2}, \dots, μ_{K}

, and the resulting graph representation dimension depends only on the number of structural signatures K rather than the number of nodes in the attribute graph. Therefore, fixed-dimensional graph representations can be obtained for attribute graphs of different sizes while approximately maintaining the identity of individual parts (substructures) and their interconnection relationships. This operator can realize the probability distribution of the substructure distribution in the graph representation attributes of the attribute graph: the density of substructure signatures in attribute graph

G_{i}

can be calculated as

p_{i} = W_{i}^{'} \cdot 1_{n_{i} \times 1}

, and each row of this density can be used as a representation of each drug atom after learning.

3.6. Substructure Interaction Correlation with Collaborative Attention

Once we have secured the substructure data,

p_{x}^{(l)}

and

p^{(l)} y

for the input pharmaceuticals

G_{x}

and

G_{y}

, respectively, via the subgraph signature learning strata (segmental chemical substructure derivation), the significance of every mutual interaction among the substructures of

G_{x}

and

G_{y}

, denoted as

γ_{i j}

, is considered using a collaborative attention mechanism [29]. The calculation is given by the following equation:

γ_{i j} = b^{T} tanh (W_{x} p_{x}^{(i)} + W_{y} p_{y}^{(j)}), i = j = 1, \dots, L,

(13)

In Equation (13),

b

represents a mutable weight vector, while

W_{x}

and

W_{y}

stand for alterable weight matrices. Distinct weight matrices are employed to sidestep elevated scores when forecasting analogous substructures. Contrasting Equations (2) and (9), there is an absence of an activation function in Equation (13). Additionally, negative scores are expected to occur in some noninteracting drugs. Therefore, Equation (14) is used to generate lower DDI probabilities. Finally, an activation function, such as tanh, is used to obtain the predicted solution.

3.7. Prediction and Loss Function

The following equation provides the probability of predicting the interaction between a pair of drugs

G_{x}

and

G_{y}

as r:

P (G_{x}, G_{y}, r) = σ (\sum_{i} \sum_{y} γ_{i j} p_{x}^{{(i)}^{T}} M_{r} p_{y}^{(j)})

(14)

Here,

σ

symbolizes the sigmoid function, while

M_{r}

denotes the acquired matrix signifying the interaction type

I_{r}

. Equation (14) evaluates the likelihood of a drug duo

(G_{x}, G_{y})

inducing interaction r. This probability is driven by the interplay among the substructures of said drugs. Every paired association possesses a corresponding weight coefficient, culminating in the conclusive DDI prediction outcome.

Gradient descent, accompanied by cross-entropy, facilitates end-to-end training of the model. Every DDI specified in the dataset is deemed a positive instance. For any positive instance

{(G_{x}, G_{y}, r)}_{i}

, negative instances are crafted by substituting

G_{x}

and

G_{y}

according to the methodology introduced in [47]. The loss function L for the comprehensive DDI dataset is deduced in the subsequent manner:

L = - 1 / N \sum_{i = 1}^{N} (log (p_{i}) + log (- p_{i}^{'}))

(15)

Here,

p_{i}

and

p_{i}^{'}

are both calculated using Equation (14).

p_{i}

corresponds to the probability calculated for the positive sample, while

p_{i}^{'}

corresponds to the probability calculated for the associated negative sample.

3.8. The Overall Algorithm of DDI-SSL

The DDI-SSL algorithm is shown in Algorithm 1. Given input drugs

G_{x}

and

G_{y}

and interaction matrix r, multiple nodes generate feature matrices

H_{x}

and

H_{y}

. We use GAT to perform weighted calculations of the features to obtain new representations (Line 4) and at the same time use deep clustering algorithms to train substructure markers (Lines 6–9). Lines 10–13 calculate the collaborative attention. Finally, the probability of a drug interaction is generated (Lines 15–17).

Algorithm 1: DDI-SSL Algorithm

4. Experimental Setup and Results Analysis

4.1. Datasets

In experiments pertaining to drug–drug interactions, we employ four datasets: DrugBank, TwoSides, DrugComb, and DrugCombDB. Details regarding these four datasets can be found in Table 1, where Management context

| C |

refers to how a dataset is organized and maintained. Furthermore, the number of labels

| Y |

represents different types of interactions, such as drug interactions categorized into various classes like strong interactions, weak interactions, harmful interactions, benign interactions, and more. Understanding the label count helps in assessing the diversity and complexity of the dataset and determines its suitability for different types of analyses and models.

DrugBank [6]: This compilation encompasses 1706 pharmaceuticals and 383,496 DDIs, exhibiting 86 broad sentence structures, with each delineating a distinct DDI variant. Training sets comprise 80% of the overarching data. Drugs correlate with their respective SMILES string portrayals [48]. Transitioning to graph format employs the freely available RDKit chemistry information library. Each medicinal entity is represented as a graph; connections symbolize edges, while elements denote nodes. Each element spans 55 dimensions, encapsulating the following: (1) elemental symbol (44, one-hot depiction), (2) elemental degree (count of neighboring elements), (3) implicit valence, (4) tally of unattached electrons, (5) hybridization type (5, one-hot depiction), (6) aromaticity status of the element, and (7) accumulated hydrogen count on said element. Every DDI duo is categorized as a positive instance, from which a negative instance is derived. Singular interactions are associated with each DDI tuple in this compilation.

TwoSides [18]: Given by filtering the original TwoSides adverse drug data [1] from [18], this dataset contains 644 drugs and 499,582 DDI pairs. Contrary to the DrugBank collection, the interactions here are phenotypic in nature rather than being based on transcriptional dimensions. That is, the interactions here are just adverse effects, such as headaches, sore throats, and symptoms such as those in [18].

DrugComb [49]: DrugComb aims to provide free access to standardized drug combination screening results. We collected and managed high-throughput drug combination screening datasets involving 4146 drugs tested on 288 cancer cell lines from 10 different tissues in 659,333 combinations using computational tools available on network servers.

DrugCombDB [50]: DrugCombDB covers 191,391 dual drug combinations, 2956 unique drugs, and 112 cancer cell line samples.

4.2. Baselines

We assessed our model against state-of-the-art methods by (1) utilizing molecular charts; (2) infusing drug fragment insights; and (3) evaluating their role in DDI predictions.

DeepDDI [6] incorporates structural similarity profiles in the representation learning process and uses the Jaccard coefficient to predict DDI.

DeepSynergy [51] uses chemical and genetic information as input and is applied to predict drug synergy.

MHCADDI [25] uses a collaborative attention mechanism to integrate drug combination information.

MR-GNN [19] captures different-sized structures of each drug using each graph convolutional layer of the nodes. These representations are subsequently channeled into a recurrent neural mechanism to collectively learn drug duo depictions for forecasting.

CASTER [51] uses an end-to-end dictionary learning framework to encode drug chemical structures.

SSI-DDI [38] perceives every node’s concealed attribute as a substructure, calculating the interplay among them to conclusively pinpoint the DDI.

EPGCN-DS [52] achieves drug structure permutation invariance using a graph convolutional network and DeepSets.

DeepDrug [53] learns graph representations of drugs and proteins using graph convolutional networks, e.g., residual structures, to optimize training.

GCN-BMP [54] layers numerous graph convolution tiers (amounting to L strata) to assimilate depictions of each node within the diagram and fabricates a focus-driven graph pooling stratum.

DeepDDS [55] discerns oncological drug pairings employing a profound learning framework hinging on graph convolutional structures or attention-centric processes.

MatchMaker [56] uses drug chemical structure information to predict in a deep learning framework.

We reproduced these methods using PyTorch and made minor modifications to some of them to achieve better performance under fair comparison.

4.3. Experimental Settings

As shown in Figure 1, the model learns drug pair representations by sharing GAT weights, realized through recycling identical layers for both medications. The algorithm consists of four GAT layers, each with C multihead modules, set to 2, where each attention head calculates a 32-dimensional intermediate feature, resulting in the final output of the GAT layers being transformed into a 64-dimensional hidden feature vector. Following the GAT layers are LayerNorm layers and ELU activation functions. The LayerNorm layer is directly is imposed upon the input datasets. Each interaction type

I_{r} \in L

is represented by a learnable matrix

M r \in R^{64 \times 64}

. The model uses the Adam optimizer [57] to train on batches of 1024 DDI data, with weight decay set to

5 \times 10^{- 4}

. A learning rate with exponential decay over time is designed as

0.01 \times ({0.96}^{t})

, where t is the epoch number and the number of epochs is set to 200.

Within this configuration, DDI pairs are haphazardly divided: 80% for training and 20% for evaluation, maintaining consistent interaction category ratios. This division recurs thrice, yielding three distinct randomly segmented dataset batches. The methodology we advocate, along with the parameters for every foundational model, is illustrated in Table 2. Each model undergoes training and assessment on these tri-fold datasets. The mean and variability of outcomes for each model, as derived from the trio of trials, are documented in Table 3.

4.4. Experimental Implementation

Our model is crafted via PyTorch, and the empirical tests were executed on an NVIDIA RTX 3090 GPU, operating under the Ubuntu OS.

4.5. Experimental Results

In this experiment, DDI-SSL uses the following metrics: ACC, AUC-ROC, and F1-score. For the multiclass models (MR-GNN and DeepDDI), the microaverage measure of AUC is considered. Every technique is deployed on an identical dataset partition. Concerning MR-GNN, the initial code proffered by its creators is redeployed. DeepDDI and MHCADDI methods are also reimplemented according to the settings given in their respective original papers. The scores obtained by each method for each metric are shown in the bottom row of Table 3. The DDI-SSL algorithm performs well on all three datasets, including DrugBank, where it shows improvement in scores, while DeepSynergy showcases superior AUC outcomes on the DrugBank collection, its measures hold questionable reliability given the dataset’s skewed distribution. Even with very low true positive rates, multiclass classifiers often have high AUC scores.

In the DrugBank dataset, each DDI pair is treated as a positive sample while generating a negative sample. However, each DDI tuple has only one interaction. As a result, AUC-ROC may not be sensitive to the performance of the minority class because it relies on the calculation of true positive rate and false positive rate. When the number of negative samples outweighs the number of positive samples, even if the model performs poorly on the positive class, it can lead to a relatively low false positive rate, thereby overestimating the model’s performance. DDI-SSL also has a very high AUC score, indicating that it not only distinguishes interacting and noninteracting drugs but also has high accuracy

Additionally, when considering the TwoSides and DrugComb collections, DDI-SSL surpasses numerous alternative techniques in F1-score and AUC metrics. The accuracy (ACC) of various approaches on the evaluation set is depicted in Figure 2. On the DrugBank dataset, the DDI-SSL method shows improvement over DeepDDI, DeepSynergy, and others and converges more smoothly, ultimately approaching 100% accuracy. On the TwoSides dataset, the accuracy is improved, and the convergence is also faster. On the DrugComb dataset, the performance is more oscillatory, and since multiple methods have oscillating ACC curves on this dataset, this suggests that the dataset itself has significant differences in distribution. Furthermore, our observations revealed that DDI-SSL maintains commendable precision on the DrugCombDB collection, signifying the model’s robust consistency when applied to such datasets.

4.5.1. The Effect of Collaborative Attention

To investigate the interaction between substructure interaction and cooperative attention, the impact of removing the cooperative attention layer was examined. Figure 3a shows the AUC, ACC, and F1-score results obtained with and without the cooperative attention layer on the DrugBank validation set. It can be seen that the combination of cooperative attention does improve performance: under the same number of attention heads and markers, using cooperative attention leads to an improvement, and its removal causes a decline. This is because when matching the substructures of two different molecules, the structural and property differences between the molecules themselves can cause many irrelevant pairings, resulting in much unmergeable information. The addition of cooperative attention allows such noise to be effectively avoided.

4.5.2. The Effect of Multihead Attention

Our investigation further assessed the impact of varying attention head counts on the outcomes. As illustrated in Figure 3b, the model exhibited optimal performance with a duo of attention heads. Multiple attention heads allow the exploration of different feature subspaces, which can improve the generalization performance of graph neural networks. When pursuing multihead attention, the computation of each head can be parallelized, giving the algorithm a special advantage. Yet, with an escalating count of attention heads, the predictive outcomes do not enhance; instead, they exhibit a steady decline, indicating that information interference reaches saturation.

4.5.3. The Effect of Number of Substructure Markers

In our experiments, we varied the number of substructures, denoted as K, to study its impact on the results. Specifically, we set K to 100, 200, 500, and 1000. As shown in Figure 3c, we found that the model performed best when K was set to 200. As the number of substructures increased beyond this point, the performance did not improve but rather gradually decreased. Consequently, if the value of K is excessively diminutive, leading to underfitting, or overly expansive, causing overfitting, there is potential for the precision to diminish. Generally, the optimal performance can be achieved when K is approximately its median value.

Specifically, for the accuracy (Acc) metric, DDI-SSL demonstrated an improvement of around 5% compared to the latest method. For the F1-score metric, DDI-SSL exhibited an improvement ranging from 2% to 5% compared to the latest method. As for the AUC, DDI-SSL showed an improvement of approximately 2% compared to the latest method.

5. Conclusions and Discussion

We introduce an innovative approach, called DDI-SSL, geared toward detecting drug-to-drug interactions (DDIs) through mastering substructural blueprints, potent in aiding the forecast of medication adverse reactions.

First, we analyze that previous methods relying on global structures may lead to inaccurate results in handling drug information. Leveraging a self-focus mechanism, we derive potency ratings for subgraphs, and every substructure is annotated with pertinent drug attributes scaled by adaptable weightings, enabling the output of variably sized and non-uniformly contoured outcomes. Due to the variable receptive field of message passing, our method has high scalability. Moreover, we accurately extract information from local drug subgraphs to identify the corresponding modules of the composed individuals. Subgraph formulas allow the projection of local structural details of attribute maps onto subgraph signature sets, which reduces noise by anchoring relevant information, thereby achieving modular representation of complex topological structures. Finally, we model the mutual influence between drugs by introducing a cooperative attention mechanism, which effectively avoids noise in molecular structures and properties. Thorough investigations on DDI data collections validate the enhancement brought by the DDI-SSL model and pinpoint the significance of each advancement aspect.

Future research will leverage message passing and other methods to capture the topological relationships and semantic information of larger-scale protein atoms and bonds by representing nodes and edges in graph networks. In addition to drug–drug interactions, research in the field of graph neural networks has also made progress in learning protein structure and predicting temporal dynamics. However, challenges remain in protein dynamics prediction tasks, such as the need to dynamically capture complex structural spatiotemporal changes and use long-range correlations at different time scales. To better understand the physiological mechanisms behind protein dynamics, a new interaction modeling graph neural network framework that captures long-range dynamic spatiotemporal correlations is urgently needed to provide prediction and interpretation for protein dynamics research in target interactions.

It is important to acknowledge that while the DDI-SSL method shows promise in enhancing drug–drug interaction predictions, there are several limitations to consider. First, the performance of DDI-SSL may be influenced by the quality and size of the available data, and the method’s effectiveness could vary depending on the specific dataset used. Additionally, DDI-SSL’s success may be constrained by the completeness and accuracy of the underlying drug interaction data, which can vary across different sources.

Real-World Impact: Accurate predictions of drug–drug interactions hold immense potential for improving various aspects of healthcare. In particular, such advancements can significantly enhance patient safety, streamline drug development processes, and reduce healthcare costs.

Patient Safety: Accurate predictions enable healthcare providers to proactively identify potential interactions between prescribed medications, thereby minimizing the risk of adverse effects and harmful drug interactions. This has a direct impact on patient safety, ensuring that treatments are both effective and safe.

Drug Development Efficiency: Drug development is a costly and time-consuming process. Accurate predictions of drug interactions allow pharmaceutical companies to identify potential issues early in the development pipeline, saving time and resources. This can expedite the release of safe and effective medications to the market.

Healthcare Cost Reduction: Avoiding drug–drug interactions can lead to a reduction in hospitalizations, emergency room visits, and other healthcare expenses associated with adverse drug events. This, in turn, can contribute to overall healthcare cost reduction.

Specific Use Cases: While DDI-SSL can be applied broadly to predict drug interactions, it is worth noting that its utility extends to both organic molecules and metallo-ligands, including compounds like Cisplatin. This versatility makes it a valuable tool for researchers and healthcare professionals working with a wide range of therapeutic agents, including both organic and metal-containing drugs.

Funding

This research was funded by the Guangxi Key Laboratory of Trusted Software (no.: KX202037), the Project of Guangxi Science and Technology (no.: GuiKeAD 20297054), and the Guangxi Natural Science Foundation Project (no.: 2020GXNSFBA297108).

Data Availability Statement

The data that support the findings of this study are openly available in PNAS at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5939113, reference number [6]; Bioinformatics at https://ui.adsabs.harvard.edu/abs/2018arXiv180200543Z/abstract, reference number [18]; Nucleic Acids Res. at https://drugcomb.org/, reference number [49]; Nucleic Acids Res. at https://pubmed.ncbi.nlm.nih.gov/31665429/, reference number [50].

Conflicts of Interest

The authors declare no conflict of interest.

References

Tatonetti, N.P.; Ye, P.P.; Daneshjou, R.; Altman, R.B. Data-driven prediction of drug effects and interactions. Sci. Transl. Med. 2012, 4, 125ra31. [Google Scholar] [CrossRef] [PubMed]
Han, K.; Jeng, E.E.; Hess, G.T.; Morgens, D.W.; Li, A.; Bassik, M.C. Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions. Nat. Biotechnol. 2017, 35, 463–474. [Google Scholar] [CrossRef] [PubMed]
Pan, R.; Ruvolo, V.; Mu, H.; Leverson, J.D.; Nichols, G.; Reed, J.C.; Konopleva, M.; Andreeff, M. Synthetic lethality of combined Bcl-2 inhibition and p53 activation in AML: Mechanisms and superior antileukemic efficacy. Cancer Cell 2017, 32, 748–760. [Google Scholar] [CrossRef]
Bansal, M.; Yang, J.; Karan, C.; Menden, M.P.; Costello, J.C.; Tang, H.; Xiao, G.; Li, Y.; Allen, J.; Zhong, R.; et al. A community computational challenge to predict the activity of pairs of compounds. Nat. Biotechnol. 2014, 32, 1213–1222. [Google Scholar] [CrossRef] [PubMed]
Ernst, F.R.; Grizzle, A.J. Drug-related morbidity and mortality: Updating the cost-of-illness model. J. Am. Pharm. Assoc. 2001, 41, 192–199. [Google Scholar] [CrossRef] [PubMed]
Ryu, J.Y.; Kim, H.U.; Lee, S.Y. Deep learning improves prediction of drug–drug and drug–food interactions. Proc. Natl. Acad. Sci. USA 2018, 115, E4304–E4311. [Google Scholar] [CrossRef]
Silverman, R.B.; Holladay, M.W. The Organic Chemistry of Drug Design and Drug Action, 3rd ed.; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
Zhang, T.; Leng, J.; Liu, Y. Deep learning for drug-drug interaction extraction from the literature: A review. Briefings Bioinform. 2020, 21, 1609–1627. [Google Scholar] [CrossRef]
Whitebread, S.; Hamon, J.; Bojanic, D.; Urban, L. Keynote review: In vitro safety pharmacology profiling: An essential tool for successful drug development. Drug Discov. Today 2005, 10, 1421–1433. [Google Scholar] [CrossRef]
Yu, H.; Mao, K.T.; Shi, J.Y.; Huang, H.; Chen, Z.; Dong, K.; Yiu, S.M. Predicting and understanding comprehensive drug-drug interactions via semi-nonnegative matrix factorization. BMC Syst. Biol. 2018, 12, 101–110. [Google Scholar] [CrossRef]
Gottlieb, A.; Stein, G.Y.; Oron, Y.; Ruppin, E.; Sharan, R. INDI: A computational framework for inferring drug interactions and their associated recommendations. Mol. Syst. Biol. 2012, 8, 592. [Google Scholar] [CrossRef]
Huang, H.; Zhang, P.; Qu, X.A.; Sanseau, P.; Yang, L. Systematic prediction of drug combinations based on clinical side-effects. Sci. Rep. 2014, 4, 7160. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Xu, Y.; Cui, H.; Huang, T.; Wang, D.; Lian, B.; Li, W.; Qin, G.; Chen, L.; Xie, L. Prediction of synergistic anti-cancer drug combinations based on drug target network and drug induced gene expression profiles. Artif. Intell. Med. 2017, 83, 35–43. [Google Scholar] [CrossRef] [PubMed]
Kastrin, A.; Ferk, P.; Leskošek, B. Predicting potential drug-drug interactions on topological and semantic similarity features using statistical learning. PloS ONE 2018, 13, e0196865. [Google Scholar] [CrossRef]
Ferdousi, R.; Safdari, R.; Omidi, Y. Computational prediction of drug-drug interactions based on drugs functional similarities. J. Biomed. Inform. 2017, 70, 54–64. [Google Scholar] [CrossRef]
Zhang, W.; Chen, Y.; Li, D.; Yue, X. Manifold regularized matrix factorization for drug-drug interaction prediction. J. Biomed. Inform. 2018, 88, 90–97. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Chen, Y.; Liu, F.; Luo, F.; Tian, G.; Li, X. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinform. 2017, 18, 18. [Google Scholar] [CrossRef]
Zitnik, M.; Agrawal, M.; Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 2018, 34, i457–i466. [Google Scholar] [CrossRef]
Xu, N.; Wang, P.; Chen, L.; Tao, J.; Zhao, J. Mr-gnn: Multi-resolution and dual graph neural network for predicting structured entity interactions. arXiv 2019, arXiv:1905.09558. [Google Scholar]
Huang, K.; Xiao, C.; Hoang, T.N.; Glass, L.; Sun, J. CASTER: Predicting Drug Interactions with Chemical Substructure Representation. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA, 7–12 February 2020; pp. 702–709. [Google Scholar]
Deng, Y.; Xu, X.; Qiu, Y.; Xia, J.; Zhang, W.; Liu, S. A multimodal deep learning framework for predicting drug–drug interaction events. Bioinformatics 2020, 36, 4316–4322. [Google Scholar] [CrossRef]
Ma, T.; Xiao, C.; Zhou, J.; Wang, F. Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 9–19 July 2018; pp. 3477–3483. [Google Scholar]
Zhang, Y.; Qiu, Y.; Cui, Y.; Liu, S.; Zhang, W. Predicting drug-drug interactions using multi-modal deep auto-encoders based network embedding and positive-unlabeled learning. Methods 2020, 179, 37–46. [Google Scholar] [CrossRef]
Feng, Y.H.; Zhang, S.W.; Shi, J.Y. DPDDI: A deep predictor for drug-drug interactions. BMC Bioinform. 2020, 21, 419. [Google Scholar] [CrossRef]
Deac, A.; Huang, Y.H.; Veličković, P.; Liò, P.; Tang, J. Drug-drug adverse effect prediction with graph co-attention. arXiv 2019, arXiv:1905.00534. [Google Scholar]
Jia, J.; Zhu, F.; Ma, X.; Cao, Z.W.; Li, Y.X.; Chen, Y.Z. Mechanisms of drug combinations: Interaction and network perspectives. Nat. Rev. Drug Discov. 2009, 8, 111–128. [Google Scholar] [CrossRef]
Wang, H.; Lian, D.; Zhang, Y.; Qin, L.; Lin, X. Gognn: Graph of graphs neural network for predicting structured entity interactions. arXiv 2020, arXiv:2005.05537. [Google Scholar]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 3837–3845. [Google Scholar]
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the Sixth International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. Proc. Mach. Learn. Res. 2017, 70, 1263–1272. [Google Scholar]
Lin, X.; Quan, Z.; Wang, Z.; Ma, T.; Zeng, X. KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 11–17 July 2020; pp. 2739–2745. [Google Scholar]
Lyu, T.; Gao, J.; Tian, L.; Li, Z.; Zhang, P.; Zhang, J. MDNN: A Multimodal Deep Neural Network for Predicting Drug-Drug Interaction Events. In Proceedings of the 30th International Joint Conference on Artificial Intelligence, Virtual, 19–27 August 2021; pp. 3536–3542. [Google Scholar]
Zhao, C.; Liu, S.; Huang, F.; Liu, S.; Zhang, W. CSGNN: Contrastive Self-Supervised Graph Neural Network for Molecular Interaction Prediction. In Proceedings of the 30th International Joint Conference on Artificial Intelligence, Virtual, 19–27 August 2021; pp. 3756–3763. [Google Scholar]
Wang, Y.; Min, Y.; Chen, X.; Wu, J. Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2921–2933. [Google Scholar]
Fu, H.; Huang, F.; Liu, X.; Qiu, Y.; Zhang, W. MVGCN: Data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks. Bioinformatics 2022, 38, 426–434. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Ma, T.; Yang, X.; Wang, J.; Song, B.; Zeng, X. MUFFIN: Multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics 2021, 37, 2651–2658. [Google Scholar] [CrossRef] [PubMed]
Nyamabo, A.K.; Yu, H.; Shi, J.Y. SSI–DDI: Substructure–substructure interactions for drug–drug interaction prediction. Briefings Bioinform. 2021, 22, bbab133. [Google Scholar] [CrossRef]
Yu, Y.; Huang, K.; Zhang, C.; Glass, L.M.; Sun, J.; Xiao, C. SumGNN: Multi-typed drug interaction prediction via efficient knowledge graph summarization. Bioinformatics 2021, 37, 2988–2995. [Google Scholar] [CrossRef]
Lv, G.; Hu, Z.; Bi, Y.; Zhang, S. Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction. In Proceedings of the 30th International Joint Conference on Artificial Intelligence, Virtual, 19–27 August 2021; pp. 3677–3683. [Google Scholar]
Huang, K.; Xiao, C.; Glass, L.M.; Sun, J. MolTrans: Molecular interaction transformer for drug–target interaction prediction. Bioinformatics 2021, 37, 830–836. [Google Scholar] [CrossRef]
Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; pp. 2224–2232. [Google Scholar]
Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular graph convolutions: Moving beyond fingerprints. J. Comput. Aided Mol. Des. 2016, 30, 595–608. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; Volume 30, p. 3. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Lee, J.; Lee, I.; Kang, J. Self-Attention Graph Pooling. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 3734–3743. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2014), Québec City, QC, Canada, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
Zagidullin, B.; Aldahdooh, J.; Zheng, S.; Wang, W.; Wang, Y.; Saad, J.; Malyutina, A.; Jafari, M.; Tanoli, Z.; Pessia, A.; et al. DrugComb: An integrative cancer drug combination data portal. Nucleic Acids Res. 2019, 47, W43–W51. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Zhang, W.; Zou, B.; Wang, J.; Deng, Y.; Deng, L. DrugCombDB: A comprehensive database of drug combinations toward the discovery of combinatorial therapy. Nucleic Acids Res. 2020, 48, D871–D881. [Google Scholar] [PubMed]
Preuer, K.; Lewis, R.P.; Hochreiter, S.; Bender, A.; Bulusu, K.C.; Klambauer, G. DeepSynergy: Predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 2018, 34, 1538–1546. [Google Scholar] [CrossRef]
Sun, M.; Wang, F.; Elemento, O.; Zhou, J. Structure-Based Drug-Drug Interaction Detection via Expressive Graph Convolutional Networks and Deep Sets (Student Abstract). In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 13927–13928. [Google Scholar]
Yin, Q.; Cao, X.; Fan, R.; Liu, Q.; Jiang, R.; Zeng, W. DeepDrug: A general graph-based deep learning framework for drug-drug interactions and drug-target interactions prediction. bioRxiv 2020. [Google Scholar] [CrossRef]
Chen, X.; Liu, X.; Wu, J. GCN-BMP: Investigating graph representation learning for DDI prediction task. Methods 2020, 179, 47–54. [Google Scholar] [CrossRef]
Wang, J.; Liu, X.; Shen, S.; Deng, L.; Liu, H. DeepDDS: Deep graph neural network with attention mechanism to predict synergistic drug combinations. Briefings Bioinform. 2022, 23, bbab390. [Google Scholar] [CrossRef]
Kuru, H.I.; Tastan, O.; Cicek, A.E. MatchMaker: A deep learning framework for drug synergy prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 19, 2334–2344. [Google Scholar] [CrossRef]
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]

Figure 1. DDI-SSL model framework.

Figure 2. Overall experimental accuracy.

Figure 3. Influence of different hyperparameters on the results.

Table 1. Number of drugs in the drug dataset (

| D |

), management context

| C |

, and number of labels (

| Y |

).

Table 1. Number of drugs in the drug dataset (

| D |

), management context

| C |

, and number of labels (

| Y |

).

Dataset	Task	D	C	$\| Y \|$
TwoSides	Polypharmacy	644	10	499,582
DrugBankDDI	Interaction	1706	86	383,496
DrugComb	Synergy	4146	288	659,333
DrugCombDB	Synergy	2596	112	191,391

Table 2. Hyperparameter used by each baseline model.

Methods	Encoder	Hyperparameter
DeepDDI	Feedforward	Hidden layer channels
DeepSynergy	Feedforward	Drug encoder channels
		Context encoder channels
		Hidden layer channels
MHCADDI	GAT	Atom encoder channels
		Edge encoder channels
		Hidden layer channels
		Readout layer channels
MR-GNN	GCN	Drug encoder channels
		Drug encoder layers
		Hidden layer channels
CASTER	Feedforward	Drug encoder channels
		Hidden layer channels
		Regularization coefficient
		Magnification factor
SSI-DDI	GAT	Drug encoder channels
		Attention heads
EPGCN-DS	GCN	Drug encoder channels
		Hidden layer channels
DeepDrug	GCN	Drug encoder channels
		Hidden layer channels
GCN-BMP	GCN	Drug encoder channels
		Hidden layer channels
DeepDDS	GCN or GAT	Context encoder channels
		Hidden layer channels
MatchMaker	Feedforward	Drug encoder channels
		Hidden layer channels

Table 3. Evaluation results of DDI experiments.

Models	DrugBank		TwoSides		DrugComb		DrugCombDB
Models	DrugBank F1	DrugBank Auc	TwoSides F1	TwoSides Auc	DrugComb F1	DrugComb Auc	DrugCombDB F1	DrugCombDB Auc
DeepDDI	0.715 ±¹ 0.003	0.880 ± 0.002	0.848 ± 0.009	0.929 ± 0.001	0.715 ± 0.003	0.669 ± 0.001	0.715 ± 0.002	0.714 ± 0.003
DeepSynergy	0.725 ± 0.002	0.991 ± 0.001	0.887 ± 0.001	0.937 ± 0.001	0.725 ± 0.002	0.702 ± 0.003	0.724 ± 0.001	0.704 ± 0.002
MHCADDI	0.721 ± 0.002	0.734 ± 0.003	0.798 ± 0.003	0.912 ± 0.002	0.714 ± 0.003	0.645 ± 0.002	0.725 ± 0.003	0.719 ± 0.003
MR-GNN	0.455 ± 0.002	0.877 ± 0.002	0.821 ± 0.002	0.937 ± 0.002	0.455 ± 0.002	0.724 ± 0.003	0.629 ± 0.003	0.702 ± 0.003
CASTER	0.689 ± 0.003	0.765 ± 0.003	0.769 ± 0.001	0.845 ± 0.003	0.624 ± 0.002	0.645 ± 0.003	0.703 ± 0.003	0.698 ± 0.003
SSI-DDI	0.711 ± 0.002	0.745± 0.002	0.707 ± 0.003	0.823 ± 0.002	0.711 ± 0.002	0.627 ± 0.001	0.714 ± 0.003	0.723 ± 0.003
EPGCN-DS	0.697 ± 0.001	0.761 ± 0.002	0.717 ± 0.003	0.855 ± 0.003	0.697 ± 0.001	0.629 ± 0.002	0.712 ± 0.003	0.709 ± 0.002
DeepDrug	0.703 ± 0.002	0.861 ± 0.003	0.805 ± 0.002	0.923 ± 0.004	0.724 ± 0.001	0.643 ± 0.001	0.678 ± 0.003	0.715 ± 0.001
GCN-BMP	0.662 ± 0.002	0.669 ± 0.002	0.621 ± 0.001	0.709 ± 0.003	0.707 ± 0.002	0.594 ± 0.001	0.708 ± 0.003	0.613 ± 0.002
DeepDDS	0.729 ± 0.002	0.963 ± 0.001	0.902 ± 0.002	0.915 ± 0.002	0.702 ± 0.003	0.663 ± 0.004	0.706 ± 0.002	0.701 ± 0.001
MatchMaker	0.725 ± 0.001	0.987 ± 0.001	0.874 ± 0.004	0.912 ± 0.002	0.712 ± 0.002	0.662 ± 0.002	0.714 ± 0.001	0.678 ± 0.002
DDI-SSL	0.731 ± 0.002	0.991 ± 0.002	0.905 ± 0.002	0.939 ± 0.001	0.727 ± 0.001	0.732 ± 0.003	0.734 ± 0.003	0.725 ± 0.003

Note: The bolded values represent the best experimental results. ¹ “±” represents the error range of the results.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, Y. DDI-SSL: Drug–Drug Interaction Prediction Based on Substructure Signature Learning. Appl. Sci. 2023, 13, 10750. https://doi.org/10.3390/app131910750

AMA Style

Liang Y. DDI-SSL: Drug–Drug Interaction Prediction Based on Substructure Signature Learning. Applied Sciences. 2023; 13(19):10750. https://doi.org/10.3390/app131910750

Chicago/Turabian Style

Liang, Yuan. 2023. "DDI-SSL: Drug–Drug Interaction Prediction Based on Substructure Signature Learning" Applied Sciences 13, no. 19: 10750. https://doi.org/10.3390/app131910750

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DDI-SSL: Drug–Drug Interaction Prediction Based on Substructure Signature Learning

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Problem Definition

3.2. Input

3.3. Substructure Graph Convolution Operator

3.3.1. Multihead Attention

3.3.2. Normalization

3.4. Substructure Extraction

3.5. Substructure Signature Learning

3.6. Substructure Interaction Correlation with Collaborative Attention

3.7. Prediction and Loss Function

3.8. The Overall Algorithm of DDI-SSL

4. Experimental Setup and Results Analysis

4.1. Datasets

4.2. Baselines

4.3. Experimental Settings

4.4. Experimental Implementation

4.5. Experimental Results

4.5.1. The Effect of Collaborative Attention

4.5.2. The Effect of Multihead Attention

4.5.3. The Effect of Number of Substructure Markers

5. Conclusions and Discussion

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI