Learning Multi-Types of Neighbor Node Attributes and Semantics by Heterogeneous Graph Transformer and Multi-View Attention for Drug-Related Side-Effect Prediction

Xuan, Ping; Li, Peiru; Cui, Hui; Wang, Meng; Nakaguchi, Toshiya; Zhang, Tiangang

doi:10.3390/molecules28186544

Open AccessArticle

Learning Multi-Types of Neighbor Node Attributes and Semantics by Heterogeneous Graph Transformer and Multi-View Attention for Drug-Related Side-Effect Prediction

by

Ping Xuan

^1,2,

Peiru Li

¹,

Hui Cui

³,

Meng Wang

¹,

Toshiya Nakaguchi

⁴ and

Tiangang Zhang

^1,5,*

¹

School of Computer Science and Technology, Heilongjiang University, Harbin 130407, China

²

Department of Computer Science, School of Engineering, Shantou University, Shantou 515000, China

³

Department of Computer Science and Information Technology, La Trobe University, Melbourne 3086, Australia

⁴

Center for Frontier Medical Engineering, Chiba University, Chiba 263-8522, Japan

⁵

School of Mathematical Science, Heilongjiang University, Harbin 130407, China

^*

Author to whom correspondence should be addressed.

Molecules 2023, 28(18), 6544; https://doi.org/10.3390/molecules28186544

Submission received: 26 July 2023 / Revised: 1 September 2023 / Accepted: 6 September 2023 / Published: 9 September 2023

(This article belongs to the Special Issue Advances in Computational Chemistry for Drug Design, Discovery and Screening)

Download

Browse Figures

Versions Notes

Abstract

:

Since side-effects of drugs are one of the primary reasons for their failure in clinical trials, predicting their side-effects can help reduce drug development costs. We proposed a method based on heterogeneous graph transformer and capsule networks for side-effect-drug-association prediction (TCSD). The method encodes and integrates attributes from multiple types of neighbor nodes, connection semantics, and multi-view pairwise information. In each drug-side-effect heterogeneous graph, a target node has two types of neighbor nodes, the drug nodes and the side-effect ones. We proposed a new heterogeneous graph transformer-based context representation learning module. The module is able to encode specific topology and the contextual relations among multiple kinds of nodes. There are similarity and association connections between the target node and its various types of neighbor nodes, and these connections imply semantic diversity. Therefore, we designed a new strategy to measure the importance of a neighboring node to the target node and incorporate different semantics of the connections between the target node and its multi-type neighbors. Furthermore, we designed attentions at the neighbor node type level and at the graph level, respectively, to obtain enhanced informative neighbor node features and multi-graph features. Finally, a pairwise multi-view feature learning module based on capsule networks was built to learn the pairwise attributes from the heterogeneous graphs. Our prediction model was evaluated using a public dataset, and the cross-validation results showed it achieved superior performance to several state-of-the-art methods. Ablation experiments undertaken demonstrated the effectiveness of heterogeneous graph transformer-based context encoding, the position enhanced pairwise attribute learning, and the neighborhood node category-level attention. Case studies on five drugs further showed TCSD’s ability in retrieving potential drug-related side-effect candidates, and TCSD inferred the candidate side-effects for 708 drugs.

Keywords:

1. Introduction

The side-effects of drugs are defined as effects occurring in the body when the drug is administered at therapeutic doses that are unrelated to its therapeutic purpose, including adverse reactions that may cause the drug to fail in clinical trials [1,2,3]. Therefore, providing precise and efficient identification of drug-related side-effect candidates can aid in lowering drug development costs and enhance drug safety [4,5]. Computational methods have demonstrated their ability to aid in drug discovery [6] and design [7] (CADD). They can also screen for reliable drug-related side-effect candidates [8,9,10].

The three categories of currently used drug-side-effect association prediction methods are as follows: The first category involves estimation of drug and side-effect association likelihoods based on drug-associated proteins. New indications and adverse reactions are usually caused by unexpected chemical–protein interactions at off-target sites. Therefore, the targeted protein information of the drug is used to predict drug-related side-effects. Compound–protein interaction (CPI) sets [11,12] and drug–protein interactions (DPI) can also be used to infer drug-related side-effect candidates [13]. However, this class of method is limited in that only a small fraction of the structural information for the drug-associated proteins is available [14].

A second class of predictive models uses machine learning to screen candidates for drug-related side-effects. To combine data on medications, proteins, and side-effects, five machine learning techniques were used: logistic regression, parsimonious Bayes, k-nearest neighbors, random forest, and support vector machine [15]. Approaches to infer potential drug-side-effect associations are based on multi-label learning [16], on multiple kernels learning and least squares [17], on random forests [18], on a random wandering and skip-gram algorithm [19], on feature-derived graph-regularized matrix factorization for predicting drug side-effects (FGRMF) [20], on triple matrix decomposition based on nuclear target alignment [21], and on non-negative matrix factorization [22]. Mohsen et al. [23] constructed a framework based on a deep neural network (DNN) for inferring the candidates. However, such models are shallow prediction models which have difficulty in fully extracting the complicated and nonlinear associations between drugs and side-effects.

The third category establishes a prediction model based on deep learning to further enhance prediction performance by extracting the depth and representative features of the drug and side-effect nodes. The training process of a deep learning model usually needs several hours or tens of hours. On the other hand, when the model is applied to inferring the association possibility for a pair of drug and side-effects, it often only needs no more than a second. The newly advanced models make full use of the diverse data related to drug and side-effect nodes for drug-side-effect association prediction, including the similarity and association information of drugs and side-effects as well as the association information of drugs and diseases. Several approaches integrate multi-source data on drugs and side-effects, including through use of graph attention networks [24], a similarity-based deep learning approach for determining the frequencies of drug side-effects (SDPred) using a multi-layer perceptron [25], graph convolutional autoencoders, and convolutional neural networks [26], respectively. Recently, hybrid graph neural network models incorporating graph-embedding and node-embedding modules have been used to model drug-side-effect associations and to provide candidate predictions [27]. Although deep models have shown improvements in drug-side-effect association predictions, the above models cannot adequately fuse the features of the edges between the source and target nodes and do not integrate the rich positional information in the feature embedding of the node pairs. Our model aggregates the information from multiple types of neighbor nodes, and encodes the semantic information of the various connections. Moreover, an attribute learning module is built to learn the pairwise attributes from a multiple capsule perspective.

In this study, we propose a novel prediction model TCSD for integrating the various neighbor attributes, the diverse connection semantics, and the pairwise attributes. TCSD’s main contributions are listed as follows:

(1): First, two heterogeneous graphs composed of drug and side-effect nodes are constructed by utilizing two types of drug similarities to complement the encoding of the specific topology structure and node attributes of each heterogeneous graph. A target node in each graph has drug neighbor nodes and side-effect nodes, and there are contextual relationship among the attributes of the target node and the attributes of its diverse neighbor nodes. Most previous approaches have focused only on aggregating the information of a single type of neighbor node. A module based on a graph transformer is established to learn category-sensitive attributes for each category of neighbor nodes.
(2): Previous approaches did not fully utilize the diverse information of multiple types of connections among the drug and side-effect nodes. In order to improve the node feature-learning capacity in each heterogeneous graph, we design a strategy to integrate the similarity semantic connections between drugs (side-effects) and the association semantic connections between drugs and side-effects.
(3): Third, we design two attention mechanisms for the effective fusion of learned information. To adaptively fuse the encoded contextual features from the drug neighbor nodes and the side-effect nodes for each target node, we design the attention at the neighbor category level. Since two heterogeneous graphs make different contributions to drug-related side-effect prediction, we design an attention from the graph perspective to discriminate their contributions.
(4): Finally, we propose a capsule network-based strategy to learn the attributes of a pair of drug and side-effect nodes. The created multiple capsules and the dynamic routing mechanism enhance position information learning in the pairwise attribute embedding. Previous approaches did not integrate the information of the positions in the pairwise embedding. A comprehensive comparison with six state-of-the-art methods and case studies on five drugs showed TCSD’s superior performance and its ability in discovering potential association candidates.

2. Materials and Methods

The new prediction model TCSD is presented in Figure 1. It integrates the multi-modality similarities of medications and side-effects, neighbor context encoding, and pairwise feature representation to predict drug-related potential side-effects. First, two drug–side-effect heterogeneous graphs were created based on the associations between drugs and side-effects as well as the multi-modality similarities (Figure 1a). Afterwards, to learn the neighbor context encoding of the target node, we built a transformer-based context encoding (CET) module using a neighbor node category-level and a graph-level attention mechanism (Figure 1b) with detailed structures, as shown in Figure 2. In parallel, a capsule network-based acquisition pairwise multi-view feature (MVF) learning module (Figure 1c) was used to learn the feature map of a pair of drug–side-effect nodes.

2.1. Dataset

Public databases [28,29] and papers [26,30] addressing drug-side-effect associations, side-effect similarities, drug chemical substructure similarities, and drug functional similarities were used to gather data on drugs and side-effects. Initially, 80,164 pairs of drug and side-effect associations were retrieved from the SIDER databank [28]. We obtained the chemical substructural similarities from the comparative toxicogenomics database [29], which includes the chemical substructures of 708 drugs. The disease-based drug similarities were obtained from a previous study [31]. These associations and similarities included 708 drugs, 4192 side-effects, and 5603 diseases.

2.2. Multi-Source Data Matrix Representation and Construction of Heterogeneous Graphs

2.2.1. Matrix Representation of Drug-Side-Effect Associations

We created an association matrix

A = A_{i, j} \in R^{N_{r} * N_{s}}

according to the discovered associations of the drug-side-effect node pairs. This matrix illustrates the relationships between

N_{r}

drugs and

N_{s}

side-effects. The drugs are represented by the rows of A and the adverse effects are represented by the columns. If a drug

r_{i}

and side-effect

s_{j}

are known to be associated, then

A_{i, j} = 1

. If not, then

A_{i, j} = 0

.

2.2.2. Matrix Representation of Multi-Modality Similarities of Drugs

When two drugs

r_{i}

and

r_{j}

are associated with a greater number of similar diseases, the functional similarity of the two drugs is usually greater. We, therefore, computed the functional similarity

D_{i, j}^{d i s}

between a pair of drug nodes

r_{i}

and

r_{j}

based on the diseases they are connected with, in accordance with the work of Wang et al. [31]. Similarly, a greater similarity in the chemical substructures of

r_{i}

and

r_{j}

indicates a greater similarity between the drugs themselves. Based on this biological premise,

D_{i, j}^{c h e}

was calculated based on Luo et al. using the cosine similarity to reflect the similarity of the drug chemical substructures [30]. Using the drug-related multi-source data, we obtained a multimodal similarity matrix

D^{ρ}

for the drug defined as

D^{ρ} = {\begin{matrix} D^{d i s} = D_{i, j}^{d i s} \in R^{N_{r} * N_{r}} \\ D^{c h e} = D_{i, j}^{c h e} \in R^{N_{r} * N_{r}} \end{matrix},

(1)

where

ρ = c h e

or

d i s

.

D_{i, j}^{ρ}

is used to denote the

ρ

th similarity of

r_{i}

and

r_{j}

. In addition,

D_{i, j}^{ρ} \in [0, 1]

. The value of

D_{i, j}^{ρ}

increases with the degree of resemblance between

r_{i}

and

r_{j}

.

2.2.3. Matrix Representation of Side-Effect Similarity

A greater number of similar drugs being associated with side-effects

s_{i}

and

s_{j}

indicates a greater similarity between

s_{i}

and

s_{j}

. We calculated the similarity matrix

S = S_{i, j} \in R^{N_{s} * N_{s}}

of all side-effects based on the approach adopted by Wang et al. [26]. With a number between 0 and 1,

s_{i, j}

indicates how similar side-effect

s_{i}

and side-effect

s_{j}

are to one another. The larger the similarity value, the higher the similarity between

s_{i}

and

s_{j}

.

2.2.4. Construction of Drug-Side-Effect Heterogeneous Graphs and Attribute Extraction

D^{c h e}

and

D^{d i s}

represent the similarities according to the chemical substructures of the two drugs and diseases that they are associated with, respectively. We created two drug-side-effect heterogeneous graphs relying on

D^{c h e}

and

D^{d i s}

, respectively, where

ρ = c h e

or

d i s

. The set of nodes

V = \{V^{r} \cup V^{s}\}

in each heterogeneous graph comprises the set of drug nodes

V^{r}

and the set of side-effect nodes

V^{s}

; an edge

e_{i, j}^{ρ} \in E^{ρ}

with a weight

w_{i, j}^{ρ} \in W^{ρ}

links a pair of nodes

v_{i}

,

v_{j}

. In general, several types of connecting edges can exist between drugs and side-effects, including a drug–side-effect association edge

e_{r s}

, a drug–drug similarity edge

e_{r r}

, and a side-effect–side-effect similarity edge

e_{s s}

.

W^{ρ}

contains the association matrix A and similarity matrices S and

D^{ρ}

. The adjacency matrix of the

ρ

th heterogeneous graph is expressed as

I^{ρ}

,

I^{ρ} = [\begin{matrix} D^{ρ} & A \\ A^{T} & S \end{matrix}] \in R^{N_{t o t a l} * N_{t o t a l}},

(2)

where the total number of nodes is

N_{t o t a l} = N_{r} + N_{s}

and

A^{T}

denotes the transpose of the matrix A. The i-th row in the matrix

I^{ρ}

denotes the association and similarity of the node

v_{i}

with all of the drugs and side-effects, which are considered as node attributes of

v_{i}

. The attribute vector

x_{i}^{ρ}

of the drug

r_{i}

is defined as

x_{i}^{ρ} = [D_{i,}^{ρ} ‖ A_{i,}] \in R^{N_{t o t a l}},

(3)

where

ρ = c h e

or

d i s

, and ‖ indicates the operation of the first and last link. The i-th row of the matrix A, where each side-effect’s association with

r_{i}

is recorded, is designated by the symbol

A_{i,}

.

D_{i,}^{c h e} (D_{i,}^{d i s})

is the row i of the matrix

D^{c h e} (D^{d i s})

containing the chemical substructural (functional) similarities with all drugs.

Similarly, the attribute vector of the side-effect

s_{j}

is represented as

y_{j}

,

y_{j} = [A_{, j} ‖ S_{, j}] \in R^{N_{t o t a l}},

(4)

where

A_{, j} (S_{, j})

denotes the connection with the association (similarity) of

s_{j}

and all drugs (side-effects). The feature embedding matrix

Z^{ρ}

of the node pairs

r_{i}

and

s_{j}

is defined as

Z^{ρ} = [\begin{matrix} x_{i}^{ρ} \\ y_{j} \end{matrix}] = [\begin{matrix} D_{i, α}^{ρ} & A_{i, α} \\ A_{α, j} & S_{α, j} \end{matrix}] \in R^{2 * N_{t o t a l}},

(5)

where

2 * N_{t o t a l}

is the dimension of

Z^{ρ}

.

2.3. Context Representation Learning Based on Transformer with Attention

The target node attributes are contextually linked to the attributes of the neighbors of each category in their neighborhood. In order to learn the context representations of the nodes, we designed the CET module based on a graph-level attention mechanism to aggregate information regarding its neighbor nodes. As each heterogeneous graph has a unique topology, we used a graph transformer (GT) module (Figure 2) for

G^{c h e}

and

G^{d i s}

. The semantic information of the similarity or association connection edges between the neighbor node and target node was used to learn the corresponding neighborhood context representation. The module comprised

l_{e}

coding levels; layer l can serve as an illustration of how the context is learned. The CET module’s drug node and side-effect node learning processes were similar; an example is described for drug

r_{i}

.

2.3.1. Neighborhood Node Set Extraction

Based on the similarity between the drug

r_{i}

and all drugs, we obtained the top

N_{t}

most similar neighbors to

r_{i}

. If

N_{t} = 4

, let

r_{i}

,

r_{a}

,

r_{b}

, and

r_{c}

be the four top neighbor nodes, and their attribute vectors be

x_{i}^{ρ}

,

x_{a}^{ρ}

,

x_{b}^{ρ}

, and

x_{c}^{ρ}

, respectively. The set of attribute vectors of the drug neighbor nodes of

r_{i}

is denoted as

S_{r_{i}, r}

,

S_{r_{i}, r} = \{x_{i}^{ρ}, x_{a}^{ρ}, x_{b}^{ρ}, x_{c}^{ρ}\} .

(6)

Similarly, we can obtain all of the

N_{k}

side-effect neighbor nodes associated with

r_{i}

. When

N_{k} = 3

, the

N_{k}

side-effect neighbors of

r_{i}

are

s_{a}

,

s_{b}

, and

s_{c}

, with

y_{a}

,

y_{b}

, and

y_{c}

being their attribute vectors, respectively. Thus, the set of attribute vectors of the side-effect neighbor nodes of

r_{i}

is represented as

S_{r_{i}, s}

,

S_{r_{i}, s} = \{y_{a}, y_{b}, y_{c}\} .

(7)

2.3.2. Node Attribute Conversion

S_{r_{i}, r} = \{x_{i}^{ρ}, x_{m}^{ρ}, m = a, b, c\}

is the set of drug-like neighbor node attribute vectors for

r_{i}

. Inspired by Transformer, we mapped the attribute vector

x_{i}^{ρ}

of

r_{i}

to a query vector space and

S_{r_{i}, r}

to a key vector space and value vector space. To reduce the bias in the contextual semantic learning process, we established a multi-headed attention mechanism. In the t-th attention head, because each drug-like neighbor contributes differently to

r_{i}

, we employed a neighbor node-level attention mechanism to calculate the attention weights of

r_{i}

for each neighbor. The output query vectors of the layer 1 and layer l coding layers are

q_{t}^{ρ, 1} (r_{i}) \in R^{n}

and

q_{t}^{^{ρ, l}} (r_{i}) \in R^{n}

, respectively.

q_{t}^{ρ, 1} (r_{i}) \in R^{n}

and

q_{t}^{^{ρ, l}} (r_{i}) \in R^{n}

are calculated as follows,

q_{t}^{ρ, 1} (r_{i}) = W_{t, Q}^{1} \cdot x_{i}^{ρ}

(8)

q_{t}^{ρ, l} (r_{i}) = W_{t, Q}^{l} \cdot c^{ρ, l - 1} (r_{i}), l = 2, \dots, l_{e}

(9)

where

W_{t, Q}^{1} \in R^{n * N_{t o t a l}}

and

W_{t, Q}^{l} \in R^{n * N_{t o t a l}}

are the weight matrices of layer 1 and layer l, respectively.

c_{i}^{ρ, l - 1}

is the vector of the encoded information of

r_{i}

obtained in layer

l - 1

;

l_{e}

is the number of layers of the encoding layer. We calculate the key matrix

K_{t}^{ρ, l} \in R^{4 * n}

and value matrix

V_{t}^{ρ, l} \in R^{4 * n}

for

r_{i}

as follows:

K_{t}^{ρ, l} = W_{t, K}^{l} {[c_{i}^{ρ, l - 1} ‖ c_{m}^{ρ, l - 1}]}^{T}, l = 1, 2, \dots, l_{e}

(10)

V_{t}^{ρ, l} = W_{t, V}^{l} {[c_{i}^{ρ, l - 1} ‖ c_{m}^{ρ, l - 1}]}^{T}, l = 1, 2, \dots, l_{e}

(11)

where

W_{t, K}^{l}

and

W_{t, V}^{l}

are the weight matrices. ‖ represents the splicing between two vectors.

c_{i}^{ρ, l - 1}

and

c_{m}^{ρ, l - 1}

are the results of the layer

l - 1

encoding of

r_{i}

and its neighbors, respectively, and

c_{i}^{ρ, 0}

and

c_{m}^{ρ, 0}

are their attribute vectors

x_{i}^{ρ}

and

x_{m}^{ρ}

, respectively.

2.3.3. Contextual Encoding of Nodes of the Same Type

All of the drug-type neighbor nodes of drug

r_{i}

form the set

\{r_{i}, r_{m}, m = a, b, c\}

, and a contextual connection exist between the node properties of

r_{i}

and the properties of these neighbor nodes. Therefore, we must gather information about the neighbors of

r_{i}

to update the attribute vector of

r_{i}

. We calculate the attention score of

r_{v}

to

r_{i}

as

α_{t}^{ρ, l} (r_{i}, r_{v})

,

α_{t}^{ρ, l} (r_{i}, r_{v}) = K_{t}^{ρ, l} W_{t, D}^{l} \cdot q_{t}^{ρ, l} {(r_{i})}^{T},

(12)

where

v = i, a, b

or c.

W_{t, D}^{l} \in R^{n * n}

is a weight matrix specific to the drug-like neighbor nodes of

r_{i}

for fusing the corresponding semantic information for each connection (similarity connection or association connection). Then, for the neighborhood nodes

r_{i}

,

r_{a}

,

r_{b}

, and

r_{c}

of

r_{i}

, and the obtained

α_{t}^{ρ, l} (r_{i}, r_{i})

,

α_{t}^{ρ, l} (r_{i}, r_{a})

,

α_{t}^{ρ, l} (r_{i}, r_{b})

, and

α_{t}^{ρ, l} (r_{i}, r_{c})

, the normalized attention weight is obtained as

γ_{t, v}^{ρ, l}

,

γ_{t, v}^{ρ, l} = \frac{exp (α_{t}^{ρ, l} (r_{i}, r_{v}))}{\sum_{j \in \{i, a, b, c\}} exp (α_{t}^{ρ, l} (r_{i}, r_{j}))},

(13)

where exp is an exponential function. The drug-like neighbor encoding information

y_{t, e_{r r}}^{ρ, l} (r_{i})

of

r_{i}

can be represented as,

y_{t, e_{r r}}^{ρ, l} (r_{i}) = \sum_{v \in \{i, a, b, c\}} γ_{t, v}^{ρ, l} V_{t}^{ρ, l} (r_{v}),

(14)

where

y_{t, e_{r r}}^{ρ, l} (r_{i}) \in R^{n}

. Finally, the context encoding

y_{e_{r r}}^{l} [r_{i}] \in R^{n T}

at the drug neighbor node level of

r_{i}

is defined as,

c^{ρ, l - 1} (r_{i}) = y_{e_{r r}}^{ρ, l} (r_{i}) = ‖_{t = 1}^{T} y_{t, e_{r r}}^{ρ, l} (r_{i}),

(15)

where ‖ denotes the first and last join of the T-head attention encoding vector. Similarly, for the set

\{s_{a}, s_{b}, s_{c}\}

of the side-effect neighbor nodes of

r_{i}

, we can obtain the context encoding

y_{e_{r s}}^{ρ, l} (r_{i})

specific to that class of neighbor nodes.

2.3.4. Neighborhood Node Category-Level and Graph-Level Attention Mechanisms

Since the drug node

r_{i}

has two types of neighbor nodes, which are drug and side-effects, we learn the context encodings

y_{e_{r r}}^{ρ, l} (r_{i})

and

y_{e_{r s}}^{ρ, l} (r_{i})

of

r_{i}

, respectively. As

y_{e_{r r}}^{ρ, l} (r_{i})

and

y_{e_{r s}}^{ρ, l} (r_{i})

differ in their learning contributions to the final contextual representations of

r_{i}

, we propose a neighborhood node category-level attention mechanism. The attention score is obtained as,

s_{u, n e i}^{ρ, l} = h_{n e i}^{ρ, l} tanh (W_{u, n e i}^{ρ, l} y_{e_{r u}}^{ρ, l} (r_{i}) + b_{n e i}^{ρ, l}),

(16)

where

u \in \{r, s\}

,

W_{u, n e i}

is the weight matrix of the first-class neighbor nodes;

h_{n e i}^{ρ, l}

and

b_{n e i}^{ρ, l}

are the weight and bias vectors, respectively. The normalized attention score is calculated as

β_{r_{i}, u}^{ρ, l}

,

β_{r_{i}, u}^{ρ, l} = \frac{exp (s_{u, n e i}^{ρ, l})}{\sum_{j \in \{r, s\}} exp (s_{j, n e i}^{ρ, l})} .

(17)

The contextual encoding of

r_{i}

, as enhanced by the attention mechanism, is obtained as

Z_{c o n}^{ρ, l} (r_{i})

,

Z_{c o n}^{ρ, l} (r_{i}) = \sum_{u \in \{r, s\}} β_{r_{i}, u}^{ρ, l} y_{e_{r r}}^{ρ, l} (r_{i}),

(18)

where

Z_{c o n}^{ρ, l} (r_{i}) \in R^{n T}

. The encoding result

Z_{c o n}^{ρ, l_{e}} (r_{i}) \in R^{n_{f i n}}

obtained by the

l_{e}

-th layer GT contains contextual information regarding the two types of neighbor nodes of

r_{i}

in the heterogeneous graph

G^{ρ}

with the discriminative semantics of the connected edge; it is renamed as

Z^{ρ} (r_{i})

.

x_{i}^{ρ}

contains more detailed information and

Z^{ρ} (r_{i})

carries out learning to obtain the representative neighborhood contextual encoding. Therefore, we added the information from

x_{i}^{ρ}

to

Z^{ρ} (r_{i})

. Given the original attribute vector

x_{i}^{ρ}

of

r_{i}

, we first applied a linear projection

S - L i n e a r^{ρ}

to map it to the attribute space of

Z^{ρ} (r_{i})

. Then, we superimposed it with

Z^{ρ} (r_{i})

to obtain a complemented neighbor context encoding as

Z_{a d d} (r_{i})

,

Z_{a d d}^{ρ} (r_{i}) = S - L i n e a r^{ρ} (σ (x_{i}^{ρ})) + Z^{ρ} (r_{i}),

(19)

where

σ

is the

r e l u

activation function [32].

The heterogeneous graphs

G^{c h e}

and

G^{d i s}

were learned by the CET module to obtain the contextual encodings of

r_{i}

and

s_{j}

represented as

Z_{a d d}^{ρ} (r_{i})

and

Z_{a d d}^{ρ} (s_{j})

(

ρ = c h e

or

d i s

), respectively.

Z_{a d d}^{c h e} (r_{i}) (Z_{a d d}^{d i s} (r_{i}))

and

Z_{a d d}^{c h e} (s_{j}) (Z_{a d d}^{d i s} (s_{j}))

were stacked up and down to form

Z_{a d d}^{c h e} (r_{i} - s_{j}) \in R^{2 * n_{f i n}} (Z_{a d d}^{d i s} (r_{i} - s_{j}))

.

Z_{a d d}^{c h e} (r_{i} - s_{j})

and

Z_{a d d}^{d i s} (r_{i} - s_{j})

were fused by 1 × 1 convolution to form a contextual representation

Z_{f i n} (r_{i} - s_{j}) \in R^{2 * n_{f i n}}

of the node pair.

Z_{f i n} (r_{i})

and

Z_{f i n} (s_{j})

were spliced first and last, respectively, to form a feature map

Z_{i, j} \in R^{2 n_{f i n}}

of

r_{i} - s_{j}

node pair.

y_{C E T}

denotes the probability distribution of whether

r_{i}

and

s_{j}

are related,

y_{C E T} = s o f t m a x (W_{f} Z_{i, j} + b_{f}),

(20)

where

W_{f}

is the weight matrix and

b_{f}

is the bias vector.

y_{C E T} = (y_{_{C E T}}^{0}, y_{_{C E T}}^{1})

, where

y_{_{C E T}}^{0}

is the probability that the drug

r_{i}

and side-effect

s_{j}

are not associated and

y_{_{C E T}}^{1}

is the probability that they are associated.

2.4. Local Information Enrichment Strategy for Drug-Side-Effect Node Pair Feature Representation Learning Based on Capsule Networks

Given

Z^{ρ} \in R^{2 * N_{t o t a l}}

, which contains information regarding the similarity and association of

r_{i}

and

s_{j}

with all drugs and side-effects and contains

2 * N_{t o t a l}

elements, we built the MVF capsule network-based module to deeply integrate the characteristics of multiple elements at the same position from multiple views. These characteristics formed a capsule, and all newly created capsules passed through a routing mechanism to further evaluate the association scores of node pairs. The MVF module contained two convolutional layers and two capsule layers. The detailed architecture is given in Figure 3.

2.4.1. Establishment of Primary Capsule Embedding Based on Convolution Operation

The feature-embedding matrices of a node pair

r_{i}

and

s_{j}

in the heterogeneous graphs

G^{c h e}

and

G^{d i s}

are

Z^{c h e}

and

Z^{d i s}

, respectively.

Z^{c h e}

and

Z^{d i s}

were stacked up and down to form the node pair feature-embedding matrix

Z \in R^{2 * 2 * N_{t o t a l}}

of

r_{i}

and

s_{j}

. Z was fed to the convolution module to form the embedding of the primary capsule network. The convolution module contained one layer of single-group convolutional layers and one layer of multi-group convolutional layers. In the first convolutional layer, we applied a one-round zero-fill operation on Z to create a new matrix

\overset{\land}{Z}

for learning the edge information.

l_{f}

and

w_{f}

were the length and width of the filter, respectively. If the number of filters was

n_{f}

, the filter

W_{c o n v 1} \in R^{l_{f} * w_{f} * n_{f}}

was applied to the matrix

\overset{\land}{Z}

and the feature map

Z_{c o n v 1} \in R^{n_{f} * (4 - w_{f} + 1) * (2 + N_{t o t a l} - l_{f} + 1)}

is obtained as,

\begin{matrix} Z_{c o n v 1, k} (i, j) = f (W_{c o n v 1} (k, :, :) * {\overset{\land}{Z}}_{k, i, j} + b_{c o n v 1} (k)), \\ i \in [1, 4 - w_{f} + 1], j \in [1, 2 + N_{t o t a l} - l_{f} + 1], k \in [1, n_{f}] \end{matrix}

(21)

where f is the

r e l u

activation function [32] and

b_{c o n v 1}

is the bias vector.

Z_{c o n v 1, k} (i, j)

is the element of the i-th row and j-th column of the k-th feature map

Z_{c o n v 1, k}

.

\overset{\land}{Z} (i, j)

is the element of the matrix

\overset{\land}{Z}

in row i column j. When the k-th filter slides to position

\overset{\land}{Z} (i, j)

, the region inside the filter is

{\overset{\land}{Z}}_{k, i, j}

, which can be calculated as,

{\overset{\land}{Z}}_{k, i, j} = \overset{\land}{Z} (i : i + w_{f}, j : j + l_{f}), {\overset{\land}{Z}}_{k, i, j} \in R^{w_{f} * l_{f}} .

(22)

We build the w-group convolution in the second layer. Each group of convolution can be considered as a view of the feature map, and the attributes of the node pairs can be learned from multiple views. The filter size in each set of convolutions was

W_{c o n v 2} \in R^{2 * 2}

, and

Z_{c o n v 1}

was fed to the second convolutional layer to form

Z_{c o n v 2}^{w} \in R^{w * 2 * N_{t o t a l}}

.

2.4.2. Creation of the Primary Capsule Layer

We encapsulated the value

Z_{c o n v 2}^{1} (p), Z_{c o n v 2}^{2} (p), \dots, Z_{c o n v 2}^{w} (p)

of the p-th

(p = 1, 2, \dots, 2 * N_{t o t a l})

position on the w feature maps

Z_{c o n v 2}^{1}

,

Z_{c o n v 2}^{2}

, …,

Z_{c o n v 2}^{w}

into a capsule to form

u_{p} \in R^{w}

. This capsule contained information regarding multiple views in the local area when the filter was slid into the p-th position of the feature map

Z_{c o n v 1}

.The primary capsule layer contained

[2 * N_{t o t a l}]

capsules of w-dimensional vectors.

2.4.3. Design of Capsule Layer Routing Mechanism

We used primary and digital capsule layers to build the MVF module. The digital capsule layer consisted of

n_{q n}

n_{q d}

-dimensional prediction capsules

v_{q} (q = 1, 2, \dots, n_{q n})

; all of these capsules received input from all of the primary capsules

u_{p} (p = 1, 2, \dots, 2 * N_{t o t a l})

of the previous layer. We implemented the delivery of location information from the primary capsule layer to the digital capsule layer by means of weights determined by the routing mechanism. First,

u_{p}

was used to determine the correlation between the two layers by multiplying by the weight matrix

W_{p q}

to obtain the vector as

{\hat{u}}_{q | p} \in R^{n_{p d}}

,

{\hat{u}}_{q | p} = W_{p q} u_{p} .

(23)

{\hat{u}}_{q | p}

was fed into the prediction capsule

v_{q}

based on the coupling coefficients

c_{p q}

as determined by the dynamic routing process, which were proportional to the weights of the features. We performed a dynamic routing process

n_{d r}

times to compute

c_{p q}

. We first initialized the weight

b_{p q} = 0

between capsule p and capsule q. Next, the coupling coefficient

c_{p q}

was obtained by normalizing the weights

b_{p q}

with

S o f t m a x

and the output vector

o_{q}

was generated by weighted summation;

c_{p q}

and

o_{q}

are represented as,

c_{p q} = \frac{exp (b_{p q})}{\sum_{k \in \{1, 2, \dots, n_{p n}\}} exp (b_{p k})}

(24)

o_{q} = \sum_{p} c_{p q} {\hat{u}}_{q | p}

(25)

The modulus lengths of

o_{q 1}

and

o_{q n_{p n}}

were used as the uncorrelated and correlated fractions between

r_{i}

and

s_{j}

, respectively.

o_{q}

was employed after a nonlinear compression function to produce an output capsule

v_{q}

as,

v_{q} = \frac{‖ o_{q} ‖^{2}}{1 + ‖ o_{q} ‖^{2}} \cdot \frac{o_{q}}{‖ o_{q} ‖},

(26)

where the value of the modulus length

v_{q}

is between 0 and 1. The update rules for

b_{p q}

are as follows:

b_{p q} \leftarrow b_{p q} + {\hat{u}}_{q | p} ⊙ v_{q},

(27)

where ⊙ denotes the dot product operation of two vectors. The routing mechanism is completed once after updating

b_{p q}

. After

n_{d r}

updates, the coupling coefficients

c_{p q}

are finally determined and the final prediction capsules

v_{q}^{f i n}

are formed. The modulus length of each vector is passed through the

S o f t m a x

layer to obtain the associated probability distribution

y_{N M F}^{q}

as,

y_{M V F}^{q} = \frac{exp (‖ v_{q} ‖)}{\sum_{k \in \{1, 2, \dots, n_{p n}\}} exp (‖ v_{k} ‖)} .

(28)

The prediction scores were evaluated based on the modulus length and the scores

y_{M V F} = [y_{M V F}^{1}, y_{M V F}^{n_{p n}}]

were associated with probability distributions, including the probabilities that the drug-side-effect node pair was not associated and that they were associated.

2.5. Final Integration and Optimization

The cross-entropy between the true label z and predicted association probability

y_{C E T}

was defined as the loss function when the prediction is based on the node neighbor context encoding, as follows,

L O S S_{C E T} = - \sum_{i = 1}^{N_{t r a i n}} \sum_{j = 1}^{c} z_{i} log (y_{C E T}, j),

(29)

where

N_{t r a i n}

is the number of training sample sets. The predicted results are classified as relevant and irrelevant

(c = 2)

. The true label

z_{i} = 1 (z_{i} = 0)

represents the true correlation (uncorrelated) between all drugs and side-effects. In the MVF module, the cross-entropy-based loss

L O S S_{M V F}

is defined as,

L O S S_{M V F} = - \sum_{i = 1}^{N_{t r a i n}} \sum_{j = 1}^{c} z_{i} log (y_{M V F}, j) .

(30)

We used the Adam algorithm [33] to optimize the loss functions

L O S S_{C E T}

and

L O S S_{M V F}

. Finally, a weighted sum of

y_{C E T}

and

y_{M V F}

was calculated to obtain the final predicted association score as y,

y = γ \times y_{C E T} + (1 - γ) y_{M V F},

(31)

where

γ \in (0, 1)

is a hyperparameter for adjusting the two knowledge contributions.

3. Experimental Evaluations and Discussion

3.1. Parameter Settings and Evaluation Metrics

TCSD was implemented in the

P y t o r c h

framework using a graphics processing unit (Nvidia GeForce GTX 2080Ti). For the CET module, the number of neighbor nodes per class

N_{t} = N_{k} = 10

, the number of coding layers

l_{e} = 2

, and the number of heads for the multi-headed attention was set as 8. The two encoding layers’ output feature dimensionalities were 2400 and 2000. In the MVF module, the first convolutional layer included 64 filters, while the second layer had

w = 8

groups of convolutions, the number of filters was 512, and the size of all the filter kernels was set to

2 \times 2

. The numbers of capsules in the initial and digital capsule layers were 4900 and 2, respectively. The dimensionality of each digital capsule was set to 32 and the number of routing mechanism iterations

n_{d r} = 3

. The parameter

γ

at final fusion was set to 0.3.

Each prediction model’s effectiveness was evaluated using five-fold cross-validation. The positive case samples were those where the drug-side-effect associations were known and the negative case samples were the unobserved associations. As a result, we obtained 80,164 known associations betweeen drug and side-effect and 2,887,772 unknown associations. All positive case samples were divided at random into five equal parts: four of each multiple were used to train the prediction model, whereas the rest of the positive case sample set was used for testing. Randomly chosen counterexamples were used for testing, with the remaining counterexamples being used for training an array of counterexamples equal to the amount of samples in the training set that were positive.

The evaluation metrics include the area under the receiver operating characteristic (ROC) curve (AUC) [33,34], the area under the precision-recall (PR) curve (AUPR) [35], and the maximum k recall. The ratio of known associations to unobserved associations was approximately 1:36; evidently, a significant category imbalance existed between them. Thus, the AUPR was also used to evaluate the predictive performance as being more informative than the AUC. We determined the top

k \in [30, 60, \dots, 240]

candidates’ recall rates as another measure of the model performance because biologists typically select drug-side-effect pairs from among these candidates and perform further relevant experiments.

3.2. Ablation Experiment

We conducted a series of ablation experiments to evaluate the contribution of the CET module, MVF module, and neighborhood node category-level attention mechanism (NCA) (Table 1). First, we removed the attention mechanism that was utilized to fuse the neighbor context encodings of multiple types of neighbor nodes for the target node. We performed vector summation to obtain the context representation of the target node. Next, we trained each of the two modules (CET and MVF) to obtain the contextual representation and the pairwise attributes. The attribute vectors of a pair of drug and side-effect nodes were concatenated and then went through a fully connected network to obtain the association score. The complete model with the CET module, MVF module, and NCA obtained the highest

A U C = 0.977

and

A U P R = 0.351

. In the absence of the CET module, the prediction performance decreased by 1.4% in the AUC and 14.2% in the AUPR compared to TCSD. In the absence of the rich local features obtained by the MVF module, the AUC decreased by 0.6% and the AUPR decreased by 9.7% relative to TCSD. Without the NCA, the contribution of the contextual encoding to improving the prediction performance was the largest; the main reason for this was that the Transformer-based encoding strategy can propagate the node properties between the drug and side-effect nodes, thereby learning the contextual information between nodes. The MVF module learns the second most important contribution of the node pair feature representation to the results and enriches the local information of the node pairs in the process of building capsules. Accordingly, the routing mechanism can better learn the importance of the capsules.

3.3. Comparison with Other Methods

The six most advanced approaches were compared to our model (TCSD) in order to anticipate the drug-side-effect associations: GCRS [26], idse-HE [27], SDPred [25], Galeaon’s method [21], random walk-signed heterogeneous information network (RW-SHIN) [19], Ding’s method [17] and feature-derived graph regularized matrix factorization (FGRMF) [20]. For a fair comparison, the hyperparameters of each model were set with the same parameters as suggested in each study. The training and testing time of TCSD and the compared methods are listed in the Supplementary Table S2.

For each drug, we calculated the corresponding AUC and AUPR in each multiple and then took the average value for the five-fold crossover as the final prediction result. The average values of the AUC and AUPR for 708 drugs were taken as the prediction performance of the entire method. As shown in Figure 4, TCSD obtained the highest AUC of 0.977, i.e., 0.9% and 2.0%, respectively, higher than idse-HE and GCRS, 3.1% and 3.2% better than SDPred and Ding’s method, respectively, 5.8% higher than FGRMF, 6.5% better than Galeaon’s method, and 8.5% higher than RW-SHIN, the worst-performing method. For the mean AUPR of all drugs, TCSD obtained the best mean AUPR value of 0.351, i.e., 7.9%, 12.5%, 16.0%, 17.2%, 22.0%, and 25.2% higher than the values from the above methods, respectively.

Idse-HE did not perform as well as our method—the possible reason is that it ignored the semantic information of the various connections in the heterogeneous graph. Our approach and GCRS both achieved good performance, primarily because we built multiple heterogeneous graphs and built an independent learning module for each heterogeneous graph. This suggests that separately learning the topological information specific to each heterogeneous graph is necessary for improving the prediction accuracy. SDPred, which is based on a multi-layer perceptron, and Ding’s method, which is based on central kernel-aligned multicore learning, both scored lower than GCRS. One possible reason for this is that both methods do not consider the topological structure in the drug-side-effect heterogeneous graphs. In addition, FGRMF and Galeaon’s method had similar AUC and AUPR values, with somewhat worse performance than the fourth-best, Ding’s method. One possible reason is that both are shallow prediction models constructed using matrix decomposition-based methods; these cannot dig deeper into the complex connections between drugs and side-effects. The performance of RW-SHIN was inferior to the other methods because it only builds a network of drug nodes without considering the topological information between side-effect nodes.

For the 708 AUCs (AUPRs) results for all prediction methods for the 708 drugs, we used 708 paired results for comparing TCSD with another method as calculated using pairs of Wilcoxon tests. With a p-value threshold of 0.05, the data demonstrated that TCSD significantly outperformed the other six approaches (Table 2).

For the top k drug candidates with side-effects, a higher recall indicates that more real drug and side-effect associations are included in these candidates. Our TCSD model consistently outperformed other methods at different k thresholds and ranked 50.3% of the positive cases in the top 30 candidates, 65.4% in the top 60, 73.0% in the top 90, and 78.1% in the top 120. GCRS has higher recall rates than idse-HE for the top 30 and 60 candidates. The former ranked 47.0% and 59.6% positive samples, while the latter ranked 42.1% and 58.1%, respectively. Idse-HE achieved slightly higher recall rates than GCRS for the top 90, 120, and 240 candidates. Idse-HE ranked 67.1% and 73.9% for the top 90 and 120 candidates, while GCRS ranked 66.8% and 71.9% (Figure 5). The AUC value of GCRS was very close to that of SDPred, but all of the recall rates of GCRS were higher than those of SDPred. When k was increased from 30 to 120, the SDPred ranked 41.8%, 54.9%, 62.3% and 67.4%, respectively. Ding’s method was not as good as SDPred, with corresponding recall rates of 35.5%, 48.2%, 56.3%, and 62.2%, respectively. The recall rates of FGRMF (32.8%, 45.2%, 52.5%, 58.1%) were slightly higher than those of Galeaon’s method (32.3%, 43.6%, 51.7%, 56.8%). The lowest recall rates were obtained by the RW-SHIN method with recall rates of 23.7%, 34.3%, 41.3% and 47.2%, respectively.

3.4. Case Studies on Five Drugs

According to the world mental health report in 2022, nearly one billion people across the World suffered from mental diseases. Therefore, to further demonstrate TCSD’s ability to predict drug-side-effect associations, we analyzed five psychotropic drugs, including Amitriptyline, Olanzapine, Clozapine, Aripiprazole, and Asenapine. First, using the model, we were able to obtain association scores for each drug candidate side-effect and ranked them accordingly. Then, the top 15 potential side-effects for each drug were compiled and analyzed. The results are listed in Table 3, Table 4, Table 5, Table 6 and Table 7.

MetaADEDB is a comprehensive repository of clinically reported adverse drug events (ADEs) containing 744,709 associations between 8498 drugs and 13,193 ADEs [38]. Rxlist is a searchable database of more than 5000 drugs that have appeared in physician articles and authoritative websites, such as U.S. Food and Drug Administration (FDA)-related side-effects, drug safety issues, and other bases of prescribing information [39]. Drug Central collects information on the structure, pharmacological effects, and indications of active drug ingredients approved by the FDA and other regulatory agencies, as well as on ADEs [40]. SIDER is a database of marketed drugs and their adverse reaction records, covering 5868 side-effects and 139,756 pairs of associations between 1430 drugs [28]. As shown in Table 3, 12 candidates are supported by Drug Central, 14 are included in MetaADEDB, and the Rxlist and SIDER databases also contain 14 candidates, respectively. Table 4 lists the candidates of the drug Olanzapine, and 12, 12, 15, and 15 candidates are recorded in the databases Drug Central, MetaADEDB, Rxlist, and SIDER, respectively. In addition, the constipation and vomiting of patients after they have taken the drug was confirmed by the literature [36]. We labeled these two candidates with “Literature” and added them in Table 4. As shown in Table 5 and Table 6, in terms of the drugs Clozapine and Aripiprazole, each of these two drugs has 13 candidates in Drug Central. There are 12 candidates and 15 in MetaADEDB, while Rexlist contains 12 candidates, and SIDER includes 13 candidates. In addition, dizziness and blurred vision appeared with high chance after the drug was used over 3 months [37]. The side-effect “Blurred vision” was labeled with “Literature” in Table 5. Similarly, the drug has 2, 7, 12, and 10 candidates in the four databases, respectively. Thus, TCSD has the ability to identify potential drug-related side-effect candidates. It can screen reliable candidates for biologists to undertake subsequent wet-experiment studies to determine the actual associations.

3.5. Predicting Novel Drug-Related Side-Effects

After we verified the predictive performance of the TCSD model, our model was utilized to predict candidate side-effects for 708 drugs, which included the drugs belonging to the antitumor, digestive, psychiatric, and nutritional categories. Biologists usually select the top-ranked candidate side-effects for biological experiments to determine the actual drug-related side-effects. We list the top 30 candidate side-effects for each of 708 drugs in the Supplementary Table S1.

4. Conclusions

We presented a model (TCSD), which deeply integrates the similarity and association connections with diverse semantics within multiple heterogeneous graphs for inferring potential drug-side-effect association candidates. Two constructed drug-side-effect heterogeneous graphs were beneficial for formulating their specific neighbor context encoding based on a graph-sensitive transformer. The graph-sensitive transformer also integrated the discriminative semantics from the different types of connections between a target node and its multiple kinds of neighbor nodes. A multi-layer capsule network-based module was established to capture the multi-view attribute information for each drug-side-effect node pair. Two attention mechanisms were designed to produce the more important neighbor categories and heterogeneous graph information was used to derive higher weights. The cross-validation results demonstrated TCSD’s improved prediction performance, including greater AUC and AUPR, and higher recall rates for the top-ranked candidates than the other six comparison methods. In addition, the case studies on Amitriptyline, Olanzapine, Clozapine, Aripiprazole, and Asenapine also showed TCSD’s ability in retrieving potential candidate drug-related side-effects. TCSD inferred the candidate side-effects for 708 drugs.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/molecules28186544/s1, Tablse S1: The top 30 candidate side-effects for each of 708 drugs; Table S2: The training and testing time of TCSD and the compared methods.

Author Contributions

P.X.: designed the method and participated in manuscript writing; P.L.: designed the experiments and edited the manuscript; H.C.: participated in method design and manuscript writing; M.W.: participated in method design; T.N.: participated in experiment design; T.Z.: participated in method design and manuscript writing. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the Natural Science Foundation of China (61972135, 62172143, 62372282); STU Scientific Research Initiation Grant (NTF22032); and the Natural Science Foundation of Heilongjiang Province (LH2023F044).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Cakir, A.; Tuncer, M.; Taymaz-Nikerel, H.; Ulucan, O. Side-effect prediction based on drug-induced gene expression profiles and random forest with iterative feature selection. Pharmacogenom. J. 2021, 21, 673–681. [Google Scholar] [CrossRef]
Zhang, F.; Sun, B.; Diao, X.; Zhao, W.; Shu, T. Prediction of adverse drug reactions based on knowledge graph embedding. BMC Med. Lnformatics Decis. Mak. 2021, 21, 38. [Google Scholar] [CrossRef]
Sachdev, K.; Gupta, M.K. A comprehensive review of feature based methods for drug target interaction prediction. J. Biomed. Inform. 2019, 93, 103159. [Google Scholar] [CrossRef]
Jiang, H.; Qiu, Y.; Hou, W.; Cheng, X.; Yim, M.; Ching, W. Drug Side-Effect Profiles Prediction: From Empirical to Structural Risk Minimization. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 17, 402–410. [Google Scholar] [PubMed]
Li, J.; Zheng, S.; Chen, B.; Butte, A.J.; Swamidass, S.J.; Lu, Z. A survey of current trends in computational drug repositioning. Briefings Bioinform. 2016, 17, 2–12. [Google Scholar] [CrossRef] [PubMed]
dos Santos Nascimento, I.J.; da Silva Rodrigues, É.E.; da Silva, M.F.; de Araújo-Júnior, J.X.; de Moura, R.O. Advances in Computational Methods to Discover New NS2B-NS3 Inhibitors Useful Against Dengue and Zika Viruses. Curr. Top. Med. Chem. 2022, 22, 2435–2462. [Google Scholar] [CrossRef] [PubMed]
Nascimento., I.J.d.S.; de Aquino, T.M.; da Silva-Júnior, E.F. The New Era of Drug Discovery: The Power of Computer-aided Drug Design (CADD). Lett. Drug Des. Discov. 2022, 19, 951–955. [Google Scholar] [CrossRef]
Seo, S.; Lee, T.; Kim, M.h.; Yoon, Y. Prediction of side-effects Using Comprehensive Similarity Measures. BioMed Res. Int. 2020, 2020, 1357630. [Google Scholar] [CrossRef]
Zheng, Y.; Peng, H.; Ghosh, S.; Lan, C.; Li, J. Inverse similarity and reliable negative samples for drug side-effect prediction. BMC Bioinform. 2019, 17, 554. [Google Scholar] [CrossRef]
Lee, W.P.; Huang, J.Y.; Chang, H.H.; Lee, K.T.; Lai, C.T. Predicting Drug side-effects Using Data Analytics and the Integration of Multiple Data Sources. IEEE Access 2017, 5, 20449–20462. [Google Scholar] [CrossRef]
Yang, L.; Chen, J.; He, L. Harvesting Candidate Genes Responsible for Serious Adverse Drug Reactions from a Chemical-Protein Interactome. PLoS Comput. Biol. 2009, 5, e1000441. [Google Scholar] [CrossRef]
Luo, H.; Chen, J.; Shi, L.; Mikailov, M.; Zhu, H.; Wang, K.; He, L.; Yang, L. DRAR-CPI: A server for identifying drug repositioning potential and adverse drug reactions via the chemical-protein interactome. Nucleic Acids Res. 2011, 39, W492–W498. [Google Scholar] [CrossRef] [PubMed]
Bongini, P.; Scarselli, F.; Bianchini, M.; Dimitri, G.M.; Pancino, N.; Lio, P. Modular Multi-Source Prediction of Drug Side-Effects With DruGNN. IEEE-ACM Trans. Comput. Biol. Bioinform. 2023, 20, 1211–1220. [Google Scholar] [CrossRef] [PubMed]
Mizutani, S.; Pauwels, E.; Stoven, V.; Goto, S.; Yamanishi, Y. Relating drug-protein interaction network with drug side-effects. Bioinformatics 2012, 28, I522–I528. [Google Scholar] [CrossRef]
Liu, M.; Wu, Y.; Chen, Y.; Sun, J.; Zhao, Z.; Chen, X.w.; Matheny, M.E.; Xu, H. Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs. J. Am. Med. Inform. Assoc. 2012, 19, E28–E35. [Google Scholar] [CrossRef]
Zhang, W.; Liu, F.; Luo, L.; Zhang, J. Predicting drug side-effects by multi-label learning and ensemble learning. BMC Bioinform. 2015, 16, 365. [Google Scholar] [CrossRef] [PubMed]
Ding, Y.; Tang, J.; Guo, F. Identification of drug-side-effect association via multiple information integration with centered kernel alignment. Neurocomputing 2019, 325, 211–224. [Google Scholar] [CrossRef]
Xian, Z.; Lei, C.; Jing, L. A similarity-based method for prediction of drug side-effects with heterogeneous information. Math. Biosci. 2018, 306, 136–144. [Google Scholar]
Hu, B.; Wang, H.; Yu, Z. Drug Side-Effect Prediction Via Random Walk on the Signed Heterogeneous Drug Network. Molecules 2019, 24, 3668. [Google Scholar] [CrossRef]
Zhang, W.; Liu, X.; Chen, Y.; Wu, W.; Wang, W.; Li, X. Feature-derived graph regularized matrix factorization for predicting drug side-effects. Neurocomputing 2018, 287, 154–162. [Google Scholar] [CrossRef]
Galeano, D.; Li, S.; Gerstein, M.; Paccanaro, A. Predicting the frequencies of drug side-effects. Nat. Commun. 2020, 11, 4575. [Google Scholar] [CrossRef] [PubMed]
Guo, X.; Zhou, W.; Yu, Y.; Ding, Y.; Tang, J.; Guo, F. A Novel Triple Matrix Factorization Method for Detecting Drug-Side Effect Association Based on Kernel Target Alignment. Biomed Res. Int. 2020, 2020, 4675395. [Google Scholar] [CrossRef] [PubMed]
Mohsen, A.; Tripathi, L.P.; Mizuguchi, K. Deep Learning Prediction of Adverse Drug Reactions in Drug Discovery Using Open TG–GATEs and FAERS Databases. Front. Drug Discov. 2021, 1, 768792. [Google Scholar] [CrossRef]
Zhao, H.; Zheng, K.; Li, Y.; Wang, J. A novel graph attention model for predicting frequencies of drug-side effects from multi-view data. Briefings Bioinform. 2021, 22, bbab239. [Google Scholar] [CrossRef]
Zhao, H.; Wang, S.; Zheng, K.; Zhao, Q.; Zhu, F.; Wang, J. A similarity-based deep learning approach for determining the frequencies of drug side-effects. Briefings Bioinform. 2022, 23, bbab449. [Google Scholar] [CrossRef]
Xuan, P.; Wang, M.; Liu, Y.; Wang, D.; Zhang, T.; Nakaguchi, T. Integrating specific and common topologies of heterogeneous graphs and pairwise attributes for drug-related side-effect prediction. Briefings Bioinform. 2022, 23, bbac126. [Google Scholar] [CrossRef]
Yu, L.; Cheng, M.; Qiu, W.; Xiao, X.; Lin, W. idse-HE: Hybrid embedding graph neural network for drug side-effects prediction. J. Biomed. Inform. 2022, 131, 104098. [Google Scholar] [CrossRef]
Kuhn, M.; Letunic, I.; Jensen, L.J.; Bork, P. The SIDER database of drugs and side-effects. Nucleic Acids Res. 2016, 44, D1075–D1079. [Google Scholar] [CrossRef]
Davis, A.P.; Grondin, C.J.; Johnson, R.J.; Sciaky, D.; Wiegers, J.; Wiegers, T.C.; Mattingly, C.J. Comparative Toxicogenomics Database (CTD): Update 2021. Nucleic Acids Res. 2021, 49, D1138–D1143. [Google Scholar] [CrossRef]
Luo, Y.; Zhao, X.; Zhou, J.; Yang, J.; Zhang, Y.; Kuang, W.; Peng, J.; Chen, L.; Zeng, J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 2017, 8, 573. [Google Scholar] [CrossRef]
Wang, D.; Wang, J.; Lu, M.; Song, F.; Cui, Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics 2010, 26, 1644–1650. [Google Scholar] [CrossRef] [PubMed]
Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the Conference on Machine Learning 2010, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Hajian-Tilaki, K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Casp. J. Intern. Med. 2013, 4, 627–635. [Google Scholar]
Ling, C.X.; Huang, J.; Zhang, H. AUC: A better measure than accuracy in comparing learning algorithms. In Proceedings of the Conference of the Canadian Society for Computational Studies of Inteligence 2003, Halifax, NS, Canada, 11–13 June 2003; pp. 329–341. [Google Scholar]
Saito, T.; Rehmsmeier, M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef] [PubMed]
Kanagali, S.N.; Patil, B.M.; Khanal, P.; Unger, B.S. Cyperus rotundus L. reverses the olanzapine-induced weight gain and metabolic changes-outcomes from network and experimental pharmacology. Comput. Biol. Med. 2022, 141, 105035. [Google Scholar] [CrossRef] [PubMed]
Iqbal, E.; Govind, R.; Romero, A.; Dzahini, O.; Broadbent, M.; Stewart, R.; Smith, T.; Kim, C.H.; Werbeloff, N.; MacCabe, J.H.; et al. The side-effect profile of Clozapine in real world data of three large mental health hospitals. PLoS ONE 2020, 15, e0243437. [Google Scholar] [CrossRef]
Yu, Z.; Wu, Z.; Li, W.; Liu, G.; Tang, Y. MetaADEDB 2.0: A comprehensive database on adverse drug events. Bioinformatics 2021, 37, 2221–2222. [Google Scholar] [CrossRef]
Steigerwalt, K. Online Drug Information Resources. Choice 2015, 52, 1601–1611. [Google Scholar] [CrossRef]
Avram, S.; Bologa, C.G.; Holmes, J.; Bocci, G.; Wilson, T.B.; Nguyen, D.T.; Curpan, R.; Halip, L.; Bora, A.; Yang, J.J.; et al. DrugCentral 2021 supports drug discovery and repositioning. Nucleic Acids Res. 2021, 49, D1160–D1169. [Google Scholar] [CrossRef]

Figure 1. Framework of the proposed TCSD prediction model. (a) Establish two drug-side-effect graphs according to two types of drug similarities and demonstrate their attribute matrices (b) Learn the context representations of the drug and side-effect nodes based on a graph transformer and two attentions (c) Construct the capsule network to learn the multi-view pairwise attributes.

Figure 2. Illustration of learning the context representation based on graph transformer for a drug node.

Figure 3. Explanation of learning pairwise multi-view features of drug-side-effect node pair with capsule networks.

Figure 4. ROC curves and PR curves of our method and the compared methods for drug-side-effect association prediction.

Figure 5. Recall rates of all the prediction methods at various top k values.

Table 1. Performance demonstration of the ablation experiments.

CET	MVF	NCA	Average AUC	Average AUPR
✕	✓	✕	0.963	0.209
✓	✕	✓	0.971	0.254
✓	✓	✕	0.976	0.298
✓	✓	✓	0.977	0.351

Table 2. Results of the Wilcoxon test by comparing TCSD and the other six methods.

	GCRS	idse-HE	SDPred	Ding’s Method	FGRMF	Galeaon’s Method	RW-SHIN
p-value of AUC	8.4303 × 10 $^{- 4}$	2.6327 × 10 $^{- 4}$	4.7184 × 10 $^{- 6}$	3.4493 × 10 $^{- 11}$	1.8906 × 10 $^{- 34}$	4.9532 × 10 $^{- 41}$	2.5631 × 10 $^{- 79}$
p-value of AUPR	2.6205 × 10 $^{- 5}$	1.3362 × 10 $^{- 5}$	5.3927 × 10 $^{- 6}$	4.6451 × 10 $^{- 14}$	2.2247 × 10 $^{- 26}$	3.7876 × 10 $^{- 37}$	4.8253 × 10 $^{- 54}$

Table 3. Top 15 candidate side-effects related to Amitriptyline.

Drug	Rank	Side-Effect	Evidence	Rank	Side-Effect	Evidence
	1	Edema	Drugcentral, MetaADEDB, SIDER	9	Diarrhea	Drugcentral, MetaADEDB, Rxlist, SIDER
	2	Nausea	MetaADEDB, Rxlist, SIDER	10	Hypotension	Drugcentral, MetaADEDB, Rxlist, SIDER
	3	Vomiting	Drugcentral, MetaADEDB, Rxlist, SIDER	11	Confusion	Drugcentral, Rxlist, SIDER
Amitriptyline	4	Rash	Drugcentral, MetaADEDB, Rxlist, SIDER	12	Leukopenia	Drugcentral, MetaADEDB, Rxlist, SIDER
	5	Dizziness	Drugcentral, MetaADEDB, Rxlist, SIDER	13	Constipation	Drugcentral, MetaADEDB, Rxlist, SIDER
	6	Blurred vision	Drugcentral, MetaADEDB, Rxlist	14	Paresthesia	Drugcentral, MetaADEDB, Rxlist, SIDER
	7	Anorexia	MetaADEDB, Rxlist, SIDER	15	Syncope	MetaADEDB, Rxlist, SIDER
	8	Headache	Drugcentral, MetaADEDB, Rxlist, SIDER

Table 4. Top 15 candidate side-effects related to Olanzapine.

Drug	Rank	Side-Effect	Evidence	Rank	Side-Effect	Evidence
	1	Edema	Drugcentral, MetaADEDB, Rxlist, SIDER	9	Paresthesia	Drugcentral, MetaADEDB, Rxlist, SIDER
	2	Vomiting	Rxlist, MetaADEDB, Rxlist, SIDER, Literature [36]	10	Dizziness	Drugcentral, MetaADEDB, Rxlist, SIDER
	3	Headache	Drugcentral, MetaADEDB, Rxlist, SIDER	11	Back pain	Drugcentral, MetaADEDB, Rxlist, SIDER
Olanzapine	4	Nausea	Drugcentral, MetaADEDB, Rxlist, SIDER	12	Pruritus	Drugcentral, MetaADEDB, Rxlist, SIDER
	5	Rash	Drugcentral, MetaADEDB, Rxlist, SIDER	13	Dry mouth	Rxlist, SIDER
	6	Confusion	Drugcentral, Rxlist, SIDER	14	Cough	Drugcentral, MetaADEDB, Rxlist, SIDER
	7	Diarrhea	Drugcentral, Rxlist, SIDER	15	Arthralgia	Drugcentral, MetaADEDB, Rxlist, SIDER
	8	Constipation	MetaADEDB, Rxlist, SIDER, Literature [36]

Table 5. Top 15 candidate side-effects related to Clozapine.

Drug	Rank	Side-Effect	Evidence	Rank	Side-Effect	Evidence
	1	Edema	Drugcentral, MetaADEDB, Rxlist, SIDER	9	Vomiting	Drugcentral, MetaADEDB, Rxlist, SIDER
	2	Nausea	Drugcentral, MetaADEDB, Rxlist, SIDER	10	Rash	Drugcentral, MetaADEDB, Rxlist, SIDER
	3	Pruritus	Drugcentral, MetaADEDB, SIDER	11	Blurred vision	Rxlist, Literature [37]
Clozapine	4	Diarrhea	Drugcentral, MetaADEDB, Rxlist, SIDER	12	Headache	Drugcentral, MetaADEDB, Rxlist, SIDER
	5	Anemia	Drugcentral, SIDER	13	Thrombocytopenia	Drugcentral, MetaADEDB, Rxlist, SIDER
	6	Paresthesia	Drugcentral, Rxlist, SIDER	14	Nervousness	Drugcentral, MetaADEDB
	7	Pain	Drugcentral, MetaADEDB, Rxlist, SIDER	15	Dizziness	Drugcentral, MetaADEDB, Rxlist, SIDER
	8	Anorexia	MetaADEDB, Rxlist, SIDER

Table 6. Top 15 candidate side-effects related to Aripiprazole.

Drug	Rank	Side-Effect	Evidence	Rank	Side-Effect	Evidence
	1	Edema	Drugcentral, MetaADEDB, Rxlist, SIDER	9	Tachycardia	Drugcentral, MetaADEDB, Rxlist, SIDER
	2	Headache	Drugcentral, MetaADEDB, Rxlist, SIDER	10	Blurred vision	Drugcentral, MetaADEDB, Rxlist
	3	Rash	Drugcentral, MetaADEDB, Rxlist, SIDER	11	Dyspepsia	Drugcentral, MetaADEDB, Rxlist, SIDER
Aripiprazole	4	Dizziness	MetaADEDB, MetaADEDB, Rxlist, SIDER	12	Chest pain	Drugcentral, MetaADEDB, Rxlist, SIDER
	5	Nervousness	Drugcentral, MetaADEDB, SIDER	13	Hemorrhage	MetaADEDB
	6	Infection	Drugcentral, MetaADEDB, Rxlist, SIDER	14	Hypersensitivity	Drugcentral, MetaADEDB, Rxlist, SIDER
	7	Constipation	Drugcentral, MetaADEDB, Rxlist, SIDER	15	Fatigue	Drugcentral, MetaADEDB, Rxlist, SIDER
	8	Back pain	Drugcentral, MetaADEDB, SIDER

Table 7. Top 15 candidate side-effects related to Asenapine.

Drug	Rank	Side-Effect	Evidence	Rank	Side-Effect	Evidence
	1	Edema	MetaADEDB, Rxlist, SIDER	9	Dyspnea	Rxlist, SIDER
	2	Vomiting	Rxlist, SIDER	10	Constipation	MetaADEDB, Rxlist, SIDER
	3	Headache	MetaADEDB, Rxlist, SIDER	11	Confusion	Rxlist
Asenapine	4	Pain	MetaADEDB, Rxlist, SIDER	12	Blurred vision	unconfirmed
	5	Nausea	MetaADEDB, Rxlist, SIDER	13	Fatigue	Drugcentral, MetaADEDB, Rxlist, SIDER
	6	Dizziness	MetaADEDB, Rxlist, SIDER	14	Anorexia	unconfirmed
	7	Rash	Rxlist, SIDER	15	Pruritus	unconfirmed
	8	Diarrhea	Drugcentral, Rxlist

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xuan, P.; Li, P.; Cui, H.; Wang, M.; Nakaguchi, T.; Zhang, T. Learning Multi-Types of Neighbor Node Attributes and Semantics by Heterogeneous Graph Transformer and Multi-View Attention for Drug-Related Side-Effect Prediction. Molecules 2023, 28, 6544. https://doi.org/10.3390/molecules28186544

AMA Style

Xuan P, Li P, Cui H, Wang M, Nakaguchi T, Zhang T. Learning Multi-Types of Neighbor Node Attributes and Semantics by Heterogeneous Graph Transformer and Multi-View Attention for Drug-Related Side-Effect Prediction. Molecules. 2023; 28(18):6544. https://doi.org/10.3390/molecules28186544

Chicago/Turabian Style

Xuan, Ping, Peiru Li, Hui Cui, Meng Wang, Toshiya Nakaguchi, and Tiangang Zhang. 2023. "Learning Multi-Types of Neighbor Node Attributes and Semantics by Heterogeneous Graph Transformer and Multi-View Attention for Drug-Related Side-Effect Prediction" Molecules 28, no. 18: 6544. https://doi.org/10.3390/molecules28186544

Article Menu

Learning Multi-Types of Neighbor Node Attributes and Semantics by Heterogeneous Graph Transformer and Multi-View Attention for Drug-Related Side-Effect Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Multi-Source Data Matrix Representation and Construction of Heterogeneous Graphs

2.2.1. Matrix Representation of Drug-Side-Effect Associations

2.2.2. Matrix Representation of Multi-Modality Similarities of Drugs

2.2.3. Matrix Representation of Side-Effect Similarity

2.2.4. Construction of Drug-Side-Effect Heterogeneous Graphs and Attribute Extraction

2.3. Context Representation Learning Based on Transformer with Attention

2.3.1. Neighborhood Node Set Extraction

2.3.2. Node Attribute Conversion

2.3.3. Contextual Encoding of Nodes of the Same Type

2.3.4. Neighborhood Node Category-Level and Graph-Level Attention Mechanisms

2.4. Local Information Enrichment Strategy for Drug-Side-Effect Node Pair Feature Representation Learning Based on Capsule Networks

2.4.1. Establishment of Primary Capsule Embedding Based on Convolution Operation

2.4.2. Creation of the Primary Capsule Layer

2.4.3. Design of Capsule Layer Routing Mechanism

2.5. Final Integration and Optimization

3. Experimental Evaluations and Discussion

3.1. Parameter Settings and Evaluation Metrics

3.2. Ablation Experiment

3.3. Comparison with Other Methods

3.4. Case Studies on Five Drugs

3.5. Predicting Novel Drug-Related Side-Effects

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI