Next Article in Journal
Management of Acute Radiodermatitis in Non-Melanoma Skin Cancer Patients Using Electrospun Nanofibrous Patches Loaded with Pinus halepensis Bark Extract
Next Article in Special Issue
Circular RNAs: Emerging Regulators of the Major Signaling Pathways Involved in Cancer Progression
Previous Article in Journal
Thrombospondin-2 and LDH Are Putative Predictive Biomarkers for Treatment with Everolimus in Second-Line Metastatic Clear Cell Renal Cell Carcinoma (MARC-2 Study)
Previous Article in Special Issue
A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

GATCDA: Predicting circRNA-Disease Associations Based on Graph Attention Network

1
School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
2
Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
*
Authors to whom correspondence should be addressed.
Cancers 2021, 13(11), 2595; https://doi.org/10.3390/cancers13112595
Submission received: 28 March 2021 / Revised: 19 May 2021 / Accepted: 22 May 2021 / Published: 25 May 2021
(This article belongs to the Special Issue Circular RNAs: New Insights into the Molecular Biology of Cancer)

Abstract

:

Simple Summary

CircRNAs (circular RNAs), a novel kind of non-coding RNAs, play a regulatory role in cellular processes. A growing number of biological experiments has proved that circRNAs can be used as biomarkers and therapeutic targets of some cancers. As the time and financial costs of biological experiments are high, computational methods have become a better way to predict the associations between circRNAs and diseases. Graph attention network was first applied to predict circRNA-disease associations with multiple similarities of data in this study. The circRNA–miRNA interactions and disease-mRNA interactions were adopted to construct features. The computational method proposed in this study has improved the prediction performance.

Abstract

CircRNAs (circular RNAs) are a class of non-coding RNA molecules with a closed circular structure. CircRNAs are closely related to the occurrence and development of diseases. Due to the time-consuming nature of biological experiments, computational methods have become a better way to predict the interactions between circRNAs and diseases. In this study, we developed a novel computational method called GATCDA utilizing a graph attention network (GAT) to predict circRNA–disease associations with disease symptom similarity, network similarity, and information entropy similarity for both circRNAs and diseases. GAT learns representations for nodes on a graph by an attention mechanism, which assigns different weights to different nodes in a neighborhood. Considering that the circRNA–miRNA–mRNA axis plays an important role in the generation and development of diseases, circRNA–miRNA interactions and disease–mRNA interactions were adopted to construct features, in which mRNAs were related to 88% of miRNAs. As demonstrated by five-fold cross-validation, GATCDA yielded an AUC value of 0.9011. In addition, case studies showed that GATCDA can predict unknown circRNA–disease associations. In conclusion, GATCDA is a useful method for exploring associations between circRNAs and diseases.

1. Introduction

CircRNAs (circular RNAs) are a class of non-coding RNA molecules with a closed circular structure, without a 5′-end cap and a 3′-end ployA tail. They are mainly located in the cytoplasm or stored in exosomes, and are not affected by RNA exonuclease [1]. Although circRNAs are non-coding RNAs, some circRNAs can encode polypeptides. Currently, biological functions of circRNAs are well-recognized as follows [2]: miRNA sponges, regulatory protein binding, regulation of gene transcription, and coding functions. CircRNA expression is more stable and not easily degradable, and has been proved to exist widely in a variety of eukaryotes [1]. Most circRNAs are formed by exon loops, and some circRNAs are lariat structures formed by intron loops. Because circRNAs contain a number of miRNA response elements (MREs), they can form the catalytic core of the RNA-induced silencing complex (RISC) with AGO proteins, which eventually leads to the degradation of circRNAs [3]. According to their sources, circRNAs can be roughly divided into four categories [4]: full-exon circRNAs, exon-introns circRNA (EIcircRNAs), intron-composed lariat circRNAs, and circRNAs produced by cyclization of viral RNA genomes (tRNA, rRNA, snRNA, etc.). Twenty years ago, scientists found circRNAs from plant viroids, yeast mitochondria, and hepatitis B viruses (HBV) as byproducts of abnormal splicing that have no regulatory function [5]. In 2013, Hansen et al. proposed and confirmed for the first time that circRNA is the regulatory mechanism of the miRNA sponge [6], providing a new field for circRNA research. With the rapid development in RNA sequencing technology and bioinformatics analysis, 14,807 candidate circRNAs have been identified in the human tissue transcriptome, and many exons have been found to form circRNAs by nonlinear reverse splicing or gene rearrangement in cells of other species.
In recent years, many studies [7,8,9] have shown that circRNAs are closely related to the occurrence and development of diseases, and predicted circRNAs’ application prospects in aspects of diagnostic markers of diseases. For example, after brain/spinal cord injury, circRNAs can activate several biological, molecular, and cellular activities. Therefore, interventions centered on the regulation of circRNAs may be promising for traumatic brain injury and spinal cord injury [10]. Chen et al. demonstrated that circRNA circCTNNA1 promoted colorectal cancer progression by sponging miR-149-5p and regulating FOXM1 expression [11]. Wang et al. found that circCNST promoted the tumorigenesis of osteosarcoma cells by sponging miR-421 and targeting SLC25A3, providing a potential biomarker for patients with osteosarcoma [12]. Wu et al. discovered that circ_0009582, circ_0037120, and circ_0140117 may serve as potential biomarkers for predicting the occurrence of hepatocellular carcinoma in patients with HBV infection [13]. Li et al. revealed that circRNA circITGA7 may play a regulatory role in thyroid cancer and may be a potential marker for thyroid cancer diagnosis or progression [14].
At present, the number of known interactions between circRNAs and diseases obtained through biological experiments is increasing. Some relevant databases have appeared [15,16,17,18], including the interactions between circRNAs and diseases verified by biological experiments. Due to the significant time and financial costs associated with biological experiments, in recent years, it has become a hot topic to predict associations between circRNAs and diseases using computational methods with these databases. The current calculation methods can be divided into five categories [19]. The first category is a network propagating method. For example, the computational model BRWSP applies biased random walk to search paths on a multiple heterogeneous network to discover circRNA-disease associations [20]. A method called KATZHCDA uses KATZ measures for human circRNA-disease association prediction [21]. The second category is a recommendation system method. For example, Lei et al. proposed a computational method named ICFCDA based on collaboration filtering recommendation system, handling the “cold start” problem to predict potential circRNA-disease associations [22]. The third category is the matrix completion methods. For example, a computational method called iCircDA-MF was developed based on matrix factorization by Wei et al. [23]. Zhang et al. [24] utilized metapath2vec++ and matrix factorization to discover circRNA-disease associations. The fourth category is the classical machine learning methods. For example, based on a gradient boosting decision tree, a model named GBDTCDA was proposed by Lei et al. in 2019 [25]. A computational model called RWLR uses logistic regression to predict circRNA-disease associations [26]. The fifth category is the deep learning methods. For example, Wang’s method [27] applies a convolutional neural network to discover unknown circRNA-disease associations. In 2020, GCNCDA was proposed based on a graph convolutional network [28].
In this paper, we propose a novel computational model named GATCDA to predict circRNA-disease associations with graph attention network (GAT). First, we construct a circRNA-disease association network, a circRNA-miRNA association network and a disease-mRNA association network. Second, we calculate disease symptom similarity, network similarity and information entropy similarity for both circRNAs and diseases. Third, these similarities are integrated to create the features of circRNAs and diseases. Fourth, the circRNA-disease association network and the features of circRNAs and diseases are fed into GAT, and the output is the prediction score of associations between circRNAs and diseases.

2. Materials and Methods

The whole flowchart of GATCDA is shown in Figure 1.

2.1. Dataset Curation

2.1.1. CircRNA-Disease Association

The circRNA-disease associations were downloaded from CircR2Disease [15], CircAtlas 2.0 [16], Circ2Disease [17], and CircRNADisease [18], in which the number of circRNA-disease associations are 739, 927, 273, and 354, respectively. After integration, we obtained 768 circRNA-disease associations, including 624 circRNAs and 102 diseases.

2.1.2. CircRNA-MiRNA Association and Disease-mRNA Association

An initial circRNA-miRNA association dataset was downloaded from the starBase v2.0 [29] including 130,000+ circRNA-miRNA interactions, with 276 miRNA entries and 7018 circRNA entries. We selected 82 circRNAs common to circRNA-disease interactions and circRNA-miRNA interactions. There were only 142 miRNAs related to these 82 circRNAs in the initial circRNA-miRNA interactions. Finally, we constructed a new circRNA-miRNA association network including 509 circRNA-miRNA associations among 142 miRNAs and 82 circRNAs.
An initial disease-mRNA association dataset was downloaded from DisGeNET [30] including 60,000+ disease-mRNA interactions. We selected 37 diseases common to circRNA-disease interactions and disease-mRNA interactions. There were 820 mRNAs related to these 37 diseases in the initial disease-mRNA interactions. Finally, we constructed a new disease-mRNA association network including 1239 disease-mRNA associations among 37 diseases and 820 mRNAs.
CircRNAs act as miRNA sponges in cells and increase the expression level of target genes. The circRNA-miRNA-mRNA axis plays an important regulatory role in diseases [2,31,32]. In the new circRNA-miRNA associations, we found all the miRNAs related with diseases from circRNA-disease associations or mRNAs from disease-mRNA associations. Furthermore, in the new disease-mRNA associations, we found all the mRNAs related to the 125 of 142 miRNAs mentioned above.

2.1.3. Construction of the Interaction Network

For convenience, we formulate circRNA-disease associations as a binary matrix Y R 624 × 102 . If there exists an experimentally verified interaction between circRNA ci and disease dj, Y ( i , j ) = 1 ; otherwise, Y ( i , j ) = 0 . At the same time, a circRNA-miRNA interaction matrix and a disease-mRNA interaction matrix are constructed in the same way based on circRNA-miRNA associations and disease-mRNA associations, respectively.

2.2. Similarity Calculation

2.2.1. Network Similarity

Zhou et al. [33] demonstrated the usefulness of network similarity. For a given miRNA mik, we denote the set of its interacting circRNAs by C(mik). For a given mRNA mk, we denote the set of its interacting diseases by D(mk). The network contribution of miRNA mik in the circRNA-miRNA interaction network can be calculated as follows
n c ( m i k ) = ln C ( m i k ) / k = 1 y C ( m i k ) ,
where nc(mik) is the network contribution of miRNA mik in the circRNA-miRNA interaction network, and y is the number of miRNAs. The network contribution of mRNA mk in the disease-mRNA interaction network can be calculated as follows
n c ( m k ) = ln D ( m k ) / k = 1 z D ( m k ) ,
where nc(mk) is the network contribution of mRNA mk in the disease-mRNA interaction network, and z is the number of mRNAs. We also denote the set of miRNAs that interact with a given circRNA cu by Mi(cu), and the set of mRNAs that interact with a given disease du by M(du). The network similarity between circRNA cu and circRNA cv can be defined as
C N S ( c u , c v ) = m i k M i ( c u ) M i ( c v ) n c ( m i k ) ,
where CNS(cu, cv) is the network similarity between circRNA cu and circRNA cv. Similarly, given two diseases, du and dv, the network similarity between disease du and disease dv can be defined as
D N S ( d u , d v ) = m k M ( d u ) M ( d v ) n c ( m k ) ,
where DNS(du, dv) is the network similarity between disease du and disease dv.

2.2.2. Information Entropy Similarity

Information entropy is also used to measure topological similarities of circRNAs and diseases. For a given circRNA cu, we denote the set of its interacting diseases by T m c u . For a given circRNA cv, we denote the set of its interacting circRNAs by T m c v . Next, the information entropy of T m c u can be calculated as
H ( T m c u ) = i = 1 n d p ( T m c u ( i ) ) log 2 ( p ( T m c u ( i ) ) ) p ( T m c u ( i ) ) = n ( T m c u ( i ) ) / N c d ,
where nd is the number of diseases related with circRNA cu, Ncd is the total number of known circRNA-disease interactions, n ( T m c u ( i ) ) is the number of interactions between the ith disease in the related disease set of circRNA cu and all circRNAs, and p ( T m c u ( i ) ) is the rate of the ith disease in the related disease set of circRNA cu with the known circRNA-disease interactions. The information entropy similarity between circRNA cu and circRNA cv can be calculated as
C E S ( c u , c v ) = 2 H ( T m c u T m c v ) H ( T m c u ) + H ( T m c v ) ,
where H ( T m c u T m c v ) is the information entropy of the intersection of T m c u and T m c v , and CES(cu, cv) is the information entropy similarity of circRNA cu and circRNA cv.
Similarly, the information entropy similarity between disease du and disease dv can calculated as follows
D E S ( d u , d v ) = 2 H ( T n d u T n d v ) H ( T n d u ) + H ( T n d v ) ,
where T n d u is the set of disease dus interacting circRNAs, T n d v is the set of disease dvs interacting circRNAs, H ( T n d u T n d v ) is the information entropy of the intersection of T n d u and T n d v , and DES(du, dv) is the information entropy similarity of disease du and disease dv.

2.2.3. Disease Symptom Similarity

According to the co-occurrence of diseases and symptom terms recorded in the PubMed bibliography, and the work of Zhou et al. [34], the disease similarity can be measured and a symptom-based human disease network can be constructed. Here, the symptom-based disease similarity matrix DSS was obtained from the symptom profiles of diseases.

2.2.4. Integration of Similarities

The integrated circRNA similarities and integrated disease similarities are regarded as circRNA features and disease features, respectively. The integrated circRNA similarities can be calculated as follows:
I C S ( c u , c v ) = β × C N S ( c u , c v ) + ( 1 β ) × C E S ( c u , c v ) C E S ( c u , c v ) ,
where CNS(cu, cv) is the circRNA network similarity between circRNA cu and circRNA cv, CES(cu, cv) is the circRNA information entropy similarity between circRNA cu and circRNA cv, and ICS(cu, cv) is the integrated circRNA similarity between circRNA cu and circRNA cv. β is an adjusting parameter. The dimensions of the ICS matrix are 624 × 624.
The integrated disease similarities can be calculated as follows
I D S ( d u , d v ) = α × ( D N S ( d u , d v ) + D S S ( d u , d v ) ) + ( 1 α ) × D E S ( d u , d v ) D E S ( d u , d v ) ,
where DNS(du, dv) is the disease network similarity between disease du and disease dv, DES(du, dv) is the disease information entropy similarity between disease du and disease dv, DSS(du, dv) is the disease symptom similarity between disease du and disease dv, and IDS(du, dv) is the integrated disease similarity between disease du and disease d2. α is an adjusting parameter. The dimensions of the IDS matrix are 102 × 102.

2.3. Graph Attention Network

GAT [35] combines a weighted sum of the adjacent node features with the attention mechanism. The weight of the adjacent node features is completely dependent on the node features and independent of graph structure. GAT aims to construct a hidden self-attention layer and to learn representations for nodes on a graph by assigning different weights to different nodes in a neighborhood.
The input of graph attention layer is
f = { f 1 , f 2 , , f N } , f i R F ,
where N is number of nodes (all circRNAs and all diseases), F is the length of features, and matrix f R N × F denotes the features of all nodes.
The output of the graph attention layer is
f = { f 1 , f 2 , , f N } , f i R F ,
where F denotes the dimension of new features and matrix f R N × F denotes the new features of all nodes.
The first step is to learn the importance of the neighbors for a given node. GAT implements the self-attention mechanism for every node. The attention coefficient eij for an association pair between circRNA ci and disease dj is formulated as follows
e i j ( c i , d j ) = a t t ( W f i , W f j ) ,
where att denotes a single-layer feed-forward neural network that transforms input features into high-level features for circRNAs and diseases, and W R F × F is a weight matrix.
To make the attention coefficient comparable across different nodes, GAT further normalizes the attention coefficient eij as follows
θ i j = s o f t m a x ( e i j ) = exp ( e i j ) t N i exp ( e i t ) ,
where Ni is the set of neighbor nodes of circRNA ci, and θ i j is the normalized attention coefficient indicating the importance of disease dj for circRNA ci in the process of information propagation.
By combining Formulas (12) and (13), the complete attention mechanism can be obtained as follows
θ i j = exp ( l e a k y R e L u ( a T [ W f i | | W f j ] ) ) t N i exp ( l e a k y R e L u ( a T [ W f i | | W f t ] ) ) ,
where leakyReLu is a nonlinearity activation function assigning all negative values a non-zero slope, T denotes transposition, || is the concatenation operation, and a R 2 F is the weight coefficient matrix of the graph attention layer.
The second step is to fuse the representations of the neighbors for a given node according to their attention coefficients. The embedding of a given node can be fused by the projected node features of neighbors with different weights as follows
f i = σ ( t N i θ i t W f t ) ,
where σ is a nonlinear activation function.
GAT applies a multi-head attention mechanism to increase the stability of the learning process of self-attention. Multi-head attention is the combination of multiple self-attention structures. Each head learns the features in different representation spaces, and the focus of attention learned by multiple heads may be slightly different, which increases the capacity of the model. Specifically, K-independent attention mechanisms are integrated to achieve embedding as follows:
f i = σ ( 1 K k = 1 K t N i θ i t k W k f t ) ,
where K is the number of attention mechanisms and W k is the weight matrix for the kth attention mechanism.
Ultimately, the probability score matrix S can be calculated as follows
S = U × V T ,
where U R n c × F is the final representation matrix of the circRNAs, in which nc is the number of circRNAs; and V R n d × F is the final representation matrix of diseases, in which nd is the number of diseases. The dimension of probability score matrix S is n c × n d .
The detailed procedures of using GAT to predict the associations between circRNAs and diseases are shown in Figure 2. As shown in Figure 2, the circRNA-disease association network is fed into a GAT in which the final node representation is obtained through feature propagation and attention fusion. Finally, the prediction score is calculated according to node representation. In the case of disease d3 and circRNA c2, the dark blue row represents d3, the dark yellow row represents c2, and the red grid represents the predicted score of the association between d3 and c2.

3. Results

3.1. Performance Evaluation

The five-fold cross-validation (5CV) technique was used to evaluate the prediction performance of our model. The 5CV technique randomly divides the positive samples into five equal parts, and takes out one part of them as a testing sample while the rest of samples are regarded as training samples. Next, the predicted scores are sorted in descending order. We drew the receiver operating characteristics (ROC) curve via plotting the true positive rate (TPR) versus the false positive rate (FPR) at different score thresholds. TPR (FPR) refers to the percentage of positive (negative) cases that are correctly identified. Generally, the area under the ROC curve (AUC) is calculated and employed to evaluate the prediction performance. Specifically, the closer the AUC value is to one, the better the prediction performance. As a result, in 5CV, GATCDA achieved an AUC of 0.9011. In addition, GATCDA yielded an accuracy of 0.8710, with a precision of 0.9013.

3.2. Adjustment of Parameters

The GATCDA model involves two parameters, α and β, which adjust the influence of similarity data when calculating integrated similarities. We let α and β both range between 0.1 and 0.9. As a result, GATCDA (α = 0.1, β = 0.1) gained the highest AUCs of 0.9011 in 5CV as shown in Figure 3.

3.3. Compared with Other Methods

To analyze the performance of GATCDA in predicting circRNA-disease associations, we compared GATCDA with other four methods: DWNN-RLS [36], KATZHCDA [21], bi-random walks (BiRWR) [37], and DeepWalk [38]. DWNN-RLS uses the regularized least squares of the Kronecker product kernel to predict circRNA-disease associations. KATZHCDA uses KATZ measures for human circRNA-disease association prediction. BiRWR predicts circRNA-disease associations by walking in a circRNA subnetwork and disease subnetwork. DeepWalk is a way to learn the potential representation of nodes in a graph structure. The ROC curve and AUC value of each method using 5CV are shown in Figure 4. The precision-recall curve and the area under the precision-recall curves (AUPR) value of each method with 5CV are shown in Figure 5. Through comparison, it can be seen that the extraction of circRNA and disease features by GAT can achieve better prediction performance compared with DeepWalk. In addition, as a deep learning method, GAT also shows better prediction performance compared with the other two link-based prediction methods (KATZHCDA and BiRWR).

3.4. Case Study

To further evaluate the prediction performance of GATCDA, we also carried out case studies on three common diseases, i.e., bladder cancer, diabetes retinopathy, and rheumatoid arthritis.
Bladder cancer is the most frequent cancer affecting the urinary tract [39], and has a high rate of recurrence [40]. Diabetes retinopathy is a common chronic metabolic disorder, increasing with an ageing population and the growing number of cases of diabetes [41]. Rheumatoid arthritis is the most common chronic inflammatory arthritis, which can lead to cartilage and bone damage and disability [42]. There is increasing evidence that circRNAs can be used as effective biomarkers for the diagnosis of bladder cancer, diabetes retinopathy, and rheumatoid arthritis. Therefore, we selected bladder cancer, diabetes retinopathy, and rheumatoid arthritis to verify the predictive ability of GATCDA.
In this work, all known associations between the investigated disease and circRNAs were assumed to be unknown. Through the calculation of GATCDA, the circRNAs with the top 10 scores were selected among all the predicted associations between the investigated disease and circRNAs. Then, through searching the related literature, some circRNAs were confirmed to be related to the investigated disease.
The results of the case studies of the three diseases (bladder cancer, diabetes retinopathy, and rheumatoid arthritis) are shown in Table 1. For bladder cancer, we can see that 8 of the top 10 candidates with the highest prediction scores are confirmed by the relevant literature. Notably, the seventh circRNA (hsa_circ_0075828) predicted by GATCDA is related to bladder cancer. For diabetes retinopathy and rheumatoid arthritis, 7 of the top 10 candidates with the highest prediction scores are confirmed by the relevant literature. For example, Li et al. found that hsa_circ_0001859 regulates ATF2 expression by functioning as an MiR-204/211 sponge in human rheumatoid arthritis [43]. Zhang et al. revealed that hsa_circ_0005015 acts as an miR-519d-3p sponge to inhibit miR-519d-3p activity, leading to increasing MMP-2, XIAP, and STAT3 expression in diabetes retinopathy [44].
In order to verify the prediction performance of GATCDA, we compared it with other models in the case studies of these three kinds of same diseases, as shown in Table 2. From Table 2, 8, 7, and 7 out of these top 10 circRNAs predicted by GATCDA were verified to be associated with bladder cancer, diabetes retinopathy, and rheumatoid arthritis, respectively, which are the highest among competing methods. Therefore, GATCDA also outperforms the competing prediction models in terms of the hit rate in case studies.

4. Discussion

With the rapid development in RNA sequencing technology and bioinformatics analysis, various studies have shown that circRNAs are closely related to the occurrence and development of disease, and circRNAs act a potential biomarkers for patients with certain cancers. Therefore, discovering associations between circRNAs and diseases is significative for disease diagnosis and treatment. Nevertheless, biological experiments are very costly in terms of time and money. It has become a hot topic to predict associations between circRNAs and diseases using computational methods in recent years. The lack of data on the interactions between circRNA and disease limits the predictive power of most computational methods.
In this study, we proposed a new computational model named GATCDA to identify underlying circRNA-disease associations. We performed 5CV experiments to assess the predictive performance of GATCDA. Our method yielded an AUC value of 0.9011 and an AUPR value of 0.896, which are higher than those of DWNN-RLS, KATZHCDA, BiRWR, and DeepWalk. In addition, the predicted top 10 circRNA-disease interactions in the case studies of three diseases (bladder cancer, diabetes retinopathy, and rheumatoid arthritis) have been confirmed in the relevant literature, which suggests that GATCDA can be an effective tool for predicting circRNA-disease associations.
The accurate predictive performance of GATCDA is attributed to the following factors: First, in order to identify more interactions between circRNAs and diseases, circRNA-disease interactions were integrated from four databases, i.e., CircR2Disease, CircAtlas 2.0, Circ2Disease, and CircRNADisease. Therefore, the number of positive samples input to GAT algorithm is higher. Second, as the circRNA-miRNA-mRNA axis plays an important role in the generation and development of diseases, the circRNA-miRNA interactions and disease-mRNA interactions were adopted to construct features, in which mRNAs are related to 88% of miRNAs. CircRNAs have several distinct modes of action. From the functional perspective of circRNAs as miRNA sponges, an interaction network can be constructed for circRNA network similarity calculations. Other functions of circRNA are difficult to quantify. Third, more similarities are involved in GATCDA, i.e., disease symptom similarity, disease network similarity, disease information entropy similarity, circRNA network, and circRNA information entropy similarity, which are integrated effectively. Fourth, GAT has obvious advantages, including learning representations for nodes on a graph using an attention mechanism. Therefore, it can assign different weights to different nodes in a neighborhood.
GATCDA also has limitations. Compared with other non-coding RNAs, the interactions between circRNAs and diseases are still insufficient. Therefore, the circRNA-disease association matrix is sparse, which has an impact on prediction performance. In the future, we will collect more data on the associations between circRNAs and diseases.

5. Conclusions

In this study, we proposed a new computational model named GATCDA to identify underlying circRNA-disease associations. Specifically, GAT was used to predict circRNA-disease associations based on multiple similarities of circRNA and disease. This work has two highlights: First, as the circRNA-miRNA-mRNA axis plays an important role in the generation and development of diseases, circRNA-miRNA interactions and disease-mRNA interactions are adopted to construct features, in which mRNAs are related to 88% of miRNAs. Second, GAT is used to predict the interactions between circRNAs and diseases. GAT can assign different learning weights to different neighbors, and the correlation between vertex features can be better integrated into the model. In terms of predictive performance, GATCDA achieves an AUC of 0.9011 in 5CV, and in case studies of three diseases, 70% of experimentally validated relationships were predicted. In summary, GATCDA is a powerful tool for predicting circRNA-disease associations.

Author Contributions

Conceptualization, C.B., X.-J.L. and F.-X.W.; methodology, C.B. and X.-J.L.; data curation, C.B.; writing and drafting of the manuscript, C.B., X.-J.L. and F.-X.W.; critical revision of the manuscript for important intellectual content, X.-J.L. and F.-X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 61972451 and 61902230; and the Fundamental Research Funds for the Central Universities of China, grant number GK201901010.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Li, Y.; Zheng, Q.; Bao, C.; Li, S.; Guo, W.; Zhao, J.; Chen, D.; Gu, J.; He, X.; Huang, S. Circular RNA is enriched and stable in exosomes: A promising biomarker for cancer diagnosis. Cell Res. 2015, 25, 981–984. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Rong, D.; Sun, H.; Li, Z.; Liu, S.; Dong, C.; Fu, K.; Tang, W.; Cao, H. An emerging function of circRNA-miRNAs-mRNA axis in human diseases. Oncotarget 2017, 8, 73271–73281. [Google Scholar] [CrossRef] [Green Version]
  3. Chen, Y.; Wei, S.; Wang, X.; Zhu, X.; Han, S. Progress in research on the role of circular RNAs in lung cancer. World J. Surg. Oncol. 2018, 16, 215. [Google Scholar] [CrossRef]
  4. Zhang, L.; Hou, C.; Chen, C.; Guo, Y.; Yuan, W.; Yin, D.; Liu, J.; Sun, Z. The role of N6-methyladenosine (m6A) modification in the regulation of circRNAs. Mol. Cancer 2020, 19, 105. [Google Scholar] [CrossRef]
  5. Patop, I.L.; Wüst, S.; Kadener, S. Past, present, and future of circRNAs. EMBO J. 2019, 38, e100836. [Google Scholar] [CrossRef]
  6. Hansen, T.B.; Jensen, T.I.; Clausen, B.H.; Bramsen, J.B.; Finsen, B.; Damgaard, C.K.; Kjems, J. Natural RNA circles function as efficient microRNA sponges. Nat. Cell Biol. 2013, 495, 384–388. [Google Scholar] [CrossRef] [PubMed]
  7. Han, B.; Chao, J.; Yao, H. Circular RNA and its mechanisms in disease: From the bench to the clinic. Pharmacol. Ther. 2018, 187, 31–44. [Google Scholar] [CrossRef]
  8. Zhu, L.-P.; He, Y.-J.; Hou, J.-C.; Chen, X.; Zhou, S.-Y.; Yang, S.-J.; Li, J.; Zhang, H.-D.; Hu, J.-H.; Zhong, S.-L.; et al. The role of circRNAs in cancers. Biosci. Rep. 2017, 37. [Google Scholar] [CrossRef] [PubMed]
  9. Zhang, Z.; Yang, T.; Xiao, J. Circular RNAs: Promising Biomarkers for Human Diseases. EBioMedicine 2018, 34, 267–274. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Yuan, J.; Botchway, B.O.A.; Zhang, Y.; Wang, X.; Liu, X. Role of Circular Ribonucleic Acids in the Treatment of Traumatic Brain and Spinal Cord Injury. Mol. Neurobiol. 2020, 57, 4296–4304. [Google Scholar] [CrossRef] [PubMed]
  11. Chen, P.; Yao, Y.; Yang, N.; Gong, L.; Kong, Y.; Wu, A. Circular RNA circCTNNA1 promotes colorectal cancer progression by sponging miR-149-5p and regulating FOXM1 expression. Cell Death Dis. 2020, 11, 557. [Google Scholar] [CrossRef] [PubMed]
  12. Wang, J.-H.; Wu, X.-J.; Duan, Y.-Z.; Li, F. Circular RNA_CNST Promotes the Tumorigenesis of Osteosarcoma Cells by Sponging miR-421. Cell Transplant. 2020, 29. [Google Scholar] [CrossRef]
  13. Wu, C.; Deng, L.; Zhuo, H.; Chen, X.; Tan, Z.; Han, S.; Tang, J.; Qian, X.; Yao, A. Circulating circRNA predicting the occurrence of hepatocellular carcinoma in patients with HBV infection. J. Cell. Mol. Med. 2020, 24, 10216–10222. [Google Scholar] [CrossRef] [PubMed]
  14. Li, S.; Yang, J.; Liu, X.; Guo, R.; Zhang, R. circITGA7 Functions as an Oncogene by Sponging miR-198 and Upregulating FGFR1 Expression in Thyroid Cancer. BioMed Res. Int. 2020, 2020, 8084028. [Google Scholar] [CrossRef]
  15. Fan, C.; Lei, X.; Fang, Z.; Jiang, Q.; Wu, F.-X. CircR2Disease: A manually curated database for experimentally supported circular RNAs associated with various diseases. Database 2018, 2018, bay044. [Google Scholar] [CrossRef] [Green Version]
  16. Ji, P.; Wu, W.; Chen, S.; Zheng, Y.; Zhou, L.; Zhang, J.; Cheng, H.; Yan, J.; Zhang, S.; Yang, P.; et al. Expanded Expression Landscape and Prioritization of Circular RNAs in Mammals. Cell Rep. 2019, 26, 3444–3460.e5. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Yao, D.; Zhang, L.; Zheng, M.; Sun, X.; Lu, Y.; Liu, P. Circ2Disease: A manually curated database of experimentally validated circRNAs in human disease. Sci. Rep. 2018, 8, 11018. [Google Scholar] [CrossRef] [Green Version]
  18. Zhao, Z.; Wang, K.; Wu, F.; Wang, W.; Zhang, K.; Hu, H.; Liu, Y.; Jiang, T. circRNA disease: A manually curated database of experimentally supported circRNA-disease associations. Cell Death Dis. 2018, 9, 475. [Google Scholar] [CrossRef] [PubMed]
  19. Lei, X.; Mudiyanselage, T.B.; Zhang, Y.; Bian, C.; Lan, W.; Yu, N.; Pan, Y. A comprehensive survey on computational methods of non-coding RNA and disease association prediction. Brief. Bioinform. 2020. [Google Scholar] [CrossRef]
  20. Lei, X.; Zhang, W. BRWSP: Predicting circRNA-Disease Associations Based on Biased Random Walk to Search Paths on a Multiple Heterogeneous Network. Complexity 2019, 2019, 5938035. [Google Scholar] [CrossRef] [Green Version]
  21. Fan, C.; Lei, X.; Wu, F.-X. Prediction of CircRNA-Disease Associations Using KATZ Model Based on Heterogeneous Networks. Int. J. Biol. Sci. 2018, 14, 1950–1959. [Google Scholar] [CrossRef] [PubMed]
  22. Lei, X.; Fang, Z.; Guo, L. Predicting circRNA-Disease Associations Based on Improved Collaboration Filtering Recommendation System with Multiple Data. Front. Genet. 2019, 10, 897. [Google Scholar] [CrossRef] [PubMed]
  23. Hang, W.; Bin, L. iCircDA-MF: Identification of circRNA-disease associations based on matrix factorization. Brief. Bioinform. 2019, 21, 1356–1367. [Google Scholar]
  24. Zhang, Y.; Lei, X.; Fang, Z.; Pan, Y. CircRNA-disease associations prediction based on metapath2vec++ and matrix factorization. Big Data Min. Anal. 2020, 3, 280–291. [Google Scholar] [CrossRef]
  25. Lei, X.; Fang, Z. GBDTCDA: Predicting circRNA-disease Associations Based on Gradient Boosting Decision Tree with Multiple Biological Data Fusion. Int. J. Biol. Sci. 2019, 15, 2911–2924. [Google Scholar] [CrossRef] [Green Version]
  26. Ding, Y.; Chen, B.; Lei, X.; Liao, B.; Wu, F.-X. Predicting novel CircRNA-disease associations based on random walk and logistic regression model. Comput. Biol. Chem. 2020, 87, 107287. [Google Scholar] [CrossRef] [PubMed]
  27. Wang, L.; You, Z.-H.; Huang, Y.-A.; Huang, D.-S.; Chan, K.C.C. An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network. Bioinformatics 2019, 36, 4038–4046. [Google Scholar] [CrossRef]
  28. Wang, L.; You, Z.-H.; Li, Y.-M.; Zheng, K.; Huang, Y.-A. GCNCDA: A new method for predicting circRNA-disease associations based on Graph Convolutional Network Algorithm. PLoS Comput. Biol. 2020, 16, e1007568. [Google Scholar] [CrossRef]
  29. Li, J.-H.; Liu, S.; Zhou, H.; Qu, L.-H.; Yang, J.-H. starBase v2.0: Decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 2014, 42, D92–D97. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. Piñero, J.; Ramírez-Anguita, J.M.; Saüch-Pitarch, J.; Ronzano, F.; Centeno, E.; Sanz, F.; Furlong, L.I. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2019, 48, D845–D855. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Zhang, Z.; Yue, L.; Wang, Y.; Jiang, Y.; Xiang, L.; Cheng, Y.; Ju, D.; Chen, Y. A circRNA-miRNA-mRNA network plays a role in the protective effect of diosgenin on alveolar bone loss in ovariectomized rats. BMC Complement. Med. Ther. 2020, 20, 220. [Google Scholar] [CrossRef]
  32. Su, Q.; Lv, X. Revealing new landscape of cardiovascular disease through circular RNA-miRNA-mRNA axis. Genomics 2020, 112, 1680–1685. [Google Scholar] [CrossRef] [PubMed]
  33. Zhou, Y.K.; Shen, Z.A.; Yu, H.; Luo, T.; Gao, Y.; Du, P.F. Predicting lncRNA-Protein Interactions with miRNAs as Mediators in a Heterogeneous Network Model. Front. Genet. 2019, 10, 1341. [Google Scholar] [CrossRef] [PubMed]
  34. Zhou, X.Z.; Menche, J.R.; Barabási, A.-L.; Sharma, A. Human symptoms-disease network. Nat. Commun. 2014, 5, 4212. [Google Scholar] [CrossRef] [Green Version]
  35. Velikovi, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  36. Yan, C.; Wang, J.; Wu, F.-X. DWNN-RLS: Regularized least squares method for predicting circRNA-disease associations. BMC Bioinform. 2018, 19, 520. [Google Scholar] [CrossRef] [Green Version]
  37. Lei, X.; Tie, J. Prediction of disease-related metabolites using bi-random walks. PLoS ONE 2019, 14, e0225380. [Google Scholar] [CrossRef]
  38. Chen, H.; Perozzi, B.; Al-Rfou, R.; Skiena, S. A Tutorial on Network Embeddings. arXiv 2018, arXiv:1808.02590. [Google Scholar]
  39. Hindy, J.-R.; Souaid, T.; Kourie, H.R.; Kattan, J. Targeted therapies in urothelial bladder cancer: A disappointing past preceding a bright future? Future Oncol. 2019, 15, 1505–1524. [Google Scholar] [CrossRef]
  40. Kwan, M.L.; Garren, B.; Nielsen, M.E.; Tang, L. Lifestyle and nutritional modifiable factors in the prevention and treatment of bladder cancer. Urol. Oncol. Semin. Orig. Investig. 2019, 37, 380–386. [Google Scholar] [CrossRef]
  41. Silpa-Archa, S.; Ruamviboonsuk, P. Diabetic Retinopathy: Current Treatment and Thailand Perspective. J. Med. Assoc. Thail. Chotmaihet Thangphaet 2017, 100 (Suppl. S1), S136–S147. [Google Scholar]
  42. Smolen, J.S.; Aletaha, D.; McInnes, I.B. Rheumatoid arthritis. Lancet 2016, 388, 2023–2038. [Google Scholar] [CrossRef]
  43. Li, B.; Li, N.; Zhang, L.; Li, K.; Xie, Y.; Xue, M.; Zheng, Z. Hsa_circ_0001859 Regulates ATF2 Expression by Functioning as an MiR-204/211 Sponge in Human Rheumatoid Arthritis. J. Immunol. Res. 2018, 2018, 9412387. [Google Scholar] [CrossRef] [Green Version]
  44. Zhang, S.-J.; Chen, X.; Li, C.-P.; Li, X.-M.; Liu, C.; Liu, B.-H.; Shan, K.; Jiang, Q.; Zhao, C.; Yan, B. Identification and Characterization of Circular RNAs as a New Class of Putative Biomarkers in Diabetes Retinopathy. Investig. Opthalmol. Vis. Sci. 2017, 58, 6500–6509. [Google Scholar] [CrossRef]
  45. Zhang, L.; Xia, H.B.; Zhao, C.Y.; Shi, L.; Ren, X.L. Cyclic RNA hsa_circ_0091017 inhibits proliferation, migration and invasiveness of bladder cancer cells by binding to microRNA-589-5p. Eur. Rev. Med. Pharmacol. Sci. 2020, 24, 86–96. [Google Scholar] [CrossRef]
  46. Cai, D.; Liu, Z.; Kong, G. Molecular and Bioinformatics Analyses Identify 7 Circular RNAs Involved in Regulation of Oncogenic Transformation and Cell Proliferation in Human Bladder Cancer. Med. Sci. Monit. 2018, 24, 1654–1661. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  47. Yang, C.; Yuan, W.; Yang, X.; Li, P.; Wang, J.; Han, J.; Tao, J.; Li, P.; Yang, H.; Lv, Q.; et al. Circular RNA circ-ITCH inhibits bladder cancer progression by sponging miR-17/miR-224 and regulating p21, PTEN expression. Mol. Cancer 2018, 17, 19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  48. Zhong, Z.; Lv, M.; Chen, J. Screening differential circular RNA expression profiles reveals the regulatory role of circTCF25-miR-103a-3p/miR-107-CDK6 pathway in bladder carcinoma. Sci. Rep. 2016, 6, 30919. [Google Scholar] [CrossRef] [Green Version]
  49. Zhuang, C.; Huang, X.; Yu, J.; Gui, Y. Circular RNA hsa_circ_0075828 Promotes Bladder Cancer Cell Proliferation through Activation of CREB1. BMB Rep. 2020, 53, 82–87. [Google Scholar] [CrossRef] [Green Version]
  50. Zhong, Z.; Huang, M.; Lv, M.; He, Y.; Duan, C.; Zhang, L.; Chen, J. Circular RNA MYLK as a competing endogenous RNA promotes bladder cancer progression through modulating VEGFA/VEGFR2 signaling pathway. Cancer Lett. 2017, 403, 305–317. [Google Scholar] [CrossRef] [PubMed]
  51. Liu, C.; Yao, M.-D.; Li, C.-P.; Shan, K.; Yang, H.; Wang, J.-J.; Liu, B.; Li, X.-M.; Yao, J.; Jiang, Q.; et al. Silencing of Circular RNA-ZNF609 Ameliorates Vascular Endothelial Dysfunction. Theranostics 2017, 7, 2863–2877. [Google Scholar] [CrossRef] [PubMed]
  52. Zheng, F.; Yu, X.; Huang, J.; Dai, Y. Circular RNA expression profiles of peripheral blood mononuclear cells in rheumatoid arthritis patients, based on microarray chip technology. Mol. Med. Rep. 2017, 16, 8029–8036. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  53. Zhong, S.; Ouyang, Q.; Zhu, D.; Huang, Q.; Zhao, J.; Fan, M.; Cai, Y.; Yang, M. Hsa_circ_0088036 promotes the proliferation and migration of fibroblast-like synoviocytes by sponging miR-140-3p and upregulating SIRT 1 expression in rheumatoid arthritis. Mol. Immunol. 2020, 125, 131–139. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The flowchart of the computational method GATCDA.
Figure 1. The flowchart of the computational method GATCDA.
Cancers 13 02595 g001
Figure 2. The detailed procedures of using GAT to predict the associations between circRNAs and diseases.
Figure 2. The detailed procedures of using GAT to predict the associations between circRNAs and diseases.
Cancers 13 02595 g002
Figure 3. Heatmap of AUC results of adjustment parameters.
Figure 3. Heatmap of AUC results of adjustment parameters.
Cancers 13 02595 g003
Figure 4. The ROC curves and AUCs of five methods using 5CV.
Figure 4. The ROC curves and AUCs of five methods using 5CV.
Cancers 13 02595 g004
Figure 5. Comparison of five methods in PR curves and AUPRs (5CV).
Figure 5. Comparison of five methods in PR curves and AUPRs (5CV).
Cancers 13 02595 g005
Table 1. Candidate circRNAs identified by GATCDA for bladder cancer, diabetes retinopathy and rheumatoid arthritis.
Table 1. Candidate circRNAs identified by GATCDA for bladder cancer, diabetes retinopathy and rheumatoid arthritis.
DiseaseRankCircRNASource
Bladder cancer1hsa_circ_0091017[45]
2hsa_circ_0002495[46]
3hsa_circ_0071410- 1
4hsa_circ_0001141[47]
5hsa_circ_0007915-
6hsa_circ_0041103[48]
7hsa_circ_0075828[49]
8hsa_circ_0061265[48]
9hsa_circ_0002768[50]
10hsa_circ_0082582[48]
Diabetes retinopathy1hsa_circ_0098964-
2hsa_circ_0057093[44]
3hsa_circ_0051172-
4hsa_circ_0087215[44]
5hsa_circ_0081162[44]
6hsa_circ_0066922[44]
7hsa_circ_0026388[44]
8hsa_circ_0005525-
9hsa_circ_0000615[51]
10hsa_circ_0005015[44]
Rheumatoid arthritis1hsa_circ_0083964[52]
2hsa_circ_0064996[52]
3hsa_circ_0004712[52]
4hsa_circ_0061893-
5hsa_circ_0052012[52]
6hsa_circ_0032683[52]
7hsa_circ_0001859[43]
8hsa_circ_0088036[53]
9hsa_circ_0003028-
10hsa_circ_0010090-
1—means has no source.
Table 2. The number of circRNAs confirmed by evidence in the top 10 potential disease-related circRNAs predicted by GATCDA and other models in case studies of the three kinds of diseases such as bladder cancer, diabetes retinopathy, and rheumatoid arthritis.
Table 2. The number of circRNAs confirmed by evidence in the top 10 potential disease-related circRNAs predicted by GATCDA and other models in case studies of the three kinds of diseases such as bladder cancer, diabetes retinopathy, and rheumatoid arthritis.
ModelBladder CancerDiabetes RetinopathyRheumatoid Arthritis
GATCDA877
DWNN-RLS754
KATZHCDA544
BiRWR534
DeepWalk312
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bian, C.; Lei, X.-J.; Wu, F.-X. GATCDA: Predicting circRNA-Disease Associations Based on Graph Attention Network. Cancers 2021, 13, 2595. https://doi.org/10.3390/cancers13112595

AMA Style

Bian C, Lei X-J, Wu F-X. GATCDA: Predicting circRNA-Disease Associations Based on Graph Attention Network. Cancers. 2021; 13(11):2595. https://doi.org/10.3390/cancers13112595

Chicago/Turabian Style

Bian, Chen, Xiu-Juan Lei, and Fang-Xiang Wu. 2021. "GATCDA: Predicting circRNA-Disease Associations Based on Graph Attention Network" Cancers 13, no. 11: 2595. https://doi.org/10.3390/cancers13112595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop