1. Introduction
Non-coding RNAs (ncRNAs) play special roles in the development, differentiation, and aging of cells. Numerous studies have shown that ncRNAs are widely involved in human pathological activities. They act as biomarkers to provide new targets for the treatment of diseases such as cancer [
1]. Non-coding RNAs such as microRNAs (miRNAs), circular RNAs (circRNAs), and long ncRNAs (lncRNAs) have aroused great interest of researchers. miRNAs are short regulatory biomolecules that are involved in the post-transcriptional regulation of gene expression [
2]. Compared with linear miRNAs, circRNAs [
3] are more stable and may function as transporters or scaffolds [
4]. They exert essential biological functions by acting as microRNA or protein inhibitors (“sponges”), regulating protein function, or being translated themselves [
5]. lncRNA can play a role in regulating cooperating proteins [
6]. piRNA (Piwi-Interacting RNA) has been relatively poorly studied compared to those three. piRNA can form a piRNA/PIWI complex with PIWI proteins to affect gene expression and mainly function to suppress the activity of transposons [
7,
8]. There are synergies among RNAs. For example, lncRNA can act as a molecular sponge of miRNA to regulate the expression of its target gene [
9,
10,
11,
12].
According to a statistical cancer report released by the American Cancer Society [
13], it is estimated that there will be approximately 4950 new cancer cases and 1600 deaths due to cancer every day in the United States. Unfortunately, the development of drug resistance greatly increases the probability of recurrence and significantly reduces the cure rate. Drug resistance has become a major obstacle to clinical treatment.
With the development of sequencing technology, it has been reported that cancer resistance to treatment is related to mutations of the cell’s genome [
14,
15]. The instability of the genome may change the phenotype of the tumor and lead to drug resistance. Studies have shown that some ncRNAs, such as miRNAs, can act as rheostats to regulate protein output [
16]. The abnormal expression of ncRNAs is not only associated with several diseases but also may promote drug resistance of cancer cells [
17,
18]. circRNA acts as a miRNA sponge and enhances the response of HCC (hepatocellular carcinoma) cells to chemotherapy with cisplatin [
19]. lncRNA enhanced drug resistance in AML (acute myeloid leukemia) cells by inhibiting miR-186 [
20]. Overexpression of miRNA-194 can make HCC cells more sensitive to sorafenib [
21]. Increasing evidence suggests that drug resistance is affected by ncRNA. Exploring their interaction will provide new insights for improving the therapeutic effect.
The relationship between ncRNAs and drug resistance has been gradually discovered, and some databases already provide relevant data. The ncDR database [
22] provides 135 compounds and 1050 ncRNAs. Additional information on compounds and ncRNAs, such as ncRNA genomic contexts, had also been added. NoncoRNA [
23] covers the basic calculation of ncRNA, drugs, diseases, etc., and includes experimental detection techniques, drug response, and other information. However, existing knowledge is minimal compared to the unknown associations. Discovering possible relationships between ncRNAs and drugs is beneficial for understanding related drug resistance mechanisms and accelerating drug development. To some extent, the traditional biological experiments are difficult to be carried out due to the factors such as difficult control and high time cost. Computational methods are useful accelerators of this process, but very little work has been done in this area.
In recent years, association prediction methods have been greatly developed in Bioinformatics. GCMDR [
24] established a three-layer latent factor model to predict miRNA-disease associations introducing features such as miRNA expression profile and drug PubChem substructure fingerprints into the model. Zhu et al. [
25] utilized the matrix completion method. SDLDA [
26] introduced singular value decomposition and ILNCRNADIS-FB [
27] calculated the three-dimensional feature blocks to capture characteristics. In a different way, SAEMDA [
28] extracts features through semantic similarity. In terms of the prediction of circRNA-disease associations, AE-RF algorithm [
29] also integrates many information sources to obtain the depth features. DMFCDA [
30] constructed a circRNA-disease matrix with explicit and implicit feedback to capture the non-linear features. Deng et al. [
31] constructed a heterogeneous information network (HIN) containing multiple subnetworks. A great deal of research has focused on the microbe-disease association prediction. The KATZHMDA [
32] introduced the Gaussian kernel to perform a complete and easy reconstruction of the microbe-disease relationship. The ABHMDA [
33] is a strong classifier based on the existing model to achieve better self-adaptability. Liu et al. [
34] used matrix decomposition based on neural networks to obtain nonlinear latent features to infer disease-related microbes. The NTSHMDA [
35] successfully reduced the prediction error by assigning random walks according to different weights.
Although the above methods have achieved good results, some problems and shortcomings still hinder more comprehensive potential feature mining. The lack of relevant biological data and information leads to noise in the calculated features, which reduces prediction accuracy. Existing association predictions are more dependent on the existing similarities in the database. When the number of ncRNAs and diseases increases, the existing calculation models are difficult to draw conclusions efficiently, so they are not suitable for large-scale data sets. Therefore, these methods are not applicable when predicting the relationship between multiple ncRNAs and drug resistance. Although more and more ncRNA-drug resistance associations have been determined and existing databases provide relevant data, the existing knowledge is still very limited compared with the unknown potential associations. Here we propose an efficient approach based on a linear residual graph convolutional network, LRGCPND, which only employs ncRNA and drug resistance validated interactions. Initially, LRGCPND constructs a bipartite graph through the association network of ncRNA and drug resistance, where the edges represent the hidden interaction factors between the two types of nodes. The unconnected edges may have associations that are not obvious to identify. LRGCPND then fleetly aggregates the intrinsic characteristics of neighbor nodes in the former layer and performs the linear transition. After the specified number of iterations, it fuses the embeddings of previous convolutional layers through residual learning to favorably explore the interactions between ncRNA and drug resistance. LRGCPND achieves the best performance compared with the other advanced computational methods. Case studies of two anti-cancer drugs demonstrate the practical capability of LRGCPND. The flow chart of LRGCPND is shown in
Figure 1.
2. Results and Discussion
2.1. Experimental Setup
To objectively and systematically evaluate the ability of LRGCPND and expedite comparison with other methods, we perform -fold cross-validation (-fold CV) on the collected dataset. All verified associations are randomly divided into parts. Each part is picked as positive samples with an equal quantity of unlabeled samples as negative samples to form the testing set. Meanwhile, the equivalent operation is performed on the remaining parts to obtain the training set. This process ends after iterations.
Even if there may be latent associations in the selected negative samples, since they account for a tiny proportion in the entire unverified sample set, the influence is negligible.
2.2. Evaluation Criteria
To observe intuitively and comprehensively, we measure the performance of models by widely adopted metrics, including AUC, AUPR,
(Acc.),
(P.),
(R.), and
scores, which are defined by the following formula:
and represent the number of correct and incorrect classifications in the related ncRNA-drug resistance pairs. In contrast, and represent the number of correct and incorrect classifications in the unrelated pairs. By adjusting the threshold, we can plot the receiver operating characteristic (ROC) curve and precision-recall (PR) curve and then calculate the area under the curves to get AUC and AUPR, respectively.
2.3. Performance Evaluation for LRGCPND
To evaluate the identification ability of our model, we performed five-fold and ten-fold CV on the dataset specified above.
Table 1 lists the specific results in five-fold CV, and
Figure 2 displays the ROC curves. In five-fold CV, the average values of AUC, AUPR, and Accuracy reach 0.8987, 0.9094, 0.8342, respectively. With the increasing size of the training set, training of the model will achieve a more thorough level. So, in ten-fold CV, the AUC increased to 0.9052. As seen from the above experimental results, LRGCPND can accurately and effectively identify potential ncRNAs related to drug resistance.
2.4. Effects of Parameters
For LRGCPND, there are two crucial parameters: the depth of propagation and the dimension of embedding, which influence the prediction capability. For one thing, we explored the impact of layer depth
, following the settings of other parameters constant. When
ranges from 1 to 5, we performed five-fold CV.
Table 2 lists the detailed data, and
Figure 3 is the trend chart of different indicators. Our model achieves the best performance when
is equal to 4.
For another thing, the embedding dimension
also has a critical role. When setting the value of
to 8, 16, 32, 64, 128 sequentially, we conducted five-fold CV to measure the impact on the prediction ability of our model.
Table 3 shows the detailed statistics, and
Figure 4 indicates the trend of diverse metrics. From the results, we can conclude that when
varies from 8 to 128, the performance first monotonically improves. That is because the larger embedding dimension enhances the expressivity of LRGCPND to a certain extent. When
is 32, it reaches the optimum. Then as
increases, it starts to produce adverse effects on the performance.
In other experiments, we employ the optimal values obtained above as the default of model parameters.
2.5. Comparison with Other Approaches
Since inferring ncRNA-drug resistance interactions is a relatively new area, no researchers have proposed relevant solutions already. Nonetheless, reviewing other association prediction methods in bioinformatics still provides significant references for the performance of our model. To further assess the effectiveness of LRGCPND, we compared it with seven advanced approaches in directions of lncRNA-disease, circRNA-disease, and microbe-disease.
For the sake of rigor, we need to point out that since AE-RF [
29] and ABHMDA [
33] employ other similarity-based features besides the Gaussian interaction profile (GIP) kernel similarity. Considering the scarcity of relevant biological resources and convenience, we only calculated the GIP similarity for them in the experiments. Furthermore, the adjacency matrix allocated at the beginning of training is different, so the topology information of the interaction network needs to be re-extracted. We re-calculated the GIP similarity matrices during each cross-validation process for similarity-based methods, AE-RF, KATZHMDA [
32], NTSHMDA [
35], and ABHMDA. As plotted in
Figure 5, it is evident that LRGCPND leads others with the average AUC value of 0.8987, which is 5.84% higher than the second-best method DMFMDA [
34].
From statistics of various metrics listed in
Table 4, except that the Recall value is slightly lower than ABHMDA, our model yields the optimal identification ability. Its AUPR, Accuracy, and F1 values achieve 0.9094, 0.8342, 0.8335, respectively. We also drew a radar chart to intuitively and comprehensively measure the capabilities of diverse models through various metrics, as shown in
Figure 6. All six evaluation metrics range from 0.4 to 1.0. The farther the point from the center of the circle, the higher the value. It is also apparent to conclude that LRGCPND advantages over other methods.
These experimental results sufficiently demonstrate that our model is reliable and promising in inferring candidate ncRNA-drug resistance pairs.
2.6. Case Studies
The discovery of unknown associations between ncRNA and drug resistance matters tremendously for practical application. Thus, we selected two drugs, Cisplatin and Paclitaxel, and conducted case studies. Precisely, for a particular drug, to start with, we removed the known associated ncRNAs. Then, the remaining ncRNAs were sorted in descending order following the values predicted by LRGCPND. Lastly, we screened the top 15 ncRNAs and searched for supporting evidence in published literature.
Cisplatin is a common chemotherapeutic drug used to treat numerous cancers, including lung cancer, head and neck cancer, and ovarian cancer. Resistance frequently causes reduced efficacy of Cisplatin in chemotherapy [
36]. Paclitaxel is another widely applied taxane medication. Chemoresistance to Paclitaxel makes its clinical application problematic [
37].
Table 5 and
Table 6 summarize the top 15 candidate ncRNAs of Cisplatin and Paclitaxel, respectively. We can see that 10 and 7 of the former and the latter are confirmed by existing evidence, indicating that our method has an excellent capability for predicting novel associated ncRNAs for drugs in terms of resistance. It is noteworthy that other unproven associations are likely to exist and deserve further relevant experiments.
4. Conclusions
Drug resistance response has caused vital challenges to clinical treatment. Numerous studies have indicated that ncRNA plays a pivotal role in the mechanisms of drug resistance. Accurately identifying the ncRNA-drug resistance association pairs is conducive to drug development and promotes clinical treatment. In this work, we propose LRGCPND, a graph convolutional network computational framework for mining the latent associations between ncRNA and drug resistance through linear transition and residual prediction. To our best knowledge, this is the first computational prediction approach in this field. We represent the relationship between ncRNA and drug resistance in a bipartite graph and exploit limited information to learn complex latent factors for edge prediction. LRGCPND first captures the neighborhood representations by aggregation. Then, it performs feature transformation through linear operations. Finally, the embedding vectors of convolutional layers are concatenated through residual blocks to achieve prediction.
Experimental results and case studies corroborate the effectiveness of our model, to which several aspects may contribute. We utilize graph convolution to perform relatively more adequate representation learning on the original association data with inadequate information. Residual blocks enable the model to attain higher-layer potential characteristics, and linear feature propagation keeps the model lightweight and flexible to extend to datasets on a large scale. In conclusion, our model is promising and facilitates further research in predicting novel associated ncRNAs for drug resistance. Our study helps build a systematic map of ncRNA and drug resistance, provides more insights into drug resistance, and aids in identifying effective therapeutic combinations.
As with many computational prediction methods, LRGCPND also has its limitations. First, LRGCNPND only utilizes ncRNA-drug resistance association data. The quality and coverage of the association data would affect the performance. Second, LRGCPND makes predictions with ncRNAs containing subtypes. Despite this provides insights from a broader perspective, differences between subtypes would cause bias. In the future, we will combine the attention mechanism and integrate multiple heterogeneous data to improve the performance further.