Next Article in Journal
Genetic Characteristic and RNA-Seq Analysis in Transparent Mutant of Carp–Goldfish Nucleocytoplasmic Hybrid
Next Article in Special Issue
Forming Big Datasets through Latent Class Concatenation of Imperfectly Matched Databases Features
Previous Article in Journal
A Comparative Transcriptome Analysis Reveals Physiological Maturation Properties of Mycelia in Pleurotus tuoliensis
Previous Article in Special Issue
A Super-Clustering Approach for Fully Automated Single Particle Picking in Cryo-EM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identifying Interaction Clusters for MiRNA and MRNA Pairs in TCGA Network

1
Department of Mathematics and Computer Science, Indiana State University, Terre Haute, IN 47809, USA
2
Department of Biology, Indiana State University, Terre Haute, IN 47809, USA
3
Carmel High School, 520 E. Main St. Carmel, IN 46032, USA
4
Seven Lakes High School, 9251 S Fry Rd, Katy, TX 77494, USA
5
Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
6
Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
*
Author to whom correspondence should be addressed.
Genes 2019, 10(9), 702; https://doi.org/10.3390/genes10090702
Submission received: 14 August 2019 / Accepted: 6 September 2019 / Published: 11 September 2019

Abstract

:
Existing methods often fail to recognize the conversions for the biological roles of the pairs of genes and microRNAs (miRNAs) between the tumor and normal samples. We have developed a novel cluster scoring method to identify messenger RNA (mRNA) and miRNA interaction pairs and clusters while considering tumor and normal samples jointly. Our method has identified 54 significant clusters for 15 cancer types selected from The Cancer Genome Atlas project. We also determined the shared clusters across tumor types and/or subtypes. In addition, we compared gene and miRNA overlap between lists identified in our liver hepatocellular carcinoma (LIHC) study and regulatory relationships reported from human and rat nonalcoholic fatty liver disease studies (NAFLD). Finally, we analyzed biological functions for the single significant cluster in LIHC and uncovered a significantly enriched pathway (phospholipase D signaling pathway) with six genes represented in the cluster, symbols: DGKQ, LPAR2, PDGFRB, PIK3R3, PTGFR and RAPGEF3.

1. Introduction

Transcription from gene to messenger RNA (mRNA) and translation from mRNA to protein are two essential stages for cells to perform biological functions. Non-coding RNAs (e.g., microRNA (miRNA), long non-coding RNA (lncRNA)) are often involved in many cellular processes, but mainly in post-transcriptional regulation. MiRNAs, which are short (17–22 nt) highly processed oligonucleotides, play their regulatory roles through either degradation or inhibition of protein translation of the targeting mRNAs [1].
The messenger RNA (mRNA), an outcome of gene transcription, is essential to performing biochemical functions in the cell. Different regulatory RNAs (e.g., miRNA and lncRNA) are important driving factors for the stable and successful translation of mRNA in a cell. Therefore, regulatory RNAs play a vital role in mRNA activation and suppression. Among these regulatory RNAs, the miRNA class is intensively studied at both the sequence and functional level. These MiRNAs are involved in post-transcriptional regulation of the target mRNA by two known mechanisms; the degradation of target mRNA and suppression of protein translation [1]. Thus, regulation of the miRNA and mRNA network is complex. A single miRNA can target many mRNAs, while many miRNAs are able to cooperatively target a single mRNA, in both degradation and inhibition contexts. This allows for fine-tuned gene expression regulation [2]. Understanding of these mechanisms has advanced significantly with the advent of high-throughput microarray-based technologies such as expression profiling.
Clusters in an mRNA–miRNA interaction network are often interaction complexes and/or parts of pathways. If interaction pair cluster(s) are significantly rendered in both tumor and normal samples, they could be important in the context of biological processes and/or cancers. Therefore, identifying such significant interaction clusters will identify genes and miRNAs functionally associated to various cancer molecular subtypes, with diagnostic and therapeutic implications. An undirected graph can be used to represent gene and miRNA relationships in an interaction network. Specifically, interactions between genes and miRNAs are rendered as a bipartite graph with genes or miRNAs as vertices and their interactions as edges (Figure 1). Thus, each vertex representing a gene or miRNA is connected to at least one of each other node since one gene or miRNA has at least one interaction with its interaction partner. Clusters in a network are formed by sets of vertices and edges with interconnections.
Clustering techniques are widely used to provide a reasonable understanding about gene function, regulation, and cellular process. A group of similar objects makes an exclusive class of objects (similar genes, same cluster), whereas dissimilar objects are distributed into different clusters. The analysis of mRNA–miRNA interaction is complex. Most existing graph-based clustering algorithms (also called graph partition algorithms or community detection algorithms) consider the topology of only a single instance (e.g., gene or miRNA) and treat all of nodes equivalently in the graph. Moreover, the complexity of the change of the correlation coefficients and expression values of the mRNA–miRNA pairs between tumor and normal samples is still not resolved, and this hinders the potential clinical applications. There is an urgent need to develop innovative methodologies and tools to accurately cluster mRNA–miRNA interaction pairs into functional miRNA–mRNA regulatory modules while analyzing tumor and normal samples jointly.
Previous studies on clustering such data were mainly based on microarray gene expression data [3,4] and can just analyze only samples within one category (e.g., tumor samples alone). In this study, a novel concurrent simultaneous clustering (or co-biclustering) method for identifying gene and miRNA interaction clusters in a network was designed and implemented. This novel method considers tumor and normal samples jointly in the same network.

2. Materials and Methods

2.1. Significant mRNA–miRNA Pairs Selection for Input Data

The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov) datasets were employed to generate the miRNA and mRNA expression files for evaluation. The TCGA data level 3 was used for the expression results. The University of North Carolina at Chapel Hill (UNC) and the Baylor College Human Genome Sequencing (BCGSC) data were considered for RNA-Seq and miRNA-Seq data respectively. Out of 33 cancer types (Supplementary file 1), we obtained RNA-Seq and miRNA-Seq data for 15 cancer types (eight of these cancer types were previously characterized [5], and seven additional cancer types are analyzed using the novel methods in this report) used for downstream analysis, after excluding cancer types which do not have corresponding tumor or normal samples. A computer C program was developed to calculate the Pearson correlation coefficient (CC). The targets prediction outcomes were testified using Targetprofiler [6], TargetScan [7] and miRanda [8]. We employed the same approach used in our previous study to claim target relationships if there is the match between pre-miRNA from TCGA datasets and the same or closely related mature miRNA from target prediction databases. The same target prediction criterion was applied if the prediction was supported by at least one of three databases mentioned above [5]. To filter out miRNA and mRNA pairs with insignificant CC, an R Script was written to compute the statistical significance (p-values and Q-values or false discovery rates (FDR)) for each calculated CC and select statistically significant pairs of miRNA and mRNA.
To check the expression change (up-regulation, down-regulation, and no change) of transcripts in cancer(s), we calculated average expression or fold change (FC) values for miRNAs and mRNAs in tumor and normal samples. The significant (FDR < 0.1) miRNA and mRNA pairs with inverse correlations between tumor and normal samples and with FC values greater than 1.5 were selected as input for the clustering algorithm to process.

2.2. Cluster Identification and Scoring Algorithm

We modified the Louvain algorithm [9] employed by NetworkX (https://networkx.github.io/) by considering the correlation coefficient values for both tumor and normal samples simultaneously to detect “communities” or clusters upon processing significant pairs selected for each of the 15 cancers in TCGA project. We used the following customized scoring algorithm to calculate the “score” for each detected cluster. Our assumption is that clusters consisting of gene and miRNA pairs having the most fold changes in their expression and with the highest correlations could be associated with cancers. Specifically, the scoring algorithm will (1) Calculate expression and CC values for genes and miRNAs in both tumor and normal samples; (2) Calculate and normalize deviation scores for features (node and edge) in clusters; (3) Compute the total scores for each classified cluster in the best partition; (4) Calculate the statistical significance for classified clusters and report top scored and significant clusters in the best partition.
s c o r e = i = 1 m | ln ( m R N A _ F C i )   1 + ln ( m R N A _ F C i 2 )   | + i = 1 n | ln ( m i R N A _ F C i )   1 + ln ( m i R N A _ F C i 2 ) | + i = 1 q ( | T _ C C i | + | N _ C C i | )
where mRNA_FCi is the fold change of gene expression, miRNA_FCi is the fold change of miRNA expression, T_CCi is the CC value in tumor samples, and N_CCi is the CC value in normal samples. m is the number of genes, n is the number of miRNAs, and q is the total number of edges in the bicluster.
There were four variables: mRNA fold change between normal and tumor samples (mRNA_FC), miRNA fold change between normal and tumor samples (miRNA_FC), mRNA and miRNA correlation coefficient in tumor sample (T_CC), and mRNA and miRNA correlation coefficient in normal sample (N_CC) which were used for score calculation. We added both correlation coefficient values (absolute values for negative correlation coefficient) into the formula to enhance the cluster score. We took absolute values of each variable, then we normalized mRNA_FC and miRNA_FC to make sure their values are between 0 to 1. In addition, we only selected the mRNA–miRNA pairs with large fold changes (cutoff value 1.5) for their miRNA and mRNA expression changed in the opposite direction ((FC > 1.5 for mRNA and FC < 1.5 for miRNA) OR (FC > 1.5 for miRNA and FC < 1.5 for mRNA)) to run the clustering algorithm. A high score was evidence that the observed cluster is associated with biologically-driven co-expression.

2.3. Statistical Analysis

The significance of a detected cluster was calculated using a permutation test approach. Specifically, the p-value Pk for a cluster Ck is defined as the probability of observing a cluster with a score of at least Sk, if the graph is generated “randomly” (i.e., when we do not expect to see any cluster in it). We used the permutation test to assess the statistical significance of highly scored clusters by sampling enough randomized graphs which will be generated by shuffling (or permuting) the nodes (mRNAs and miRNAs) without changing the topology of the graph. We then ran our partition and scoring algorithm on each of the randomized graphs. The p-value for a cluster detected by our algorithm in the original graph is estimated as the proportion of randomized graphs that has a cluster whose score is equal to or larger than the one detected in the original graph (Figure 2). For example, if we generated 1,000,000 “random” graphs, and among them there are 1000 “random” graphs from which we got a cluster with a score of at least Sk, then the p-value for a cluster with a score of Sk in the original (i.e., “non-random”) graph is Pk = 1001/1,000,001 = 0.001. Benjamini-Hochberg procedure was used to calculate false discovery rate (FDR) for adjusting the p values. For a low p-value, we reject the null hypothesis that no biological clusters are reflected in the observed data.

2.4. Determination of Shared Clusters Across Tumor Types and/or Subtypes

It was also our goal to identify clusters with common genes and miRNAs across several cancer types since they could be associated with several cancer diseases. To compare two graphs or clusters it was necessary to identify corresponding genes and miRNAs across two different clusters. A list of correspondence between the genes and miRNAs and their interactions can be regarded as a set of edges that connect the vertices across two different clusters.
Figure 3 shows the workflow of our proposed graph comparison method. Specifically, the steps of graph comparison algorithm will be implemented as follows:
  • Identify corresponding vertices (genes and miRNAs) and edges (their interactions) that connect their vertices in clusters between two different cancers (A and B);
  • Construct matrices to store vertices and edges;
  • Calculate the shortest “distance” as the number of edges between any two vertices for each cluster in cancer A and B, respectively;
  • Determine whether two clusters match based on their matching percentage (defined as the ratio of the number of corresponding vertex pairs with equal distance out of total matched vertex pairs).
The strategy has been expanded to compare clusters between multi-cancers. Specifically, we constructed a matrix with rows represented as connections in the common cluster across cancers and columns as cancer types. The cell in the matrix was filled with “distance” values. The number of common values in one row over the total number values of that row was computed as the row (r) percentage across multiple cancers; the total (t) percentage of the matrix was the ratio of sums of the numerator of all r percentages meeting the user-defined cutoff criterion over the total number of rows in the matrix.

2.5. Check the Overlap Between miRNAs Reported in LIHC and Differentially Expressed miRNAs from Studies of Human and Rat with Nonalcoholic Fatty Liver Disease

Nonalcoholic fatty liver disease (NAFLD) can lead to liver inflammation resulting in fibrosis, cirrhosis and finally in hepatocellular carcinoma [10,11]. Across a range of expression studies, pathways implicated in this progression include circadian rhythms [12], oncogenes and toll-like receptors [13], immune activation more generally [14], and reorganization of the extracellular matrix [15]. This range of associations shows that the molecular etiology of this progression is still a matter of controversy; but it is credible that molecular processes associated with early stages in this disease progression contribute directly to the manifestation of the associated cancer. Therefore, we took the significant clusters reported in TCGA LIHC datasets and searched for published miRNA lists from human associated with NAFLD [16] and Type 2 diabetes (T2D) [17], and other liver diseases [18].
In addition, we also checked the conservation of miRNA reported in the significant cluster of LIHC. In a study about the liver of Wistar rats [19], authors identified lists of most abundant miRNAs differentially expressed in NAFLD and normal rat liver and miRNAs with the largest F-change between NAFLD and normal liver. We used the above list to make the comparison.

3. Results

3.1. Inversely Correlated miRNA and mRNA Pairs with Opposite Fold Change

We ran customized correlation calculation and database prediction scripts to generate the filtered miRNA and mRNA co-expression data for the 15 selected cancer types. At an FDR of 0.1, we found 92,751 inversely-correlated miRNA and mRNA pairs. Of these, 45,882 pairs also showed opposing fold-change between tumor and control samples (Table 1 and Supplementary file 2).

3.2. Cluster Detection Results

We initially generated the miRNA–mRNA pairs using the approach described in our previous paper [5]. Specifically, we selected the significant miRNA–mRNA pairs with their expression correlation in tumor and normal samples that were inverse and the fold of change of expression of both mRNA and miRNA were great than 1.5 for cluster identification. Upon running the community detection Louvain algorithm [9] with our defined cluster score, the detected “communities” or clusters upon processing significant pairs selected in each of the 15 cancers in TCGA project are shown in Table 2.
For LIHC, the algorithm detected the largest number (114) of clusters but only one significant (q-value < 0.1) cluster: LIHC_57. LUAD and BRCA had the largest cluster sizes (908 for LUAD and 628 for BRCA). COAD contained the smallest number (20) of clusters. LUAD also has the highest number (9) of significant clusters represented the significant ratio 43% (9/21) of initial clusters (Table 2).
The COAD clusters had the lowest score distribution and the scores for clusters in LUAD were shown to be the highest. This pattern was consistent with the distribution trend of the numbers of their clusters. The distribution of “maximum” scores for detected clusters during permutation tests in 15 cancers is shown in Figure 4. The detailed information including cluster sizes and cluster scores for identified clusters is shown in Supplementary file 3.

3.3. Cross-Cancer Comparison Results

Under the cutoff criteria of the total percentage 0.5 and row percentage 0.6, we have identified 393 clusters which have their pairs available in at least one of the other cancers for comparison. From LUAD_12 cluster had the largest number of matchable clusters or neighbors (152) with at least one common pair identified. LIHC_57 showed the highest similarity percentage vs having high total common percentage among the cluster pool with matchable clusters 2 or less (Figure 5). In LIHC_57, LUAD_12 was identified to be one of similar clusters with it. Specifically, PCSK1N and hsa-mir-378 pair is reported in common between LIHC_57 and LUAD_12. GPR143 and hsa-mir-378 pair was also in common between LIHC_57, KIRC_33, and KIRP_49, which indicate these cancer clusters could have common driving transcription regulation patterns due to the fact that they share common miRNA–mRNA pairs. The detailed information across multiple cancers is reported in Supplementary files 4–18. A list of compared clusters along with three metrices (number_of_neighbors, similarity_percentage, common_pair_percentage) is reported in Supplementary file 19.

3.4. Investigation of miRNAs and Their Targets Overlap Between Lists in LIHC and the Ones Reported from the Study in Human with Nonalcoholic Fatty Liver Disease

Given that LIHC has the largest number (114) of clusters, we are interested in checking if any miRNAs/genes reported in LIHC are also prevalent in other liver associated diseases such as, NAFLD. We compared our miRNAs (224) in LIHC with the list of hepatic miRNAs (44) upregulated in human with NAFLD [16], and total 11 out of 41 (3 of them (2 Epstein–Barr viruses: ebv-miRBART18-3p and ebv-miRBART17-3p and 1 Herpes simplex virus: hsv2-miR-H20 were excluded in the analysis) miRNAs or 32 target pairs were identified to be present in LIHC clusters (Table 3). In checking the miRNA list (hsa-mir-17, hsa-mir-20a, hsa-mir-20b, and hsa-mir-122) upregulated in type 2 diabetes mellitus (T2DM) patients with NAFLD complication reported in another study [17], all these miRNAs have been identified in our LIHC cluster list. In addition, a well-known miRNA (hsa-mir-22) involved in hepatocellular carcinoma a cell migration and invasion [18] in liver disease has been identified in our LIHC cluster list.

3.5. Investigation of miRNA Overlap Between Differentially Expressed miRNA List in Rat with Nonalcoholic Fatty Liver Disease and the Ones in LIHC

We also checked the results for miRNA–mRNA in LIHC against the miRNAs (21) identified in the study [19] to see if there are any miRNAs associated with liver cancer which are also associated with NAFLD in rat.
Upon checking the gene/miRNA list reported in a study for rat with NAFLD, out of the 10 most abundant miRNAs differentially expressed in NAFLD and normal rat liver provided in the above mentioned study [19], six miRNAs (mir-122, let-7c, let-7b, mir-192, mir-29a, mir-21) were also identified to be present in LIHC clusters; out of the 10 miRNAs with the largest F-change between NAFLD and normal liver, four miRNAs (mir-132, mir-99a, mir-200c, mir-145) were also reported to be present in LIHC clusters. A list of overlapping miRNAs and their targeted gene pairs is reported in Table 4.

3.6. Cluster Functional Analysis for LIHC

We performed functional annotation and pathway analysis for 56 genes reported in cluster LIHC_17 using ClusterProfiler [20]. From the annotation result, four genes (PDGFRB, PIK3IP1, PIK3R3, WDR91) tagged with the Gene Ontology (GO) term biological process “regulation of phosphatidylinositol 3-kinase activity” and “lipid kinase activity” (q-value < 0.01) and “regulation of phospholipid metabolic process” (q-value = 0.02) were significantly enriched. In addition, three genes (ACVRL1, COL3A1, PDGFRB) have been reported to be significantly enriched (q-value = 0.04) with GO term “aorta morphogenesis”. In searching for pathway enrichment for this gene list, a significant (q-value < 0.002) pathway “phospholipase D signaling pathway” (KEGG ID: hsa04072) has been identified to be enriched for six genes (DGKQ, LPAR2, PDGFRB, PIK3R3, PTGFR, RAPGEF3). The pathways downstream of phospholipase D (PLD) are involved in oncogenic transformation. The research showed that membrane-associated phospholipase D can be activated by the small Guanosine-5’-triphosphate (GTP)-binding protein RhoA in rat liver [21]. PDGFRB or platelet-derived growth factor receptor B is a protein-coding gene and essential for normal development of the cardiovascular system. In a recent study [22], it has been reported to have the function of stimulating invasion and liver metastasis formation of mesenchymal-like colorectal tumor cells in mice. The gene PIK3R3 has been reported to play an important role in colorectal cancer metastasis [23]. Blocking PIK3R3 can prevent colorectal cancer liver metastasis in animal model [24]. A list of miRNAs targeting six genes involved in Phospholipase D signaling pathway is reported in Table 5.

4. Discussion

Existing tools cannot reveal the biological roles (e.g., cancer association) of pairs of genes and miRNAs due to their lack of consideration of the “inverse/altered regulation” between tumor and normal samples concurrently. Our proposed method considers the topology of genes in the network and takes as input inversely regulated target pairs containing their target prediction relationship predicted by several target database prediction algorithms to identify significant target pairs and elucidate cancer and disease associated signatures of clusters.
As a direct clinical application on our novel method, we have conducted the functional annotation for the significant clusters identified in BRCA datasets. Our analysis has confirmed that breast cancer related GO terms (cell cycle and chromosome) are enriched in some of the identified significant clusters. These clusters often contain more previously reported breast cancer risk genes than other identified significant clusters not enriched with breast cancer GO terms.
As biomarkers (of cancer, of cancer progression, of cancer resistance to chemotherapy, etc.), clusters of correlated expression levels will generically be more robust than any individual marker [25]. In conventional blood biochemical biomarkers, well-studied examples where ratios of related markers outperform individual markers include such liver disease scores as APRI [26] and FIB-4 [27], each of which includes the ratio of aspartate aminotransferase (AST) to platelets in the blood. As the field matures, the methods developed here will support similar and more robust ratio biomarkers as a natural extension of coding-gene differential expression [28].
There are previous studies of miRNA and mRNA regulatory networks in cancer. Those studies generally use expression profiles of miRNA and mRNAs and different clustering algorithms and/or statistical analyses to identify the potential miRNA–mRNA modules or clusters. Such studies usually also perform miRNA–mRNA pair selection based on miRNA–mRNA interaction experimental databases or prediction algorithms, functional enrichment analyses of the genes or proteins, disease association, and other analyses in order to relate the miRNA and mRNAs in modules to the cancer types/subtypes of interest or survival probability. Specifically, in a study of colorectal cancer, the rough hypercuboid based supervised clustering algorithm (RH-SAC) was used to generate clusters of functionally similar miRNAs or mRNAs whose coherent expression can further efficiently classify the samples [29]. In a study of multiple myelomas, through integrative analysis of GO biological processes, miRNA–mRNA targeting relationship, and matched miRNA and mRNA expression data, the ping-pong algorithm and multiobjective genetic algorithm were integrated to detect subtype-specific miRNA–mRNA regulatory modules [30]. In a study of glioblastomas, mRNA, miRNA, and protein expression profiles were integrated to identify regulatory networks. Cancer-related miRNAs were ranked based on the amount of identified correlated genes, including both protein and mRNA datasets. Then modules containing mRNAs, proteins, and miRNAs, in which the three molecular types are highly correlated, were clustered by the SAMBA bi-clustering algorithm, a Bayesian network model, and an extended step in which proteins are included into mRNA sample modules prior to the miRNAs’ inclusion [31]. Compared to other studies, our research is unique and novel in terms of selection of the significant miRNA–mRNA pairs with their expression correlation in tumor and normal samples that were inverse and the fold of change of expression of both mRNA and miRNA were great than 1.5. We used the modified Louvain algorithm to detect “communities” or clusters cluster from the bipartite graph of miRNA and mRNA vertices based on their correlation coefficient values for both tumor and normal samples simultaneously. We scored the detected clusters to see if they are significant. A graph comparison algorithm in our study was expanded to compare clusters across cancer types.

5. Conclusions

We believe our study is the first attempt to employ the idea of “inverse/altered regulation” and integrate multiple cancer specific databases with mRNA–miRNA interaction. Next, we plan to develop a powerful and user-friendly mRNA–miRNA functional annotation tool for visualizing cluster interactions in both tumor and normal samples of various cancer types. This effort will not only give a detailed presentation of miRNA associated with various types of human cancers, but will also provide a comprehensive analysis of gene ontology and mRNA-cancer associations.
By providing comprehensive and accessible tools to confront this growing class of cancer sequencing big data, our results provided a list of candidate cancer-associated genes and miRNAs with their biological functions and could shift current research and/or clinical practice paradigms. Our proposed method is applicable across a range of diseases and cancers and has uniquely distinctive advantages over existing tools. This will likewise contribute to new bioinformatics methodologies for identifying cancer driver genes in personal genomes in which clinicians seek to develop better treatment strategies.

Supplementary Materials

The following are available online at https://www.mdpi.com/2073-4425/10/9/702/s1, Supplementary file 1: The sequencing data information for TCGA 33 cancer types, Supplementary file 2: Selected anti-correlated miRNA-mRNA pairs passing cutoff criterion (q < 0.1) of statistical significance, Supplementary file 3: Clustering results for 15 selected cancers, Supplementary file 4: BLCA cluster similarity result, Supplementary file 5: BRCA cluster similarity result, Supplementary file 6: COAD cluster similarity result, Supplementary file 7: ESCA cluster similarity result, Supplementary file 8: HNSC cluster similarity result, Supplementary file 9: KICH cluster similarity result, Supplementary file 10: KIRC cluster similarity result, Supplementary file 11: KIRP cluster similarity result, Supplementary file 12: LIHC cluster similarity result, Supplementary file 13: LUAD cluster similarity result, Supplementary file 14: LUSC cluster similarity result, Supplementary file 15: PRAD cluster similarity result, Supplementary file 16: STAD cluster similarity result, Supplementary file 17: THCA cluster similarity result, Supplementary file 18: UCEC cluster similarity result, Supplementary file 19: Cluster comparison with three metrices (number_of_neighbors, similarity_percentage, common_pair_percentage).

Author Contributions

Wrote codes, generated the results, and plotted figures, participated in the revision, X.D.; did the cluster functional analysis, drew diagrams and graphs, participated in the revision, L.D.; participated in revising figures and formatting the manuscript, H.L.; participated in the revision, Z.X.; wrote R code for significance calculation of miRNA–mRNA pairs and offered statistical advice on cluster significance computation, participated in the revision, H.J.; participated in the review and revision of the manuscript, S.H.; designed and guided the project, compared LIHC and NAFLD datasets, summarized results, and wrote and revised the manuscript, Y.B.

Funding

This research was supported by funds from ISU to YB to pay student salaries. The authors would like to thank Department of Internal Medicine and Health Information Technology and Services at the University of Michigan for their support.

Acknowledgments

The authors thank Rena Wen-Rui Li, Jenny Bai, Zheyun Feng, Ali Salman, Hongtao Li and other members in Bai Lab for research assistance and reviewers for useful discussions and suggestions.

Conflicts of Interest

The authors declare that they have no competing interests.

Abbreviations

TCGAThe Cancer Genome Atlas
LIHCLiver hepatocellular carcinoma
KIRCKidney renal clear cell carcinoma
KIRPKidney renal papillary cell carcinoma
LUADLung adenocarcinoma
BLCABladder urothelial carcinoma
BRCABreast invasive carcinoma
COADColon adenocarcinoma
ESCAEsophageal carcinoma
HNSCHead and neck squamous cell carcinoma
KICHKidney chromophobe
LUSCLung squamous cell carcinoma
PRADProstate adenocarcinoma
STADStomach adenocarcinoma
THCAThyroid carcinoma
UCECUterine corpus endometrial carcinoma

References

  1. Peter, M.E. Targeting of mrnas by multiple mirnas: The next step. Oncogene 2010, 29, 2161–2164. [Google Scholar] [CrossRef] [PubMed]
  2. Nunez, Y.O.; Truitt, J.M.; Gorini, G.; Ponomareva, O.N.; Blednov, Y.A.; Harris, R.A.; Mayfield, R.D. Positively correlated mirna-mrna regulatory networks in mouse frontal cortex during early stages of alcohol dependence. BMC Genom. 2013, 14, 725. [Google Scholar] [CrossRef] [PubMed]
  3. Liu, J.; Li, Z.; Hu, X.; Chen, Y. Biclustering of microarray data with mospo based on crowding distance. BMC Bioinform. 2009, 10, S9. [Google Scholar] [CrossRef] [PubMed]
  4. Tanay, A.; Sharan, R.; Shamir, R. Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002, 18, S136–S144. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Bai, Y.; Ding, L.; Baker, S.; Bai, J.M.; Rath, E.; Jiang, F.; Wu, J.; Jiang, H.; Stuart, G. Dissecting the biological relationship between TCGA miRNA and mRNA sequencing data using MMiRNA-Viewer. BMC Bioinform. 2016, 17, 336. [Google Scholar] [CrossRef] [PubMed]
  6. Oulas, A.; Karathanasis, N.; Louloupi, A.; Iliopoulos, I.; Kalantidis, K.; Poirazi, P. A new microrna target prediction tool identifies a novel interaction of a putative miRNA with CCND2. RNA Biol. 2012, 9, 1196–1207. [Google Scholar] [CrossRef]
  7. Agarwal, V.; Bell, G.W.; Nam, J.; Bartel, D.P. Predicting effective microrna target sites in mammalian mrnas. eLife 2015, 4, e05005. [Google Scholar] [CrossRef] [PubMed]
  8. John, B.; Enright, A.J.; Aravin, A.; Tuschl, T.; Sander, C.; Marks, D.S. Human microrna targets. PLoS Biol. 2004, 2, e363. [Google Scholar] [CrossRef]
  9. Vincent, D.B.; Jean-Loup, G.; Renaud, L.; Etienne, L. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [Green Version]
  10. Cohen, J.C.; Horton, J.D.; Hobbs, H.H. Human fatty liver disease: Old questions and new insights. Science 2011, 332, 1519–1523. [Google Scholar] [CrossRef]
  11. Chalasani, N.; Younossi, Z.; Lavine, J.E.; Diehl, A.M.; Brunt, E.M.; Cusi, K.; Charlton, M.; Sanyal, A.J. The diagnosis and management of non-alcoholic fatty liver disease: Practice guideline by the american association for the study of liver diseases, american college of gastroenterology, and the american gastroenterological association. Hepatology 2012, 55, 2005–2023. [Google Scholar] [CrossRef] [PubMed]
  12. Naik, A.; Kosir, R.; Rozman, D. Genomic aspects of nafld pathogenesis. Genomics 2013, 102, 84–95. [Google Scholar] [CrossRef] [PubMed]
  13. Liu, X.-L.; Ming, Y.-N.; Zhang, J.-Y.; Chen, X.-Y.; Zeng, M.-D.; Mao, Y.-M. Gene-metabolite network analysis in different nonalcoholic fatty liver disease phenotypes. Exp. Mol. Med. 2017, 49, e283. [Google Scholar] [CrossRef] [PubMed]
  14. Ryaboshapkina, M.; Hammar, M. Human hepatic gene expression signature of non-alcoholic fatty liver disease progression, a meta-analysis. Sci. Rep. 2017, 7, 12361. [Google Scholar] [CrossRef] [PubMed]
  15. Lou, Y.; Tian, G.Y.; Song, Y.; Liu, Y.L.; Chen, Y.D.; Shi, J.P.; Yang, J. Characterization of transcriptional modules related to fibrosing-nafld progression. Sci. Rep. 2017, 7, 4748. [Google Scholar] [CrossRef] [PubMed]
  16. Soronen, J.; Yki-Jarvinen, H.; Zhou, Y.; Sadevirta, S.; Sarin, A.P.; Leivonen, M.; Sevastianova, K.; Perttila, J.; Laurila, P.P.; Sigruener, A.; et al. Novel hepatic micrornas upregulated in human nonalcoholic fatty liver disease. Physiol. Rep. 2016, 4, e12661. [Google Scholar] [CrossRef] [PubMed]
  17. Ye, D.; Zhang, T.; Lou, G.; Xu, W.; Dong, F.; Chen, G.; Liu, Y. Plasma mir-17, mir-20a, mir-20b and mir-122 as potential biomarkers for diagnosis of nafld in type 2 diabetes mellitus patients. Life Sci. 2018, 208, 201–207. [Google Scholar] [CrossRef] [PubMed]
  18. Luo, L.-J.; Zhang, L.-P.; Duan, C.-Y.; Wang, B.; He, N.-N.; Abulimiti, P.; Lin, Y. The inhibition role of mir-22 in hepatocellular carcinoma cell migration and invasion via targeting cd147. Cancer Cell Int. 2017, 17, 17. [Google Scholar] [CrossRef] [PubMed]
  19. Zhang, D.; Wang, Y.; Ji, Z.; Wang, Z. Identification and differential expression of micrornas associated with fat deposition in the liver of wistar rats with nonalcoholic fatty liver disease. Gene 2016, 585, 1–8. [Google Scholar] [CrossRef] [PubMed]
  20. Yu, G.; Wang, L.; Han, Y.; He, Q. Clusterprofiler: An r package for comparing biological themes among gene clusters. OMICS J. Integr. Biol. 2012, 16, 284–287. [Google Scholar] [CrossRef]
  21. Malcolm, K.C.; Ross, A.H.; Qiu, R.G.; Symons, M.; Exton, J.H. Activation of rat liver phospholipase d by the small gtp-binding protein rhoa. J. Biol. Chem. 1994, 269, 25951–25954. [Google Scholar]
  22. Steller, E.J.; Raats, D.A.; Koster, J.; Rutten, B.; Govaert, K.M.; Emmink, B.L.; Snoeren, N.; van Hooff, S.R.; Holstege, F.C.; Maas, C.; et al. Pdgfrb promotes liver metastasis formation of mesenchymal-like colorectal tumor cells. Neoplasia 2013, 15, 204–217. [Google Scholar] [CrossRef]
  23. Wang, G.; Yang, X.; Li, C.; Cao, X.; Luo, X.; Hu, J. Pik3r3 induces epithelial-to-mesenchymal transition and promotes metastasis in colorectal cancer. Mol. Cancer Ther. 2014, 13, 1837–1847. [Google Scholar] [CrossRef]
  24. Hu, J.; Xia, X.; Cheng, A.; Wang, G.; Luo, X.; Reed, M.F.; Fojo, T.; Oetting, A.; Gong, J.; Yen, P.M. A peptide inhibitor derived from p55pik phosphatidylinositol 3-kinase regulatory subunit: A novel cancer therapy. Mol. Cancer Ther. 2008, 7, 3719. [Google Scholar] [CrossRef]
  25. Wu, X.; Chen, L.; Wang, X. Network biomarkers, interaction networks and dynamical network biomarkers in respiratory diseases. Clin. Transl. Med. 2014, 3, 16. [Google Scholar] [CrossRef]
  26. Lin, Z.H.; Xin, Y.N.; Dong, Q.J.; Wang, Q.; Jiang, X.J.; Zhan, S.H.; Sun, Y.; Xuan, S.Y. Performance of the aspartate aminotransferase-to-platelet ratio index for the staging of hepatitis c-related fibrosis: An updated meta-analysis. Hepatology 2011, 53, 726–736. [Google Scholar] [CrossRef]
  27. Sterling, R.K.; Lissen, E.; Clumeck, N.; Sola, R.; Correa, M.C.; Montaner, J.; Sulkowski, M.S.; Torriani, F.J.; Dieterich, D.T.; Thomas, D.L.; et al. Development of a simple noninvasive index to predict significant fibrosis in patients with hiv/hcv coinfection. Hepatology 2006, 43, 1317–1325. [Google Scholar] [CrossRef]
  28. Yang, B.; Li, M.; Tang, W.; Liu, W.; Zhang, S.; Chen, L.; Xia, J. Dynamic network biomarker indicates pulmonary metastasis at the tipping point of hepatocellular carcinoma. Nat. Commun. 2018, 9, 678. [Google Scholar] [CrossRef]
  29. Paul, S.; Lakatos, P.; Hartmann, A.; Schneider-Stock, R.; Vera, J. Identification of mirna-mrna modules in colorectal cancer using rough hypercuboid based supervised clustering. Sci. Rep. 2017, 7, 42809. [Google Scholar] [CrossRef]
  30. Zhang, Y.; Liu, W.; Xu, Y.; Li, C.; Wang, Y.; Yang, H.; Zhang, C.; Su, F.; Li, Y.; Li, X. Identification of subtype specific mirna-mrna functional regulatory modules in matched mirna-mrna expression data: Multiple myeloma as a case. Biomed. Res. Int. 2015, 2015, 15. [Google Scholar] [CrossRef]
  31. Seo, J.; Jin, D.; Choi, C.H.; Lee, H. Integration of microrna, mrna, and protein expression data for the identification of cancer-related micrornas. PLoS ONE 2017, 12, e0168412. [Google Scholar] [CrossRef]
Figure 1. Bipartite diagram for messenger RNA (mRNA) (blue, left) and microRNA (miRNA) (red, right) interaction pairs, with a cluster identified by a purple ellipse.
Figure 1. Bipartite diagram for messenger RNA (mRNA) (blue, left) and microRNA (miRNA) (red, right) interaction pairs, with a cluster identified by a purple ellipse.
Genes 10 00702 g001
Figure 2. Workflow for statistical significance test for gene and miRNA interaction clusters: (1) miRNA and mRNA pairs with target relationship; (2) cluster identification; (3) permutation test.
Figure 2. Workflow for statistical significance test for gene and miRNA interaction clusters: (1) miRNA and mRNA pairs with target relationship; (2) cluster identification; (3) permutation test.
Genes 10 00702 g002
Figure 3. A schematic representation of the graph comparison algorithm to detect correlated clusters or local similarities in two graphs. The count was calculated as the number of connections between two nodes.
Figure 3. A schematic representation of the graph comparison algorithm to detect correlated clusters or local similarities in two graphs. The count was calculated as the number of connections between two nodes.
Genes 10 00702 g003
Figure 4. The distribution of cluster scores with max score values when running Louvain algorithm 10,000 times for 15 cancers.
Figure 4. The distribution of cluster scores with max score values when running Louvain algorithm 10,000 times for 15 cancers.
Genes 10 00702 g004
Figure 5. Cluster comparison results of 15 cancers for three different categories in the number of common mRNA–miRNA pairs across the comparison.
Figure 5. Cluster comparison results of 15 cancers for three different categories in the number of common mRNA–miRNA pairs across the comparison.
Genes 10 00702 g005
Table 1. The statistics for number of miRNA–mRNA pairs in 15 selected cancers from The Cancer Genome Atlas (TCGA).
Table 1. The statistics for number of miRNA–mRNA pairs in 15 selected cancers from The Cancer Genome Atlas (TCGA).
Cancer TypesNumber of miRNA–mRNA Pairs with Inverse CorrelationsNumber of miRNA–mRNA Pairs with Inverse Correlations and Opposite Fold Change Between Tumor and Normal Samples
BLCA998578
BRCA20,66110,101
COAD8255
ESCA344155
HNSC30661601
KICH1039442
KIRC10,7496189
KIRP61433190
LIHC1426659
LUAD26,38012,874
LUSC265171
PRAD69723801
STAD12,8925108
THCA1326744
UCEC408214
Total92,75145,882
Notes: The italic rows are cancer types included for the analysis in this study in addition to cancer types selected for the analysis in our previous study (Bai et al., 2016).
Table 2. The detected “communities” or “clusters” of significant pairs selected for each of 15 selected cancers from TCGA.
Table 2. The detected “communities” or “clusters” of significant pairs selected for each of 15 selected cancers from TCGA.
Cancer TypesTotal Number of Detected ClustersNumber of Detected Significant Clusters (FDR < 0.1)
BLCA282
BRCA338
COAD200
ESCA420
HNSC964
KICH641
KIRC518
KIRP624
LIHC1141
LUAD219
LUSC392
PRAD523
STAD398
THCA574
UCEC700
Total78854
Table 3. The list of matched miRNAs upregulated in human with nonalcoholic fatty liver disease (NAFLD) and their targeted genes.
Table 3. The list of matched miRNAs upregulated in human with nonalcoholic fatty liver disease (NAFLD) and their targeted genes.
GenemiRNA
ITPKBhsa-mir-106b
CD69hsa-mir-106b
EPHA4hsa-mir-106b
APOBEC3Hhsa-mir-106b
CYP2U1hsa-mir-106b
ZNFX1hsa-mir-106b
CNTNAP1hsa-mir-505
EFCAB1hsa-mir-505
BTG1hsa-mir-505
HPRT1hsa-mir-505
PAMhsa-mir-505
IRF2BP2hsa-mir-505
FSThsa-mir-505
CLDN23hsa-mir-505
SIN3Ahsa-mir-20b
XPR1hsa-mir-2355
C7orf49hsa-mir-2355
ZDHHC23hsa-mir-2355
VANGL1hsa-mir-2355
SSX2IPhsa-mir-584
DYNLT3hsa-mir-584
ESR1hsa-mir-584
ARL15hsa-mir-877
MESThsa-mir-181d
TBCChsa-mir-374b
GUCY1A2hsa-mir-551b
SCO1hsa-mir-200b
CASC4hsa-mir-200b
FAM169Ahsa-mir-200b
UGGT1hsa-let-7b
PLEKHA6hsa-let-7b
ATP6V1C1hsa-let-7b
Table 4. A list of identified miRNAs and their targeted gene pairs.
Table 4. A list of identified miRNAs and their targeted gene pairs.
GenemiRNA
DTNAmir-122
SMYD2mir-122
IGF2mir-122
KYNUmir-122
DBNDD1mir-122
SYNCRIPlet-7c
KIF5Blet-7c
MGAT4Alet-7c
PDLIM2let-7c
LDHDlet-7c
PLCB1let-7c
BDH1let-7c
STXBP4let-7c
UGGT1let-7b
PLEKHA6let-7b
ATP6V1C1let-7b
CBX7mir-192
ZC3H10mir-192
RAB2Amir-192
TRIM66mir-192
MYO1Emir-192
ING5mir-192
SYAP1mir-192
P2RX4mir-29a
ZNF286Bmir-29a
CNDP2mir-29a
GPR146mir-29a
BMFmir-29a
SSTR2mir-29a
NLNmir-29a
AMICA1mir-29a
SYNMmir-29a
PRPF3mir-29a
CHST10mir-29a
ZNF160mir-29a
NDNmir-29a
MTMR2mir-29a
ZNF431mir-29a
NAP1L1mir-29a
ATP6V0E2mir-29a
ATPAF1mir-29a
MORF4L1mir-29a
PRR3mir-29a
CPT2mir-29a
DNAJA3mir-29a
RIT1mir-29a
UCP3mir-29a
ZNF35mir-21
WDR72mir-21
KIAA1804mir-21
LAMP2mir-21
PFN2mir-21
NFASCmir-21
FABP4mir-21
C7mir-21
STK3mir-21
RASGRF1mir-132
STK3mir-132
PFN2mir-132
MESTmir-132
NCALDmir-132
C9orf156mir-132
LAMP2mir-99a
RCBTB1mir-99a
KPTNmir-99a
RPS20mir-99a
ZDHHC18mir-99a
ABCB4mir-200c
PGAM1mir-200c
SCO1mir-200c
IGFBP2mir-145
PRPF38Amir-145
CDK5RAP3mir-145
RBMXmir-145
MGLLmir-145
Table 5. A list of miRNAs targeted six genes involved in phospholipase D signaling pathway.
Table 5. A list of miRNAs targeted six genes involved in phospholipase D signaling pathway.
GenemiRNA
DGKQmir-140
LPAR2mir-140
PDGFRBmir-186
PIK3R3mir-151
PIK3R3mir-148b
PIK3R3mir-589
PTGFRmir-107
RAPGEF3mir-454
RAPGEF3mir-93
RAPGEF3mir-25
RAPGEF3mir-186
RAPGEF3mir-942

Share and Cite

MDPI and ACS Style

Dai, X.; Ding, L.; Liu, H.; Xu, Z.; Jiang, H.; Handelman, S.K.; Bai, Y. Identifying Interaction Clusters for MiRNA and MRNA Pairs in TCGA Network. Genes 2019, 10, 702. https://doi.org/10.3390/genes10090702

AMA Style

Dai X, Ding L, Liu H, Xu Z, Jiang H, Handelman SK, Bai Y. Identifying Interaction Clusters for MiRNA and MRNA Pairs in TCGA Network. Genes. 2019; 10(9):702. https://doi.org/10.3390/genes10090702

Chicago/Turabian Style

Dai, Xinqing, Lizhong Ding, Hannah Liu, Zesheng Xu, Hui Jiang, Samuel K Handelman, and Yongsheng Bai. 2019. "Identifying Interaction Clusters for MiRNA and MRNA Pairs in TCGA Network" Genes 10, no. 9: 702. https://doi.org/10.3390/genes10090702

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop