Next Article in Journal
Super-Spreaders or Victims of Circumstance? Childhood in Canadian Media Reporting of the COVID-19 Pandemic: A Critical Content Analysis
Next Article in Special Issue
A Noninvasive Risk Stratification Tool Build Using an Artificial Intelligence Approach for Colorectal Polyps Based on Annual Checkup Data
Previous Article in Journal
Artificial Intelligence Advances in the World of Cardiovascular Imaging
Previous Article in Special Issue
Real-Time Tracking of Human Neck Postures and Movements
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Artificial Intelligence Analysis of Gene Expression Predicted the Overall Survival of Mantle Cell Lymphoma and a Large Pan-Cancer Series

1
Department of Pathology, Faculty of Medicine, Tokai University School of Medicine, 143 Shimokasuya, Isehara 259-1193, Japan
2
Department of Clinical Sciences, College of Medicine, University of Sharjah, Sharjah P.O. Box 27272, United Arab Emirates
3
Division of Surgery and Interventional Science, University College London, Gower Street, London WC1E 6BT, UK
*
Author to whom correspondence should be addressed.
Healthcare 2022, 10(1), 155; https://doi.org/10.3390/healthcare10010155
Submission received: 29 October 2021 / Revised: 10 January 2022 / Accepted: 12 January 2022 / Published: 14 January 2022

Abstract

:
Mantle cell lymphoma (MCL) is a subtype of mature B-cell non-Hodgkin lymphoma characterized by a poor prognosis. First, we analyzed a series of 123 cases (GSE93291). An algorithm using multilayer perceptron artificial neural network, radial basis function, gene set enrichment analysis (GSEA), and conventional statistics, correlated 20,862 genes with 28 MCL prognostic genes for dimensionality reduction, to predict the patients’ overall survival and highlight new markers. As a result, 58 genes predicted survival with high accuracy (area under the curve = 0.9). Further reduction identified 10 genes: KIF18A, YBX3, PEMT, GCNA, and POGLUT3 that associated with a poor survival; and SELENOP, AMOTL2, IGFBP7, KCTD12, and ADGRG2 with a favorable survival. Correlation with the proliferation index (Ki67) was also made. Interestingly, these genes, which were related to cell cycle, apoptosis, and metabolism, also predicted the survival of diffuse large B-cell lymphoma (GSE10846, n = 414), and a pan-cancer series of The Cancer Genome Atlas (TCGA, n = 7289), which included the most relevant cancers (lung, breast, colorectal, prostate, stomach, liver, etcetera). Secondly, survival was predicted using 10 oncology panels (transcriptome, cancer progression and pathways, metabolic pathways, immuno-oncology, and host response), and TYMS was highlighted. Finally, using machine learning, C5 tree and Bayesian network had the highest accuracy for prediction and correlation with the LLMPP MCL35 proliferation assay and RGS1 was made. In conclusion, artificial intelligence analysis predicted the overall survival of MCL with high accuracy, and highlighted genes that predicted the survival of a large pan-cancer series.

1. Introduction

Mantle cell lymphoma (MCL) is a hematological neoplasia derived from B-lymphocytes, and a subtype of non-Hodgkin lymphomas (NHL) [1]. MCL represents around 7% of adult NHL, and has an incidence of four to eight cases per million people per year [2,3,4,5,6]. MCL affects white men, with a median age at diagnosis of 65 years. The disease frequency increases with age [7], and the incidence of this disease is on the rise in Western and developed countries [7].
MCL is a B-cell lymphoma of small and irregular cells (centrocytes) [8]. The immunophenotype of the classic variant is characterized by the expression of B-cell markers (CD19, CD20), CD5, SOX11, and cyclin D1 due to the characteristics translocation t(11; 14) (q13; q32) between CCND1 and IGH locus [9,10,11]. MCL expresses high levels of IgM and IgD, with a lambda light chain restriction in 80% of the cases [8,12]. At diagnosis, most of the patients present with an advanced disease, and lymphadenopathy. Primary extranodal disease is found in 20% of cases, and the gastrointestinal site in the form of lymphomatous polyposis is a characteristic location [13,14,15].
MCL has traditionally been considered a very aggressive and incurable lymphoma. MCL is associated with a median survival of 3–5 years, with most patients not being cured even with the newer therapeutic modalities [1,8,16]. The “leukemic” variant, which is SOX11-negative, is clinically indolent [17]. Several studies have focused on the identification of prognostic markers to identify patients with a higher probability of an aggressive disease [18,19,20,21,22,23,24,25,26,27]. Among them, the International Prognostic Index (IPI), MCL International Prognostic Index (MIPI), and proliferation index (Ki67) are extensively used [18,22]. The pathobiology of MCL comprises several pathways, mechanisms, and target genes that contribute to not only in the pathogenesis but also to aggressiveness and clinical evolution. The major oncogenic driver is CCND1 gene of the cell cycle pathway. Other relevant genes are involved in cell cycle (CCND2, CCND3, MYC), response to DNA damage (ATM, TP53), chromatin modification (WHSC1, MLL2, MEF2B), apoptosis (BCL2, BIRC3, TLR2), and NOTCH signaling (NOTCH1 and NOTCH2), NF-kB and PI3K/AKT signaling pathways, among others [8,28,29,30,31].
Neural networks are a favored analytical method for numerous predictive data mining applications because of their power, adaptability, and ease of usage. Predictive neural networks are specially valuable in applications where the underlying process is complex [32,33,34,35,36,37,38,39,40,41,42,43], such as biological systems [44]. Both the multilayer perceptron (MLP) and radial basis function (RBF) network have a feedforward architecture, because the connections in the network flow forward the input layer (predictors) to the output layer (responses). The hidden layer contains unobservable nodes or units. The value of each hidden unit is some function of the predictors. Both are supervised learning networks that perform prediction and classification. Your choice of strategy will depend on the sort of data and the level of complexity you look for to reveal; while the MLP strategy can discover more complex connections, the RBF method is faster [32,33]. We have recently shown that neural networks can predict the prognosis of diffuse large B-cell lymphoma (DLBCL) and follicular lymphoma (FL) [35,37,45], and also can predict the different subtypes of non-Hodgkin lymphomas with high accuracy [46]. In this research we focused on MCL and the workflow algorithm was improved to handle this type of lymphoma more efficiently: the neural networks not only predicted the overall survival outcome and identified the most relevant genes, but the results were modulated by the inclusion of known prognostic genes and immune oncology pathways.
The main aim of the work was to use artificial neural networks (ANN) analyses and other machine learning techniques to analyze the gene expression of MCL and identify relevant prognostic markers. The principal conclusion was that ANN provided a novel analysis technique that not only confirmed known prognostic markers but also highlighted new potential pathological mechanisms.

2. Materials and Methods

2.1. Hardware

All the analyses were performed on a desktop workstation using an AMD Ryzen 7, 3700X, 8-core, processor at 2.59 GHz, 16.0 GB RAM, and a Nvidia GeForce GTX 1650 Turing architecture, 4 GB, GPU.

2.2. Software

Several software were used for data processing, preanalysis, full-analysis, and validation including EditPad Lite, Microsoft Excel, R, R Studio, IBM SPSS Statistic and Modeler, GSEA, and JMP.
The details of the software were as follows:

2.3. Predictive Genes and Artificial Neural Network Analysis

2.3.1. Gene Expression Series of Mantle Cell Lymphoma

The gene expression data of the MCL series GSE93291 were downloaded from the gene expression omnibus (GEO) database [50], which is located at the National Center for Biotechnology Information (NCBI) repository [page URL: https://www.ncbi.nlm.nih.gov/ (accessed on 29 August 2021)]. This database was last updated on 25 March 2019 (contact name: Professor Louis M. Staudt, National Cancer Institute, Lymphoid Malignancies Branch laboratory, Bethesda, MD 20892, USA).
The study involved retrospective gene expression profiling of samples from patients with MCL, confirmed by expert pathology consensus review. This series was created by the Lymphoma/Leukemia Molecular Profiling Project (LLMPP) [50]. These biopsies, with tumor content ≥ 60%, were obtained from untreated patients, with no history of previous lymphoma, who subsequently received a broad range of treatment regimens. The biopsies contributing to the set included 80 biopsies described in Rosenwald et al. [51] (classified based on established morphologic and immunophenotypic criteria, with overexpression of cyclin D1 (CCND1) mRNA (in most cases, immunohistochemistry demonstrated overexpression of cyclin D1 also on the protein level), 3.8 male/female ratio, median age of 62 years (range 38 to 93), multiagent treatment, and median survival 2.8 years) [51], along with additional biopsies gathered from the clinical sites of the LLMPP. The treatments of the patients was multiagent chemotherapy (R-CHOP, R-CHOP-like), six received no treatment, and no information on treatment was available for two patients.
The gene expression array used in this series was the HG-U133 plus 2 platform (GPL570, Affymetrix, Santa Clara, CA, USA). The GeneChip™ Human Genome U133 Plus 2.0 Array (#900466, ThermoFisher Scientific, Affymetrix Japan K.K., Tokyo, Japan), which is the first and most comprehensive whole human genome array. It has a complete coverage of the Human Genome U133 Set, plus 6500 additional genes for analysis of over 47,000 transcripts. The design and performance of the chip can be accessed at the following webpage: https://www.thermofisher.com/order/catalog/product/900466 (accessed on 29 December 2021).
Total RNA from MCL specimens of frozen samples from 123 patients had been extracted using the FastTrack kit from Invitrogen (Thermo Fisher Scientific Corp., Waltham, MA 02451, USA), and biotinylated cRNA had been prepared according to the standard Affymetrix protocol from 1 microg mRNA (Expression Analysis Technical Manual, 2001, Affymetrix). The Affymetrix hybridization protocol was used: following fragmentation, 15 micrograms of cRNA were hybridized for 16 h at 45 °C on arrays from Affymetrix. Arrays were washed and stained in the Affymetrix Fluidics Station 400. The Affymetrix scanning protocol was used and the scanning had been performed by the Affymetrix 3000 scanner. The data had been analyzed with Microarray Suite version 5.0 (MA S 5.0) using Affymetrix default analysis settings and global scaling as normalization method. The trimmed mean target intensity of each array was arbitrarily set to 500. The data was normalized and log2 transformed. The original series matrix files [50] provided by the LLMPP were used for the artificial neural network analysis. The gene expression values were collapsed to symbols applying the max probe values, using the GSEA software and the gene cluster text file (*.gct) [52,53].

2.3.2. Identification of Prognostic Genes for Overall Survival

Eighty-six prognostic and pathogenic genes specific for mantle cell lymphoma (MCL) were selected from previous publications [1,8,17,22,28,29,30,31,50].
Among these 86 genes, 28 genes with prognostic value for overall survival in this GSE93291 series were selected. The selection depended on the presence of a significant p value in the Kaplan–Meier with log-rank test, after finding adequate cut-off for the stratification into low vs. high groups (Table 1).
The cut-offs were found using SPSS software on the collapsed to symbols gene expression values dataset (i.e., each gene had only one expression value). The visual binning function created new variables based on grouping contiguous values into a limited number of distinct categories. The cutpoints were created using equal percentiles, three cutpoints and a width of 25%. After visualization of the overall survival plots with the Kaplan–Meier and log-rank test, the most adequate cut-off value was identified. Then, the Cox regression calculated the hazard-risk (contrast: indicator; reference category: first). Based on the p values (Table 2), the most relevant predictors for overall survival were MKI67 (p = 6.6 × 10−9, hazard risk = 4.4), CDK4 (p = 3.2 × 10−8; HR = 4.0), CHEK1 (p = 0.2 × 10−5, HR = 3.0), CCND1 (p = 0.4 × 10−5, HR = 3.1), and CDKN2C (p = 0.8 × 10−5, HR = 2.8). These genes belonged to the cell cycle and apoptosis pathways.

2.3.3. Description of the Basic Neural Network Architecture

The multilayer perceptron (MLP) analysis was performed as previously described [35,36,37,45,56,57]. The architectures are shown in Figure 1, Figure 2 and Figure 3, and the analysis outline in Figure 4. The MLP procedure produces a predictive model for one or more dependent (target) variables based on the values of the predictor variables. The MLP is a feedforward architecture, the input layer contains the predictors (our gene expression data), the hidden layer contains unobservable nodes or units, and the output layer contains the target variables. The target variables were the overall survival outcome as dead vs. alive, and the gene expression of each prognostic and pathogenic gene as a categorical variable (high vs. low expression). Figure 5, on the top right side, shows the basic neural network architecture. Of note, the basic architecture of the radial basis function (RBF) is like the MLP, but only one hidden layer characterizes it. This research used a simple type of artificial neural network, but solid enough to provide a “basic analysis unit” that conforms a more complex analysis algorithm as shown in Figure 5. A thorough description is shown in our recent publication of artificial analysis of gene expression data of diffuse large b-cell lymphoma (DLBCL) and non-Hodgkin lymphomas [46,58].

2.3.4. Parameters of the Neural Network

A thorough description of the artificial neural network procedure is described in our recent publication [58]. The predictors (covariates) were the 20,862 genes of the array. The covariates were rescaled by default to improve network training. All rescaling was performed based on the training data, even if a testing or holdout sample is defined. The method for rescaling was the standardized (subtract the mean and divide by the standard deviation (x-mean/s)). Other available methods for rescaling were the normalized ((x − min)/(max − min)), adjusted normalized ([2 × (x − min)/(max − min)] − 1), or none. The cases were randomly assigned to the training set, testing set, and holdout according to the relative number of cases, being 70%, 30%, and 0%, respectively. To avoid bias, each individual neural network underwent a random assignation of the samples into the training and testing sets.
The “best” architecture design for the analysis was searched and finally selected [58,59]. The architecture can be selected automatically (with a minimum number of units in the hidden layer of 1 and a maximum of 50) or can be a custom architecture. A custom architecture selection provides control over the hidden and output layers and can be most useful when you know in advance what architecture you want or when you need to tweak the results of the automatic architecture selection.
In a custom architecture, the number of hidden layers could be one or two. The number of units of the hidden layer could be automatically computed or custom. The activation function of the hidden layers was the hyperbolic tangent (γ(c) = tanh(c) = (ec − e−c)/(ec + e−c)), or sigmoid (γ(c) = 1/(1 + e−c)).
The activation function of the output layer was the identity (γ(c) = c), softmax (γ(ck) = exp(ck)/Σjexp(cj)), hyperbolic tangent, or sigmoid. Of note, the activation function chosen for the output layer determined which rescaling methods were available. The rescaling of scale dependent variables was standardized ((x − mean)/s), normalized ((x − min)/(max − min)), adjusted normalized ([2 × (x − min)/(max − min)] − 1), or none.
Several types of training were available: the batch, online, and mini-batch. The optimization algorithm included the scaled conjugate gradient, and gradient descent. The training options were the following: initial lambda (0.0000005); initial sigma (0.00005); interval center (0); and interval offset (±0.5).
The output included the network structure and network performance.
Several parameters displayed the network performance: model summary; classification results; receiver operating characteristic ROC curve; cumulative gains chart; lift chart; predicted by observed chart; and the independent variable importance analysis. ROC analysis displayed a curve for each categorical dependent variable and category and the area under each curve [35,36,37,45,46,56,57]. The predicting variables (predictors) were ranked according to their normalized importance for predicting the target (dependent) variable and for determining the neural network. This analysis performed a sensitivity analysis that is based on the combined training and testing samples or only on the training sample if there is no testing sample [32,33,60].
The predicted value or category and the predicted pseudo-probability for each dependent variable were saved. The synaptic weight estimates were exported to an XML file.
If it was necessary to replicate the results exactly, the same initialization value for the random number generator, data order, and variable order should be used, in addition to using the same procedure settings.
The setup of a radial basis function (RBF) is similar to the MLP. In a RBF, the activation function for hidden layer was normalized or ordinary radial basis function. Figure 1 and Figure 2 show the general architecture for MLP and RBF [32,33,60]. Figure 3 shows the sensitivity analysis [32,33,60].
Figure 1. General architecture for multilayer perceptron (MLP) networks. A neural network is a set of non-linear data modeling tools consisting of input layers plus one or two hidden layers. The multilayer perceptron procedure is a feedforward architecture. In comparison to RBF, the MLP con find more complex relationships but it is slower to compute. The MLP network is a function of one or more predictors (also called inputs or independent variables) that minimizes the prediction error of one or more target variables (also called outputs) [32,33,60].
Figure 1. General architecture for multilayer perceptron (MLP) networks. A neural network is a set of non-linear data modeling tools consisting of input layers plus one or two hidden layers. The multilayer perceptron procedure is a feedforward architecture. In comparison to RBF, the MLP con find more complex relationships but it is slower to compute. The MLP network is a function of one or more predictors (also called inputs or independent variables) that minimizes the prediction error of one or more target variables (also called outputs) [32,33,60].
Healthcare 10 00155 g001
Figure 2. General architecture for radial basis function (RBF) networks. A radial basis function (RBF) network is a feed-forward, supervised learning network with only one hidden layer, called the radial basis function layer [32,33,60].
Figure 2. General architecture for radial basis function (RBF) networks. A radial basis function (RBF) network is a feed-forward, supervised learning network with only one hidden layer, called the radial basis function layer [32,33,60].
Healthcare 10 00155 g002
Figure 3. Sensitivity analysis. Independent variable importance analysis. Performs a sensitivity analysis, which computes the importance of each predictor in determining the neural network [32,33,60].
Figure 3. Sensitivity analysis. Independent variable importance analysis. Performs a sensitivity analysis, which computes the importance of each predictor in determining the neural network [32,33,60].
Healthcare 10 00155 g003
Figure 4. Summary of the analysis methodology. The analysis was comprised of two methods, one based on the analysis of 20,862 genes and a second based on 10 immuno-oncology panels. This research used artificial neural networks and several machine learning techniques to identify genes associated with the overall survival of the patients. Correlation with known MCL pathogenic genes and the LLMPP MCL35 proliferation assay was also made.
Figure 4. Summary of the analysis methodology. The analysis was comprised of two methods, one based on the analysis of 20,862 genes and a second based on 10 immuno-oncology panels. This research used artificial neural networks and several machine learning techniques to identify genes associated with the overall survival of the patients. Correlation with known MCL pathogenic genes and the LLMPP MCL35 proliferation assay was also made.
Healthcare 10 00155 g004
Figure 5. Artificial neural network analysis for the prediction of the overall survival of mantle cell lymphoma (Method 1). From a start point of 20,862 genes, using several neural networks, a correlation between the overall survival outcome and several mantle cell lymphoma pathogenic genes managed to reduce to a final set of 10 genes. These 10 genes correlated with the survival of the patients, but also with the proliferation index as expressed by MKI67 gene: MLP, multilayer perceptron; RBF, radial basis function; OS, overall survival; DA, dead/alive; GSEA, gene set enrichment analysis; AUC, area under the curve.
Figure 5. Artificial neural network analysis for the prediction of the overall survival of mantle cell lymphoma (Method 1). From a start point of 20,862 genes, using several neural networks, a correlation between the overall survival outcome and several mantle cell lymphoma pathogenic genes managed to reduce to a final set of 10 genes. These 10 genes correlated with the survival of the patients, but also with the proliferation index as expressed by MKI67 gene: MLP, multilayer perceptron; RBF, radial basis function; OS, overall survival; DA, dead/alive; GSEA, gene set enrichment analysis; AUC, area under the curve.
Healthcare 10 00155 g005

2.4. Gene Set Enrichment Analysis (GSEA)

GSEA is a method that determines whether a priori defined set of genes shows statistically concordant differences between two “biological” states (e.g., phenotypes) [48,49]. Three types of files were necessary to run the application: (1) the gene cluster text file (*.gct) with the GSE93291 gene expression dataset; (2) the phenotype data as a categorical class (e.g., dead/alive) file format (*.cls); and (3) the gene set database as a gene matrix file format (*.gmx). The GSEA parameters were the following [37]: number of permutations (1000); collapse to gene symbols; permutation type (phenotype); chip platform (GPL570, HG-U133 Plus 2); enrichment statistic (weighted); metric for ranking genes (signal2noise); gene list sorting mode (real); gene list ordering mode (descending); max size (500); and min size (15) [37].

2.5. Summary of the Research Analysis Algorithm

The algorithms for the analysis of the gene expression data of MCL are shown in Figure 5, Figure 6, Figure 7 and Figure 8.

2.5.1. Algorithm Based on the Input of 20,862 Genes (Method 1)

First, all the genes of the array were used as predictors (input layer) for the target variables (output layer) of overall survival (dead/alive) and for the 28 genes with prognostic value in MCL (high/low expression) using an artificial neural network. The neural network included both a multilayer perceptron and a radial basis function analysis for each target variable (Figure 5). In the output of each individual neural network, all the genes of the array were ranked according to their normalized importance for predicting the target variable. Then, the genes with a normalized importance above 70% were selected. In addition, the normalized importance of all the neural networks were averaged, the genes ranked according to the averaged normalized importance for prediction, and the top 1% genes were selected. As a result, the initial set of 20,862 genes was reduced to a smaller number (n = 1394).
Next, an MLP was performed using the 1394 genes as predictors (input layer) of the overall survival outcome (dead/alive, output layer); this analysis was repeated 20 times, and the top 4 MLPs with higher area under the curves were selected. The normalized importance of each 1394 were averaged between the four results and ranked from higher to lower values. Then, using multiple MLP analysis, the minimum number of genes (starting from the one with higher normalized importance) that provided the highest area under the curve was found (n = 58) (Figure 6).
Finally, a Cox regression for overall survival (backward conditional) reduced the list to 19 genes. From these 19 genes, additional analyses included Kaplan–Meier with log-rank test for overall survival using cutoffs (Figure 7), analysis of other types of cancer (“pan-cancer analysis”) (Figure 9 and Figure 10), other machine learning (Figure 11, Figure 12 and Figure 13), and immunohistochemistry for RGS1 (Figure 14).

2.5.2. Algorithm Based on the Input of 10 Immune Oncology Panels (Method 2)

In comparison to the first algorithm in which the whole genes of the array were used (n = 20,862), this second algorithm used 9 different immune oncology panels as input data (7817 genes in total) (Figure 8). Nine individual MLP analysis for the prediction of overall survival outcome (dead/alive) were performed, and the genes with a normalized importance above 70% in each panel were pooled (n = 125). A GSEA analysis confirmed the association of these genes towards the dead or alive overall survival outcome (phenotype). Next, an additional MLP analysis confirmed the prediction of the overall survival outcome and ranked the 125 genes according to their normalized importance. The top genes were later tested for conventional overall survival analysis.

2.6. Conventional Statistical Analyses

Traditional statistics calculated the overall survival analyses. Overall survival was calculated from time of diagnosis to the last follow-up time, and recorded as alive or dead (event), following the criteria of Cheson B. D. [61,62]. Comparison between groups was performed using Kaplan–Meier analysis and the log-rank test. The Breslow and Tarone–Ware tests were also used. The Cox regression (with the method enter or backward conditional) was used to calculate the hazard-risks and the 95% confidence intervals. A p value less than 0.05 was considered statistically significant.
In case of a neural network analysis, poor prognosis/survival corresponds to the cases whose overall survival event was dead. In case of an overall survival analysis using the Kaplan–Meier test, poor prognosis corresponds to the group with lower cumulative survival proportion in the plot.

2.7. Immunohistochemistry

The immunohistochemistry was performed using an automated piece of equipment, Leica BOND-MAX stainer, following the manufacturer’s instructions and as previously described [53,59,63,64,65]. The RGS1 primary antibody (rabbit polyclonal) was purchased from Thermofisher [63]. The slides were digitalized using a Hamamatsu NanoZoomer S360, scanned, and visualized using the NDP.veiw2 software.

3. Results

3.1. Highlights

  • Using 20,862 genes as a start point (input layers) (Method 1), several neural network analyses correlated with the overall survival outcome and with known pathogenic genes of MCL (output layers), and a final set of 19 genes with predictive value was highlighted (Figure 5);
  • This type of analysis was repeated focusing on 10 immune, cancer, and immuno-oncology panels (Method 2), and 15 genes were highlighted (Figure 8);
  • Other machine learning techniques were used to predict the overall survival (Figure 11 and Figure 12);
  • The highlighted genes also predicted the overall survival of a pan-cancer series (Figure 9, Figure 10 and Figure A1);
  • The combination of both Methods 1 (19 genes) and 2 (15 genes) with the LLMPP MCL35 assay (17) genes and analysis using several machine learning and neural networks techniques predicted the overall survival outcome (dead vs. alive) with high accuracy.

3.2. Prediction of Overall Survival Based on the 20,862 Genes of the Array (Method 1)

Dimensionality reduction refers to techniques for reducing the number of input variables in training data. Fewer input dimensions often mean correspondingly fewer parameters or a simpler architecture in the machine learning model, referred to as degrees of freedom [66]. The input layer of 20,862 predicted the overall survival of mantle cell lymphoma (MCL), using an analysis algorithm (Figure 5). The output variables (targets) were the overall survival outcome as a dichotomous variable (dead/alive), and the 28 genes (high/low expression) with prognostic relevance for the overall survival were confirmed in the same series (Table 2). Table A1 and Table A2 show the complete details of the artificial neural networks. The multilayer perceptron (MLP) technique had better performance than the radial basis function (RBF): comparing area under the curve, percentage of incorrect predictions (testing set), and overall percentage of correct classification (testing set), for MLP vs. RBF, the results were 0.85 ± 0.05 vs. 0.77 ± 0.09 (p = 0.000053), 15.3% ± 5.9 vs. 26.5% ± 10.2 (p = 0.000005), and 84.7% ± 5.9 vs. 73.5% ± 10.2 (p = 0.000005), respectively. CCND1 was the best predicted gene; in the MLP analysis CCND1 had a percentage of incorrect predictions in the testing set of 2.8%, the lowest value among all genes (Table A1).
From the initial 20,862 genes, the list was reduced to 1394 genes, and additional multilayer perceptron analyses led to a set of 58 genes (Figure 6). The network performance of the MLP with the input of 58 genes was “good”, with an area under the curve (AUC) of 0.9. The genes were ranked based on their normalized importance for prediction, and GSEA confirmed that most of these genes were associated with the death survival outcome (Figure 6); the most relevant were KIF18A, FANCG, GCNA, YBX3, ZCCHC4, and DMTF1.
Based on the 58 genes, a subsequent multivariate Cox regression analysis, backward conditional, highlighted a set of 19 genes (Table A3), and a final set of 10 genes was found after using a cut-off and a Kaplan–Meier analysis for overall survival (Table 2). KIF18A, YBX3, PEMT, GCNA, and POGLUT3 were associated with an unfavorable overall survival, and SELENOP, AMOTL2, IGFBP7, KCTD12, and ADGRG2 to a favorable survival (Figure 6). Finally, the 10 genes were correlated with the cell proliferation marker of MKI67, which is one of the most relevant genes in the pathogenesis of MCL (Table 3). The cases with low MKI67 were associated with high KCTD12, ADGRG2, SELENOP, and IGFBP7. However, high MKI67 associated with high YBX3. Table A4 shows a multivariate analysis for overall survival between MIK67 and the 10 genes using a Cox regression.
Therefore, the dimensionality/data reduction of the Methods 1 went from 20,862 initial genes, to 1394, 58, 19, and the final 10 most relevant prognostic genes for overall survival of MCL patients.

3.3. Prediction of Overall Survival Based on the Immuno-Oncology Panels (Method 2)

The prediction of the overall survival outcome was performed using another strategy, based on nine different immune oncology pathways, multilayer perceptron neural networks, GSEA, and Kapan–Meier analyses (Figure 8).
The characteristics and performance parameters of the neural networks are shown in Table A5. The most predictive panels (pathways) were the autoimmune (AUC = 0.98), the pan cancer human IO360 (AUC = 0.94), human inflammation (AUC = 0.89), pan cancer (AUC = 0.89), and metabolic (AUC = 0.87). Interestingly, some pathways had a more predictive power toward the dead than the alive outcome.
After selecting the genes with a normalized importance above 70% and merging, a final set of 125 was identified. A GSEA on these 125 genes had a sinusoidal-like pattern, with some genes associated toward poor (dead) and others to favorable (alive) overall survival. The genes were ranked according to their normalized importance for prediction using a multilayer perceptron analysis, and the top 15 genes were CD8B, CEACAM6, FABP5, CFB, IL6ST, AHR, BST2, ROBO4, AR, ID1, PIK3CD, ITGAX, TYMS, CSF1, and PCK2 (normalized importance >0.68). Among them, TYMS was highlighted, and this gene by itself managed to predict the overall survival of the patients (Hazard risk (HR) = 3.2, 95% CI 2.0–5.0, p = 8.9 × 10−7). Of note, high TYMS also correlated with high MIK67 expression (Fisher’s exact test, p = 0.001).
In a multivariate Cox regression survival analysis including these top 15 genes as quantitative variables, backward conditional method, in the last step (11) the significant genes were TYMS (p < 0.001, HR = 2.6), AR (p = 0.012, HR = 1.5), and CSF1 (p = 0.049, HR = 0.6).

3.4. Prediction of Overall Survival of a Pan-Cancer Series

The predictive value of the set of 19 genes, derived from neural network analysis and dimensional reduction of the initial 20,862 genes (Figure 5, Method 1), was tested for the prediction of a pan cancer series of 7289 cases from The Cancer Genome Atlas (TCGA) database and GSE10846 dataset for diffuse large B-cell lymphoma (DLBCL). Using a risk-score formula [36,46], a different overall survival of the patients was found, confirming the pathological role of these genes in cancer (Figure 9 and Figure 10, Table A6, Figure A1). In overall high-risk versus low-risk cases, Cox regression hazard risk = 3.3 (95% CI 2.9–3.6), p < 0.0001.

3.5. Prediction of Overall Survival Outcome Using other Machine Learning Techniques

The predictive value of the set of 19 genes (Method 1) as quantitative variables for the overall survival outcome was modeled using other machine-learning techniques, including logistic regression, Bayesian network, discriminant analysis, KNN algorithm, LSVM, tree-AS, C5, CHAID, Quest, random, and C&R trees. Among them, the highest overall accuracy for prediction was achieved by the C5 tree (95%, 9 genes used), and Bayesian network (85%, 19 genes, Figure 11 and Figure 12).

3.6. Combination of Method 1, Method 2, and the LLMPP MCL35 Prognostic Gene Signature

A machine learning and neural network modeling was performed using the highlighted genes of both Methods 1 (19 genes) and Methods 2 (15) with the previously identified prognostic genes of MCL of the LLMPP, the MCL35 signature [50,67,68,69]. All the available artificial intelligence methods were tested, and high overall accuracy for predicting was found for logistic regression (100%), Bayesian network (92%), discriminant analysis (86%), CHAID (85%), C&R tree (85%), and SVM (81%) (Table 4, Figure 13).

3.7. Immunohistochemical Analysis of RGS1

RGS1 was identified as an MCL prognostic gene. It was present within the set of 19 in the last step of the first analysis algorithm (Figure 5) and the Cox regression (backward conditional). The prognostic association was tested by immunohistochemistry in a series of 11 cases of MCL from Tokai University. Among the different gene candidates, RGS1 was selected because a reliable primary antibody for immunohistochemistry was available, and we previously showed that high RGS1 protein expression correlated with poor prognosis in diffuse large b-cell lymphoma [63]. The clinicopathological characteristics of this series was the following: age (median, 72 years; range 41–82); male (9/11, 82%); lymph node and tonsil biopsy (10/11, 91%); CD3-negative (100%); CD5-positive (10/11, 91%); CD20, CD10, Cyclin D1 (CCND1) and BCL2-positive (100%); BCL6-positive (3/11, 27%); MUM-1(IRF4)-positive (9/10, 90%); proliferation index (Ki67, 10–50%).
The RGS1 protein expression was evaluated as low and high, and correlated with the overall survival of the patients (p = 0.048) (Figure 10). Nevertheless, no correlation was found between RGS1 and the other clinicopathological characteristics.

4. Discussion

Mantle cell lymphoma is a hematological neoplasia that belongs to the group of non-Hodgkin lymphomas (NHL) and it is derived from mature B-lymphocytes [16].
The postulated cell of origin in most of the cases is a naïve pregerminal center B-cell of the mantle zone [1,9,16,17,46], because of the absence of somatic mutations in the variable region of the heavy chain of immunoglobulin genes (IgVH). IgVH somatic mutational status is a marker of the transition of a B-lymphocyte through a follicular germinal center [70]. However, in 20–30% of the cases somatic hypermutation is found, which suggests a postgerminal origin (marginal zone) [71], and these cases are associated with a better prognosis [72]. Because of the aggressive clinical behavior of mantle cell lymphoma, it is critical to find prognostic makers that will allow identifying the patients who should receive more aggressive therapy.
Mantle cell lymphoma is characterized by increased cell division and replication, decreased response to DNA damage, and enhanced cell survival (impaired apoptosis) [16]. Some of these pathways and genes correlate with prognosis. For instance, TP53 and NOTCH1 mutations, overexpression of SOX11, and high proliferation index (Ki67 staining) associate with a poor prognosis.
This research identified new prognostic markers using gene expression data. Dimensionality reduction refers to techniques for reducing the number of input variables in training data. Fewer input dimensions often mean correspondingly fewer parameters or a simpler architecture in the machine learning model, referred to as degrees of freedom [66]. A neural network analysis correlated the 20,862 genes of the array with the overall survival outcome (dead/alive), and ranked the genes according to their normalized importance for prediction. Additionally, the analysis was enriched with the inclusion of 28 prognostic genes, which were identified from the literature and later confirmed to have prognostic relevance in this series (Table 1). Therefore, the input data of the neural network were solid and resulted in the identification of potentially relevant new prognostic markers. Additionally, the second type of neural network analysis was performed using several immune oncology pathways, which provided a more supervised training and analysis. The fact that we found a correlation of some of the highlighted genes with the expression of MKI67, a marker of proliferation known to be critical in mantle cell lymphoma pathogenesis, suggests that the identified new markers are also potentially relevant.
The highlighted genes influence apoptosis, angiogenesis, cell proliferation, and metabolic processes. They contribute to hematological neoplasia or cancer (Table 5). Therefore, it is expected that these genes also affect the progression of the pan cancer series.
It is important to point out that one could also use background information (e.g., patient age, sex, comorbidities, etc.) into the artificial neural network analyses. Incorporating such information would have a large impact on the results. In this research, the target was the prediction of the overall survival of patients based on the gene expression data as proof of concept. In future analyses, background information will be incorporated in MCL analysis, in a similar way as we have recently done in diffuse large b-cell lymphoma (DLBCL) [35].
In addition to neural networks, other machine learning techniques were tested, and the C5 tree and Bayesian networks had the best accuracy for predicting the overall survival outcome. Of note, the type of analyses used do not necessarily represent direct cause and effect, but the probabilistic or conditional independencies between the markers.
The recent advances in machine learning have led to many artificial intelligence (AI) applications, which will produce autonomous systems. However, the effectiveness of these systems is limited by the machine’s current inability to explain their decision and actions to human users [87]. Therefore, explainable AI (XAI) will be essential to understand, trust, and effectively managed AI machine partners [87]. In this research, the artificial neural networks highlighted the most relevant genes according to their normalized importance for predicting the overall survival of the patients. To make the results more explainable, we performed serval additional machine learning techniques and conventional statistics to understand the results. For future work, the explanation of algorithms will be developed. Of note, in medicine, AI technologies can be clinically validated even when their function cannot be understood by their operators [88].
Future research directions will be the validation of the methodology and highlighted genes in other series of mantle cell lymphoma and non-Hodgkin lymphomas.

5. Conclusions

This research combined artificial neural networks, machine learning, and conventional statistics to model the overall survival of mantle cell lymphoma and highlight pathogenic genes. Artificial intelligence is a promising field in the understanding of hematological neoplasia, and other types of cancer.

Author Contributions

Conceptualization, J.C.; methodology, J.C.; validation, R.H.; formal analysis, J.C.; writing—original draft preparation, J.C.; writing—review and editing, J.C.; supervision, N.N.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

Joaquim Carreras was funded by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) and the Japan Society for the Promotion of Science, grants KAKEN 15K19061 and 18K15100, and Tokai University School of Medicine, research incentive assistant plan 2021-B04. Rifat Hamoudi was funded by Al-Jalila Foundation (grant number AJF2018090), and University of Sharjah (grant number 1901090258).

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board and the Ethics Committee of Tokai University, School of Medicine (protocol code IRB14R-080 and IRB20-156).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study, according to a protocol approved by the National Cancer Institute institutional review board.

Data Availability Statement

The gene expression data (GEO data sets) were obtained from the publicly available database of the NCBI resources webpage, located at https://www.ncbi.nlm.nih.gov/gds (accessed on 15 August 2021).

Acknowledgments

I would like to thank all the researchers and colleagues that contributed to the generation of the GSE93291, GSE10846, and The Cancer Genome Atlas (TCGA) program.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Multilayer Perceptron Neural Network Analysis of Mantle Cell Lymphoma (Method 1).
Table A1. Multilayer Perceptron Neural Network Analysis of Mantle Cell Lymphoma (Method 1).
GeneNum. Genes Top 70%Case Processing SummaryNetwork LayersModel SummaryClassificationArea under the Curve (AUC)
TrainingTestingInputHiddenOutputTrainingTestingTraining (% Correct)Testing (% Correct)
Num.%Num.%UnitsNum.UnitsNum.UnitsCross Entropy ErrorIncorrect Predictions %Training TimeCross Entropy ErrorIncorrect Predictions %Observed 0Observed 1OverallObserved 0Observed 1Overall
Dead/Alive808468.33931.720863161238.2 21.4 01:04.910.4 12.867.68678.688.986.787.20.90
SYNE169073.23326.8208621121238.5 18.9 01:05.88.8 9.159.390.581.166.796.390.90.86
DAZAP1808770.73629.3208621111232.0 14.9 01:06.36.4 5.66493.585.183.396.794.40.92
MYCN1548569.13830.920862181237.5 27.1 01:01.514.4 13.236.485.772.966.793.186.80.82
CXCL12568770.73629.320862181240.5 19.5 00:57.410.1 8.34495.280.583.393.391.70.83
NOTCH2208468.33931.720862191229.9 20.2 00:58.211.8 17.992.336.879.893.15082.10.90
CDK4478770.73629.3208621111230.4 13.8 00:51.213.8 22.291.366.786.210027.377.80.89
BMI1259385.63024.420862181253.0 26.9 00:56.313.2 16.771.774.573.193.871.483.30.81
ING1947661.84738.2208621101236.3 17.1 00:52.722.7 27.75093.182.930.888.272.30.76
NSD2389174322620862191243.0 20.9 01:04.715.1 15.682.47579.191.78084.40.86
PTK269375.63024.4208621131240.2 16.1 01:07.37.9 1097.143.583.991.385.7900.85
PIK3CA47661.84738.2208621101226.4 13.2 00:52.417.7 12.894.861.186.894.366.787.20.88
CHEK1869174322620862191245.3 27.5 00:58.712.9 18.868.876.772.592.972.281.30.85
CHEK289073.23326.8208621101239.8 18.9 01:07.613.0 15.277.384.881.183.386.784.80.88
PIK3CD508266.74133.3208621101217.6 11.0 01:08.114.6 14.690.986.88990.978.985.40.96
XIAP228569.13830.9208621121240.2 18.8 00:49.917.7 23.783.778.681.285.764.776.30.87
PAX5238871.53528.520862171245.3 27.3 00:55.213.0 8.62093.772.75010091.40.75
BCL2L11127157.75242.320862151229.9 19.7 00:50.124.2 23.192.641.280.394.923.176.90.82
BORCS8_MEF2B128569.13830.9208621111239.2 21.2 00:53.311.6 10.540.992.178.855.610089.50.83
PTEN868468.33931.7208621101236.0 20.2 00:57.012.2 7.792.142.979.893.388.992.30.85
MYC108468.33931.720862191228.9 16.7 00:56.214.2 20.587.768.483.396.436.479.50.90
CCND1238770.73629.320862181238.3 23.0 01:03.56.7 2.892.331.87796.410097.20.89
MKI6729375.63024.4208621101240.2 20.4 01:04.611.7 16.77881.479.685.781.383.30.89
CCND2467661.84738.220862191232.4 21.1 00:54.917.7 14.990.75078.992.35085.10.84
CDKN2A11291743226208621141222.0 9.9 00:53.611.3 21.994.473.790.191.344.478.10.93
CDKN2C69073.23326.820862181246.7 26.7 00:58.113.5 15.267.478.773.389.578.684.80.85
TERT2058266.74133.320862191234.6 20.7 01:00.814.9 19.593.731.679.393.345.580.50.85
NOTCH1158569.13830.9208621111232.4 17.6 00:49.116.3 21.188.258.882.488.558.378.90.85
RB1478871.53528.5208621121248.9 27.3 00:56.314.3 17.165.18072.778.987.582.90.83
Combined1891743226208351829581348.9 25.7 01:22.2525.3 29.4--74.3--70.6-
Average 85.9 70.1 37.1 30.2 20861 1 9.6 --80.4 20.1 -30.6 15.8 75.0 70.8 79.9 84.2 73.5 84.2 0.9
Input layer: standardized rescaling method for covariates. Hidden layer: hyperbolic tangent activation function. Output layer: softmax activation function, cross-entropy error function. Model summary, training, one consecutive step(s) with no decrease in error (error computations are based on the testing sample) as stopping rule.
Table A2. Radial Basis Function Neural Network Analysis of Mantle Cell Lymphoma (Method 1).
Table A2. Radial Basis Function Neural Network Analysis of Mantle Cell Lymphoma (Method 1).
GeneNum. Genes Top 70%Case Processing SummaryNetwork LayersModel SummaryClassificationArea under the Curve (AUC)
TrainingTestingInputHiddenOutputTrainingTestingTraining (% Correct)Testing (% Correct)
Num.%Num.%UnitsNum.UnitsNum.UnitsSum of Squares ErrorIncorrect Predictions %Training TimeSum of Squares ErrorIncorrect Predictions %Observed 0Observed 1OverallObserved 0Observed 1Overall %
Dead/Alive379274.83125.220863181216.9 27.2 04:13.36.7 38.7 45.5 88.1 72.8 10.0 85.7 61.3 0.73
SYNE1188569.13830.920862181210.4 17.6 02:46.37.4 23.7 40.9 96.8 82.4 27.3 96.3 76.3 0.79
DAZAP128806543352086216128.2 16.3 02:24.13.1 9.3 81.8 84.5 83.8 100.0 88.2 90.7 0.93
MYCN488266.74133.320862161211.1 20.7 02:32.27.4 31.7 30.0 95.2 79.3 9.1 90.0 68.3 0.78
CXCL12508266.74133.320862151212.7 22.0 02:39.98.2 26.8 10.0 100.0 78.0 0.0 100.0 73.2 0.74
NOTCH2299274.83125.2208621101211.7 15.2 03:18.64.9 25.8 98.6 35.0 84.8 100.0 11.1 74.2 0.80
CDK4168266.74133.3208621101211.4 20.7 02:21.84.9 17.1 98.3 27.3 79.3 100.0 0.0 82.9 0.83
BMI1419073.23326.820862151220.0 34.4 03:21.67.4 39.4 77.6 51.2 65.6 100.0 35.0 60.6 0.70
ING1407964.24435.820862141214.8 26.6 02:14.77.6 22.7 0.0 100.0 73.4 0.0 100.0 77.3 0.60
NSD2399274.83125.2208621101213.6 20.7 03:11.64.1 9.7 85.7 72.1 79.3 85.7 94.1 90.3 0.88
PTK2199073.23326.820862131216.2 24.4 03:15.75.8 24.2 100.0 0.0 75.6 100.0 0.0 75.8 0.64
PIK3CA467964.24435.820862181212.5 24.1 02:23.17.7 25.0 93.3 21.1 75.9 100.0 0.0 75.0 0.74
CHEK1519274.83125.220862181216.4 26.1 03:12.57.0 41.9 78.6 70.0 73.9 50.0 72.7 58.1 0.80
CHEK2808871.53528.520862191213.5 25.0 02:57.15.9 22.9 59.1 90.9 75.0 66.7 88.2 77.1 0.86
PIK3CD477964.24435.820862131212.1 20.3 02:15.38.0 27.3 66.7 90.7 79.7 63.3 92.9 72.9 0.83
XIAP897964.24435.820862181210.7 17.7 02:20.411.0 43.2 88.4 75.0 82.3 66.7 47.8 56.8 0.80
PAX5818972.43427.620862191214.5 24.7 02:55.36.0 26.5 13.0 97.0 75.3 0.0 96.2 73.5 0.71
BCL2L11288871.53528.520862181210.9 14.8 02:51.24.1 14.3 100.0 43.5 85.2 96.4 42.9 85.7 0.86
BORCS8_MEF2B418669.93730.120862131213.8 23.3 02:45.95.8 18.9 19.0 95.4 76.7 30.0 100.0 81.1 0.76
PTEN239274.83125.220862171211.1 16.3 03:14.23.5 12.9 95.4 55.6 83.7 92.9 33.3 87.1 0.84
MYC189274.83125.22086219129.8 16.3 03:31.24.1 25.8 91.8 52.6 83.7 95.0 36.4 74.2 0.90
CCND1428266.74133.3208621101211.2 19.5 02:29.46.0 26.8 88.3 59.1 80.5 87.9 12.5 73.2 0.81
MKI67379073.23326.8208621101212.6 21.1 03:00.85.0 21.2 88.0 67.5 78.9 78.6 78.9 78.8 0.89
CCND2407964.24435.820862141212.3 24.1 02:14.57.6 25.0 100.0 0.0 75.9 100.0 0.0 75.0 0.74
CDKN2A569274.83125.220862161214.1 20.7 03:02.75.0 25.8 97.2 15.0 79.3 100.0 0.0 74.2 0.73
CDKN2C348871.53528.520862191217.6 21.6 02:50.98.9 34.3 86.8 72.0 78.4 58.3 81.8 65.7 0.78
TERT587964.24435.8208621101210.3 17.7 02:17.210.0 27.3 93.7 37.5 82.3 100.0 14.3 72.7 0.71
NOTCH1717964.24435.820862131212.4 22.8 02:14.67.3 25.0 100.0 0.0 77.2 100.0 0.0 75.0 0.74
RB1878972.43427.620862121222.2 47.2 02:55.38.7 55.9 100.0 0.0 52.8 100.0 0.0 44.1 0.49
Combined879375.63024.4208351142958366.4 20.4 09:53.4147.2 23.7 --79.6 --76.3 -
Average 86.0 69.9 37.0 30.1 20861 17.2 25.0 22.3 11.2 26.4 73.4 58.4 77.7 69.6 51.7 73.6 0.77
Input layer: standardized rescaling method for covariates. Hidden layer: softmax activation function. Output layer: identity activation function, sum of squares error function. Model summary, testing, sum of square error (the number of hidden units is determined by the testing data criterion: The “best” number of hidden units is the one that yields the smallest error in the testing data).
Table A3. Multivariate Cox regression analysis for predicting overall survival outcome (Method 1).
Table A3. Multivariate Cox regression analysis for predicting overall survival outcome (Method 1).
NumGeneBSEWalddfp ValueHazard Risk95.0% CI for HR
LowerUpper
1KIF18A2.70.358.31<0.00114.27.228.1
2YBX30.80.219.01<0.0012.21.63.2
3GCNA0.90.214.61<0.0012.51.64.1
4POGLUT31.20.313.41<0.0013.21.76.0
5AMOTL20.90.310.110.0012.51.44.3
6RAB131.20.49.810.0023.31.67.0
7ZCCHC41.10.39.510.0022.91.55.7
8PEMT0.60.28.410.0041.91.22.8
9RRAS0.80.44.710.0292.21.14.4
10PALLD0.60.33.910.0481.81.03.1
11ADAMDEC10.70.43.510.0631.91.03.9
12ADGRG20.40.22.810.0941.50.92.3
13IGFBP7−1.50.320.31<0.0010.20.10.4
14TMEM176B−1.60.418.91<0.0010.20.10.4
15SELENOP−1.00.215.61<0.0010.40.20.6
16RPGRIP1L−0.50.110.510.0010.60.50.8
17TAMM41−0.80.37.510.0060.40.30.8
18KCTD12−1.20.57.510.0060.30.10.7
19RGS1−0.40.24.510.0340.70.51.0
Cox regression, backward conditional.
Table A4. Multivariate Cox regression overall survival analysis between MKI67 and the 10 highlighted genes (Method 1).
Table A4. Multivariate Cox regression overall survival analysis between MKI67 and the 10 highlighted genes (Method 1).
GeneBSEWalddfSig.HR95.0% CI for HR
LowerUpper
MKI671.30.320.510.0003.82.16.8
YBX30.90.311.310.0012.61.54.4
SELENOP−0.50.33.010.0850.60.31.1
POGLUT30.60.26.910.0091.91.23.1
ADGRG2−0.70.34.510.0350.50.20.9
GCNA0.80.35.310.0212.21.14.2
KIF18A1.50.326.610.0004.32.57.6
PEMT0.80.36.610.0102.11.23.8
Multivariate Cox regression analysis, backward conditional. HR, hazard risk. Note: There are only 8 genes because it is a multivariate Cox regression analysis with the backward conditional method. In this method, the nonsignificant variables are eliminated.
Table A5. Multilayer perceptron analysis of the immuno-oncology pathways (Method 2).
Table A5. Multilayer perceptron analysis of the immuno-oncology pathways (Method 2).
PathwayNum. Genes Top 70%Case Processing SummaryNetwork LayersModel SummaryClassificationArea under the Curve (AUC)
TrainingTestingInputHiddenOutputTrainingTestingTraining (% Correct)Testing (% Correct)
Num.%Num.%UnitsNum.UnitsNum.UnitsCross Entropy ErrorIncorrect Predictions %Training TimeCross Entropy ErrorIncorrect Predictions %Observed AliveObserved DeadOverallObserved AliveObserved DeadOverall %
Cancer Transcriptome138468.33931.71785161241.1 27.4 00:03.917.6 23.1 58.8 82.0 72.6 55.6 83.3 76.9 0.84
Pan Cancer Human IO360158468.33931.7727181222.5 13.1 00:01.414.7 15.4 82.4 90.0 86.9 88.9 83.3 84.6 0.94
Pan Cancer Immune Profiling18468.33931.7707151244.9 26.2 00:01.515.0 12.8 64.7 80.0 73.8 88.9 86.7 87.2 0.82
Pan Cancer Progression188468.33931.77151111251.2 32.1 00:01.718.7 12.8 29.4 94.0 67.9 66.7 93.3 87.2 0.74
Pan Cancer Pathways68468.33931.7712181236.9 21.4 00:01.816.8 15.4 67.6 86.0 78.6 77.8 86.7 84.6 0.89
Metabolic Pathways278468.33931.77371141239.8 22.6 00:01.613.7 17.9 55.9 92.0 77.4 66.7 86.7 82.1 0.87
Immune Exhaustion128468.33931.77201101247.2 31.0 00:01.618.2 17.9 50.0 82.0 69.0 66.7 86.7 82.1 0.79
Human Inflammation238468.33931.7247191233.7 17.9 00:00.616.6 23.1 73.5 88.0 82.1 55.6 83.3 76.9 0.89
Host Response88468.33931.7747191241.1 21.4 00:01.618.1 20.5 67.6 86.0 78.6 66.7 83.3 79.5 0.83
Autoimmune138468.33931.77191101211.9 6.0 00:01.512.5 10.3 88.2 98.0 94.0 88.9 90.0 89.7 0.98
Organ Transplantation128468.33931.77281111241.5 21.4 00:01.615.7 10.3 64.7 88.0 78.6 88.9 90.0 89.7 0.85
Input layer: standardized rescaling method for covariates. Hidden layer: hyperbolic tangent activation function. Output layer: softmax activation function, cross-entropy error function. Model summary, training, one consecutive step(s) with no decrease in error (error computations are based on the testing sample) as stopping rule.
Table A6. Overall survival of the pan cancer series using the risk-scores.
Table A6. Overall survival of the pan cancer series using the risk-scores.
SubtypeOverallLow-RiskHigh-RiskK–M Log-Rank p ValueCox p ValueCox HR95% CI for HR
LowerHigher
Breast9628211414.0 × 10−176.5 × 10−154.02.85.6
Lung475426491.0 × 10−101.1 × 10−93.32.34.9
Prostate497446511.5 × 10−42.0 × 10−39.22.337.2
Colorectal466415511.4 × 10−53.3 × 10−52.91.74.8
Cervix191169223.4 × 10−108.9 × 10−87.73.616.2
Stomach4402931472.6 × 10−43.1 × 10−41.81.32.4
Skin (melanoma)3351771583.2 × 10−101.3 × 10−92.61.93.5
Bladder3892071829.2 × 10−139.7 × 10−123.02.24.1
Ovary247217300.6 × 10−51.5 × 10−52.91.84.6
DLBCL4142891253.3 × 10−161.5 × 10−143.32.54.5
Kidney7924703225.9 × 10−172.5 × 10−153.22.44.3
Uterus (endometrium)247214335.5 × 10−112.4 × 10−87.43.715.0
Leukemia (AML)149115341.9 × 10−147.0 × 10−125.53.49.0
Pancreas176109670.4 × 10−59.0 × 10−62.61.73.9
Thyroid489434559.9 × 10−126.4 × 10−717.45.653.5
Liver3611971646.7 × 10−104.0 × 10−93.02.14.3
CNS (GBM)6592094502.6 × 10−178.9 × 10−154.53.16.6
Overall7289520820812.8 × 10−1782.5 × 10−1593.32.93.6
K–M, Kapan–Meier; HR, hazard risk, DLBCL, diffuse large B-cell lymphoma; AML, acute myeloid leukemia; CNS, central nervous system; GBM, glioblastoma multiforme. This analysis is univariate.
Figure A1. Differential gene expression of the set of 19 genes per cancer subtype. Based on a risk-score formula and the gene expression of 19 genes, the overall survival for each risk-group could be calculated. The contribution in the prognosis for each gene is shown on the right. This Figure is complementary to Figure 9.
Figure A1. Differential gene expression of the set of 19 genes per cancer subtype. Based on a risk-score formula and the gene expression of 19 genes, the overall survival for each risk-group could be calculated. The contribution in the prognosis for each gene is shown on the right. This Figure is complementary to Figure 9.
Healthcare 10 00155 g0a1

References

  1. Swerdlow, S.H.; Campo, E.; Pileri, S.A.; Harris, N.L.; Stein, H.; Siebert, R.; Advani, R.; Ghielmini, M.; Salles, G.A.; Zelenetz, A.D.; et al. The 2016 revision of the World Health Organization classification of lymphoid neoplasms. Blood 2016, 127, 2375–2390. [Google Scholar] [CrossRef] [Green Version]
  2. Armitage, J.O. A clinical evaluation of the International Lymphoma Study Group classification of non-Hodgkin’s lymphoma. The Non-Hodgkin’s Lymphoma Classification Project. Blood 1997, 89, 3909–3918. [Google Scholar]
  3. Armitage, J.O.; Weisenburger, D.D. New approach to classifying non-Hodgkin’s lymphomas: Clinical features of the major histologic subtypes. Non-Hodgkin’s Lymphoma Classification Project. J. Clin. Oncol. 1998, 16, 2780–2795. [Google Scholar] [CrossRef]
  4. Sant, M.; Allemani, C.; Tereanu, C.; De Angelis, R.; Capocaccia, R.; Visser, O.; Marcos-Gragera, R.; Maynadie, M.; Simonetti, A.; Lutz, J.M.; et al. Incidence of hematologic malignancies in Europe by morphologic subtype: Results of the HAEMACARE project. Blood 2010, 116, 3724–3734. [Google Scholar] [CrossRef]
  5. Shivdasani, R.A.; Hess, J.L.; Skarin, A.T.; Pinkus, G.S. Intermediate lymphocytic lymphoma: Clinical and pathologic features of a recently characterized subtype of non-Hodgkin’s lymphoma. J. Clin. Oncol. 1993, 11, 802–811. [Google Scholar] [CrossRef] [PubMed]
  6. Smith, A.; Howell, D.; Patmore, R.; Jack, A.; Roman, E. Incidence of haematological malignancy by sub-type: A report from the Haematological Malignancy Research Network. Br. J. Cancer 2011, 105, 1684–1692. [Google Scholar] [CrossRef] [Green Version]
  7. Zhou, Y.; Wang, H.; Fang, W.; Romaguer, J.E.; Zhang, Y.; Delasalle, K.B.; Kwak, L.; Yi, Q.; Du, X.L.; Wang, M. Incidence trends of mantle cell lymphoma in the United States between 1992 and 2004. Cancer 2008, 113, 791–798. [Google Scholar] [CrossRef] [PubMed]
  8. Freedman, A.S.; Aster, J.C. Clinical manifestations, pathologic features, and diagnosis of mantle cell lymphoma. In UpToDate; Wolters Kluwer: Waltham, MA, USA, 2021. [Google Scholar]
  9. Campo, E.; Raffeld, M.; Jaffe, E.S. Mantle-cell lymphoma. Semin. Hematol. 1999, 36, 115–127. [Google Scholar]
  10. Tsujimoto, Y.; Yunis, J.; Onorato-Showe, L.; Erikson, J.; Nowell, P.C.; Croce, C.M. Molecular cloning of the chromosomal breakpoint of B-cell lymphomas and leukemias with the t(11;14) chromosome translocation. Science 1984, 224, 1403–1406. [Google Scholar] [CrossRef]
  11. De Wolf-Peeters, C.; Pittaluga, S. Mantle-cell lymphoma. Ann. Oncol. 1994, 5 (Suppl. 1), 35–37. [Google Scholar] [CrossRef]
  12. Bertoni, F.; Zucca, E.; Genini, D.; Cazzaniga, G.; Roggero, E.; Ghielmini, M.; Cavalli, F.; Biondi, A. Immunoglobulin light chain kappa deletion rearrangement as a marker of clonality in mantle cell lymphoma. Leuk. Lymphoma 1999, 36, 147–150. [Google Scholar] [CrossRef] [PubMed]
  13. Argatoff, L.H.; Connors, J.M.; Klasa, R.J.; Horsman, D.E.; Gascoyne, R.D. Mantle cell lymphoma: A clinicopathologic study of 80 cases. Blood 1997, 89, 2067–2078. [Google Scholar] [CrossRef]
  14. Romaguera, J.E.; Medeiros, L.J.; Hagemeister, F.B.; Fayad, L.E.; Rodriguez, M.A.; Pro, B.; Younes, A.; McLaughlin, P.; Goy, A.; Sarris, A.H.; et al. Frequency of gastrointestinal involvement and its clinical significance in mantle cell lymphoma. Cancer 2003, 97, 586–591. [Google Scholar] [CrossRef] [PubMed]
  15. Ferrer, A.; Salaverria, I.; Bosch, F.; Villamor, N.; Rozman, M.; Bea, S.; Gine, E.; Lopez-Guillermo, A.; Campo, E.; Montserrat, E. Leukemic involvement is a common feature in mantle cell lymphoma. Cancer 2007, 109, 2473–2480. [Google Scholar] [CrossRef]
  16. Brown, J.R.; Freedman, A.S.; Aster, J.C.; Lister, A.; Rosmarin, A. Pathobiology of mantle cell lymphoma. In UpToDate; Wolters Kluwer: Waltham, MA, USA, 2020. [Google Scholar]
  17. Beekman, R.; Amador, V.; Campo, E. SOX11, a key oncogenic factor in mantle cell lymphoma. Curr. Opin. Hematol. 2018, 25, 299–306. [Google Scholar] [CrossRef]
  18. Hoster, E.; Dreyling, M.; Klapper, W.; Gisselbrecht, C.; van Hoof, A.; Kluin-Nelemans, H.C.; Pfreundschuh, M.; Reiser, M.; Metzner, B.; Einsele, H.; et al. A new prognostic index (MIPI) for patients with advanced-stage mantle cell lymphoma. Blood 2008, 111, 558–565. [Google Scholar] [CrossRef] [PubMed]
  19. Moller, M.B.; Pedersen, N.T.; Christensen, B.E. Mantle cell lymphoma: Prognostic capacity of the Follicular Lymphoma International Prognostic Index. Br. J. Haematol. 2006, 133, 43–49. [Google Scholar] [CrossRef]
  20. Meusers, P.; Engelhard, M.; Bartels, H.; Binder, T.; Fulle, H.H.; Gorg, K.; Gunzer, U.; Havemann, K.; Kayser, W.; Konig, E.; et al. Multicentre randomized therapeutic trial for advanced centrocytic lymphoma: Anthracycline does not improve the prognosis. Hematol. Oncol. 1989, 7, 365–380. [Google Scholar] [CrossRef]
  21. Berger, F.; Felman, P.; Sonet, A.; Salles, G.; Bastion, Y.; Bryon, P.A.; Coiffier, B. Nonfollicular small B-cell lymphomas: A heterogeneous group of patients with distinct clinical features and outcome. Blood 1994, 83, 2829–2835. [Google Scholar] [CrossRef] [Green Version]
  22. Hartmann, E.; Fernandez, V.; Moreno, V.; Valls, J.; Hernandez, L.; Bosch, F.; Abrisqueta, P.; Klapper, W.; Dreyling, M.; Hoster, E.; et al. Five-gene model to predict survival in mantle-cell lymphoma using frozen or formalin-fixed, paraffin-embedded tissue. J. Clin. Oncol. 2008, 26, 4966–4972. [Google Scholar] [CrossRef]
  23. Tiemann, M.; Schrader, C.; Klapper, W.; Dreyling, M.H.; Campo, E.; Norton, A.; Berger, F.; Kluin, P.; Ott, G.; Pileri, S.; et al. Histopathology, cell proliferation indices and clinical outcome in 304 patients with mantle cell lymphoma (MCL): A clinicopathological study from the European MCL Network. Br. J. Haematol. 2005, 131, 29–38. [Google Scholar] [CrossRef]
  24. Raty, R.; Franssila, K.; Jansson, S.E.; Joensuu, H.; Wartiovaara-Kautto, U.; Elonen, E. Predictive factors for blastoid transformation in the common variant of mantle cell lymphoma. Eur. J. Cancer 2003, 39, 321–329. [Google Scholar] [CrossRef]
  25. Andersen, N.S.; Jensen, M.K.; de Nully Brown, P.; Geisler, C.H. A Danish population-based analysis of 105 mantle cell lymphoma patients: Incidences, clinical features, response, survival and prognostic factors. Eur. J. Cancer 2002, 38, 401–408. [Google Scholar] [CrossRef]
  26. Matutes, E.; Parry-Jones, N.; Brito-Babapulle, V.; Wotherspoon, A.; Morilla, R.; Atkinson, S.; Elnenaei, M.O.; Jain, P.; Giustolisi, G.M.; A’Hern, R.P.; et al. The leukemic presentation of mantle-cell lymphoma: Disease features and prognostic factors in 58 patients. Leuk. Lymphoma 2004, 45, 2007–2015. [Google Scholar] [CrossRef]
  27. Fisher, R.I.; Dahlberg, S.; Nathwani, B.N.; Banks, P.M.; Miller, T.P.; Grogan, T.M. A clinical analysis of two indolent lymphoma entities: Mantle cell lymphoma and marginal zone lymphoma (including the mucosa-associated lymphoid tissue and monocytoid B-cell subcategories): A Southwest Oncology Group study. Blood 1995, 85, 1075–1082. [Google Scholar] [CrossRef] [PubMed]
  28. Jain, P.; Wang, M. Mantle cell lymphoma: 2019 update on the diagnosis, pathogenesis, prognostication, and management. Am. J. Hematol. 2019, 94, 710–725. [Google Scholar] [CrossRef] [Green Version]
  29. Nadeu, F.; Martin-Garcia, D.; Clot, G.; Diaz-Navarro, A.; Duran-Ferrer, M.; Navarro, A.; Vilarrasa-Blasi, R.; Kulis, M.; Royo, R.; Gutierrez-Abril, J.; et al. Genomic and epigenomic insights into the origin, pathogenesis, and clinical behavior of mantle cell lymphoma subtypes. Blood 2020, 136, 1419–1432. [Google Scholar] [CrossRef]
  30. Navarro, A.; Bea, S.; Jares, P.; Campo, E. Molecular Pathogenesis of Mantle Cell Lymphoma. Hematol. Oncol. Clin. N. Am. 2020, 34, 795–807. [Google Scholar] [CrossRef]
  31. Roue, G.; Sola, B. Management of Drug Resistance in Mantle Cell Lymphoma. Cancers 2020, 12, 1565. [Google Scholar] [CrossRef]
  32. IBM. IBM SPSS Neural Networks 26; IBM: Armonk, NY, USA, 2019. [Google Scholar]
  33. IBM. IBM SPSS Neural Networks; New tools for building predictive models; YTD03119-GBEN-01; IBM: Somers, NY, USA, 2012. [Google Scholar]
  34. Banihabib, M.E.; Bandari, R.; Valipour, M. Improving Daily Peak Flow Forecasts Using Hybrid Fourier-Series Autoregressive Integrated Moving Average and Recurrent Artificial Neural Network Models. AI 2020, 1, 263–275. [Google Scholar] [CrossRef]
  35. Carreras, J.; Kikuti, Y.Y.; Miyaoka, M.; Hiraiwa, S.; Tomita, S.; Ikoma, H.; Kondo, Y.; Ito, A.; Nakamura, N.; Hamoudi, R. A Combination of Multilayer Perceptron, Radial Basis Function Artificial Neural Networks and Machine Learning Image Segmentation for the Dimension Reduction and the Prognosis Assessment of Diffuse Large B-Cell Lymphoma. AI 2021, 2, 106–134. [Google Scholar] [CrossRef]
  36. Carreras, J.; Kikuti, Y.Y.; Miyaoka, M.; Hiraiwa, S.; Tomita, S.; Ikoma, H.; Kondo, Y.; Ito, A.; Shiraiwa, S.; Hamoudi, R.; et al. A Single Gene Expression Set Derived from Artificial Intelligence Predicted the Prognosis of Several Lymphoma Subtypes; and High Immunohistochemical Expression of TNFAIP8 Associated with Poor Prognosis in Diffuse Large B-Cell Lymphoma. AI 2020, 1, 342–360. [Google Scholar] [CrossRef]
  37. Carreras, J.; Kikuti, Y.Y.; Miyaoka, M.; Hiraiwa, S.; Tomita, S.; Ikoma, H.; Kondo, Y.; Ito, A.; Nakamura, N.; Hamoudi, R. Artificial Intelligence Analysis of the Gene Expression of Follicular Lymphoma Predicted the Overall Survival and Correlated with the Immune Microenvironment Response Signatures. Mach. Learn. Knowl. Extr. 2020, 2, 647–671. [Google Scholar] [CrossRef]
  38. Lin, H.; Zheng, W.; Peng, X. Orientation-Encoding CNN for Point Cloud Classification and Segmentation. Mach. Learn. Knowl. Extr. 2021, 3, 601–614. [Google Scholar] [CrossRef]
  39. Mayr, F.; Yovine, S.; Visca, R. Property Checking with Interpretable Error Characterization for Recurrent Neural Networks. Mach. Learn. Knowl. Extr. 2021, 3, 205–227. [Google Scholar] [CrossRef]
  40. Pickens, A.; Sengupta, S. Benchmarking Studies Aimed at Clustering and Classification Tasks Using K-Means, Fuzzy C-Means and Evolutionary Neural Networks. Mach. Learn. Knowl. Extr. 2021, 3, 695–719. [Google Scholar] [CrossRef]
  41. Shah, S.A.A.; Manzoor, M.A.; Bais, A. Canopy Height Estimation at Landsat Resolution Using Convolutional Neural Networks. Mach. Learn. Knowl. Extr. 2020, 2, 23–36. [Google Scholar] [CrossRef] [Green Version]
  42. Silva Araújo, V.J.; Guimarães, A.J.; de Campos Souza, P.V.; Rezende, T.S.; Araújo, V.S. Using Resistin, Glucose, Age and BMI and Pruning Fuzzy Neural Network for the Construction of Expert Systems in the Prediction of Breast Cancer. Mach. Learn. Knowl. Extr. 2019, 1, 466–482. [Google Scholar] [CrossRef] [Green Version]
  43. Škrlj, B.; Kralj, J.; Lavrač, N.; Pollak, S. Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture. Mach. Learn. Knowl. Extr. 2019, 1, 575–589. [Google Scholar] [CrossRef] [Green Version]
  44. Knapič, S.; Malhi, A.; Saluja, R.; Främling, K. Explainable Artificial Intelligence for Human Decision Support System in the Medical Domain. Mach. Learn. Knowl. Extr. 2021, 3, 740–770. [Google Scholar] [CrossRef]
  45. Carreras, J.; Hamoudi, R.; Nakamura, N. Artificial Intelligence Analysis of Gene Expression Data Predicted the Prognosis of Patients with Diffuse Large B-Cell Lymphoma. Tokai J. Exp. Clin. Med. 2020, 45, 37–48. [Google Scholar]
  46. Carreras, J.; Hamoudi, R. Artificial Neural Network Analysis of Gene Expression Data Predicted Non-Hodgkin Lymphoma Subtypes with High Accuracy. Mach. Learn. Knowl. Extr. 2021, 3, 720–739. [Google Scholar] [CrossRef]
  47. Team, R.C. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
  48. Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  49. Mootha, V.K.; Lindgren, C.M.; Eriksson, K.F.; Subramanian, A.; Sihag, S.; Lehar, J.; Puigserver, P.; Carlsson, E.; Ridderstrale, M.; Laurila, E.; et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003, 34, 267–273. [Google Scholar] [CrossRef] [PubMed]
  50. Scott, D.W.; Abrisqueta, P.; Wright, G.W.; Slack, G.W.; Mottok, A.; Villa, D.; Jares, P.; Rauert-Wunderlich, H.; Royo, C.; Clot, G.; et al. New Molecular Assay for the Proliferation Signature in Mantle Cell Lymphoma Applicable to Formalin-Fixed Paraffin-Embedded Biopsies. J. Clin. Oncol. 2017, 35, 1668–1677. [Google Scholar] [CrossRef]
  51. Rosenwald, A.; Wright, G.; Wiestner, A.; Chan, W.C.; Connors, J.M.; Campo, E.; Gascoyne, R.D.; Grogan, T.M.; Muller-Hermelink, H.K.; Smeland, E.B.; et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell 2003, 3, 185–197. [Google Scholar] [CrossRef] [Green Version]
  52. Carreras, J.; Lopez-Guillermo, A.; Kikuti, Y.Y.; Itoh, J.; Masashi, M.; Ikoma, H.; Tomita, S.; Hiraiwa, S.; Hamoudi, R.; Rosenwald, A.; et al. High TNFRSF14 and low BTLA are associated with poor prognosis in Follicular Lymphoma and in Diffuse Large B-cell Lymphoma transformation. J. Clin. Exp. Hematop. 2019, 59, 1–16. [Google Scholar] [CrossRef] [Green Version]
  53. Tsuda, S.; Carreras, J.; Kikuti, Y.Y.; Nakae, H.; Dekiden-Monma, M.; Imai, J.; Tsuruya, K.; Nakamura, J.; Tsukune, Y.; Uchida, T.; et al. Prediction of steroid demand in the treatment of patients with ulcerative colitis by immunohistochemical analysis of the mucosal microenvironment and immune checkpoint: Role of macrophages and regulatory markers in disease severity. Pathol. Int. 2019, 69, 260–271. [Google Scholar] [CrossRef] [PubMed]
  54. UniProt, C. UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [Google Scholar] [CrossRef]
  55. Safran, M.; Dalah, I.; Alexander, J.; Rosen, N.; Iny Stein, T.; Shmoish, M.; Nativ, N.; Bahir, I.; Doniger, T.; Krug, H.; et al. GeneCards Version 3: The human gene integrator. Database 2010, 2010, baq020. [Google Scholar] [CrossRef]
  56. Carreras, J.; Kikuti, Y.Y.; Miyaoka, M.; Roncador, G.; Garcia, J.F.; Hiraiwa, S.; Tomita, S.; Ikoma, H.; Kondo, Y.; Ito, A.; et al. Integrative Statistics, Machine Learning and Artificial Intelligence Neural Network Analysis Correlated CSF1R with the Prognosis of Diffuse Large B-Cell Lymphoma. Hemato 2021, 2, 182–206. [Google Scholar] [CrossRef]
  57. Carreras, J.; Kikuti, Y.Y.; Roncador, G.; Miyaoka, M.; Hiraiwa, S.; Tomita, S.; Ikoma, H.; Kondo, Y.; Ito, A.; Shiraiwa, S.; et al. High Expression of Caspase-8 Associated with Improved Survival in Diffuse Large B-Cell Lymphoma: Machine Learning and Artificial Neural Networks Analyses. BioMedInformatics 2021, 1, 18–46. [Google Scholar] [CrossRef]
  58. Carreras, J.; Hiraiwa, S.; Kikuti, Y.Y.; Miyaoka, M.; Tomita, S.; Ikoma, H.; Ito, A.; Kondo, Y.; Roncador, G.; Garcia, J.F.; et al. Artificial Neural Networks Predicted the Overall Survival and Molecular Subtypes of Diffuse Large B-Cell Lymphoma Using a Pancancer Immune-Oncology Panel. Cancers 2021, 13, 6384. [Google Scholar] [CrossRef]
  59. Carreras, J.; Kikuti, Y.Y.; Hiraiwa, S.; Miyaoka, M.; Tomita, S.; Ikoma, H.; Ito, A.; Kondo, Y.; Itoh, J.; Roncador, G.; et al. High PTX3 expression is associated with a poor prognosis in diffuse large B-cell lymphoma. Cancer Sci. 2021, 113, 334–348. [Google Scholar] [CrossRef]
  60. Corporation, I. IBM SPSS Statistics Algorithms; IBM Corporation: Armonk, NY, USA, 2017; pp. 685–686. [Google Scholar]
  61. Cheson, B.D.; Horning, S.J.; Coiffier, B.; Shipp, M.A.; Fisher, R.I.; Connors, J.M.; Lister, T.A.; Vose, J.; Grillo-Lopez, A.; Hagenbeek, A.; et al. Report of an international workshop to standardize response criteria for non-Hodgkin’s lymphomas. NCI Sponsored International Working Group. J. Clin. Oncol. 1999, 17, 1244. [Google Scholar] [CrossRef]
  62. Cheson, B.D.; Pfistner, B.; Juweid, M.E.; Gascoyne, R.D.; Specht, L.; Horning, S.J.; Coiffier, B.; Fisher, R.I.; Hagenbeek, A.; Zucca, E.; et al. Revised response criteria for malignant lymphoma. J. Clin. Oncol. 2007, 25, 579–586. [Google Scholar] [CrossRef]
  63. Carreras, J.; Kikuti, Y.Y.; Bea, S.; Miyaoka, M.; Hiraiwa, S.; Ikoma, H.; Nagao, R.; Tomita, S.; Martin-Garcia, D.; Salaverria, I.; et al. Clinicopathological characteristics and genomic profile of primary sinonasal tract diffuse large B cell lymphoma (DLBCL) reveals gain at 1q31 and RGS1 encoding protein; high RGS1 immunohistochemical expression associates with poor overall survival in DLBCL not otherwise specified (NOS). Histopathology 2017, 70, 595–621. [Google Scholar] [CrossRef]
  64. Carreras, J.; Yukie Kikuti, Y.; Miyaoka, M.; Hiraiwa, S.; Tomita, S.; Ikoma, H.; Kondo, Y.; Shiraiwa, S.; Ando, K.; Sato, S.; et al. Genomic Profile and Pathologic Features of Diffuse Large B-Cell Lymphoma Subtype of Methotrexate-associated Lymphoproliferative Disorder in Rheumatoid Arthritis Patients. Am. J. Surg. Pathol 2018, 42, 936–950. [Google Scholar] [CrossRef]
  65. Fujisawa, M.; Matsushima, M.; Carreras, J.; Hirabayashi, K.; Kikuti, Y.Y.; Ueda, T.; Kaneko, M.; Fujimoto, R.; Sano, M.; Teramura, E.; et al. Whole-genome copy number and immunohistochemical analyses on surgically resected intracholecystic papillary neoplasms. Pathol. Int. 2021, 71, 823–830. [Google Scholar] [CrossRef]
  66. Brownlee, J. Machine Learning Mastery. Available online: https://machinelearningmastery.com/dimensionality-reduction-for-machine-learning/ (accessed on 15 October 2021).
  67. Holte, H.; Beiske, K.; Boyle, M.; Troen, G.; Blaker, Y.N.; Myklebust, J.; Kvaloy, S.; Rosenwald, A.; Lingjaerde, O.C.; Rimsza, L.M.; et al. The MCL35 gene expression proliferation assay predicts high-risk MCL patients in a Norwegian cohort of younger patients given intensive first line therapy. Br. J. Haematol. 2018, 183, 225–234. [Google Scholar] [CrossRef] [Green Version]
  68. Ramsower, C.A.; Maguire, A.; Robetorye, R.S.; Feldman, A.L.; Syrbu, S.I.; Rosenthal, A.C.; Rimsza, L.M. Clinical laboratory validation of the MCL35 assay for molecular risk stratification of mantle cell lymphoma. J. Hematop. 2020, 13, 231–238. [Google Scholar] [CrossRef]
  69. Rauert-Wunderlich, H.; Mottok, A.; Scott, D.W.; Rimsza, L.M.; Ott, G.; Klapper, W.; Unterhalt, M.; Kluin-Nelemans, H.C.; Hermine, O.; Hartmann, S.; et al. Validation of the MCL35 gene expression proliferation assay in randomized trials of the European Mantle Cell Lymphoma Network. Br. J. Haematol. 2019, 184, 616–624. [Google Scholar] [CrossRef] [Green Version]
  70. Walsh, S.H.; Thorselius, M.; Johnson, A.; Soderberg, O.; Jerkeman, M.; Bjorck, E.; Eriksson, I.; Thunberg, U.; Landgren, O.; Ehinger, M.; et al. Mutated VH genes and preferential VH3-21 use define new subsets of mantle cell lymphoma. Blood 2003, 101, 4047–4054. [Google Scholar] [CrossRef] [Green Version]
  71. Camacho, F.I.; Algara, P.; Rodriguez, A.; Ruiz-Ballesteros, E.; Mollejo, M.; Martinez, N.; Martinez-Climent, J.A.; Gonzalez, M.; Mateo, M.; Caleo, A.; et al. Molecular heterogeneity in MCL defined by the use of specific VH genes and the frequency of somatic mutations. Blood 2003, 101, 4042–4046. [Google Scholar] [CrossRef] [Green Version]
  72. Lai, R.; Lefresne, S.V.; Franko, B.; Hui, D.; Mirza, I.; Mansoor, A.; Amin, H.M.; Ma, Y. Immunoglobulin VH somatic hypermutation in mantle cell lymphoma: Mutated genotype correlates with better clinical outcome. Mod. Pathol. 2006, 19, 1498–1505. [Google Scholar] [CrossRef]
  73. Sabnis, R.W. Novel KIF18A Inhibitors for Treating Cancer. ACS Med. Chem. Lett. 2020, 11, 2368–2369. [Google Scholar] [CrossRef]
  74. Wong, J.J.; Lau, K.A.; Pinello, N.; Rasko, J.E. Epigenetic modifications of splicing factor genes in myelodysplastic syndromes and acute myeloid leukemia. Cancer Sci. 2014, 105, 1457–1463. [Google Scholar] [CrossRef] [Green Version]
  75. Li, D.; Bi, F.F.; Chen, N.N.; Cao, J.M.; Sun, W.P.; Zhou, Y.M.; Cao, C.; Li, C.Y.; Yang, Q. Epigenetic repression of phosphatidylethanolamine N-methyltransferase (PEMT) in BRCA1-mutated breast cancer. Oncotarget 2014, 5, 1315–1325. [Google Scholar] [CrossRef] [Green Version]
  76. Dokshin, G.A.; Davis, G.M.; Sawle, A.D.; Eldridge, M.D.; Nicholls, P.K.; Gourley, T.E.; Romer, K.A.; Molesworth, L.W.; Tatnell, H.R.; Ozturk, A.R.; et al. GCNA Interacts with Spartan and Topoisomerase II to Regulate Genome Stability. Dev. Cell 2020, 52, 53–68. [Google Scholar] [CrossRef] [Green Version]
  77. Bjornsti, M.A.; Kaufmann, S.H. Topoisomerases and cancer chemotherapy: Recent advances and unanswered questions. F1000Research 2019, 8, 1704. [Google Scholar] [CrossRef]
  78. Tsai, Y.L.; Chang, H.H.; Chen, Y.C.; Chang, Y.C.; Chen, Y.; Tsai, W.C. Molecular Mechanisms of KDELC2 on Glioblastoma Tumorigenesis and Temozolomide Resistance. Biomedicines 2020, 8, 339. [Google Scholar] [CrossRef]
  79. Donadio, J.L.S.; Liu, L.; Freeman, V.L.; Ekoue, D.N.; Diamond, A.M.; Bermano, G. Interaction of NKX3.1 and SELENOP genotype with prostate cancer recurrence. Prostate 2019, 79, 462–467. [Google Scholar] [CrossRef]
  80. Cui, R.; Jiang, N.; Zhang, M.; Du, S.; Ou, H.; Ge, R.; Ma, D.; Zhang, J. AMOTL2 inhibits JUN Thr239 dephosphorylation by binding PPP2R2A to suppress the proliferation in non-small cell lung cancer cells. Biochim. Biophys. Acta Mol. Cell Res. 2021, 1868, 118858. [Google Scholar] [CrossRef]
  81. Guo, Z.; Wang, X.; Yang, Y.; Chen, W.; Zhang, K.; Teng, B.; Huang, C.; Zhao, Q.; Qiu, Z. Hypoxic Tumor-Derived Exosomal Long Noncoding RNA UCA1 Promotes Angiogenesis via miR-96-5p/AMOTL2 in Pancreatic Cancer. Mol. Ther. Nucleic Acids 2020, 22, 179–195. [Google Scholar] [CrossRef]
  82. Silveira, V.S.; Scrideli, C.A.; Moreno, D.A.; Yunes, J.A.; Queiroz, R.G.; Toledo, S.C.; Lee, M.L.; Petrilli, A.S.; Brandalise, S.R.; Tone, L.G. Gene expression pattern contributing to prognostic factors in childhood acute lymphoblastic leukemia. Leuk. Lymphoma 2013, 54, 310–314. [Google Scholar] [CrossRef]
  83. Ye, R.Y.; Kuang, X.Y.; Zeng, H.J.; Shao, N.; Lin, Y.; Wang, S.M. KCTD12 promotes G1/S transition of breast cancer cell through activating the AKT/FOXO1 signaling. J. Clin. Lab. Anal. 2020, 34, e23315. [Google Scholar] [CrossRef] [Green Version]
  84. Ahn, J.I.; Yoo, J.Y.; Kim, T.H.; Kim, Y.I.; Broaddus, R.R.; Ahn, J.Y.; Lim, J.M.; Jeong, J.W. G-protein coupled receptor 64 (GPR64) acts as a tumor suppressor in endometrial cancer. BMC Cancer 2019, 19, 810. [Google Scholar] [CrossRef] [Green Version]
  85. Zhou, J.Y.; Shi, R.; Yu, H.L.; Zeng, Y.; Zheng, W.L.; Ma, W.L. Association between polymorphic sites in thymidylate synthase gene and risk of non-Hodgkin lymphoma: A systematic review and pooled analysis. Leuk. Lymphoma 2012, 53, 1953–1960. [Google Scholar] [CrossRef]
  86. Fu, Z.; Jiao, Y.; Li, Y.; Ji, B.; Jia, B.; Liu, B. TYMS presents a novel biomarker for diagnosis and prognosis in patients with pancreatic cancer. Medicine 2019, 98, e18487. [Google Scholar] [CrossRef]
  87. Turek, M. Explainable Artificial Intelligence (XAI). Available online: https://www.darpa.mil/program/explainable-artificial-intelligence (accessed on 10 January 2022).
  88. McCoy, L.G.; Brenna, C.T.A.; Chen, S.S.; Vold, K.; Das, S. Believing in black boxes: Machine learning for healthcare does not need explainability to be evidence-based. J. Clin. Epidemiol. 2021; in press. [Google Scholar] [CrossRef]
Figure 6. Multilayer perceptron analysis using the selected 58 genes (Method 1 continuation). As shown in Figure 4, the neural networks reduced the initial input of 20,862 genes to 58 predictive genes. Next, the overall survival outcome (dead/alive) was predicted using 58 genes and a neural network. Several parameters display the network performance: model summary; classification results; receiver operating characteristic ROC curve; cumulative gains chart; lift chart; predicted by observed chart; and the independent variable importance analysis. ROC analysis displays a curve for each categorical dependent variable and category and the area under each curve [34,35,36,44,45,55,56]. The genes were ranked according to their normalized importance for predicting the overall survival outcome as a dichotomic variables (dead vs. alive). A GSEA analysis confirmed the association toward a dead outcome. The characteristics of the network were as follows. Case processing: training n = 93 (76%); testing n = 30 (24%). Units n = 58. Rescaling = standardized. Hidden layer: number = 1; units = 2; activation function = hyperbolic tangent. Output layer: dependent variables = 1 (overall survival outcome dead/alive); units = 2, activation function = softmax, error function = cross-entropy. Model summary: training, cross-entropy error = 30.8, 14% of incorrect predictions; testing, cross-entropy error = 14.5, 23% of incorrect predictions. Classification: training, 86% overall correct (93.8% alive, 82% dead); testing, 77% overall correct (82% alive, 74% dead). Area under the curve = 0.9. Top 10 most relevant genes were RAB13, ZFYVE19, FANCG, KIF18A, RPGRIP1L, YBX3, ZCCHC4, NCLN, OLFM1, and PDZRN3. A complete description of the multilayer perceptron is present in our recent publication (Carreras J. et al. Artificial Neural Networks Predicted the Overall Survival and Molecular Subtypes of Diffuse Large B-Cell Lymphoma Using a Pan-cancer Immune-Oncology Panel. Cancers 2021, 13, 6384; https://doi.org/10.3390/cancers13246384) [58].
Figure 6. Multilayer perceptron analysis using the selected 58 genes (Method 1 continuation). As shown in Figure 4, the neural networks reduced the initial input of 20,862 genes to 58 predictive genes. Next, the overall survival outcome (dead/alive) was predicted using 58 genes and a neural network. Several parameters display the network performance: model summary; classification results; receiver operating characteristic ROC curve; cumulative gains chart; lift chart; predicted by observed chart; and the independent variable importance analysis. ROC analysis displays a curve for each categorical dependent variable and category and the area under each curve [34,35,36,44,45,55,56]. The genes were ranked according to their normalized importance for predicting the overall survival outcome as a dichotomic variables (dead vs. alive). A GSEA analysis confirmed the association toward a dead outcome. The characteristics of the network were as follows. Case processing: training n = 93 (76%); testing n = 30 (24%). Units n = 58. Rescaling = standardized. Hidden layer: number = 1; units = 2; activation function = hyperbolic tangent. Output layer: dependent variables = 1 (overall survival outcome dead/alive); units = 2, activation function = softmax, error function = cross-entropy. Model summary: training, cross-entropy error = 30.8, 14% of incorrect predictions; testing, cross-entropy error = 14.5, 23% of incorrect predictions. Classification: training, 86% overall correct (93.8% alive, 82% dead); testing, 77% overall correct (82% alive, 74% dead). Area under the curve = 0.9. Top 10 most relevant genes were RAB13, ZFYVE19, FANCG, KIF18A, RPGRIP1L, YBX3, ZCCHC4, NCLN, OLFM1, and PDZRN3. A complete description of the multilayer perceptron is present in our recent publication (Carreras J. et al. Artificial Neural Networks Predicted the Overall Survival and Molecular Subtypes of Diffuse Large B-Cell Lymphoma Using a Pan-cancer Immune-Oncology Panel. Cancers 2021, 13, 6384; https://doi.org/10.3390/cancers13246384) [58].
Healthcare 10 00155 g006
Figure 7. Overall survival analysis (Method 1 continuation). Because of the neural network analysis and dimensional reduction (Figure 4 and Figure 5), a final set of 10 genes with overall survival relationship was highlighted. These genes not only correlated with the clinical outcome but also with the proliferation index, as expressed by MKI67. Of note, ki67 is a marker routinely used for prediction in mantle cell lymphoma, and the most relevant marker of the LLMPP MCL35 proliferation assay.
Figure 7. Overall survival analysis (Method 1 continuation). Because of the neural network analysis and dimensional reduction (Figure 4 and Figure 5), a final set of 10 genes with overall survival relationship was highlighted. These genes not only correlated with the clinical outcome but also with the proliferation index, as expressed by MKI67. Of note, ki67 is a marker routinely used for prediction in mantle cell lymphoma, and the most relevant marker of the LLMPP MCL35 proliferation assay.
Healthcare 10 00155 g007
Figure 8. Artificial neural network analysis for predicting of the overall survival of mantle cell lymphoma using several immune oncology panels (Method 2). Overall survival was predicted using 10 immuno-oncology panels. After several multilayer perceptron analyses, a set of 125 genes predicted the overall survival outcome (dead/alive) with high accuracy. Among the most relevant genes, TYMS was highlighted. GSEA analysis had a sinusoidal-like, with some genes enriched toward dead or alive survival outcomes.
Figure 8. Artificial neural network analysis for predicting of the overall survival of mantle cell lymphoma using several immune oncology panels (Method 2). Overall survival was predicted using 10 immuno-oncology panels. After several multilayer perceptron analyses, a set of 125 genes predicted the overall survival outcome (dead/alive) with high accuracy. Among the most relevant genes, TYMS was highlighted. GSEA analysis had a sinusoidal-like, with some genes enriched toward dead or alive survival outcomes.
Healthcare 10 00155 g008
Figure 9. Overall survival in a pan-cancer series. The multilayer perceptron using the 20,862 genes identified a final set of 19 genes with prognostic value in mantle cell lymphoma. As a start point of the gene expression of the set of 19 genes and using a risk-score formula [36,46], we confirmed that these genes also contributed to the overall survival of diffuse large B-cell lymphoma (DLBCL). Additionally, these genes could also predict the overall survival of a pan-cancer series of 7289 cases from The Cancer Genome Atlas (TCGA) program that included the most frequent human cancers. Of note, the weight and direction of the overall survival association was different in each subtype of neoplasia. Risk scores were calculated by multiplying the beta values of the multivariate Cox regression analysis for overall survival of each gene with the values of the corresponding gene expressions, as previously described [58].
Figure 9. Overall survival in a pan-cancer series. The multilayer perceptron using the 20,862 genes identified a final set of 19 genes with prognostic value in mantle cell lymphoma. As a start point of the gene expression of the set of 19 genes and using a risk-score formula [36,46], we confirmed that these genes also contributed to the overall survival of diffuse large B-cell lymphoma (DLBCL). Additionally, these genes could also predict the overall survival of a pan-cancer series of 7289 cases from The Cancer Genome Atlas (TCGA) program that included the most frequent human cancers. Of note, the weight and direction of the overall survival association was different in each subtype of neoplasia. Risk scores were calculated by multiplying the beta values of the multivariate Cox regression analysis for overall survival of each gene with the values of the corresponding gene expressions, as previously described [58].
Healthcare 10 00155 g009
Figure 10. Overall survival in a pan cancer series.
Figure 10. Overall survival in a pan cancer series.
Healthcare 10 00155 g010
Figure 11. Bayesian network. A Bayesian network successfully modeled the overall survival outcome (dead/alive) using the 19 genes, previously identified in the neural network analysis (Figure 5, Method 1). The Bayesian network enables you to build a probability model by combining observed and recorded evidence with “common-sense” real-world knowledge to establish the likelihood of occurrences by using seemingly unlinked attributes. The node focuses on Tree Augmented Naïve Bayes (TAN) and Markov Blanket networks that are primarily used for classification. This graphical model shows the variables (nodes) and the probabilistic, or conditional, independencies between them. The links of the network (arcs) may represent causal relationships, but the links do not necessary represent direct cause and effect. This Bayesian network is used to calculate the probability of a patient of being alive or dead, given the gene expression of 19 genes, if the probabilistic independencies between the gene expression and the overall survival outcome as displayed on the graph hold true. Bayesian networks are very robust in case of missing data.
Figure 11. Bayesian network. A Bayesian network successfully modeled the overall survival outcome (dead/alive) using the 19 genes, previously identified in the neural network analysis (Figure 5, Method 1). The Bayesian network enables you to build a probability model by combining observed and recorded evidence with “common-sense” real-world knowledge to establish the likelihood of occurrences by using seemingly unlinked attributes. The node focuses on Tree Augmented Naïve Bayes (TAN) and Markov Blanket networks that are primarily used for classification. This graphical model shows the variables (nodes) and the probabilistic, or conditional, independencies between them. The links of the network (arcs) may represent causal relationships, but the links do not necessary represent direct cause and effect. This Bayesian network is used to calculate the probability of a patient of being alive or dead, given the gene expression of 19 genes, if the probabilistic independencies between the gene expression and the overall survival outcome as displayed on the graph hold true. Bayesian networks are very robust in case of missing data.
Healthcare 10 00155 g011
Figure 12. C5.0 decision tree model. A decision tree successfully modeled the overall survival outcome (dead/alive) using the 19 genes, previously identified in the neural network analysis (Figure 5, Method 1). This model uses the C5.0 algorithm to build either a decision tree or a rule set. A C5.0 model works by splitting the sample based on the field that provides the maximum information gain. Each subsample defined by the first split is then split again, usually based on a different field, and the process repeats until the subsamples cannot be split any further. Finally, the lowest-level splits are reexamined, and those that do not contribute significantly to the value are removed. In this model, the target field (variable) must be categorical (i.e., nominal or ordinal, such as de overall survival outcome as dead vs. alive). The input fields (predictors) can be of any type (in our analysis, the 19 genes were entered as quantitative gene expression). The C5.0 models are quite robust in the presence of problems such as missing data and large numbers of input fields. The C5.0 tree shows how using only the gene expression of 9 genes, the overall survival outcome as dead or alive can be predicted with high accuracy.
Figure 12. C5.0 decision tree model. A decision tree successfully modeled the overall survival outcome (dead/alive) using the 19 genes, previously identified in the neural network analysis (Figure 5, Method 1). This model uses the C5.0 algorithm to build either a decision tree or a rule set. A C5.0 model works by splitting the sample based on the field that provides the maximum information gain. Each subsample defined by the first split is then split again, usually based on a different field, and the process repeats until the subsamples cannot be split any further. Finally, the lowest-level splits are reexamined, and those that do not contribute significantly to the value are removed. In this model, the target field (variable) must be categorical (i.e., nominal or ordinal, such as de overall survival outcome as dead vs. alive). The input fields (predictors) can be of any type (in our analysis, the 19 genes were entered as quantitative gene expression). The C5.0 models are quite robust in the presence of problems such as missing data and large numbers of input fields. The C5.0 tree shows how using only the gene expression of 9 genes, the overall survival outcome as dead or alive can be predicted with high accuracy.
Healthcare 10 00155 g012
Figure 13. Addition of the MCL35 proliferation signature in a Bayesian network. A Bayesian network modeling was performed using the highlighted genes of both Methods 1 (19 genes) and Methods 2 (15) with the previously identified prognostic genes of MCL of the LLMPP, the MCL35 signature. Some of the most relevant genes are highlighted, in red for the bad, green for the good prognostic genes, and their interrelationships (arrows).
Figure 13. Addition of the MCL35 proliferation signature in a Bayesian network. A Bayesian network modeling was performed using the highlighted genes of both Methods 1 (19 genes) and Methods 2 (15) with the previously identified prognostic genes of MCL of the LLMPP, the MCL35 signature. Some of the most relevant genes are highlighted, in red for the bad, green for the good prognostic genes, and their interrelationships (arrows).
Healthcare 10 00155 g013
Figure 14. Overall survival according to the immunohistochemical expression of RGS1.
Figure 14. Overall survival according to the immunohistochemical expression of RGS1.
Healthcare 10 00155 g014
Table 1. Prognostic and pathogenic genes of mantle cell lymphoma.
Table 1. Prognostic and pathogenic genes of mantle cell lymphoma.
Genes (n = 86)
ADAMDEC1, ADGRG2, AKT1, AKT3, AMOTL2, ARID2, ATM, BCL2, BCL2L11, BCL6, BCOR, BIRC3, BMI1, BORCS8_MEF2B, BTK, CARD11, CASP8, CCND1, CCND2, CCND3, CD5, CD79A, CDK4, CDKN1B, CDKN2A, CDKN2C, CFLAR, CHEK1, CHEK2, CUL4A, CXCL12, CXCR4, DAZAP1, GCNA, HNRNPH1, IGFBP7, ING1, KCTD12, KIF18A, KMT2C, KMT2D, LYN, MDM2, MIR17HG, MKI67, MTOR, MYC, MYCN, NFKB1, NFKBIE, NOTCH1, NOTCH2, NSD2, PALLD, PAX5, PDGFA, PEMT, PIK3CA, PIK3CD, POGLUT3, PTEN, PTK2, RAB13, RB1, RGS1, RPGRIP1L, RRAS, SAMHD1, SELENOP, SMARCA2, SMARCA4, SMARCB1, SOX11, SYK, SYNE1, TAMM41, TERT, TET2, TMEM176B, TNFAIP3, TP53, TRAF2, UBR5, XIAP, YBX3, and ZCCHC4
Eighty-six genes with predictive and pathogenic role in MCL were selected from the literature. These genes were later tested for overall survival in the GSE93291 series. Only significant ones were chosen for the neural network analysis.
Table 2. Pathogenic genes of mantle cell lymphoma (GSE93291 series) (Method 1).
Table 2. Pathogenic genes of mantle cell lymphoma (GSE93291 series) (Method 1).
GeneKeywordFunctionCorrelation with the Overall Survival of MCL
betapHR
BCL2L11ApoptosisB-cell apoptotic process1.0<0.012.7
BMI1Regulation of gene expressionComponent of the Polycomb group (PcG) multiprotein PRC1-like complex, negative regulation of gene expression, epigenetic−0.50.0420.6
BORCS8_MEF2BLysosomesBORC complex, role in lysosomes movement and localization at the cell periphery−1.0<0.010.4
CCND1Cell cyclePositive regulation of G1/S transition of the mitotic cell cycle1.1<0.013.1
CCND2Cell cycle, apoptosisPositive regulation of G1/S transition of the mitotic cell cycle, negative regulation of apoptosis−0.70.0180.5
CDK4Cell cycle, apoptosisNegative regulation of G1/S transition of the mitotic cell cycle, positive regulation of apoptotic process1.4<0.014.0
CDKN2ACell cycle, NF-kB, apoptosisNegative regulation of G1/S transition of the mitotic cell cycle, negative regulation of NF-kB, positive regulation of apoptotic process1.0<0.012.7
CDKN2CCell cycleNegative regulation of G1/S transition of the mitotic cell cycle1.0<0.012.8
CHEK1Cell cycle, DNA repair, apoptosisPositive regulation of cell cycle, DNA damage checkpoint and repair, apoptosis1.1<0.013.0
CHEK2Cell cycle, DNA repair, apoptosisPositive regulation of cell cycle, DNA damage checkpoint and repair, apoptosis0.8<0.012.1
CXCL12Chemotaxis, apoptosisCell chemotaxis, defense response, negative regulation of apoptotic process, DNA damage−0.60.0140.5
DAZAP1Cell differentiation and proliferationCell differentiation, cell proliferation, positive regulation of mRNA splicing0.80.0162.3
ING1Cell cycleNegative regulation of cell growth, cooperates with TP53−1.1<0.010.3
MKI67Cell proliferationrRNA transcription1.5<0.014.4
MYCCell proliferationTranscription factor that binds DNA and activates transcription of growth-related genes (positive regulation of gene expression), negative regulation of apoptotic process0.9<0.012.5
MYCNGene expressionRegulation of gene expression, DNA-binding−0.50.0520.6
NOTCH1Multiple negative regulationsAffects the implementation of differentiation, proliferation, angiogenesis, and apoptotic programs. Multiple negative regulations−0.8<0.010.5
NOTCH2Multiple regulationsAffects the implementation of differentiation, proliferation and apoptotic programs0.60.0201.8
NSD2B-cell developmentHistone methyltransferase, B-cell development (B1), and B2 activation, humoral immune response, isotype class switch recombination, germinal center formation1.0<0.012.7
PAX5B-cell developmentThe commitment of lymphoid progenitors to B-lymphocyte lineage, promotes development of the mature B-cell stage.−0.70.0100.5
PIK3CAERBB2 signaling, apoptosisCell migration, ERBB2 signaling pathway, negative regulation of apoptosis,0.50.0421.7
PIK3CDB-cell development and functionMediates immune responses. Contributes to B-cell development, proliferation, migration, and function. Required for B-cell receptor (BCR) signaling0.50.0251.7
PTENCell cycle, tumor suppressor geneNegative regulation of G1/S transition of the mitotic cell cycle−0.80.0120.5
PTK2Multiple regulationsRegulation of cell migration, adhesion, cell cycle progression, cell proliferation, apoptosis, MAPK/ERK1 pathway, MDM2 and TP53 recruitment0.50.0351.7
RB1Cell cycle, tumor suppressor geneTumor suppressor that is a key regulator of the G1/S transition of the cell cycle−0.50.0430.6
SYNE1CytoskeletonCytoskeleton-nuclear membrane anchor activity, maintaining of subcellular spatial organization−0.6<0.010.5
TERTTelomerase, multiple functionsTelomerase, negative regulation apoptosis, positive regulation G1/S transition of the mitotic cell cycle, negative regulation of gene expression0.7<0.012.0
XIAPMultiple functions, regulation of caspases and apoptosisMulti-functional protein that regulates not only caspases and apoptosis, but also modulates inflammatory signaling and immunity, copper homeostasis, mitogenic kinase signaling, cell proliferation, as well as cell invasion and metastasis−0.8<0.010.5
From an initial set of 86 genes with known pathogenic role in MCL, a final set of 28 genes were selected because their predictive value for overall survival using a Kaplan–Meier and log-rank test in the GSE93291: P, p value; HR, hazard risk. The gene information is based on UniProt [54], and Genecards [55].
Table 3. Kaplan–Meier analysis for prediction of overall survival outcome (Method 1).
Table 3. Kaplan–Meier analysis for prediction of overall survival outcome (Method 1).
mGeneCut-OffLog-Rank p ValueBreslow p ValueHazard RiskCorrelation with High MKI67, Odds Ratio (OR)OR p Value
1KIF18A8.71<0.001<0.0013.5 (2.1–5.8)1.3 (0.6–3.0)0.499
2YBX311.830.0010.0022.3 (1.4–3.8)2.3 (0.9–5.3)0.056
3PEMT8.750.0150.0161.9 (1.1–3.1)1.1 (0.5–2.5)0.798
4GCNA7.660.0370.1371.8 (1.0–3.3)2.1 (0.9–4.9)0.077
5POGLUT38.810.0340.0141.6 (1.0–2.5)0.9 (0.4–1.7)0.649
6SELENOP12.810.0280.0480.6 (0.4–0.9)0.2 (0.1–0.5)0.001
7AMOTL28.990.0390.0290.5 (0.3–0.9)0.5 (0.2–1.1)0.068
8IGFBP713.370.0190.0420.5 (0.3–0.9)0.2 (0.1–0.4)<0.001
9KCTD1212.020.0220.0420.5 (0.3–0.9)0.2 (0.1–0.5)0.01
10ADGRG29.95<0.001<0.0010.3 (0.2–0.6)0.2 (0.1–0.5)0.001
This analysis is a univariate.
Table 4. Machine learning and neural network analysis of the combined Methods 1 and 2 with the MCL35 signature.
Table 4. Machine learning and neural network analysis of the combined Methods 1 and 2 with the MCL35 signature.
ModelOverall Accuracy for Predicting the Overall SurvivalNo. of Genes Used in the Final ModelGene Names
Logistic regression10050All the 50
Bayesian network9250All the 50
Discriminant8650All the 50
CHAID856E2F2, GCNA, FMNL3, POGLUT3, SELENOP, and ZDHHC21
C&R tree8521ADGRG2, CDC20, CEACAM6, ESPL1, FABP5, FAM83D, FMNL3, GCNA, GLIPR1, ID1, ITGAX, KIF2C, MKI67, RGS1, ROBO4, RPGRIP1L, RRAS, SELENOP, TAMM41, ZDHHC21, and ZWINT
SVM8150All the 50
KNN algorithm7850All the 50
Neural network7650All the 50
C5763ESPL1, RPGRIP1L, and ZWINT
Quest6550All the 50
In this analysis, several methods were tested, including C5, logistic regression, Bayesian network, discriminant analysis, KNN algorithm, LSVM, random trees, SVM, Tree-AS, CHAID, Quest, C&R tree, and neural networks. Among them, logistic regression and Bayesian network had the best overall accuracy for predicting the overall survival (dead vs. alive). The analysis used a custom field (genes) assignment. The target variable was the overall survival as a dichotomic (binary) variable (dead vs. alive). The inputs (predictive genes) were the most relevant genes (n = 50) that were previously identified in the Methods 1 (n = 19), 2 (n = 15), and the MCL35 signature (n = 17), as follows: ADAMDEC1, ADGRG2, AHR, AMOTL2, AR, ATL1, BST2, CCNB2, CD8B, CDC20, CDKN3, CEACAM6, CFB, CSF1, E2F2, ESPL1, FABP5, FAM83D, FMNL3, FOXM1, GCNA, GLIPR1, ID1, IGFBP7, IL6ST, ITGAX, KCTD12, KIF18A, KIF2C, MKI67, NCAPG, PALLD, PCK2, PEMT, PIK3CD, POGLUT3, RAB13, RGS1, ROBO4, RPGRIP1L, RRAS, SELENOP, TAMM41, TMEM176B, TOP2A, TYMS, YBX3, ZCCHC4, ZDHHC21, and ZWINT. A total of 13 models were selected and ranked according to their overall accuracy for predicting the overall survival. In the modeling, every possible combination of options was tested, and the best models were saved. Of note, in the final models not all the genes were necessary or contributed to the model, and only the best combinations were selected (e.g., 50 genes in the Bayesian network but only 6 in the CHAID tree).
Table 5. Function and association of the highlighted genes in neoplasia.
Table 5. Function and association of the highlighted genes in neoplasia.
GeneFunctionRole in Cancer
KIF18AMicrotubule motor activity, role in mitosisOverexpressed in various types of cancer; inhibitors are available [73]
YBX3Translation repression, negative regulation of intrinsic apoptosis signalingRelated to myelodysplastic syndromes and acute myeloid leukemia [74]
PEMTNegative regulation of cell proliferation, positive regulation of lipoprotein metabolic processCritical role in breast cancer progression [75]
GCNAAcidic repeat-containing protein, expressed in germ cells (testis)Regulate genome stability [76,77]
POGLUT3Protein glucosyltransferase, specifically targets extracellular EGF repeats of proteins (NOTCH1 and NOTCH3)Related to glioblastoma multiforme tumorigenesis [78]
SELENOPTransport of selenium, response to oxidative stressProstate cancer recurrence [79]
AMOTL2Actin cytoskeleton organization, angiogenesis, cell migration, Wnt-signaling pathwayAngiogenesis in pancreatic, and proliferation in lung cancer [80,81]
IGFBP7Cell adhesion, metabolic process (retinoic acid, cortisol), regulation of cell growthPrognosis of acute lymphoblastic leukemia [82]
KCTD12GABA-B receptors auxiliary subunitProliferation in breast cancer [83]
ADGRG2G protein-coupled receptor signaling pathwayTumor suppressor in endometrial cancer [84]
TYMSRegulation of mitotic cell cycle (G1/S transition)Association with non-Hodgkin lymphomas, prognosis of pancreatic cancer [85,86]
The gene information is based on UniProt [54], and Genecards [55]. TYMs was highlighted in Method 2; the rest of genes in Method 1.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Carreras, J.; Nakamura, N.; Hamoudi, R. Artificial Intelligence Analysis of Gene Expression Predicted the Overall Survival of Mantle Cell Lymphoma and a Large Pan-Cancer Series. Healthcare 2022, 10, 155. https://doi.org/10.3390/healthcare10010155

AMA Style

Carreras J, Nakamura N, Hamoudi R. Artificial Intelligence Analysis of Gene Expression Predicted the Overall Survival of Mantle Cell Lymphoma and a Large Pan-Cancer Series. Healthcare. 2022; 10(1):155. https://doi.org/10.3390/healthcare10010155

Chicago/Turabian Style

Carreras, Joaquim, Naoya Nakamura, and Rifat Hamoudi. 2022. "Artificial Intelligence Analysis of Gene Expression Predicted the Overall Survival of Mantle Cell Lymphoma and a Large Pan-Cancer Series" Healthcare 10, no. 1: 155. https://doi.org/10.3390/healthcare10010155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop