Research

Jump to: Review

13 pages, 1443 KiB

Open AccessArticle

The Use of the Random Number Generator and Artificial Intelligence Analysis for Dimensionality Reduction of Follicular Lymphoma Transcriptomic Data

by Joaquim Carreras, Yara Yukie Kikuti, Masashi Miyaoka, Shinichiro Hiraiwa, Sakura Tomita, Haruka Ikoma, Yusuke Kondo, Atsushi Ito, Rifat Hamoudi and Naoya Nakamura

BioMedInformatics 2022, 2(2), 268-280; https://doi.org/10.3390/biomedinformatics2020017 - 27 Apr 2022

Cited by 9 | Viewed by 2946

Abstract

Follicular lymphoma (FL) is one of the most frequent subtypes of non-Hodgkin lymphomas. This research predicted the prognosis of 184 untreated follicular lymphoma patients (LLMPP GSE16131 series), using gene expression data and artificial intelligence (AI) neural networks. A new strategy based on the [...] Read more.

Follicular lymphoma (FL) is one of the most frequent subtypes of non-Hodgkin lymphomas. This research predicted the prognosis of 184 untreated follicular lymphoma patients (LLMPP GSE16131 series), using gene expression data and artificial intelligence (AI) neural networks. A new strategy based on the random number generation was used to create 120 different and independent multilayer perceptron (MLP) solutions, and 22,215 gene probes were ranked according to their averaged normalized importance for predicting the overall survival. After dimensionality reduction, the final neural network architecture included (1) newly identified predictor genes related to cell adhesion and migration, cell signaling, and metabolism (EPB41L4B, MOCOS, SPIN2A, BTD, SRGAP3, CTNS, PRB1, L1CAM, and CEP57); (2) the international prognostic index (IPI); and (3) other relevant immuno-oncology, immune microenvironment, and checkpoint markers (CD163, CSF1R, FOXP3, PDCD1, TNFRSF14 (HVEM), and IL10). The performance of this neural network was good, with an area under the curve (AUC) of 0.89. A comparison with other machine learning techniques (C5 tree, logistic regression, Bayesian network, discriminant analysis, KNN algorithms, LSVM, random trees, SVM, tree-AS, XGBoost linear, XGBoost tree, CHAID, Quest, C&R tree, random forest, and neural network) was also made. In conclusion, the overall survival of follicular lymphoma was predicted with a neural network with high accuracy. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

15 pages, 4165 KiB

Open AccessArticle

BDP1 Expression Correlates with Clinical Outcomes in Activated B-Cell Diffuse Large B-Cell Lymphoma

by Stephanie Cabarcas-Petroski and Laura Schramm

BioMedInformatics 2022, 2(1), 169-183; https://doi.org/10.3390/biomedinformatics2010011 - 12 Feb 2022

Cited by 3 | Viewed by 2754

Abstract

The RNA polymerase III–specific TFIIIB complex is targeted by oncogenes and tumor suppressors, specifically the TFIIIB subunits BRF1, BRF2, and TBP. Currently, it is unclear if the TFIIIB subunit BDP1 is universally deregulated in human cancers. We performed a meta-analysis of patient data [...] Read more.

The RNA polymerase III–specific TFIIIB complex is targeted by oncogenes and tumor suppressors, specifically the TFIIIB subunits BRF1, BRF2, and TBP. Currently, it is unclear if the TFIIIB subunit BDP1 is universally deregulated in human cancers. We performed a meta-analysis of patient data in the Oncomine database to analyze BDP1 alterations in human cancers. Herein, we report a possible role for BDP1 in non-Hodgkin’s lymphoma (NHL) for the first time. To the best of our knowledge, this is the first study to report a statistically significant decrease in BDP1 expression in patients with anaplastic lymphoma kinase–positive (ALK+) anaplastic large-cell lymphoma (ALCL) (p = 1.67 × 10⁻⁶) and Burkitt’s lymphoma (BL) (p = 1.54 × 10⁻¹¹). Analysis of the BDP1 promoter identified putative binding sites for MYC, BCL6, E2F4, and KLF4 transcription factors, which were previously demonstrated to be deregulated in lymphomas. MYC and BDP1 expression were inversely correlated in ALK+ ALCL, suggesting a possible mechanism for the significant and specific decrease in BDP1 expression. In activated B-cell (ABC) diffuse large B-cell lymphoma (DLBCL), decreased BDP1 expression correlated with clinical outcomes, including recurrence at 1 year (p = 0.021) and 3 years (p = 0.005). Mortality at 1 (p = 0.030) and 3 (p = 0.012) years correlated with decreased BDP1 expression in ABC DLBCL. Together, these data suggest that BDP1 alterations may be of clinical significance in specific NHL subtypes and warrant further investigation. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

18 pages, 2225 KiB

Open AccessArticle

Prediction of Recovery from Traumatic Brain Injury with EEG Power Spectrum in Combination of Independent Component Analysis and RUSBoost Model

by Nor Safira Elaina Mohd Noor, Haidi Ibrahim, Muhammad Hanif Che Lah and Jafri Malin Abdullah

BioMedInformatics 2022, 2(1), 106-123; https://doi.org/10.3390/biomedinformatics2010007 - 06 Jan 2022

Cited by 4 | Viewed by 3198

Abstract

The computational electroencephalogram (EEG) is recently garnering significant attention in examining whether the quantitative EEG (qEEG) features can be used as new predictors for the prediction of recovery in moderate traumatic brain injury (TBI). However, the brain’s recorded electrical activity has always been [...] Read more.

The computational electroencephalogram (EEG) is recently garnering significant attention in examining whether the quantitative EEG (qEEG) features can be used as new predictors for the prediction of recovery in moderate traumatic brain injury (TBI). However, the brain’s recorded electrical activity has always been contaminated with artifacts, which in turn further impede the subsequent processing steps. As a result, it is crucial to devise a strategy for meticulously flagging and extracting clean EEG data to retrieve high-quality discriminative features for successful model development. This work proposed the use of multiple artifact rejection algorithms (MARA), which is an independent component analysis (ICA)-based algorithm, to eliminate artifacts automatically, and explored their effects on the predictive performance of the random undersampling boosting (RUSBoost) model. Continuous EEG were acquired using 64 electrodes from 27 moderate TBI patients at four weeks to one-year post-accident. The MARA incorporates an artifact removal stage based on ICA prior to RUSBoost, SVM, DT, and k-NN classification. The area under the curve (AUC) of RUSBoost was higher in absolute power spectral density (PSD) in AUC

_{δ}

= 0.75, AUC

_{α}

= 0.73 and AUC

_{θ}

= 0.71 bands than SVM, DT, and k-NN. The MARA has provided a good generalization performance of the RUSBoost prediction model. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

24 pages, 10380 KiB

Open AccessArticle

Curcumin Analogues as a Potential Drug against Antibiotic Resistant Protein, β-Lactamases and L, D-Transpeptidases Involved in Toxin Secretion in Salmonella typhi: A Computational Approach

by Tanzina Akter, Mahim Chakma, Afsana Yeasmin Tanzina, Meheadi Hasan Rumi, Mst. Sharmin Sultana Shimu, Md. Abu Saleh, Shafi Mahmud, Saad Ahmed Sami and Talha Bin Emran

BioMedInformatics 2022, 2(1), 77-100; https://doi.org/10.3390/biomedinformatics2010005 - 27 Dec 2021

Cited by 4 | Viewed by 3196

Abstract

Typhoid fever caused by the bacteria Salmonella typhi gained resistance through multidrug-resistant S. typhi strains. One of the reasons behind β-lactam antibiotic resistance is -lactamase. L, D-Transpeptidases is responsible for typhoid fever as it is involved in toxin release that results in [...] Read more.

Typhoid fever caused by the bacteria Salmonella typhi gained resistance through multidrug-resistant S. typhi strains. One of the reasons behind β-lactam antibiotic resistance is -lactamase. L, D-Transpeptidases is responsible for typhoid fever as it is involved in toxin release that results in typhoid fever in humans. A molecular modeling study of these targeted proteins was carried out by various methods, such as homology modeling, active site prediction, prediction of disease-causing regions, and by analyzing the potential inhibitory activities of curcumin analogs by targeting these proteins to overcome the antibiotic resistance. The five potent drug candidate compounds were identified to be natural ligands that can inhibit those enzymes compared to controls in our research. The binding affinity of both the Go-Y032 and NSC-43319 were found against β-lactamase was −7.8 Kcal/mol in AutoDock, whereas, in SwissDock, the binding energy was −8.15 and −8.04 Kcal/mol, respectively. On the other hand, the Cyclovalone and NSC-43319 had an equal energy of −7.60 Kcal/mol in AutoDock, whereas −7.90 and −8.01 Kcal/mol in SwissDock against L, D-Transpeptidases. After the identification of proteins, the determination of primary and secondary structures, as well as the gene producing area and homology modeling, was accomplished. The screened drug candidates were further evaluated in ADMET, and pharmacological properties along with positive drug-likeness properties were observed for these ligand molecules. However, further in vitro and in vivo experiments are required to validate these in silico data to develop novel therapeutics against antibiotic resistance. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

15 pages, 2944 KiB

Open AccessArticle

Projection of High-Dimensional Genome-Wide Expression on SOM Transcriptome Landscapes

by Maria Nikoghosyan, Henry Loeffler-Wirth, Suren Davidavyan, Hans Binder and Arsen Arakelyan

BioMedInformatics 2022, 2(1), 62-76; https://doi.org/10.3390/biomedinformatics2010004 - 27 Dec 2021

Cited by 1 | Viewed by 3194

Abstract

The self-organizing maps portraying has been proven to be a powerful approach for analysis of transcriptomic, genomic, epigenetic, single-cell, and pathway-level data as well as for “multi-omic” integrative analyses. However, the SOM method has a major disadvantage: it requires the retraining of the [...] Read more.

The self-organizing maps portraying has been proven to be a powerful approach for analysis of transcriptomic, genomic, epigenetic, single-cell, and pathway-level data as well as for “multi-omic” integrative analyses. However, the SOM method has a major disadvantage: it requires the retraining of the entire dataset once a new sample is added, which can be resource- and time-demanding. It also shifts the gene landscape, thus complicating the interpretation and comparison of results. To overcome this issue, we have developed two approaches of transfer learning that allow for extending SOM space with new samples, meanwhile preserving its intrinsic structure. The extension SOM (exSOM) approach is based on adding secondary data to the existing SOM space by “meta-gene adaptation”, while supervised SOM portrayal (supSOM) adds support vector machine regression model on top of the original SOM algorithm to “predict” the portrait of a new sample. Both methods have been shown to accurately combine existing and new data. With simulated data, exSOM outperforms supSOM for accuracy, while supSOM significantly reduces the computing time and outperforms exSOM for this parameter. Analysis of real datasets demonstrated the validity of the projection methods with independent datasets mapped on existing SOM space. Moreover, both methods well handle the projection of samples with new characteristics that were not present in training datasets. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

19 pages, 1270 KiB

Open AccessArticle

Analysis of Single-Cell RNA-Sequencing Data: A Step-by-Step Guide

by Aanchal Malhotra, Samarendra Das and Shesh N. Rai

BioMedInformatics 2022, 2(1), 43-61; https://doi.org/10.3390/biomedinformatics2010003 - 26 Dec 2021

Cited by 2 | Viewed by 9727

Abstract

Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the [...] Read more.

Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the underlying data, which are not easy to follow by genome researchers and experimental biologists. Therefore, we describe a step-by-step workflow for processing and analyzing the scRNA-seq unique molecular identifier (UMI) data from Human Lung Adenocarcinoma cell lines. We demonstrate the basic analyses including quality check, mapping and quantification of transcript abundance through suitable real data example to obtain UMI count data. Further, we performed basic statistical analyses, such as zero-inflation, differential expression and clustering analyses on the obtained count data. We studied the effects of excess zero-inflation present in scRNA-seq data on the downstream analyses. Our findings indicate that the zero-inflation associated with UMI data had no or minimal role in clustering, while it had significant effect on identifying differentially expressed genes. We also provide an insight into the comparative analysis for differential expression analysis tools based on zero-inflated negative binomial and negative binomial models on scRNA-seq data. The sensitivity analysis enhanced our findings in that the negative binomial model-based tool did not provide an accurate and efficient way to analyze the scRNA-seq data. This study provides a set of guidelines for the users to handle and analyze real scRNA-seq data more easily. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

10 pages, 1721 KiB

Open AccessArticle

Gibbs Free Energy, a Thermodynamic Measure of Protein–Protein Interactions, Correlates with Neurologic Disability

by Michael Keegan, Hava T. Siegelmann, Edward A. Rietman, Giannoula Lakka Klement and Jack A. Tuszynski

BioMedInformatics 2021, 1(3), 201-210; https://doi.org/10.3390/biomedinformatics1030013 - 14 Dec 2021

Cited by 1 | Viewed by 2840

Abstract

Modern network science has been used to reveal new and often fundamental aspects of brain network organization in physiological as well as pathological conditions. As a consequence, these discoveries, which relate to network hierarchy, hubs and network interactions, have begun to change the [...] Read more.

Modern network science has been used to reveal new and often fundamental aspects of brain network organization in physiological as well as pathological conditions. As a consequence, these discoveries, which relate to network hierarchy, hubs and network interactions, have begun to change the paradigms of neurodegenerative disorders. In this paper, we explore the use of thermodynamics for protein–protein network interactions in Alzheimer’s disease (AD), Parkinson’s disease (PD), multiple sclerosis (MS), traumatic brain injury and epilepsy. To assess the validity of using network interactions in neurological diseases, we investigated the relationship between network thermodynamics and molecular systems biology for these neurological disorders. In order to uncover whether there was a correlation between network organization and biological outcomes, we used publicly available RNA transcription data from individual patients with these neurological conditions, and correlated these molecular profiles with their respective individual disability scores. We found a linear correlation (Pearson correlation of −0.828) between disease disability (a clinically validated measurement of a person’s functional status) and Gibbs free energy (a thermodynamic measure of protein–protein interactions). In other words, we found an inverse relationship between disease disability and thermodynamic energy. Because a larger degree of disability correlated with a larger negative drop in Gibbs free energy in a linear disability-dependent fashion, it could be presumed that the progression of neuropathology such as is seen in Alzheimer’s disease could potentially be prevented by therapeutically correcting the changes in Gibbs free energy. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

16 pages, 723 KiB

Open AccessArticle

A Stochastic Multivariate Irregularly Sampled Time Series Imputation Method for Electronic Health Records

by Muhammad Adib Uz Zaman and Dongping Du

BioMedInformatics 2021, 1(3), 166-181; https://doi.org/10.3390/biomedinformatics1030011 - 16 Nov 2021

Cited by 2 | Viewed by 3624

Abstract

Electronic health records (EHRs) can be very difficult to analyze since they usually contain many missing values. To build an efficient predictive model, a complete dataset is necessary. An EHR usually contains high-dimensional longitudinal time series data. Most commonly used imputation methods do [...] Read more.

Electronic health records (EHRs) can be very difficult to analyze since they usually contain many missing values. To build an efficient predictive model, a complete dataset is necessary. An EHR usually contains high-dimensional longitudinal time series data. Most commonly used imputation methods do not consider the importance of temporal information embedded in EHR data. Besides, most time-dependent neural networks such as recurrent neural networks (RNNs) inherently consider the time steps to be equal, which in many cases, is not appropriate. This study presents a method using the gated recurrent unit (GRU), neural ordinary differential equations (ODEs), and Bayesian estimation to incorporate the temporal information and impute sporadically observed time series measurements in high-dimensional EHR data. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

28 pages, 735 KiB

Open AccessArticle

Analyzing Large Microbiome Datasets Using Machine Learning and Big Data

by Thomas Krause, Jyotsna Talreja Wassan, Paul Mc Kevitt, Haiying Wang, Huiru Zheng and Matthias Hemmje

BioMedInformatics 2021, 1(3), 138-165; https://doi.org/10.3390/biomedinformatics1030010 - 08 Nov 2021

Cited by 9 | Viewed by 4665

Abstract

Metagenomics promises to provide new valuable insights into the role of microbiomes in eukaryotic hosts such as humans. Due to the decreasing costs for sequencing, public and private repositories for human metagenomic datasets are growing fast. Metagenomic datasets can contain terabytes of raw [...] Read more.

Metagenomics promises to provide new valuable insights into the role of microbiomes in eukaryotic hosts such as humans. Due to the decreasing costs for sequencing, public and private repositories for human metagenomic datasets are growing fast. Metagenomic datasets can contain terabytes of raw data, which is a challenge for data processing but also an opportunity for advanced machine learning methods like deep learning that require large datasets. However, in contrast to classical machine learning algorithms, the use of deep learning in metagenomics is still an exception. Regardless of the algorithms used, they are usually not applied to raw data but require several preprocessing steps. Performing this preprocessing and the actual analysis in an automated, reproducible, and scalable way is another challenge. This and other challenges can be addressed by adjusting known big data methods and architectures to the needs of microbiome analysis and DNA sequence processing. A conceptual architecture for the use of machine learning and big data on metagenomic data sets was recently presented and initially validated to analyze the rumen microbiome. The same architecture can be used for clinical purposes as is discussed in this paper. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures

Figure 1

Review

Jump to: Research

25 pages, 4380 KiB

Open AccessReview

Medical Decision Making for Cardiac MRI with CFD “Detection of Severe Stenosis Using a 5D Model of the Descending Aorta”

by Houneida Sakly, Mourad Said and Moncef Tagina

BioMedInformatics 2022, 2(1), 18-42; https://doi.org/10.3390/biomedinformatics2010002 - 24 Dec 2021

Viewed by 2465

Abstract

The aim of this study is to develop a reliable 5D (x, y, z, time, flow dimension) model for medical decision making. Sophisticated techniques for the assessment of serious stenosis were developed using time-dependent instantaneous pressure gradients through the aorta (flow rate, Reynolds [...] Read more.

The aim of this study is to develop a reliable 5D (x, y, z, time, flow dimension) model for medical decision making. Sophisticated techniques for the assessment of serious stenosis were developed using time-dependent instantaneous pressure gradients through the aorta (flow rate, Reynolds number, velocity, etc.). A 74 cardiac MRI scan and 3057 scans were performed on a 10-year-old patient with congenital valve and valvular aortic stenosis on sensitive MRI and coarctation (operated and then dilated) in the sense of shone syndrome. The occlusion rate was estimated to be 80.5%. The stenosis area was approximately 15 mm long and 10 mm high. The fluid solver (NS) exhibited a significant shear stress of −3.735 × 10⁻⁵ Pa within the first 10 iterations. There was a significant drop in the flux mass of −0.0050 (kg/s), as well as high blood turbulence in vortex field lines and low geometry Reynolds cells. The fifth dimension was used for negative velocity prediction (−81.4 cm/s). The discoveries of the 5D aortic simulation are convincing based on the evaluation of its physical and biomedical features. Full article

(This article belongs to the Special Issue Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect)

► Show Figures