Current Trends and Developments in Bioinformatics and Statistical Research from a Biomedical Aspect

A special issue of BioMedInformatics (ISSN 2673-7426).

Deadline for manuscript submissions: closed (15 July 2022) | Viewed by 40394

Special Issue Editor

School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA
Interests: omics and bioinformatics; statistical causal inference; artificial intelligence; immunology; radiomics; oncology

Special Issue Information

Dear Colleagues,

In the past decade, the advancement of next-generation sequencing technologies has produced a large number of different types of omics data, such as genomics, transcriptomics, radiomics, metabolomics, epigenomics, etc. More systematic ways to collect and store health and disease information from patients can also accumulate tons of informatics. The characterization of diseases, as well as patients, has never been so detailed. For example, the development of radiomics can extract far more disease information that is not visible to doctors.

The information explosion in the biomedical area makes it possible to provide tailored treatment to each patient instead of treating them as an average, which is the core value of precision medicine. At the same time, it has brought many challenges in the mining, processing, integrating, and further modelling of large-scale data from different levels of patients.

In order for biomedical data to be manipulated appropriately, and for physicians and researchers to have a rule of thumb to follow, Biomedinformatics introduces this Special Issue. We encourage contributions to the development and applications of bioinformatics and statistical methods in the context of biomedicine. Original studies, as well as insightful reviews, are very welcome to be published under this Special Issue.

Dr. Qian Du
Guest Editor


Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. BioMedInformatics is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • genome and sequence analysis
  • machine learning and artificial intelligence in bioinformatics
  • clinical informatics
  • statistical genetics
  • computational bio-modeling
  • computational pharmacology

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

13 pages, 1443 KiB  
Article
The Use of the Random Number Generator and Artificial Intelligence Analysis for Dimensionality Reduction of Follicular Lymphoma Transcriptomic Data
by Joaquim Carreras, Yara Yukie Kikuti, Masashi Miyaoka, Shinichiro Hiraiwa, Sakura Tomita, Haruka Ikoma, Yusuke Kondo, Atsushi Ito, Rifat Hamoudi and Naoya Nakamura
BioMedInformatics 2022, 2(2), 268-280; https://doi.org/10.3390/biomedinformatics2020017 - 27 Apr 2022
Cited by 9 | Viewed by 2946
Abstract
Follicular lymphoma (FL) is one of the most frequent subtypes of non-Hodgkin lymphomas. This research predicted the prognosis of 184 untreated follicular lymphoma patients (LLMPP GSE16131 series), using gene expression data and artificial intelligence (AI) neural networks. A new strategy based on the [...] Read more.
Follicular lymphoma (FL) is one of the most frequent subtypes of non-Hodgkin lymphomas. This research predicted the prognosis of 184 untreated follicular lymphoma patients (LLMPP GSE16131 series), using gene expression data and artificial intelligence (AI) neural networks. A new strategy based on the random number generation was used to create 120 different and independent multilayer perceptron (MLP) solutions, and 22,215 gene probes were ranked according to their averaged normalized importance for predicting the overall survival. After dimensionality reduction, the final neural network architecture included (1) newly identified predictor genes related to cell adhesion and migration, cell signaling, and metabolism (EPB41L4B, MOCOS, SPIN2A, BTD, SRGAP3, CTNS, PRB1, L1CAM, and CEP57); (2) the international prognostic index (IPI); and (3) other relevant immuno-oncology, immune microenvironment, and checkpoint markers (CD163, CSF1R, FOXP3, PDCD1, TNFRSF14 (HVEM), and IL10). The performance of this neural network was good, with an area under the curve (AUC) of 0.89. A comparison with other machine learning techniques (C5 tree, logistic regression, Bayesian network, discriminant analysis, KNN algorithms, LSVM, random trees, SVM, tree-AS, XGBoost linear, XGBoost tree, CHAID, Quest, C&R tree, random forest, and neural network) was also made. In conclusion, the overall survival of follicular lymphoma was predicted with a neural network with high accuracy. Full article
Show Figures

Figure 1

15 pages, 4165 KiB  
Article
BDP1 Expression Correlates with Clinical Outcomes in Activated B-Cell Diffuse Large B-Cell Lymphoma
by Stephanie Cabarcas-Petroski and Laura Schramm
BioMedInformatics 2022, 2(1), 169-183; https://doi.org/10.3390/biomedinformatics2010011 - 12 Feb 2022
Cited by 3 | Viewed by 2754
Abstract
The RNA polymerase III–specific TFIIIB complex is targeted by oncogenes and tumor suppressors, specifically the TFIIIB subunits BRF1, BRF2, and TBP. Currently, it is unclear if the TFIIIB subunit BDP1 is universally deregulated in human cancers. We performed a meta-analysis of patient data [...] Read more.
The RNA polymerase III–specific TFIIIB complex is targeted by oncogenes and tumor suppressors, specifically the TFIIIB subunits BRF1, BRF2, and TBP. Currently, it is unclear if the TFIIIB subunit BDP1 is universally deregulated in human cancers. We performed a meta-analysis of patient data in the Oncomine database to analyze BDP1 alterations in human cancers. Herein, we report a possible role for BDP1 in non-Hodgkin’s lymphoma (NHL) for the first time. To the best of our knowledge, this is the first study to report a statistically significant decrease in BDP1 expression in patients with anaplastic lymphoma kinase–positive (ALK+) anaplastic large-cell lymphoma (ALCL) (p = 1.67 × 10−6) and Burkitt’s lymphoma (BL) (p = 1.54 × 10−11). Analysis of the BDP1 promoter identified putative binding sites for MYC, BCL6, E2F4, and KLF4 transcription factors, which were previously demonstrated to be deregulated in lymphomas. MYC and BDP1 expression were inversely correlated in ALK+ ALCL, suggesting a possible mechanism for the significant and specific decrease in BDP1 expression. In activated B-cell (ABC) diffuse large B-cell lymphoma (DLBCL), decreased BDP1 expression correlated with clinical outcomes, including recurrence at 1 year (p = 0.021) and 3 years (p = 0.005). Mortality at 1 (p = 0.030) and 3 (p = 0.012) years correlated with decreased BDP1 expression in ABC DLBCL. Together, these data suggest that BDP1 alterations may be of clinical significance in specific NHL subtypes and warrant further investigation. Full article
Show Figures

Figure 1

18 pages, 2225 KiB  
Article
Prediction of Recovery from Traumatic Brain Injury with EEG Power Spectrum in Combination of Independent Component Analysis and RUSBoost Model
by Nor Safira Elaina Mohd Noor, Haidi Ibrahim, Muhammad Hanif Che Lah and Jafri Malin Abdullah
BioMedInformatics 2022, 2(1), 106-123; https://doi.org/10.3390/biomedinformatics2010007 - 06 Jan 2022
Cited by 4 | Viewed by 3198
Abstract
The computational electroencephalogram (EEG) is recently garnering significant attention in examining whether the quantitative EEG (qEEG) features can be used as new predictors for the prediction of recovery in moderate traumatic brain injury (TBI). However, the brain’s recorded electrical activity has always been [...] Read more.
The computational electroencephalogram (EEG) is recently garnering significant attention in examining whether the quantitative EEG (qEEG) features can be used as new predictors for the prediction of recovery in moderate traumatic brain injury (TBI). However, the brain’s recorded electrical activity has always been contaminated with artifacts, which in turn further impede the subsequent processing steps. As a result, it is crucial to devise a strategy for meticulously flagging and extracting clean EEG data to retrieve high-quality discriminative features for successful model development. This work proposed the use of multiple artifact rejection algorithms (MARA), which is an independent component analysis (ICA)-based algorithm, to eliminate artifacts automatically, and explored their effects on the predictive performance of the random undersampling boosting (RUSBoost) model. Continuous EEG were acquired using 64 electrodes from 27 moderate TBI patients at four weeks to one-year post-accident. The MARA incorporates an artifact removal stage based on ICA prior to RUSBoost, SVM, DT, and k-NN classification. The area under the curve (AUC) of RUSBoost was higher in absolute power spectral density (PSD) in AUCδ = 0.75, AUC α = 0.73 and AUCθ = 0.71 bands than SVM, DT, and k-NN. The MARA has provided a good generalization performance of the RUSBoost prediction model. Full article
Show Figures

Figure 1

24 pages, 10380 KiB  
Article
Curcumin Analogues as a Potential Drug against Antibiotic Resistant Protein, β-Lactamases and L, D-Transpeptidases Involved in Toxin Secretion in Salmonella typhi: A Computational Approach
by Tanzina Akter, Mahim Chakma, Afsana Yeasmin Tanzina, Meheadi Hasan Rumi, Mst. Sharmin Sultana Shimu, Md. Abu Saleh, Shafi Mahmud, Saad Ahmed Sami and Talha Bin Emran
BioMedInformatics 2022, 2(1), 77-100; https://doi.org/10.3390/biomedinformatics2010005 - 27 Dec 2021
Cited by 4 | Viewed by 3196
Abstract
Typhoid fever caused by the bacteria Salmonella typhi gained resistance through multidrug-resistant S. typhi strains. One of the reasons behind β-lactam antibiotic resistance is -lactamase. L, D-Transpeptidases is responsible for typhoid fever as it is involved in toxin release that results in [...] Read more.
Typhoid fever caused by the bacteria Salmonella typhi gained resistance through multidrug-resistant S. typhi strains. One of the reasons behind β-lactam antibiotic resistance is -lactamase. L, D-Transpeptidases is responsible for typhoid fever as it is involved in toxin release that results in typhoid fever in humans. A molecular modeling study of these targeted proteins was carried out by various methods, such as homology modeling, active site prediction, prediction of disease-causing regions, and by analyzing the potential inhibitory activities of curcumin analogs by targeting these proteins to overcome the antibiotic resistance. The five potent drug candidate compounds were identified to be natural ligands that can inhibit those enzymes compared to controls in our research. The binding affinity of both the Go-Y032 and NSC-43319 were found against β-lactamase was −7.8 Kcal/mol in AutoDock, whereas, in SwissDock, the binding energy was −8.15 and −8.04 Kcal/mol, respectively. On the other hand, the Cyclovalone and NSC-43319 had an equal energy of −7.60 Kcal/mol in AutoDock, whereas −7.90 and −8.01 Kcal/mol in SwissDock against L, D-Transpeptidases. After the identification of proteins, the determination of primary and secondary structures, as well as the gene producing area and homology modeling, was accomplished. The screened drug candidates were further evaluated in ADMET, and pharmacological properties along with positive drug-likeness properties were observed for these ligand molecules. However, further in vitro and in vivo experiments are required to validate these in silico data to develop novel therapeutics against antibiotic resistance. Full article
Show Figures

Figure 1

15 pages, 2944 KiB  
Article
Projection of High-Dimensional Genome-Wide Expression on SOM Transcriptome Landscapes
by Maria Nikoghosyan, Henry Loeffler-Wirth, Suren Davidavyan, Hans Binder and Arsen Arakelyan
BioMedInformatics 2022, 2(1), 62-76; https://doi.org/10.3390/biomedinformatics2010004 - 27 Dec 2021
Cited by 1 | Viewed by 3194
Abstract
The self-organizing maps portraying has been proven to be a powerful approach for analysis of transcriptomic, genomic, epigenetic, single-cell, and pathway-level data as well as for “multi-omic” integrative analyses. However, the SOM method has a major disadvantage: it requires the retraining of the [...] Read more.
The self-organizing maps portraying has been proven to be a powerful approach for analysis of transcriptomic, genomic, epigenetic, single-cell, and pathway-level data as well as for “multi-omic” integrative analyses. However, the SOM method has a major disadvantage: it requires the retraining of the entire dataset once a new sample is added, which can be resource- and time-demanding. It also shifts the gene landscape, thus complicating the interpretation and comparison of results. To overcome this issue, we have developed two approaches of transfer learning that allow for extending SOM space with new samples, meanwhile preserving its intrinsic structure. The extension SOM (exSOM) approach is based on adding secondary data to the existing SOM space by “meta-gene adaptation”, while supervised SOM portrayal (supSOM) adds support vector machine regression model on top of the original SOM algorithm to “predict” the portrait of a new sample. Both methods have been shown to accurately combine existing and new data. With simulated data, exSOM outperforms supSOM for accuracy, while supSOM significantly reduces the computing time and outperforms exSOM for this parameter. Analysis of real datasets demonstrated the validity of the projection methods with independent datasets mapped on existing SOM space. Moreover, both methods well handle the projection of samples with new characteristics that were not present in training datasets. Full article
Show Figures

Figure 1

19 pages, 1270 KiB  
Article
Analysis of Single-Cell RNA-Sequencing Data: A Step-by-Step Guide
by Aanchal Malhotra, Samarendra Das and Shesh N. Rai
BioMedInformatics 2022, 2(1), 43-61; https://doi.org/10.3390/biomedinformatics2010003 - 26 Dec 2021
Cited by 2 | Viewed by 9727
Abstract
Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the [...] Read more.
Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the underlying data, which are not easy to follow by genome researchers and experimental biologists. Therefore, we describe a step-by-step workflow for processing and analyzing the scRNA-seq unique molecular identifier (UMI) data from Human Lung Adenocarcinoma cell lines. We demonstrate the basic analyses including quality check, mapping and quantification of transcript abundance through suitable real data example to obtain UMI count data. Further, we performed basic statistical analyses, such as zero-inflation, differential expression and clustering analyses on the obtained count data. We studied the effects of excess zero-inflation present in scRNA-seq data on the downstream analyses. Our findings indicate that the zero-inflation associated with UMI data had no or minimal role in clustering, while it had significant effect on identifying differentially expressed genes. We also provide an insight into the comparative analysis for differential expression analysis tools based on zero-inflated negative binomial and negative binomial models on scRNA-seq data. The sensitivity analysis enhanced our findings in that the negative binomial model-based tool did not provide an accurate and efficient way to analyze the scRNA-seq data. This study provides a set of guidelines for the users to handle and analyze real scRNA-seq data more easily. Full article
Show Figures

Figure 1

10 pages, 1721 KiB  
Article
Gibbs Free Energy, a Thermodynamic Measure of Protein–Protein Interactions, Correlates with Neurologic Disability
by Michael Keegan, Hava T. Siegelmann, Edward A. Rietman, Giannoula Lakka Klement and Jack A. Tuszynski
BioMedInformatics 2021, 1(3), 201-210; https://doi.org/10.3390/biomedinformatics1030013 - 14 Dec 2021
Cited by 1 | Viewed by 2840
Abstract
Modern network science has been used to reveal new and often fundamental aspects of brain network organization in physiological as well as pathological conditions. As a consequence, these discoveries, which relate to network hierarchy, hubs and network interactions, have begun to change the [...] Read more.
Modern network science has been used to reveal new and often fundamental aspects of brain network organization in physiological as well as pathological conditions. As a consequence, these discoveries, which relate to network hierarchy, hubs and network interactions, have begun to change the paradigms of neurodegenerative disorders. In this paper, we explore the use of thermodynamics for protein–protein network interactions in Alzheimer’s disease (AD), Parkinson’s disease (PD), multiple sclerosis (MS), traumatic brain injury and epilepsy. To assess the validity of using network interactions in neurological diseases, we investigated the relationship between network thermodynamics and molecular systems biology for these neurological disorders. In order to uncover whether there was a correlation between network organization and biological outcomes, we used publicly available RNA transcription data from individual patients with these neurological conditions, and correlated these molecular profiles with their respective individual disability scores. We found a linear correlation (Pearson correlation of −0.828) between disease disability (a clinically validated measurement of a person’s functional status) and Gibbs free energy (a thermodynamic measure of protein–protein interactions). In other words, we found an inverse relationship between disease disability and thermodynamic energy. Because a larger degree of disability correlated with a larger negative drop in Gibbs free energy in a linear disability-dependent fashion, it could be presumed that the progression of neuropathology such as is seen in Alzheimer’s disease could potentially be prevented by therapeutically correcting the changes in Gibbs free energy. Full article
Show Figures

Figure 1

16 pages, 723 KiB  
Article
A Stochastic Multivariate Irregularly Sampled Time Series Imputation Method for Electronic Health Records
by Muhammad Adib Uz Zaman and Dongping Du
BioMedInformatics 2021, 1(3), 166-181; https://doi.org/10.3390/biomedinformatics1030011 - 16 Nov 2021
Cited by 2 | Viewed by 3624
Abstract
Electronic health records (EHRs) can be very difficult to analyze since they usually contain many missing values. To build an efficient predictive model, a complete dataset is necessary. An EHR usually contains high-dimensional longitudinal time series data. Most commonly used imputation methods do [...] Read more.
Electronic health records (EHRs) can be very difficult to analyze since they usually contain many missing values. To build an efficient predictive model, a complete dataset is necessary. An EHR usually contains high-dimensional longitudinal time series data. Most commonly used imputation methods do not consider the importance of temporal information embedded in EHR data. Besides, most time-dependent neural networks such as recurrent neural networks (RNNs) inherently consider the time steps to be equal, which in many cases, is not appropriate. This study presents a method using the gated recurrent unit (GRU), neural ordinary differential equations (ODEs), and Bayesian estimation to incorporate the temporal information and impute sporadically observed time series measurements in high-dimensional EHR data. Full article
Show Figures

Figure 1

28 pages, 735 KiB  
Article
Analyzing Large Microbiome Datasets Using Machine Learning and Big Data
by Thomas Krause, Jyotsna Talreja Wassan, Paul Mc Kevitt, Haiying Wang, Huiru Zheng and Matthias Hemmje
BioMedInformatics 2021, 1(3), 138-165; https://doi.org/10.3390/biomedinformatics1030010 - 08 Nov 2021
Cited by 9 | Viewed by 4665
Abstract
Metagenomics promises to provide new valuable insights into the role of microbiomes in eukaryotic hosts such as humans. Due to the decreasing costs for sequencing, public and private repositories for human metagenomic datasets are growing fast. Metagenomic datasets can contain terabytes of raw [...] Read more.
Metagenomics promises to provide new valuable insights into the role of microbiomes in eukaryotic hosts such as humans. Due to the decreasing costs for sequencing, public and private repositories for human metagenomic datasets are growing fast. Metagenomic datasets can contain terabytes of raw data, which is a challenge for data processing but also an opportunity for advanced machine learning methods like deep learning that require large datasets. However, in contrast to classical machine learning algorithms, the use of deep learning in metagenomics is still an exception. Regardless of the algorithms used, they are usually not applied to raw data but require several preprocessing steps. Performing this preprocessing and the actual analysis in an automated, reproducible, and scalable way is another challenge. This and other challenges can be addressed by adjusting known big data methods and architectures to the needs of microbiome analysis and DNA sequence processing. A conceptual architecture for the use of machine learning and big data on metagenomic data sets was recently presented and initially validated to analyze the rumen microbiome. The same architecture can be used for clinical purposes as is discussed in this paper. Full article
Show Figures

Figure 1

Review

Jump to: Research

25 pages, 4380 KiB  
Review
Medical Decision Making for Cardiac MRI with CFD “Detection of Severe Stenosis Using a 5D Model of the Descending Aorta”
by Houneida Sakly, Mourad Said and Moncef Tagina
BioMedInformatics 2022, 2(1), 18-42; https://doi.org/10.3390/biomedinformatics2010002 - 24 Dec 2021
Viewed by 2465
Abstract
The aim of this study is to develop a reliable 5D (x, y, z, time, flow dimension) model for medical decision making. Sophisticated techniques for the assessment of serious stenosis were developed using time-dependent instantaneous pressure gradients through the aorta (flow rate, Reynolds [...] Read more.
The aim of this study is to develop a reliable 5D (x, y, z, time, flow dimension) model for medical decision making. Sophisticated techniques for the assessment of serious stenosis were developed using time-dependent instantaneous pressure gradients through the aorta (flow rate, Reynolds number, velocity, etc.). A 74 cardiac MRI scan and 3057 scans were performed on a 10-year-old patient with congenital valve and valvular aortic stenosis on sensitive MRI and coarctation (operated and then dilated) in the sense of shone syndrome. The occlusion rate was estimated to be 80.5%. The stenosis area was approximately 15 mm long and 10 mm high. The fluid solver (NS) exhibited a significant shear stress of −3.735 × 10−5 Pa within the first 10 iterations. There was a significant drop in the flux mass of −0.0050 (kg/s), as well as high blood turbulence in vortex field lines and low geometry Reynolds cells. The fifth dimension was used for negative velocity prediction (−81.4 cm/s). The discoveries of the 5D aortic simulation are convincing based on the evaluation of its physical and biomedical features. Full article
Show Figures

Figure 1

Back to TopTop