Machine Learning in Bioinformatics: Latest Advances and Prospects

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Applied Biosciences and Bioengineering".

Deadline for manuscript submissions: closed (20 March 2024) | Viewed by 8808

Special Issue Editor

Department of Bioinformatics & Life Science, Soongsil University, Seoul 06978, Korea
Interests: bioinformatics; machine learning; genome informatics; cancer genomics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Machine learning can offer new perspectives in many fields of science and engineering. In particular, with the development of high-throughput technologies, the amount of biological data has been exponentially increased. Using large and complex datasets, machine learning techniques have been employed to solve a variety of biological problems. For example, machine learning has had significant impacts in many areas of bioinformatics, such as the analysis of genomic sequences, knowledge of gene regulation, prediction of molecular interactions, protein structure prediction, systematic modeling in cell systems, drug discovery, text mining, biomedical image analysis, and so on.

This Special Issue aims to cover recent advancements in machine learning techniques and applications that have been applied to bioinformatics. It will feature original research papers with technically sound and creative machine learning methods for a variety of biological challenges. Moreover, it invites review articles that present current challenges and outlooks in the field of biomedical sciences and highlight the importance of machine learning methods.

Dr. Je-Keun Rhee
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep neural networks
  • feature selection
  • next-generation sequencing
  • gene expression
  • biomarker discovery
  • multi-omics data integration
  • reconstruction of biological networks
  • prediction of clinical outcomes
  • clinical decision support systems
  • prediction of drug response
  • drug discovery
  • protein-structure prediction
  • text mining
  • biomedical image analysis
  • healthcare

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

15 pages, 912 KiB  
Article
Use of Ensemble Learning to Improve Performance of Known Convolutional Neural Networks for Mammography Classification
by Mayra C. Berrones-Reyes, M. Angélica Salazar-Aguilar and Cristian Castillo-Olea
Appl. Sci. 2023, 13(17), 9639; https://doi.org/10.3390/app13179639 - 25 Aug 2023
Viewed by 1144
Abstract
Convolutional neural networks and deep learning models represent the gold standard in medical image classification. Their innovative architectures have led to notable breakthroughs in image classification and feature extraction performance. However, these advancements often remain underutilized in the medical imaging field due to [...] Read more.
Convolutional neural networks and deep learning models represent the gold standard in medical image classification. Their innovative architectures have led to notable breakthroughs in image classification and feature extraction performance. However, these advancements often remain underutilized in the medical imaging field due to the scarcity of sufficient labeled data which are needed to leverage these new features fully. While many methodologies exhibit stellar performance on benchmark data sets like DDSM or Minimias, their efficacy drastically decreases when applied to real-world data sets. This study aims to develop a tool to streamline mammogram classification that maintains high reliability across different data sources. We use images from the DDSM data set and a proprietary data set, YERAL, which comprises 943 mammograms from Mexican patients. We evaluate the performance of ensemble learning algorithms combined with prevalent deep learning models such as Alexnet, VGG-16, and Inception. The computational results demonstrate the effectiveness of the proposed methodology, with models achieving 82% accuracy without overtaxing our hardware capabilities, and they also highlight the efficiency of ensemble algorithms in enhancing accuracy across all test cases. Full article
(This article belongs to the Special Issue Machine Learning in Bioinformatics: Latest Advances and Prospects)
Show Figures

Figure 1

16 pages, 3217 KiB  
Article
A Low-Complexity Deep Learning Model for Predicting Targeted Sequencing Depth from Probe Sequence
by Yibo Feng, Quan Guo, Weigang Chen and Changcai Han
Appl. Sci. 2023, 13(12), 6996; https://doi.org/10.3390/app13126996 - 09 Jun 2023
Viewed by 1148
Abstract
Targeted sequencing has been widely utilized for genomic molecular diagnostics and the emerging DNA data storage paradigm. However, the probe sequences used to enrich regions of interest have different hybridization kinetic properties, resulting in poor sequencing uniformity and setting limitations for the large-scale [...] Read more.
Targeted sequencing has been widely utilized for genomic molecular diagnostics and the emerging DNA data storage paradigm. However, the probe sequences used to enrich regions of interest have different hybridization kinetic properties, resulting in poor sequencing uniformity and setting limitations for the large-scale application of the technology. Here, a low-complexity deep learning model is proposed for prediction of sequencing depth from probe sequences. To capture the representation of probe and target sequences, we utilized a sequence-encoding model that incorporates k-mer and word embedding techniques, providing a streamlined alternative to the intricate computations involved in biochemical feature analysis. We employed bidirectional long short-term memory (Bi-LSTM) to effectively capture both long-range and short-range interactions within the representation. Furthermore, the attention mechanism was adopted to identify pivotal regions in the sequences that significantly influence sequencing depth. The ratio of the predicted sequencing depth to the actual sequencing depth was in the interval of 1/3—3 as the evaluation metric of model accuracy. The prediction accuracy was 94.3% in the human single-nucleotide polymorphism (SNP) panel and 99.7% in the synthetic DNA information storage sequence (SynDNA) panel. Our model substantially reduced data processing time (from 334 min to 4 min of CPU time in the SNP panel) and model parameters (from 300 k to 70 k) compared with the baseline model. Full article
(This article belongs to the Special Issue Machine Learning in Bioinformatics: Latest Advances and Prospects)
Show Figures

Figure 1

14 pages, 1891 KiB  
Article
Feature Analysis of Predictors Affecting the Nidus Obliteration of Linear Accelerator-Based Radiosurgery for Arteriovenous Malformations Using Explainable Predictive Modeling
by Kwang Hyeon Kim and Moon-Jun Sohn
Appl. Sci. 2023, 13(7), 4267; https://doi.org/10.3390/app13074267 - 28 Mar 2023
Viewed by 1016
Abstract
This study aimed to evaluate prognostic factors associated with nidus obliteration following stereotactic radiosurgery (SRS) for cerebral arteriovenous malformations. From January 2001 to January 2018, 119 patients who underwent SRS with AVM were studied to analyze major prognostic factors (age, prescription dose (Gy), [...] Read more.
This study aimed to evaluate prognostic factors associated with nidus obliteration following stereotactic radiosurgery (SRS) for cerebral arteriovenous malformations. From January 2001 to January 2018, 119 patients who underwent SRS with AVM were studied to analyze major prognostic factors (age, prescription dose (Gy), volume (mm3), nidus size (cm), and Spetzler–Martin (SM) grade) for nidus obliteration. A random forest and tree explainer was used to construct a predictive model of nidus obliteration. The prognostic factors affecting nidus obliteration from most to least important were age, nidus size, volume, total prescription dose, and SM grade, using a predictive model. In a specific case for nidus size (1.5 cm), total dose (23 Gy), and SM grade (2), the result showed a high obliteration score of 0.75 with the actual obliteration period of 6 months spent; the mean AUC was 0.90 in K-fold cross validation. The predictive model identified the main contributing factors associated with a prognostic of nidus obliteration from linear accelerator-based SRS for cerebral AVM. It was confirmed that the results, including the prognostic factors, are potentially useful for outcome prediction for patient and treatment. Full article
(This article belongs to the Special Issue Machine Learning in Bioinformatics: Latest Advances and Prospects)
Show Figures

Figure 1

11 pages, 1685 KiB  
Article
An Ensemble Feature Selection Approach for Analysis and Modeling of Transcriptome Data in Alzheimer’s Disease
by Petros Paplomatas, Marios G. Krokidis, Panagiotis Vlamos and Aristidis G. Vrahatis
Appl. Sci. 2023, 13(4), 2353; https://doi.org/10.3390/app13042353 - 11 Feb 2023
Cited by 7 | Viewed by 1747
Abstract
Data-driven analysis and characterization of molecular phenotypes comprises an efficient way to decipher complex disease mechanisms. Using emerging next generation sequencing technologies, important disease-relevant outcomes are extracted, offering the potential for precision diagnosis and therapeutics in progressive disorders. Single-cell RNA sequencing (scRNA-seq) allows [...] Read more.
Data-driven analysis and characterization of molecular phenotypes comprises an efficient way to decipher complex disease mechanisms. Using emerging next generation sequencing technologies, important disease-relevant outcomes are extracted, offering the potential for precision diagnosis and therapeutics in progressive disorders. Single-cell RNA sequencing (scRNA-seq) allows the inherent heterogeneity between individual cellular environments to be exploited and provides one of the most promising platforms for quantifying cell-to-cell gene expression variability. However, the high-dimensional nature of scRNA-seq data poses a significant challenge for downstream analysis, particularly in identifying genes that are dominant across cell populations. Feature selection is a crucial step in scRNA-seq data analysis, reducing the dimensionality of data and facilitating the identification of genes most relevant to the biological question. Herein, we present a need for an ensemble feature selection methodology for scRNA-seq data, specifically in the context of Alzheimer’s disease (AD). We combined various feature selection strategies to obtain the most dominant differentially expressed genes (DEGs) in an AD scRNA-seq dataset, providing a promising approach to identify potential transcriptome biomarkers through scRNA-seq data analysis, which can be applied to other diseases. We anticipate that feature selection techniques, such as our ensemble methodology, will dominate analysis options for transcriptome data, especially as datasets increase in volume and complexity, leading to more accurate classification and the generation of differentially significant features. Full article
(This article belongs to the Special Issue Machine Learning in Bioinformatics: Latest Advances and Prospects)
Show Figures

Figure 1

21 pages, 7471 KiB  
Article
Pseudomonas sp., Strain L5B5: A Genomic and Transcriptomic Insight into an Airborne Mine Bacterium
by Jose Luis Gonzalez-Pimentel, Irene Dominguez-Moñino, Valme Jurado, Ana Teresa Caldeira and Cesareo Saiz-Jimenez
Appl. Sci. 2022, 12(21), 10854; https://doi.org/10.3390/app122110854 - 26 Oct 2022
Cited by 1 | Viewed by 1464
Abstract
Mines, like other subterranean environments, have ecological conditions which allow the thriving of microorganisms. Prokaryotes and fungi are common inhabitants of mines, developing a metabolism suitable for growing in such inhospitable environments. The mine of Lousal, Portugal, is an interesting site for the [...] Read more.
Mines, like other subterranean environments, have ecological conditions which allow the thriving of microorganisms. Prokaryotes and fungi are common inhabitants of mines, developing a metabolism suitable for growing in such inhospitable environments. The mine of Lousal, Portugal, is an interesting site for the study of microorganisms present in their galleries. Aerobiological studies resulted in the isolation of a Pseudomonas sp., strain L5B5, closely related to the opportunistic fish pathogen P. piscis MC042T, and to the soil bacteria P. protegens CHA0T, P. protegens Cab57, and P. protegens Pf-5. Strain L5B5 was able to inhibit the growth of the pathogenic bacteria Bacillus cereus, Staphylococcus aureus, and Acinetobacter baumanii, as well as the cave fungi Aspergillus versicolor, Penicillium chrysogenum, Cladosporium cladosporioides, Fusarium solani, and Ochroconis lascauxensis. In silico analyses based on de novo genome hybrid assembly and RNA-Seq, performing seven conditions based on culture and phases of growth resulted in the prediction and detection of genetic mechanisms involved in secondary metabolites, with the presence of a possible new gene cluster transcribed under the tested conditions, as well as feasible virulence factors and antimicrobial resistance mechanisms. Full article
(This article belongs to the Special Issue Machine Learning in Bioinformatics: Latest Advances and Prospects)
Show Figures

Figure 1

Other

Jump to: Research

25 pages, 2258 KiB  
Systematic Review
PREFMoDeL: A Systematic Review and Proposed Taxonomy of Biomolecular Features for Deep Learning
by Jacob L. North and Victor L. Hsu
Appl. Sci. 2023, 13(7), 4356; https://doi.org/10.3390/app13074356 - 29 Mar 2023
Viewed by 1466
Abstract
Of fundamental importance in biochemical and biomedical research is understanding a molecule’s biological properties—its structure, its function(s), and its activity(ies). To this end, computational methods in Artificial Intelligence, in particular Deep Learning (DL), have been applied to further biomolecular understanding—from analysis and prediction [...] Read more.
Of fundamental importance in biochemical and biomedical research is understanding a molecule’s biological properties—its structure, its function(s), and its activity(ies). To this end, computational methods in Artificial Intelligence, in particular Deep Learning (DL), have been applied to further biomolecular understanding—from analysis and prediction of protein–protein and protein–ligand interactions to drug discovery and design. While choosing the most appropriate DL architecture is vitally important to accurately model the task at hand, equally important is choosing the features used as input to represent molecular properties in these DL models. Through hypothesis testing, bioinformaticians have created thousands of engineered features for biomolecules such as proteins and their ligands. Herein we present an organizational taxonomy for biomolecular features extracted from 808 articles from across the scientific literature. This objective view of biomolecular features can reduce various forms of experimental and/or investigator bias and additionally facilitate feature selection in biomolecular analysis and design tasks. The resulting dataset contains 1360 nondeduplicated features, and a sample of these features were classified by their properties, clustered, and used to suggest new features. The complete feature dataset (the Public Repository of Engineered Features for Molecular Deep Learning, PREFMoDeL) is released for collaborative sourcing on the web. Full article
(This article belongs to the Special Issue Machine Learning in Bioinformatics: Latest Advances and Prospects)
Show Figures

Graphical abstract

Back to TopTop