Applications of Machine Learning in Genetics and Genomics

A special issue of Biomolecules (ISSN 2218-273X). This special issue belongs to the section "Bioinformatics and Systems Biology".

Deadline for manuscript submissions: closed (15 August 2023) | Viewed by 7767

Special Issue Editor


E-Mail Website
Guest Editor
1. Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, North Ryde 3169, Australia
2. Department of Biomedical Sciences, Faculty of Medicine and Health Science, Macquarie University, Macquarie Park 2109, Australia
3. Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park 2109, Australia
Interests: machine learning; big genomic data; artificial intelligence; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are seeking contributions to a Special Issue of Biomolecules on Machine Learning in Genetics and Genomics. We encourage the submission of both algorithmic advances in machine learning relating to genome analytics, as well as insights and knowledge about genetics and genomics derived using machine learning methods. However, we discourage the application of off-the-shelf machine learning solutions to small or widely available datasets.

Dr. Denis Bauer
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biomolecules is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • random forest
  • support vector machines
  • deep learning
  • neural networks
  • artificial intelligence
  • disease gene analytics
  • molecular processes
  • genomic function

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 5862 KiB  
Article
Prediction of Parkinson’s Disease Using Machine Learning Methods
by Jiayu Zhang, Wenchao Zhou, Hongmei Yu, Tong Wang, Xiaqiong Wang, Long Liu and Yalu Wen
Biomolecules 2023, 13(12), 1761; https://doi.org/10.3390/biom13121761 - 08 Dec 2023
Viewed by 1853
Abstract
The detection of Parkinson’s disease (PD) in its early stages is of great importance for its treatment and management, but consensus is lacking on what information is necessary and what models should be used to best predict PD risk. In our study, we [...] Read more.
The detection of Parkinson’s disease (PD) in its early stages is of great importance for its treatment and management, but consensus is lacking on what information is necessary and what models should be used to best predict PD risk. In our study, we first grouped PD-associated factors based on their cost and accessibility, and then gradually incorporated them into risk predictions, which were built using eight commonly used machine learning models to allow for comprehensive assessment. Finally, the Shapley Additive Explanations (SHAP) method was used to investigate the contributions of each factor. We found that models built with demographic variables, hospital admission examinations, clinical assessment, and polygenic risk score achieved the best prediction performance, and the inclusion of invasive biomarkers could not further enhance its accuracy. Among the eight machine learning models considered, penalized logistic regression and XGBoost were the most accurate algorithms for assessing PD risk, with penalized logistic regression achieving an area under the curve of 0.94 and a Brier score of 0.08. Olfactory function and polygenic risk scores were the most important predictors for PD risk. Our research has offered a practical framework for PD risk assessment, where necessary information and efficient machine learning tools were highlighted. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Genetics and Genomics)
Show Figures

Figure 1

14 pages, 2656 KiB  
Article
Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System
by Dhvani Sandip Vora, Shashank Yadav and Durai Sundar
Biomolecules 2023, 13(4), 641; https://doi.org/10.3390/biom13040641 - 03 Apr 2023
Cited by 3 | Viewed by 1535
Abstract
CRISPR/Cas9 technology is capable of precisely editing genomes and is at the heart of various scientific and medical advances in recent times. The advances in biomedical research are hindered because of the inadvertent burden on the genome when genome editors are employed—the off-target [...] Read more.
CRISPR/Cas9 technology is capable of precisely editing genomes and is at the heart of various scientific and medical advances in recent times. The advances in biomedical research are hindered because of the inadvertent burden on the genome when genome editors are employed—the off-target effects. Although experimental screens to detect off-targets have allowed understanding the activity of Cas9, that knowledge remains incomplete as the rules do not extrapolate well to new target sequences. Off-target prediction tools developed recently have increasingly relied on machine learning and deep learning techniques to reliably understand the complete threat of likely off-targets because the rules that drive Cas9 activity are not fully understood. In this study, we present a count-based as well as deep-learning-based approach to derive sequence features that are important in deciding on Cas9 activity at a sequence. There are two major challenges in off-target determination—the identification of a likely site of Cas9 activity and the prediction of the extent of Cas9 activity at that site. The hybrid multitask CNN–biLSTM model developed, named CRISP–RCNN, simultaneously predicts off-targets and the extent of activity on off-targets. Employing methods of integrated gradients and weighting kernels for feature importance approximation, analysis of nucleotide and position preference, and mismatch tolerance have been performed. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Genetics and Genomics)
Show Figures

Figure 1

19 pages, 4774 KiB  
Article
Integrative Analysis and Experimental Validation of Competing Endogenous RNAs in Obstructive Sleep Apnea
by Niannian Li, Yaxin Zhu, Feng Liu, Xiaoman Zhang, Yuenan Liu, Xiaoting Wang, Zhenfei Gao, Jian Guan and Shankai Yin
Biomolecules 2023, 13(4), 639; https://doi.org/10.3390/biom13040639 - 01 Apr 2023
Viewed by 1961
Abstract
Background: Obstructive sleep apnea (OSA) is highly prevalent yet underdiagnosed. This study aimed to develop a predictive signature, as well as investigate competing endogenous RNAs (ceRNAs) and their potential functions in OSA. Methods: The GSE135917, GSE38792, and GSE75097 datasets were collected from the [...] Read more.
Background: Obstructive sleep apnea (OSA) is highly prevalent yet underdiagnosed. This study aimed to develop a predictive signature, as well as investigate competing endogenous RNAs (ceRNAs) and their potential functions in OSA. Methods: The GSE135917, GSE38792, and GSE75097 datasets were collected from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database. Weighted gene correlation network analysis (WGCNA) and differential expression analysis were used to identify OSA-specific mRNAs. Machine learning methods were applied to establish a prediction signature for OSA. Furthermore, several online tools were used to establish the lncRNA-mediated ceRNAs in OSA. The hub ceRNAs were screened using the cytoHubba and validated by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Correlations between ceRNAs and the immune microenvironment of OSA were also investigated. Results: Two gene co-expression modules closely related to OSA and 30 OSA-specific mRNAs were obtained. They were significantly enriched in the antigen presentation and lipoprotein metabolic process categories. A signature that consisted of five mRNAs was established, which showed a good diagnostic performance in both independent datasets. A total of twelve lncRNA-mediated ceRNA regulatory pathways in OSA were proposed and validated, including three mRNAs, five miRNAs, and three lncRNAs. Of note, we found that upregulation of lncRNAs in ceRNAs could lead to activation of the nuclear factor kappa B (NF-κB) pathway. In addition, mRNAs in the ceRNAs were closely correlated to the increased infiltration level of effector memory of CD4 T cells and CD56bright natural killer cells in OSA. Conclusions: In conclusion, our research opens new possibilities for diagnosis of OSA. The newly discovered lncRNA-mediated ceRNA networks and their links to inflammation and immunity may provide potential research spots for future studies. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Genetics and Genomics)
Show Figures

Figure 1

33 pages, 2576 KiB  
Article
Gene Self-Expressive Networks as a Generalization-Aware Tool to Model Gene Regulatory Networks
by Sergio Peignier and Federica Calevro
Biomolecules 2023, 13(3), 526; https://doi.org/10.3390/biom13030526 - 13 Mar 2023
Cited by 2 | Viewed by 1680
Abstract
Self-expressiveness is a mathematical property that aims at characterizing the relationship between instances in a dataset. This property has been applied widely and successfully in computer-vision tasks, time-series analysis, and to infer underlying network structures in domains including protein signaling interactions and social-networks [...] Read more.
Self-expressiveness is a mathematical property that aims at characterizing the relationship between instances in a dataset. This property has been applied widely and successfully in computer-vision tasks, time-series analysis, and to infer underlying network structures in domains including protein signaling interactions and social-networks activity. Nevertheless, despite its potential, self-expressiveness has not been explicitly used to infer gene networks. In this article, we present Generalizable Gene Self-Expressive Networks, a new, interpretable, and generalization-aware formalism to model gene networks, and we propose two methods: GXN•EN and GXN•OMP, based respectively on ElasticNet and OMP (Orthogonal Matching Pursuit), to infer and assess Generalizable Gene Self-Expressive Networks. We evaluate these methods on four Microarray datasets from the DREAM5 benchmark, using both internal and external metrics. The results obtained by both methods are comparable to those obtained by state-of-the-art tools, but are fast to train and exhibit high levels of sparsity, which make them easier to interpret. Moreover we applied these methods to three complex datasets containing RNA-seq informations from different mammalian tissues/cell-types. Lastly, we applied our methodology to compare a normal vs. a disease condition (Alzheimer), which allowed us to detect differential expression of genes’ sub-networks between these two biological conditions. Globally, the gene networks obtained exhibit a sparse and modular structure, with inner communities of genes presenting statistically significant over/under-expression on specific cell types, as well as significant enrichment for some anatomical GO terms, suggesting that such communities may also drive important functional roles. Full article
(This article belongs to the Special Issue Applications of Machine Learning in Genetics and Genomics)
Show Figures

Figure 1

Back to TopTop