Machine Learning Applications in Biology

A special issue of Biology (ISSN 2079-7737). This special issue belongs to the section "Bioinformatics".

Deadline for manuscript submissions: 31 May 2024 | Viewed by 16468

Special Issue Editor


E-Mail Website
Guest Editor
1. Laboratory of Hygiene and Epidemiology, Department of Clinical and Laboratory Research, Faculty of Medicine, University of Thessaly, 41222 Larisa, Greece
2. Laboratory of Genetics, Department of Biology, University of Patras, 26500 Patras, Greece
Interests: machine learning; long noncoding RNAs; microRNAs; genomics; epigenomics; T cell development

Special Issue Information

Dear Colleagues,

We are pleased to invite you to contribute to the Special Issue titled “Machine Learning Applications in Biology” in the Bioinformatics section of Biology journal.

Gene regulatory networks (GRNs) represent a fundamental mechanism for maintaining the homeostasis of cells, while their inherent plasticity enables the occurrence of dynamic processes such as cell differentiation and adaptation to environmental stimuli, among others. Abrupt changes in GRNs, which can often be attributed to environmental factors or genetic variation, often lead to the development of pathological conditions, including cancer, autoimmune disorders, etc.

GRN is an umbrella term that refers to the complex set of interactions between genomic and epigenomic elements that drive the fine-tuning process of gene expression. GRNs typically consist of elements such as protein-coding and noncoding RNAs (i.e., long noncoding RNAs and microRNAs), transcriptional (i.e., DNA binding proteins) and post-transcriptional (i.e., RNA binding proteins and RNA modification enzymes) regulators, chromatin remodeling factors, DNA methylation enzymes, and virtually any type of molecule that is implicated in mechanisms affecting gene expression.

Machine learning (ML) has been an indispensable tool at the hands of researchers studying any of the aforementioned elements. From building ML-based computational methods for, i.e., predicting transcription factor binding sites, genomic loci that harbor genes, microRNA:gene interactions, RNA binding protein recognition sites, and histone modification driven genome segmentation to modeling complex relationships between the environment and genetic variation or integrating multipurpose experimental data, ML has been instrumental in shedding light on the darkest corners of biology research.

This Special Issue aims to be the substrate of disseminating state-of-the-art and high quality research regarding ML applications on any of the aforementioned fields, since we believe that these fields represent the quintessence of biology research and perfectly fit the aim and scope of this journal.In this Special Issue, original research articles and reviews are welcome. Research areas may include (but not limited to) the following:

  • Genomics.
  • Epigenomics.
  • Interplay between genetic variation and the environment.
  • Gene regulatory networks.
  • Integration of multipurpose next generation sequencing data (bulk or single cell) in the context of the aforementioned thematic areas.
  • Epidemiology.

We look forward to receiving your contributions.

Dr. Georgios K. Georgakilas
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biology is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • genomics
  • epigenomics
  • genetic variation and the environment
  • gene regulatory networks
  • integration of multipurpose NGS data

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

12 pages, 2286 KiB  
Article
Single-Cell Measurements and Modeling and Computation of Decision-Making Errors in a Molecular Signaling System with Two Output Molecules
by Ali Emadi, Tomasz Lipniacki, Andre Levchenko and Ali Abdi
Biology 2023, 12(12), 1461; https://doi.org/10.3390/biology12121461 - 23 Nov 2023
Viewed by 1044
Abstract
A cell constantly receives signals and takes different fates accordingly. Given the uncertainty rendered by signal transduction noise, a cell may incorrectly perceive these signals. It may mistakenly behave as if there is a signal, although there is none, or may miss the [...] Read more.
A cell constantly receives signals and takes different fates accordingly. Given the uncertainty rendered by signal transduction noise, a cell may incorrectly perceive these signals. It may mistakenly behave as if there is a signal, although there is none, or may miss the presence of a signal that actually exists. In this paper, we consider a signaling system with two outputs, and introduce and develop methods to model and compute key cell decision-making parameters based on the two outputs and in response to the input signal. In the considered system, the tumor necrosis factor (TNF) regulates the two transcription factors, the nuclear factor κB (NFκB) and the activating transcription factor-2 (ATF-2). These two system outputs are involved in important physiological functions such as cell death and survival, viral replication, and pathological conditions, such as autoimmune diseases and different types of cancer. Using the introduced methods, we compute and show what the decision thresholds are, based on the single-cell measured concentration levels of NFκB and ATF-2. We also define and compute the decision error probabilities, i.e., false alarm and miss probabilities, based on the concentration levels of the two outputs. By considering the joint response of the two outputs of the signaling system, one can learn more about complex cellular decision-making processes, the corresponding decision error rates, and their possible involvement in the development of some pathological conditions. Full article
(This article belongs to the Special Issue Machine Learning Applications in Biology)
Show Figures

Figure 1

19 pages, 2118 KiB  
Article
Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes
by Ondřej Vaculík, Eliška Chalupová, Katarína Grešová, Tomáš Majtner and Panagiotis Alexiou
Biology 2023, 12(10), 1276; https://doi.org/10.3390/biology12101276 - 25 Sep 2023
Viewed by 1262
Abstract
RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological [...] Read more.
RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological domains, including the field of protein–RNA interactions. Nonetheless, several challenges persist, such as the limited availability of experimentally validated binding sites to train well-performing DL models for the majority of proteins. Here, we present a novel training approach based on transfer learning (TL) to address the issue of limited data. Employing a sophisticated and interpretable architecture, we compare the performance of our method trained using two distinct approaches: training from scratch (SCR) and utilizing TL. Additionally, we benchmark our results against the current state-of-the-art methods. Furthermore, we tackle the challenges associated with selecting appropriate input features and determining optimal interval sizes. Our results show that TL enhances model performance, particularly in datasets with minimal training data, where satisfactory results can be achieved with just a few hundred RNA binding sites. Moreover, we demonstrate that integrating both sequence and evolutionary conservation information leads to superior performance. Additionally, we showcase how incorporating an attention layer into the model facilitates the interpretation of predictions within a biologically relevant context. Full article
(This article belongs to the Special Issue Machine Learning Applications in Biology)
Show Figures

Figure 1

31 pages, 3113 KiB  
Article
Literature-Based Discovery to Elucidate the Biological Links between Resistant Hypertension and COVID-19
by David Kartchner, Kevin McCoy, Janhvi Dubey, Dongyu Zhang, Kevin Zheng, Rushda Umrani, James J. Kim and Cassie S. Mitchell
Biology 2023, 12(9), 1269; https://doi.org/10.3390/biology12091269 - 21 Sep 2023
Cited by 1 | Viewed by 1870
Abstract
Multiple studies have reported new or exacerbated persistent or resistant hypertension in patients previously infected with COVID-19. We used literature-based discovery to identify and prioritize multi-scalar explanatory biology that relates resistant hypertension to COVID-19. Cross-domain text mining of 33+ million PubMed articles within [...] Read more.
Multiple studies have reported new or exacerbated persistent or resistant hypertension in patients previously infected with COVID-19. We used literature-based discovery to identify and prioritize multi-scalar explanatory biology that relates resistant hypertension to COVID-19. Cross-domain text mining of 33+ million PubMed articles within a comprehensive knowledge graph was performed using SemNet 2.0. Unsupervised rank aggregation determined which concepts were most relevant utilizing the normalized HeteSim score. A series of simulations identified concepts directly related to COVID-19 and resistant hypertension or connected via one of three renin–angiotensin–aldosterone system hub nodes (mineralocorticoid receptor, epithelial sodium channel, angiotensin I receptor). The top-ranking concepts relating COVID-19 to resistant hypertension included: cGMP-dependent protein kinase II, MAP3K1, haspin, ral guanine nucleotide exchange factor, N-(3-Oxododecanoyl)-L-homoserine lactone, aspartic endopeptidases, metabotropic glutamate receptors, choline-phosphate cytidylyltransferase, protein tyrosine phosphatase, tat genes, MAP3K10, uridine kinase, dicer enzyme, CMD1B, USP17L2, FLNA, exportin 5, somatotropin releasing hormone, beta-melanocyte stimulating hormone, pegylated leptin, beta-lipoprotein, corticotropin, growth hormone-releasing peptide 2, pro-opiomelanocortin, alpha-melanocyte stimulating hormone, prolactin, thyroid hormone, poly-beta-hydroxybutyrate depolymerase, CR 1392, BCR-ABL fusion gene, high density lipoprotein sphingomyelin, pregnancy-associated murine protein 1, recQ4 helicase, immunoglobulin heavy chain variable domain, aglycotransferrin, host cell factor C1, ATP6V0D1, imipramine demethylase, TRIM40, H3C2 gene, COL1A1+COL1A2 gene, QARS gene, VPS54, TPM2, MPST, EXOSC2, ribosomal protein S10, TAP-144, gonadotropins, human gonadotropin releasing hormone 1, beta-lipotropin, octreotide, salmon calcitonin, des-n-octanoyl ghrelin, liraglutide, gastrins. Concepts were mapped to six physiological themes: altered endocrine function, 23.1%; inflammation or cytokine storm, 21.3%; lipid metabolism and atherosclerosis, 17.6%; sympathetic input to blood pressure regulation, 16.7%; altered entry of COVID-19 virus, 14.8%; and unknown, 6.5%. Full article
(This article belongs to the Special Issue Machine Learning Applications in Biology)
Show Figures

Figure 1

13 pages, 2415 KiB  
Article
Utilization of Computer Classification Methods for Exposure Prediction and Gene Selection in Daphnia magna Toxicogenomics
by Berkay Paylar, Martin Längkvist, Jana Jass and Per-Erik Olsson
Biology 2023, 12(5), 692; https://doi.org/10.3390/biology12050692 - 09 May 2023
Viewed by 1347
Abstract
Zinc (Zn) is an essential element that influences many cellular functions. Depending on bioavailability, Zn can cause both deficiency and toxicity. Zn bioavailability is influenced by water hardness. Therefore, water quality analysis for health-risk assessment should consider both Zn concentration and water hardness. [...] Read more.
Zinc (Zn) is an essential element that influences many cellular functions. Depending on bioavailability, Zn can cause both deficiency and toxicity. Zn bioavailability is influenced by water hardness. Therefore, water quality analysis for health-risk assessment should consider both Zn concentration and water hardness. However, exposure media selection for traditional toxicology tests are set to defined hardness levels and do not represent the diverse water chemistry compositions observed in nature. Moreover, these tests commonly use whole organism endpoints, such as survival and reproduction, which require high numbers of test animals and are labor intensive. Gene expression stands out as a promising alternative to provide insight into molecular events that can be used for risk assessment. In this work, we apply machine learning techniques to classify the Zn concentrations and water hardness from Daphnia magna gene expression by using quantitative PCR. A method for gene ranking was explored using techniques from game theory, namely, Shapley values. The results show that standard machine learning classifiers can classify both Zn concentration and water hardness simultaneously, and that Shapley values are a versatile and useful alternative for gene ranking that can provide insight about the importance of individual genes. Full article
(This article belongs to the Special Issue Machine Learning Applications in Biology)
Show Figures

Figure 1

19 pages, 4599 KiB  
Article
SigPrimedNet: A Signaling-Informed Neural Network for scRNA-seq Annotation of Known and Unknown Cell Types
by Pelin Gundogdu, Inmaculada Alamo, Isabel A. Nepomuceno-Chamorro, Joaquin Dopazo and Carlos Loucera
Biology 2023, 12(4), 579; https://doi.org/10.3390/biology12040579 - 10 Apr 2023
Cited by 3 | Viewed by 2505
Abstract
Single-cell RNA sequencing is increasing our understanding of the behavior of complex tissues or organs, by providing unprecedented details on the complex cell type landscape at the level of individual cells. Cell type definition and functional annotation are key steps to understanding the [...] Read more.
Single-cell RNA sequencing is increasing our understanding of the behavior of complex tissues or organs, by providing unprecedented details on the complex cell type landscape at the level of individual cells. Cell type definition and functional annotation are key steps to understanding the molecular processes behind the underlying cellular communication machinery. However, the exponential growth of scRNA-seq data has made the task of manually annotating cells unfeasible, due not only to an unparalleled resolution of the technology but to an ever-increasing heterogeneity of the data. Many supervised and unsupervised methods have been proposed to automatically annotate cells. Supervised approaches for cell-type annotation outperform unsupervised methods except when new (unknown) cell types are present. Here, we introduce SigPrimedNet an artificial neural network approach that leverages (i) efficient training by means of a sparsity-inducing signaling circuits-informed layer, (ii) feature representation learning through supervised training, and (iii) unknown cell-type identification by fitting an anomaly detection method on the learned representation. We show that SigPrimedNet can efficiently annotate known cell types while keeping a low false-positive rate for unseen cells across a set of publicly available datasets. In addition, the learned representation acts as a proxy for signaling circuit activity measurements, which provide useful estimations of the cell functionalities. Full article
(This article belongs to the Special Issue Machine Learning Applications in Biology)
Show Figures

Figure 1

20 pages, 3619 KiB  
Article
Finding a Husband: Using Explainable AI to Define Male Mosquito Flight Differences
by Yasser M. Qureshi, Vitaly Voloshin, Luca Facchinelli, Philip J. McCall, Olga Chervova, Cathy E. Towers, James A. Covington and David P. Towers
Biology 2023, 12(4), 496; https://doi.org/10.3390/biology12040496 - 24 Mar 2023
Cited by 1 | Viewed by 2132
Abstract
Mosquito-borne diseases account for around one million deaths annually. There is a constant need for novel intervention mechanisms to mitigate transmission, especially as current insecticidal methods become less effective with the rise of insecticide resistance among mosquito populations. Previously, we used a near [...] Read more.
Mosquito-borne diseases account for around one million deaths annually. There is a constant need for novel intervention mechanisms to mitigate transmission, especially as current insecticidal methods become less effective with the rise of insecticide resistance among mosquito populations. Previously, we used a near infra-red tracking system to describe the behaviour of mosquitoes at a human-occupied bed net, work that eventually led to an entirely novel bed net design. Advancing that approach, here we report on the use of trajectory analysis of a mosquito flight, using machine learning methods. This largely unexplored application has significant potential for providing useful insights into the behaviour of mosquitoes and other insects. In this work, a novel methodology applies anomaly detection to distinguish male mosquito tracks from females and couples. The proposed pipeline uses new feature engineering techniques and splits each track into segments such that detailed flight behaviour differences influence the classifier rather than the experimental constraints such as the field of view of the tracking system. Each segment is individually classified and the outcomes are combined to classify whole tracks. By interpreting the model using SHAP values, the features of flight that contribute to the differences between sexes are found and are explained by expert opinion. This methodology was tested using 3D tracks generated from mosquito mating swarms in the field and obtained a balanced accuracy of 64.5% and an ROC AUC score of 68.4%. Such a system can be used in a wide variety of trajectory domains to detect and analyse the behaviours of different classes, e.g., sex, strain, and species. The results of this study can support genetic mosquito control interventions for which mating represents a key event for their success. Full article
(This article belongs to the Special Issue Machine Learning Applications in Biology)
Show Figures

Figure 1

16 pages, 2841 KiB  
Article
Using Attribution Sequence Alignment to Interpret Deep Learning Models for miRNA Binding Site Prediction
by Katarína Grešová, Ondřej Vaculík and Panagiotis Alexiou
Biology 2023, 12(3), 369; https://doi.org/10.3390/biology12030369 - 26 Feb 2023
Cited by 1 | Viewed by 1692
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that play a central role in the post-transcriptional regulation of biological processes. miRNAs regulate transcripts through direct binding involving the Argonaute protein family. The exact rules of binding are not known, and several in silico miRNA target [...] Read more.
MicroRNAs (miRNAs) are small non-coding RNAs that play a central role in the post-transcriptional regulation of biological processes. miRNAs regulate transcripts through direct binding involving the Argonaute protein family. The exact rules of binding are not known, and several in silico miRNA target prediction methods have been developed to date. Deep learning has recently revolutionized miRNA target prediction. However, the higher predictive power comes with a decreased ability to interpret increasingly complex models. Here, we present a novel interpretation technique, called attribution sequence alignment, for miRNA target site prediction models that can interpret such deep learning models on a two-dimensional representation of miRNA and putative target sequence. Our method produces a human readable visual representation of miRNA:target interactions and can be used as a proxy for the further interpretation of biological concepts learned by the neural network. We demonstrate applications of this method in the clustering of experimental data into binding classes, as well as using the method to narrow down predicted miRNA binding sites on long transcript sequences. Importantly, the presented method works with any neural network model trained on a two-dimensional representation of interactions and can be easily extended to further domains such as protein–protein interactions. Full article
(This article belongs to the Special Issue Machine Learning Applications in Biology)
Show Figures

Figure 1

Review

Jump to: Research

24 pages, 761 KiB  
Review
Small RNA Targets: Advances in Prediction Tools and High-Throughput Profiling
by Katarína Grešová, Panagiotis Alexiou and Ilektra-Chara Giassa
Biology 2022, 11(12), 1798; https://doi.org/10.3390/biology11121798 - 11 Dec 2022
Cited by 3 | Viewed by 3054
Abstract
MicroRNAs (miRNAs) are an abundant class of small non-coding RNAs that regulate gene expression at the post-transcriptional level. They are suggested to be involved in most biological processes of the cell primarily by targeting messenger RNAs (mRNAs) for cleavage or translational repression. Their [...] Read more.
MicroRNAs (miRNAs) are an abundant class of small non-coding RNAs that regulate gene expression at the post-transcriptional level. They are suggested to be involved in most biological processes of the cell primarily by targeting messenger RNAs (mRNAs) for cleavage or translational repression. Their binding to their target sites is mediated by the Argonaute (AGO) family of proteins. Thus, miRNA target prediction is pivotal for research and clinical applications. Moreover, transfer-RNA-derived fragments (tRFs) and other types of small RNAs have been found to be potent regulators of Ago-mediated gene expression. Their role in mRNA regulation is still to be fully elucidated, and advancements in the computational prediction of their targets are in their infancy. To shed light on these complex RNA–RNA interactions, the availability of good quality high-throughput data and reliable computational methods is of utmost importance. Even though the arsenal of computational approaches in the field has been enriched in the last decade, there is still a degree of discrepancy between the results they yield. This review offers an overview of the relevant advancements in the field of bioinformatics and machine learning and summarizes the key strategies utilized for small RNA target prediction. Furthermore, we report the recent development of high-throughput sequencing technologies, and explore the role of non-miRNA AGO driver sequences. Full article
(This article belongs to the Special Issue Machine Learning Applications in Biology)
Show Figures

Figure 1

Back to TopTop