Submit to Processes Review for Processes Propose a Special Issue

Journal Menu

Journal Browser

Bioinformatics Applications Based On Machine Learning

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Processes (ISSN 2227-9717). This special issue belongs to the section "Biological Processes and Systems".

Deadline for manuscript submissions: closed (15 December 2020) | Viewed by 48807

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special issue Bioinformatics Applications Based On Machine Learning book cover image

Share This Special Issue

Special Issue Editors

Dr. Pablo Chamoso

E-Mail Website
Guest Editor

BISITE Research Group, University of Salamanca, Calle Espejo sn, 24.2, 37007 Salamanca, Spain
Interests: smart cities; machine learning; IoT
Special Issues, Collections and Topics in MDPI journals

Dr. Sara Rodriguez

E-Mail Website
Guest Editor

BISITE Research Group, University of Salamanca, 37008 Salamanca, Spain
Interests: artificial intelligence; distributed computing; machine learning; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Mohd Saberi Mohamad

E-Mail Website
Guest Editor

Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu 16100, Malaysia
Interests: artificial intelligence and intelligent systems; bioinformatics and computational biology
Special Issues, Collections and Topics in MDPI journals

Dr. Alfonso González-Briones

E-Mail Website
Guest Editor

Department of Computer Science and Automation Control, University of Salamanca, 37007 Salamanca, Spain
Interests: smart cities; IoT; Industry 4.0; machine learning; artificial intelligence natural language processing; computational technologies; sentiment analysis
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Research in the area of bioinformatics has always been one of the most active lines of research in the scientific community. However, it has gained even more interest thanks to the increased processing capacities of computers, which allow processing large volumes of data and analyzing them with techniques such as machine learning.

Thanks to these advances, new applications appear in the area of bioinformatics. In them, the results obtained generally improve those of previous applications that do not use these computation techniques.

In this Special Issue, we seek research and case studies that demonstrate the application of machine learning to support applied scientific research, in any area of bioinformatics. Example topics include (but are not limited to) the following topics applied to bioinformatics:

- New machine learning algorithms
- Distributed machine learning systems
- New applications on bioinformatics
- Health-care applications
- Bio imaging
- Next generation sequencing
- Data and software integration
- Visualization of biological systems and networks
- High-throughput data analysis (transcriptomics, proteomics, etc)
- Comparison and alignment methods

Dr. Pablo Chamoso
Dr. Sara Rodríguez González
Prof. Dr. Mohd Saberi Mohamad
Dr. Alfonso González Briones
Guest editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Processes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

bioinformatics applications
machine learning
artificial intelligence

Published Papers (11 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

14 pages, 1333 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

Population-Based Parameter Identification for Dynamical Models of Biological Networks with an Application to Saccharomyces cerevisiae

by Ewelina Weglarz-Tomczak, Jakub M. Tomczak, Agoston E. Eiben and Stanley Brul

Processes 2021, 9(1), 98; https://doi.org/10.3390/pr9010098 - 05 Jan 2021

Cited by 3 | Viewed by 2991

Abstract

One of the central elements in systems biology is the interaction between mathematical modeling and measured quantities. Typically, biological phenomena are represented as dynamical systems, and they are further analyzed and comprehended by identifying model parameters using experimental data. However, all model parameters cannot be found by gradient-based optimization methods by fitting the model to the experimental data due to the non-differentiable character of the problem. Here, we present POPI4SB, a Python-based framework for population-based parameter identification of dynamic models in systems biology. The code is built on top of PySCeS that provides an engine to run dynamic simulations. The idea behind the methodology is to provide a set of derivative-free optimization methods that utilize a population of candidate solutions to find a better solution iteratively. Additionally, we propose two surrogate-assisted population-based methods, namely, a combination of a k-nearest-neighbor regressor with the Reversible Differential Evolution and the Evolution of Distribution Algorithm, that speeds up convergence. We present the optimization framework on the example of the well-studied glycolytic pathway in Saccharomyces cerevisiae. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Graphical abstract

23 pages, 7074 KiB

Open AccessArticle

A Genetic Programming Strategy to Induce Logical Rules for Clinical Data Analysis

by José A. Castellanos-Garzón, Yeray Mezquita Martín, José Luis Jaimes Sánchez, Santiago Manuel López García and Ernesto Costa

Processes 2020, 8(12), 1565; https://doi.org/10.3390/pr8121565 - 27 Nov 2020

Viewed by 2064

Abstract

This paper proposes a machine learning approach dealing with genetic programming to build classifiers through logical rule induction. In this context, we define and test a set of mutation operators across from different clinical datasets to improve the performance of the proposal for each dataset. The use of genetic programming for rule induction has generated interesting results in machine learning problems. Hence, genetic programming represents a flexible and powerful evolutionary technique for automatic generation of classifiers. Since logical rules disclose knowledge from the analyzed data, we use such knowledge to interpret the results and filter the most important features from clinical data as a process of knowledge discovery. The ultimate goal of this proposal is to provide the experts in the data domain with prior knowledge (as a guide) about the structure of the data and the rules found for each class, especially to track dichotomies and inequality. The results reached by our proposal on the involved datasets have been very promising when used in classification tasks and compared with other methods. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

12 pages, 1297 KiB

Open AccessFeature PaperArticle

A Hybrid of Particle Swarm Optimization and Harmony Search to Estimate Kinetic Parameters in Arabidopsis thaliana

by Mohamad Saufie Rosle, Mohd Saberi Mohamad, Yee Wen Choon, Zuwairie Ibrahim, Alfonso González-Briones, Pablo Chamoso and Juan Manuel Corchado

Processes 2020, 8(8), 921; https://doi.org/10.3390/pr8080921 - 02 Aug 2020

Cited by 5 | Viewed by 2202

Abstract

Recently, modelling and simulation have been used and applied to understand biological systems better. Therefore, the development of precise computational models of a biological system is essential. This model is a mathematical expression derived from a series of parameters of the system. The measurement of parameter values through experimentation is often expensive and time-consuming. However, if a simulation is used, the manipulation of computational parameters is easy, and thus the behaviour of a biological system model can be altered for a better understanding. The complexity and nonlinearity of a biological system make parameter estimation the most challenging task in modelling. Therefore, this paper proposes a hybrid of Particle Swarm Optimization (PSO) and Harmony Search (HS), also known as PSOHS, designated to determine the kinetic parameter values of essential amino acids, mainly aspartate metabolism, in Arabidopsis thaliana. Three performance measurements are used in this paper to evaluate the proposed PSOHS: the standard deviation, nonlinear least squared error, and computational time. The proposed algorithm outperformed the other two methods, namely Simulated Annealing and the downhill simplex method, and proved that PSOHS is a more suitable algorithm for estimating kinetic parameter values. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

15 pages, 2206 KiB

Open AccessArticle

MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM

by Samee Ullah Khan and Ran Baik

Processes 2020, 8(6), 725; https://doi.org/10.3390/pr8060725 - 22 Jun 2020

Cited by 30 | Viewed by 3286

Abstract

Mitochondrial proteins of Plasmodium falciparum (MPPF) are an important target for anti-malarial drugs, but their identification through manual experimentation is costly, and in turn, their related drugs production by pharmaceutical institutions involves a prolonged time duration. Therefore, it is highly desirable for pharmaceutical companies to develop computationally automated and reliable approach to identify proteins precisely, resulting in appropriate drug production in a timely manner. In this direction, several computationally intelligent techniques are developed to extract local features from biological sequences using machine learning methods followed by various classifiers to discriminate the nature of proteins. Unfortunately, these techniques demonstrate poor performance while capturing contextual features from sequence patterns, yielding non-representative classifiers. In this paper, we proposed a sequence-based framework to extract deep and representative features that are trust-worthy for Plasmodium mitochondrial proteins identification. The backbone of the proposed framework is MPPF identification-net (MPPFI-Net), that is based on a convolutional neural network (CNN) with multilayer bi-directional long short-term memory (MBD-LSTM). MPPIF-Net inputs protein sequences, passes through various convolution and pooling layers to optimally extract learned features. We pass these features into our sequence learning mechanism, MBD-LSTM, that is particularly trained to classify them into their relevant classes. Our proposed model is experimentally evaluated on newly prepared dataset PF2095 and two existing benchmark datasets i.e., PF175 and MPD using the holdout method. The proposed method achieved 97.6%, 97.1%, and 99.5% testing accuracy on PF2095, PF175, and MPD datasets, respectively, which outperformed the state-of-the-art approaches. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

19 pages, 2802 KiB

Open AccessArticle

Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements

by Simon Orozco-Arias, Johan S. Piña, Reinel Tabares-Soto, Luis F. Castillo-Ossa, Romain Guyot and Gustavo Isaza

Processes 2020, 8(6), 638; https://doi.org/10.3390/pr8060638 - 27 May 2020

Cited by 23 | Viewed by 6294

Abstract

Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure the realistic performance of algorithms. Each metric has specific characteristics and measures properties that may be different from the predicted results. Although the most commonly used way to compare measures is by using empirical analysis, a non-result-based methodology has been proposed, called measure invariance properties. These properties are calculated on the basis of whether a given measure changes its value under certain modifications in the confusion matrix, giving comparative parameters independent of the datasets. Measure invariance properties make metrics more or less informative, particularly on unbalanced, monomodal, or multimodal negative class datasets and for real or simulated datasets. Although several studies applied ML to detect and classify TEs, there are no works evaluating performance metrics in TE tasks. Here, we analyzed 26 different metrics utilized in binary, multiclass, and hierarchical classifications, through bibliographic sources, and their invariance properties. Then, we corroborated our findings utilizing freely available TE datasets and commonly used ML algorithms. Based on our analysis, the most suitable metrics for TE tasks must be stable, even using highly unbalanced datasets, multimodal negative class, and training datasets with errors or outliers. Based on these parameters, we conclude that the F1-score and the area under the precision-recall curve are the most informative metrics since they are calculated based on other metrics, providing insight into the development of an ML application. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

24 pages, 800 KiB

Open AccessFeature PaperArticle

An Adjective Selection Personality Assessment Method Using Gradient Boosting Machine Learning

by Bruno Fernandes, Alfonso González-Briones, Paulo Novais, Miguel Calafate, Cesar Analide and José Neves

Processes 2020, 8(5), 618; https://doi.org/10.3390/pr8050618 - 21 May 2020

Cited by 6 | Viewed by 5793

Abstract

Goldberg’s 100 Unipolar Markers remains one of the most popular ways to measure personality traits, in particular, the Big Five. An important reduction was later preformed by Saucier, using a sub-set of 40 markers. Both assessments are performed by presenting a set of markers, or adjectives, to the subject, requesting him to quantify each marker using a 9-point rating scale. Consequently, the goal of this study is to conduct experiments and propose a shorter alternative where the subject is only required to identify which adjectives describe him the most. Hence, a web platform was developed for data collection, requesting subjects to rate each adjective and select those describing him the most. Based on a Gradient Boosting approach, two distinct Machine Learning architectures were conceived, tuned and evaluated. The first makes use of regressors to provide an exact score of the Big Five while the second uses classifiers to provide a binned output. As input, both receive the one-hot encoded selection of adjectives. Both architectures performed well. The first is able to quantify the Big Five with an approximate error of 5 units of measure, while the second shows a micro-averaged f1-score of 83%. Since all adjectives are used to compute all traits, models are able to harness inter-trait relationships, being possible to further reduce the set of adjectives by removing those that have smaller importance. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

14 pages, 660 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

Bioinspired Hybrid Model to Predict the Hydrogen Inlet Fuel Cell Flow Change of an Energy Storage System

by Héctor Alaiz-Moretón, Esteban Jove, José-Luis Casteleiro-Roca, Héctor Quintián, Hilario López García, José Alberto Benítez-Andrades, Paulo Novais and Jose Luis Calvo-Rolle

Processes 2019, 7(11), 825; https://doi.org/10.3390/pr7110825 - 07 Nov 2019

Cited by 9 | Viewed by 3151

Abstract

The present research work deals with prediction of hydrogen consumption of a fuel cell in an energy storage system. Due to the fact that these kind of systems have a very nonlinear behaviour, the use of traditional techniques based on parametric models and other more sophisticated techniques such as soft computing methods, seems not to be accurate enough to generate good models of the system under study. Due to that, a hybrid intelligent system, based on clustering and regression techniques, has been developed and implemented to predict the necessary variation of the hydrogen flow consumption to satisfy the variation of demanded power to the fuel cell. In this research, a hybrid intelligent model was created and validated over a dataset from a fuel cell energy storage system. Obtained results validate the proposal, achieving better performance than other well-known classical regression methods, allowing us to predict the hydrogen consumption with a Mean Absolute Error (MAE) of

3.73

with the validation dataset. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

23 pages, 4111 KiB

Open AccessFeature PaperArticle

Ear Detection and Localization with Convolutional Neural Networks in Natural Images and Videos

by William Raveane, Pedro Luis Galdámez and María Angélica González Arrieta

Processes 2019, 7(7), 457; https://doi.org/10.3390/pr7070457 - 17 Jul 2019

Cited by 18 | Viewed by 7252

Abstract

The difficulty in precisely detecting and locating an ear within an image is the first step to tackle in an ear-based biometric recognition system, a challenge which increases in difficulty when working with variable photographic conditions. This is in part due to the irregular shapes of human ears, but also because of variable lighting conditions and the ever changing profile shape of an ear’s projection when photographed. An ear detection system involving multiple convolutional neural networks and a detection grouping algorithm is proposed to identify the presence and location of an ear in a given input image. The proposed method matches the performance of other methods when analyzed against clean and purpose-shot photographs, reaching an accuracy of upwards of 98%, but clearly outperforms them with a rate of over 86% when the system is subjected to non-cooperative natural images where the subject appears in challenging orientations and photographic conditions. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

18 pages, 3496 KiB

Open AccessArticle

An Accurate Clinical Implication Assessment for Diabetes Mellitus Prevalence Based on a Study from Nigeria

by Muhammad Noman Sohail, Ren Jiadong, Musa Uba Muhammad, Sohaib Tahir Chauhdary, Jehangir Arshad and Antony John Verghese

Processes 2019, 7(5), 289; https://doi.org/10.3390/pr7050289 - 15 May 2019

Cited by 10 | Viewed by 4013

Abstract

The increasing rate of diabetes is found across the planet. Therefore, the diagnosis of pre-diabetes and diabetes is important in populations with extreme diabetes risk. In this study, a machine learning technique was implemented over a data mining platform by employing Rule classifiers (PART and Decision table) to measure the accuracy and logistic regression on the classification results for forecasting the prevalence in diabetes mellitus patients suffering simultaneously from other chronic disease symptoms. The real-life data was collected in Nigeria between December 2017 and February 2019 by applying ten non-intrusive and easily available clinical variables. The results disclosed that the Rule classifiers achieved a mean accuracy of 98.75%. The error rate, precision, recall, F-measure, and Matthew’s correlation coefficient MCC were 0.02%, 0.98%, 0.98%, 0.98%, and 0.97%, respectively. The forecast decision, achieved by employing a set of 23 decision rules (DR), indicates that age, gender, glucose level, and body mass are fundamental reasons for diabetes, followed by work stress, diet, family diabetes history, physical exercise, and cardiovascular stroke history. The study validated that the proposed set of DR is practical for quick screening of diabetes mellitus patients at the initial stage without intrusive medical tests and was found to be effective in the initial diagnosis of diabetes. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

11 pages, 2919 KiB

Open AccessArticle

A Machine Learning-based Pipeline for the Classification of CTX-M in Metagenomics Samples

by Diego Ceballos, Diana López-Álvarez, Gustavo Isaza, Reinel Tabares-Soto, Simón Orozco-Arias and Carlos D. Ferrin

Processes 2019, 7(4), 235; https://doi.org/10.3390/pr7040235 - 24 Apr 2019

Cited by 5 | Viewed by 4908

Abstract

Bacterial infections are a major global concern, since they can lead to public health problems. To address this issue, bioinformatics contributes extensively with the analysis and interpretation of in silico data by enabling to genetically characterize different individuals/strains, such as in bacteria. However, the growing volume of metagenomic data requires new infrastructure, technologies, and methodologies that support the analysis and prediction of this information from a clinical point of view, as intended in this work. On the other hand, distributed computational environments allow the management of these large volumes of data, due to significant advances in processing architectures, such as multicore CPU (Central Process Unit) and GPGPU (General Propose Graphics Process Unit). For this purpose, we developed a bioinformatics workflow based on filtered metagenomic data with Duk tool. Data formatting was done through Emboss software and a prototype of a workflow. A pipeline was also designed and implemented in bash script based on machine learning. Further, Python 3 programming language was used to normalize the training data of the artificial neural network, which was implemented in the TensorFlow framework, and its behavior was visualized in TensorBoard. Finally, the values from the initial bioinformatics process and the data generated during the parameterization and optimization of the Artificial Neural Network are presented and validated based on the most optimal result for the identification of the CTX-M gene group. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Figure 1

Review

Jump to: Research

18 pages, 527 KiB

Open AccessFeature PaperReview

A Review of Computational Methods for Clustering Genes with Similar Biological Functions

by Hui Wen Nies, Zalmiyah Zakaria, Mohd Saberi Mohamad, Weng Howe Chan, Nazar Zaki, Richard O. Sinnott, Suhaimi Napis, Pablo Chamoso, Sigeru Omatu and Juan Manuel Corchado

Processes 2019, 7(9), 550; https://doi.org/10.3390/pr7090550 - 21 Aug 2019

Cited by 12 | Viewed by 5091

Abstract

Clustering techniques can group genes based on similarity in biological functions. However, the drawback of using clustering techniques is the inability to identify an optimal number of potential clusters beforehand. Several existing optimization techniques can address the issue. Besides, clustering validation can predict the possible number of potential clusters and hence increase the chances of identifying biologically informative genes. This paper reviews and provides examples of existing methods for clustering genes, optimization of the objective function, and clustering validation. Clustering techniques can be categorized into partitioning, hierarchical, grid-based, and density-based techniques. We also highlight the advantages and the disadvantages of each category. To optimize the objective function, here we introduce the swarm intelligence technique and compare the performances of other methods. Moreover, we discuss the differences of measurements between internal and external criteria to validate a cluster quality. We also investigate the performance of several clustering techniques by applying them on a leukemia dataset. The results show that grid-based clustering techniques provide better classification accuracy; however, partitioning clustering techniques are superior in identifying prognostic markers of leukemia. Therefore, this review suggests combining clustering techniques such as CLIQUE and k-means to yield high-quality gene clusters. Full article

(This article belongs to the Special Issue Bioinformatics Applications Based On Machine Learning)

► Show Figures

Journal Menu

Journal Browser

Bioinformatics Applications Based On Machine Learning

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (11 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI