Selected Papers from the 9th International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2022)

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Technologies and Resources for Genetics".

Deadline for manuscript submissions: closed (15 October 2022) | Viewed by 19691

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Engineering, Automatics and Robotics (ICAR), Information and Communications Technology Centre (CITIC-UGR), University of Granada, 18010 Granada, Spain
Interests: machine learning algorithms; data mining; bioinformatics; computational biology
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Systems Biology Group, Dip. Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi, 24, 10129 Torino, Italy
Interests: systems biology

E-Mail Website
Guest Editor
School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester M13 9PT, UK
Interests: Pathways and biological systems modelling

E-Mail Website
Guest Editor
Department of Biological Research on the Red Blood Cells, INTS, INSERM UMR_S 1134, Université de Paris, Université de la Réunion, 75739 Paris, France
Interests: structural bioinformatics; bioinformatics; next-generation sequence; drug design; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Applied Mathematics, University of Granada, 18071 Granada, Spain
Interests: deep learning; statistical analysis in big data; machine learning algorithms; data mining; bioinformatics; computational biology
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear colleagues,

The 9th International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2022) will be held in Gran Canaria Spain, 27-30th June 2022. It will serve as a forum to discuss the latest ideas and realizations in the foundations, theory, models and applications of interdisciplinary and multidisciplinary research, encompassing the disciplines of computer science, mathematics, statistics, biology, bioinformatics, and biomedicine: https://iwbbio.ugr.es.

The current Special Issue solicits high-quality original research papers on any aspect of Bioinformatics, Biomedicine and Biomedical Engineering.

New computational techniques and methods in machine learning; data mining; data integration; genomics and evolution; next generation sequencing data; protein and RNA structure; protein function and proteomics; medical informatics and translational bioinformatics; computational systems biology; modelling and simulation; and their application in the life science domain, biomedicine and biomedical engineering are especially encouraged.

Dr. Francisco Ortuño
Prof. Alfredo Benso
Dr. Jean-Marc Schwartz
Prof. Dr. Alexandre G. de Brevern
Prof. Dr. Ignacio Rojas
Prof. Dr. Olga Valenzuela
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 169 KiB  
Editorial
Special Issue: New Advances in Bioinformatics and Biomedical Engineering Using Machine Learning Techniques, IWBBIO-2022
by Olga Valenzuela, Francisco Ortuño, Alfredo Benso, Jean-Marc Schwartz, Alexandre G. de Brevern and Ignacio Rojas
Genes 2023, 14(8), 1574; https://doi.org/10.3390/genes14081574 - 01 Aug 2023
Viewed by 832
Abstract
Bioinformatics is revolutionizing Biomedicine in the way we treat and diagnose pathologies related to biological manifestations resulting from variations or mutations of our DNA [...] Full article

Research

Jump to: Editorial

14 pages, 1100 KiB  
Article
Papillary Thyroid Carcinoma: A thorough Bioinformatic Analysis of Gene Expression and Clinical Data
by Iván Petrini, Rocío L. Cecchini, Marilina Mascaró, Ignacio Ponzoni and Jessica A. Carballido
Genes 2023, 14(6), 1250; https://doi.org/10.3390/genes14061250 - 11 Jun 2023
Cited by 2 | Viewed by 2033
Abstract
The likelihood of being diagnosed with thyroid cancer has increased in recent years; it is the fastest-expanding cancer in the United States and it has tripled in the last three decades. In particular, Papillary Thyroid Carcinoma (PTC) is the most common type of [...] Read more.
The likelihood of being diagnosed with thyroid cancer has increased in recent years; it is the fastest-expanding cancer in the United States and it has tripled in the last three decades. In particular, Papillary Thyroid Carcinoma (PTC) is the most common type of cancer affecting the thyroid. It is a slow-growing cancer and, thus, it can usually be cured. However, given the worrying increase in the diagnosis of this type of cancer, the discovery of new genetic markers for accurate treatment and prognostic is crucial. In the present study, the aim is to identify putative genes that may be specifically relevant in PTC through bioinformatic analysis of several gene expression public datasets and clinical information. Two datasets from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) dataset were studied. Statistics and machine learning methods were sequentially employed to retrieve a final small cluster of genes of interest: PTGFR, ZMAT3, GABRB2, and DPP6. Kaplan–Meier plots were employed to assess the expression levels regarding overall survival and relapse-free survival. Furthermore, a manual bibliographic search for each gene was carried out, and a Protein–Protein Interaction (PPI) network was built to verify existing associations among them, followed by a new enrichment analysis. The results revealed that all the genes are highly relevant in the context of thyroid cancer and, more particularly interesting, PTGFR and DPP6 have not yet been associated with the disease up to date, thus making them worthy of further investigation as to their relationship to PTC. Full article
Show Figures

Figure 1

11 pages, 398 KiB  
Article
Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data
by Akram Vasighizaker, Yash Trivedi and Luis Rueda
Genes 2023, 14(3), 596; https://doi.org/10.3390/genes14030596 - 26 Feb 2023
Cited by 1 | Viewed by 1592
Abstract
With the advances in high-throughput sequencing technology, an increasing amount of research in revealing heterogeneity among cells has been widely performed. Differences between individual cells’ functionality are determined based on the differences in the gene expression profiles. Although the observations indicate a great [...] Read more.
With the advances in high-throughput sequencing technology, an increasing amount of research in revealing heterogeneity among cells has been widely performed. Differences between individual cells’ functionality are determined based on the differences in the gene expression profiles. Although the observations indicate a great performance of clustering methods, manual annotation of the clusters of cells is a challenge yet to be addressed more scalable and faster. On the other hand, due to the lack of enough labelled datasets, just a few supervised techniques have been used in cell type identification, and they obtained more robust results compared to clustering methods. A recent study showed that a complementary step of feature selection helped support vector machine (SVM) to outperform other classifiers in different scenarios. In this article, we compare and evaluate the performance of two state-of-the-art supervised methods, XGBoost and SVM, with information gain as a feature selection method. The results of the experiments on three standard scRNA-seq datasets indicate that XGBoost automatically annotates cell types in a simpler and more scalable framework. Additionally, it sheds light on the potential use of boosting tree approaches combined with deep neural networks to capture underlying information of single-cell RNA-Seq data more effectively. It can be used to identify marker genes and other applications in biological studies. Full article
Show Figures

Figure 1

14 pages, 949 KiB  
Article
Study on Sperm-Cell Detection Using YOLOv5 Architecture with Labaled Dataset
by Michal Dobrovolny, Jakub Benes, Jaroslav Langer, Ondrej Krejcar and Ali Selamat
Genes 2023, 14(2), 451; https://doi.org/10.3390/genes14020451 - 09 Feb 2023
Cited by 3 | Viewed by 3032
Abstract
Infertility has recently emerged as a severe medical problem. The essential elements in male infertility are sperm morphology, sperm motility, and sperm density. In order to analyze sperm motility, density, and morphology, laboratory experts do a semen analysis. However, it is simple to [...] Read more.
Infertility has recently emerged as a severe medical problem. The essential elements in male infertility are sperm morphology, sperm motility, and sperm density. In order to analyze sperm motility, density, and morphology, laboratory experts do a semen analysis. However, it is simple to err when using a subjective interpretation based on laboratory observation. In this work, a computer-aided sperm count estimation approach is suggested to lessen the impact of experts in semen analysis. Object detection techniques concentrating on sperm motility estimate the number of active sperm in the semen. This study provides an overview of other techniques that we can compare. The Visem dataset from the Association for Computing Machinery was used to test the proposed strategy. We created a labelled dataset to prove that our network can detect sperms in images. The best not-super tuned result is mAP 72.15. Full article
Show Figures

Figure 1

14 pages, 2165 KiB  
Article
An Iterative Unsupervised Method for Gene Expression Differentiation
by Olga Georgieva
Genes 2023, 14(2), 412; https://doi.org/10.3390/genes14020412 - 04 Feb 2023
Cited by 1 | Viewed by 1151
Abstract
For several decades, intensive research for understanding gene activity and its role in organism’s lives is the research focus of scientists in different areas. A part of these investigations is the analysis of gene expression data for selecting differentially expressed genes. Methods that [...] Read more.
For several decades, intensive research for understanding gene activity and its role in organism’s lives is the research focus of scientists in different areas. A part of these investigations is the analysis of gene expression data for selecting differentially expressed genes. Methods that identify the interested genes have been proposed on statistical data analysis. The problem is that there is no good agreement among them, as different results are produced by distinct methods. By taking the advantage of the unsupervised data analysis, an iterative clustering procedure that finds differentially expressed genes shows promising results. In the present paper, a comparative study of the clustering methods applied for gene expression analysis is presented to explicate the choice of the clustering algorithm implemented in the method. An investigation of different distance measures is provided to reveal those that increase the efficiency of the method in finding the real data structure. Further, the method is improved by incorporating an additional aggregation measure based on the standard deviation of the expression levels. Its usage increases the gene distinction as a new amount of differentially expressed genes is found. The method is summarized in a detailed procedure. The significance of the method is proved by an analysis of two mice strain data sets. The differentially expressed genes defined by the proposed method are compared with those selected by the well-known statistical methods applied to the same data set. Full article
Show Figures

Figure 1

16 pages, 1521 KiB  
Article
Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
by Álvaro Torres-Martos, Mireia Bustos-Aibar, Alberto Ramírez-Mena, Sofía Cámara-Sánchez, Augusto Anguita-Ruiz, Rafael Alcalá, Concepción M. Aguilera and Jesús Alcalá-Fdez
Genes 2023, 14(2), 248; https://doi.org/10.3390/genes14020248 - 18 Jan 2023
Cited by 5 | Viewed by 3228
Abstract
The use of machine learning techniques for the construction of predictive models of disease outcomes (based on omics and other types of molecular data) has gained enormous relevance in the last few years in the biomedical field. Nonetheless, the virtuosity of omics studies [...] Read more.
The use of machine learning techniques for the construction of predictive models of disease outcomes (based on omics and other types of molecular data) has gained enormous relevance in the last few years in the biomedical field. Nonetheless, the virtuosity of omics studies and machine learning tools are subject to the proper application of algorithms as well as the appropriate pre-processing and management of input omics and molecular data. Currently, many of the available approaches that use machine learning on omics data for predictive purposes make mistakes in several of the following key steps: experimental design, feature selection, data pre-processing, and algorithm selection. For this reason, we propose the current work as a guideline on how to confront the main challenges inherent to multi-omics human data. As such, a series of best practices and recommendations are also presented for each of the steps defined. In particular, the main particularities of each omics data layer, the most suitable preprocessing approaches for each source, and a compilation of best practices and tips for the study of disease development prediction using machine learning are described. Using examples of real data, we show how to address the key problems mentioned in multi-omics research (e.g., biological heterogeneity, technical noise, high dimensionality, presence of missing values, and class imbalance). Finally, we define the proposals for model improvement based on the results found, which serve as the bases for future work. Full article
Show Figures

Figure 1

22 pages, 7618 KiB  
Article
Novel Ground-Up 3D Multicellular Simulators for Synthetic Biology CAD Integrating Stochastic Gillespie Simulations Benchmarked with Topologically Variable SBML Models
by Richard Oliver Matzko, Laurentiu Mierla and Savas Konur
Genes 2023, 14(1), 154; https://doi.org/10.3390/genes14010154 - 06 Jan 2023
Cited by 3 | Viewed by 1796
Abstract
The elevation of Synthetic Biology from single cells to multicellular simulations would be a significant scale-up. The spatiotemporal behavior of cellular populations has the potential to be prototyped in silico for computer assisted design through ergonomic interfaces. Such a platform would have great [...] Read more.
The elevation of Synthetic Biology from single cells to multicellular simulations would be a significant scale-up. The spatiotemporal behavior of cellular populations has the potential to be prototyped in silico for computer assisted design through ergonomic interfaces. Such a platform would have great practical potential across medicine, industry, research, education and accessible archiving in bioinformatics. Existing Synthetic Biology CAD systems are considered limited regarding population level behavior, and this work explored the in silico challenges posed from biological and computational perspectives. Retaining the connection to Synthetic Biology CAD, an extension of the Infobiotics Workbench Suite was considered, with potential for the integration of genetic regulatory models and/or chemical reaction networks through Next Generation Stochastic Simulator (NGSS) Gillespie algorithms. These were executed using SBML models generated by in-house SBML-Constructor over numerous topologies and benchmarked in association with multicellular simulation layers. Regarding multicellularity, two ground-up multicellular solutions were developed, including the use of Unreal Engine 4 contrasted with CPU multithreading and Blender visualization, resulting in a comparison of real-time versus batch-processed simulations. In conclusion, high-performance computing and client–server architectures could be considered for future works, along with the inclusion of numerous biologically and physically informed features, whilst still pursuing ergonomic solutions. Full article
Show Figures

Figure 1

26 pages, 2995 KiB  
Article
GAGAM v1.2: An Improvement on Peak Labeling and Genomic Annotated Gene Activity Matrix Construction
by Lorenzo Martini, Roberta Bardini, Alessandro Savino and Stefano Di Carlo
Genes 2023, 14(1), 115; https://doi.org/10.3390/genes14010115 - 30 Dec 2022
Cited by 5 | Viewed by 1759
Abstract
Single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) is rapidly becoming a powerful technology for assessing the epigenetic landscape of thousands of cells. However, the sparsity of the resulting data poses significant challenges to their interpretability and informativeness. Different computational methods are available, [...] Read more.
Single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) is rapidly becoming a powerful technology for assessing the epigenetic landscape of thousands of cells. However, the sparsity of the resulting data poses significant challenges to their interpretability and informativeness. Different computational methods are available, proposing ways to generate significant features from accessibility data and process them to obtain meaningful results. Foremost among them is the peak calling, which interprets the raw scATAC-seq data generating the peaks as features. However, scATAC-seq data are not trivially comparable with single-cell RNA sequencing (scRNA-seq) data, an increasingly pressing challenge since the necessity of multimodal experiments integration. For this reason, this study wants to improve the concept of the Gene Activity Matrix (GAM), which links the accessibility data to the genes, by proposing an improved version of the Genomic-Annotated Gene Activity Matrix (GAGAM) concept. Specifically, this paper presents GAGAM v1.2, a new and better version of GAGAM v1.0. GAGAM aims to label the peaks and link them to the genes through functional annotation of the whole genome. Using genes as features in scATAC-seq datasets makes different datasets comparable and allows linking gene accessibility and expression. This link is crucial for gene regulation understanding and fundamental for the increasing impact of multi-omics data. Results confirm that our method performs better than the previous GAMs and shows a preliminary comparison with scRNA-seq data. Full article
Show Figures

Figure 1

15 pages, 3089 KiB  
Article
A Framework for Comparison and Assessment of Synthetic RNA-Seq Data
by Felitsiya Shakola, Dean Palejev and Ivan Ivanov
Genes 2022, 13(12), 2362; https://doi.org/10.3390/genes13122362 - 14 Dec 2022
Cited by 2 | Viewed by 1701
Abstract
The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and [...] Read more.
The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest. Full article
Show Figures

Figure 1

9 pages, 350 KiB  
Article
Investigating the Diversity of Tuberculosis Spoligotypes with Dimensionality Reduction and Graph Theory
by Gaetan Senelle, Christophe Guyeux, Guislaine Refrégier and Christophe Sola
Genes 2022, 13(12), 2328; https://doi.org/10.3390/genes13122328 - 10 Dec 2022
Cited by 1 | Viewed by 1047
Abstract
The spoligotype is a graphical description of the CRISPR locus present in Mycobacterium tuberculosis, which has the particularity of having only 68 possible spacers. This spoligotype, which can be easily obtained either in vitro or in silico, allows to have a summary [...] Read more.
The spoligotype is a graphical description of the CRISPR locus present in Mycobacterium tuberculosis, which has the particularity of having only 68 possible spacers. This spoligotype, which can be easily obtained either in vitro or in silico, allows to have a summary information of lineage or even antibiotic resistance (when known to be associated to a particular cluster) at a lower cost. The objective of this article is to show that this representation is richer than it seems, and that it is under-exploited until now. We first recall an original way to represent these spoligotypes as points in the plane, allowing to highlight possible sub-lineages, particularities in the animal strains, etc. This graphical representation shows clusters and a skeleton in the form of a graph, which led us to see these spoligotypes as vertices of an unconnected directed graph. In this paper, we therefore propose to exploit in detail the description of the variety of spoligotypes using a graph, and we show to what extent such a description can be informative. Full article
Show Figures

Figure 1

Back to TopTop