Topical Collection "Feature Papers in Bioinformatics"

A topical collection in Genes (ISSN 2073-4425). This collection belongs to the section "Bioinformatics".

Viewed by 14897

Editor

Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
Interests: computational molecular biology; bioinformatics; genomics; epigenetics; data mining

Topical Collection Information

Dear Colleagues,

This Special Issue, “Feature Papers in Bioinformatics”, aims to collect high-quality research articles, review articles, and communications on advances in the research area of bioinformatics. Since the aim of this topical collection is to illustrate, through selected works, frontier research in the field of bioinformatics, we encourage Editorial Board Members of the Section “Bioinformatics” to contribute feature papers reflecting the latest progress in their research field or to invite relevant senior experts and colleagues to make contributions to this Special Issue. We aim to represent our Section as an attractive open-access publishing platform for bioinformatics. Topics include but are not limited to:

  • Molecular sequence analysis
  • Sequencing and genotyping technologies
  • Regulation and epigenomics
  • Transcriptomics, including single-cell
  • Metagenomics
  • Population and statistical genetics
  • Evolutionary, compressive, and comparative genomics
  • Structure and function of non-coding RNAs
  • Computational proteomics and proteogenomics
  • Protein structure and function
  • Biological networks
  • Computational systems biology
  • Privacy of biomedical data
  • Bioimaging

Prof. Dr. Stefano Lonardi
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the collection website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • sequence analysis
  • sequencing technologies
  • genotyping technologies
  • gene regulation
  • epigenomics
  • epigenetics
  • transcriptomics
  • single-cell
  • metagenomics
  • population genetics
  • statistical genetics
  • comparative genomics
  • non-coding RNAs
  • proteomics
  • proteogenomics
  • systems biology
  • privacy of biomedical data
  • bioimaging

Published Papers (12 papers)

2023

Jump to: 2022

Article
PMIDigest: Interactive Review of Large Collections of PubMed Entries to Distill Relevant Information
Genes 2023, 14(4), 942; https://doi.org/10.3390/genes14040942 - 19 Apr 2023
Cited by 1 | Viewed by 870
Abstract
Scientific knowledge is being accumulated in the biomedical literature at an unprecedented pace. The most widely used database with biomedicine-related article abstracts, PubMed, currently contains more than 36 million entries. Users performing searches in this database for a subject of interest face thousands [...] Read more.
Scientific knowledge is being accumulated in the biomedical literature at an unprecedented pace. The most widely used database with biomedicine-related article abstracts, PubMed, currently contains more than 36 million entries. Users performing searches in this database for a subject of interest face thousands of entries (articles) that are difficult to process manually. In this work, we present an interactive tool for automatically digesting large sets of PubMed articles: PMIDigest (PubMed IDs digester). The system allows for classification/sorting of articles according to different criteria, including the type of article and different citation-related figures. It also calculates the distribution of MeSH (medical subject headings) terms for categories of interest, providing in a picture of the themes addressed in the set. These MeSH terms are highlighted in the article abstracts in different colors depending on the category. An interactive representation of the interarticle citation network is also presented in order to easily locate article “clusters” related to particular subjects, as well as their corresponding “hub” articles. In addition to PubMed articles, the system can also process a set of Scopus or Web of Science entries. In summary, with this system, the user can have a “bird’s eye view” of a large set of articles and their main thematic tendencies and obtain additional information not evident in a plain list of abstracts. Full article
Show Figures

Figure 1

Review
Computational Biology Helps Understand How Polyploid Giant Cancer Cells Drive Tumor Success
Genes 2023, 14(4), 801; https://doi.org/10.3390/genes14040801 - 26 Mar 2023
Cited by 1 | Viewed by 1886
Abstract
Precision and organization govern the cell cycle, ensuring normal proliferation. However, some cells may undergo abnormal cell divisions (neosis) or variations of mitotic cycles (endopolyploidy). Consequently, the formation of polyploid giant cancer cells (PGCCs), critical for tumor survival, resistance, and immortalization, can occur. [...] Read more.
Precision and organization govern the cell cycle, ensuring normal proliferation. However, some cells may undergo abnormal cell divisions (neosis) or variations of mitotic cycles (endopolyploidy). Consequently, the formation of polyploid giant cancer cells (PGCCs), critical for tumor survival, resistance, and immortalization, can occur. Newly formed cells end up accessing numerous multicellular and unicellular programs that enable metastasis, drug resistance, tumor recurrence, and self-renewal or diverse clone formation. An integrative literature review was carried out, searching articles in several sites, including: PUBMED, NCBI-PMC, and Google Academic, published in English, indexed in referenced databases and without a publication time filter, but prioritizing articles from the last 3 years, to answer the following questions: (i) “What is the current knowledge about polyploidy in tumors?”; (ii) “What are the applications of computational studies for the understanding of cancer polyploidy?”; and (iii) “How do PGCCs contribute to tumorigenesis?” Full article
Show Figures

Figure 1

Article
Understanding Drug Resistance of Wild-Type and L38HL Insertion Mutant of HIV-1 C Protease to Saquinavir
Genes 2023, 14(2), 533; https://doi.org/10.3390/genes14020533 - 20 Feb 2023
Viewed by 1093
Abstract
Acquired immunodeficiency syndrome (AIDS) is one of the most challenging infectious diseases to treat on a global scale. Understanding the mechanisms underlying the development of drug resistance is necessary for novel therapeutics. HIV subtype C is known to harbor mutations at critical positions [...] Read more.
Acquired immunodeficiency syndrome (AIDS) is one of the most challenging infectious diseases to treat on a global scale. Understanding the mechanisms underlying the development of drug resistance is necessary for novel therapeutics. HIV subtype C is known to harbor mutations at critical positions of HIV aspartic protease compared to HIV subtype B, which affects the binding affinity. Recently, a novel double-insertion mutation at codon 38 (L38HL) was characterized in HIV subtype C protease, whose effects on the interaction with protease inhibitors are hitherto unknown. In this study, the potential of L38HL double-insertion in HIV subtype C protease to induce a drug resistance phenotype towards the protease inhibitor, Saquinavir (SQV), was probed using various computational techniques, such as molecular dynamics simulations, binding free energy calculations, local conformational changes and principal component analysis. The results indicate that the L38HL mutation exhibits an increase in flexibility at the hinge and flap regions with a decrease in the binding affinity of SQV in comparison with wild-type HIV protease C. Further, we observed a wide opening at the binding site in the L38HL variant due to an alteration in flap dynamics, leading to a decrease in interactions with the binding site of the mutant protease. It is supported by an altered direction of motion of flap residues in the L38HL variant compared with the wild-type. These results provide deep insights into understanding the potential drug resistance phenotype in infected individuals. Full article
Show Figures

Figure 1

Technical Note
DraculR: A Web-Based Application for In Silico Haemolysis Detection in High-Throughput microRNA Sequencing Data
Genes 2023, 14(2), 448; https://doi.org/10.3390/genes14020448 - 09 Feb 2023
Viewed by 784
Abstract
The search for novel microRNA (miRNA) biomarkers in plasma is hampered by haemolysis, the lysis and subsequent release of red blood cell contents, including miRNAs, into surrounding fluid. The biomarker potential of miRNAs comes in part from their multicompartment origin and the long-lived [...] Read more.
The search for novel microRNA (miRNA) biomarkers in plasma is hampered by haemolysis, the lysis and subsequent release of red blood cell contents, including miRNAs, into surrounding fluid. The biomarker potential of miRNAs comes in part from their multicompartment origin and the long-lived nature of miRNA transcripts in plasma, giving researchers a functional window for tissues that are otherwise difficult or disadvantageous to sample. The inclusion of red-blood-cell-derived miRNA transcripts in downstream analysis introduces a source of error that is difficult to identify posthoc and may lead to spurious results. Where access to a physical specimen is not possible, our tool will provide an in silico approach to haemolysis prediction. We present DraculR, an interactive Shiny/R application that enables a user to upload miRNA expression data from a short-read sequencing of human plasma as a raw read counts table and interactively calculate a metric that indicates the degree of haemolysis contamination. The code, DraculR web tool and its tutorial are freely available as detailed herein. Full article
Show Figures

Figure 1

Review
Networks as Biomarkers: Uses and Purposes
Genes 2023, 14(2), 429; https://doi.org/10.3390/genes14020429 - 08 Feb 2023
Cited by 1 | Viewed by 979
Abstract
Networks-based approaches are often used to analyze gene expression data or protein–protein interactions but are not usually applied to study the relationships between different biomarkers. Given the clinical need for more comprehensive and integrative biomarkers that can help to identify personalized therapies, the [...] Read more.
Networks-based approaches are often used to analyze gene expression data or protein–protein interactions but are not usually applied to study the relationships between different biomarkers. Given the clinical need for more comprehensive and integrative biomarkers that can help to identify personalized therapies, the integration of biomarkers of different natures is an emerging trend in the literature. Network analysis can be used to analyze the relationships between different features of a disease; nodes can be disease-related phenotypes, gene expression, mutational events, protein quantification, imaging-derived features and more. Since different biomarkers can exert causal effects between them, describing such interrelationships can be used to better understand the underlying mechanisms of complex diseases. Networks as biomarkers are not yet commonly used, despite being proven to lead to interesting results. Here, we discuss in which ways they have been used to provide novel insights into disease susceptibility, disease development and severity. Full article
Show Figures

Figure 1

Article
An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF
Genes 2023, 14(2), 421; https://doi.org/10.3390/genes14020421 - 06 Feb 2023
Cited by 2 | Viewed by 835
Abstract
Gene families, which are parts of a genome’s information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation [...] Read more.
Gene families, which are parts of a genome’s information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method’s categorization is superior to state-of-the-art feature selection approaches. Full article
Show Figures

Figure 1

Review
Translational Bioinformatics Applied to the Study of Complex Diseases
Genes 2023, 14(2), 419; https://doi.org/10.3390/genes14020419 - 06 Feb 2023
Viewed by 1648
Abstract
Translational Bioinformatics (TBI) is defined as the union of translational medicine and bioinformatics. It emerges as a major advance in science and technology by covering everything, from the most basic database discoveries, to the development of algorithms for molecular and cellular analysis, as [...] Read more.
Translational Bioinformatics (TBI) is defined as the union of translational medicine and bioinformatics. It emerges as a major advance in science and technology by covering everything, from the most basic database discoveries, to the development of algorithms for molecular and cellular analysis, as well as their clinical applications. This technology makes it possible to access the knowledge of scientific evidence and apply it to clinical practice. This manuscript aims to highlight the role of TBI in the study of complex diseases, as well as its application to the understanding and treatment of cancer. An integrative literature review was carried out, obtaining articles through several websites, among them: PUBMED, Science Direct, NCBI-PMC, Scientific Electronic Library Online (SciELO), and Google Academic, published in English, Spanish, and Portuguese, indexed in the referred databases and answering the following guiding question: “How does TBI provide a scientific understanding of complex diseases?” An additional effort is aimed at the dissemination, inclusion, and perpetuation of TBI knowledge from the academic environment to society, helping the study, understanding, and elucidating of complex disease mechanics and their treatment. Full article
Show Figures

Figure 1

Article
Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier
Genes 2023, 14(2), 387; https://doi.org/10.3390/genes14020387 - 01 Feb 2023
Viewed by 1128
Abstract
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model [...] Read more.
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier’s performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses. Full article
Show Figures

Figure 1

Article
Reconstruction of Single-Cell Trajectories Using Stochastic Tree Search
Genes 2023, 14(2), 318; https://doi.org/10.3390/genes14020318 - 26 Jan 2023
Viewed by 789
Abstract
The recent advancement in single-cell RNA sequencing technologies enables the understanding of dynamic cellular processes at the single-cell level. Using trajectory inference methods, pseudotimes can be estimated based on reconstructed single-cell trajectories which can be further used to gain biological knowledge. Existing methods [...] Read more.
The recent advancement in single-cell RNA sequencing technologies enables the understanding of dynamic cellular processes at the single-cell level. Using trajectory inference methods, pseudotimes can be estimated based on reconstructed single-cell trajectories which can be further used to gain biological knowledge. Existing methods for modeling cell trajectories, such as minimal spanning tree or k-nearest neighbor graph, often lead to locally optimal solutions. In this paper, we propose a penalized likelihood-based framework and introduce a stochastic tree search (STS) algorithm aiming at the global solution in a large and non-convex tree space. Both simulated and real data experiments show that our approach is more accurate and robust than other existing methods in terms of cell ordering and pseudotime estimation. Full article
Show Figures

Figure 1

Article
Identification of TRPC6 as a Novel Diagnostic Biomarker of PM-Induced Chronic Obstructive Pulmonary Disease Using Machine Learning Models
Genes 2023, 14(2), 284; https://doi.org/10.3390/genes14020284 - 21 Jan 2023
Cited by 2 | Viewed by 1202
Abstract
Chronic obstructive pulmonary disease (COPD) was the third most prevalent cause of mortality worldwide in 2010; it results from a progressive and fatal deterioration of lung function because of cigarette smoking and particulate matter (PM). Therefore, it is important to identify molecular biomarkers [...] Read more.
Chronic obstructive pulmonary disease (COPD) was the third most prevalent cause of mortality worldwide in 2010; it results from a progressive and fatal deterioration of lung function because of cigarette smoking and particulate matter (PM). Therefore, it is important to identify molecular biomarkers that can diagnose the COPD phenotype to plan therapeutic efficacy. To identify potential novel biomarkers of COPD, we first obtained COPD and the normal lung tissue gene expression dataset GSE151052 from the NCBI Gene Expression Omnibus (GEO). A total of 250 differentially expressed genes (DEGs) were investigated and analyzed using GEO2R, gene ontology (GO) functional annotation, and Kyoto Encyclopedia of Genes and Genomes (KEGG) identification. The GEO2R analysis revealed that TRPC6 was the sixth most highly expressed gene in patients with COPD. The GO analysis indicated that the upregulated DEGs were mainly concentrated in the plasma membrane, transcription, and DNA binding. The KEGG pathway analysis indicated that the upregulated DEGs were mainly involved in pathways related to cancer and axon guidance. TRPC6, one of the most abundant genes among the top 10 differentially expressed total RNAs (fold change ≥ 1.5) between the COPD and normal groups, was selected as a novel COPD biomarker based on the results of the GEO dataset and analysis using machine learning models. The upregulation of TRPC6 was verified in PM-stimulated RAW264.7 cells, which mimicked COPD conditions, compared to untreated RAW264.7 cells by a quantitative reverse transcription polymerase chain reaction. In conclusion, our study suggests that TRPC6 can be regarded as a potential novel biomarker for COPD pathogenesis. Full article
Show Figures

Figure 1

Article
Client Applications and Server-Side Docker for Management of RNASeq and/or VariantSeq Workflows and Pipelines of the GPRO Suite
Genes 2023, 14(2), 267; https://doi.org/10.3390/genes14020267 - 19 Jan 2023
Viewed by 1545
Abstract
The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called [...] Read more.
The GPRO suite is an in-progress bioinformatic project for -omics data analysis. As part of the continued growth of this project, we introduce a client- and server-side solution for comparative transcriptomics and analysis of variants. The client-side consists of two Java applications called “RNASeq” and “VariantSeq” to manage pipelines and workflows based on the most common command line interface tools for RNA-seq and Variant-seq analysis, respectively. As such, “RNASeqandVariantSeq” are coupled with a Linux server infrastructure (named GPRO Server-Side) that hosts all dependencies of each application (scripts, databases, and command line interface software). Implementation of the Server-Side requires a Linux operating system, PHP, SQL, Python, bash scripting, and third-party software. The GPRO Server-Side can be installed, via a Docker container, in the user’s PC under any operating system or on remote servers, as a cloud solution. “RNASeq” and “VariantSeq” are both available as desktop (RCP compilation) and web (RAP compilation) applications. Each application has two execution modes: a step-by-step mode enables each step of the workflow to be executed independently, and a pipeline mode allows all steps to be run sequentially. “RNASeq” and “VariantSeq” also feature an experimental, online support system called GENIE that consists of a virtual (chatbot) assistant and a pipeline jobs panel coupled with an expert system. The chatbot can troubleshoot issues with the usage of each tool, the pipeline jobs panel provides information about the status of each computational job executed in the GPRO Server-Side, while the expert system provides the user with a potential recommendation to identify or fix failed analyses. Our solution is a ready-to-use topic specific platform that combines the user-friendliness, robustness, and security of desktop software, with the efficiency of cloud/web applications to manage pipelines and workflows based on command line interface software. Full article
Show Figures

Figure 1

2022

Jump to: 2023

Article
Genome-Wide Identification and Analysis of the MADS-Box Gene Family in Almond Reveal Its Expression Features in Different Flowering Periods
Genes 2022, 13(10), 1764; https://doi.org/10.3390/genes13101764 - 29 Sep 2022
Viewed by 1272
Abstract
The MADS-box gene family is an important family of transcription factors involved in multiple processes, such as plant growth and development, stress, and in particular, flowering time and floral organ development. Almonds are the best-selling nuts in the international fruit trade, accounting for [...] Read more.
The MADS-box gene family is an important family of transcription factors involved in multiple processes, such as plant growth and development, stress, and in particular, flowering time and floral organ development. Almonds are the best-selling nuts in the international fruit trade, accounting for more than 50% of the world’s dried fruit trade, and one of the main economic fruit trees in Kashgar, Xinjiang. In addition, almonds contain a variety of nutrients, such as protein and dietary fiber, which can supplement nutrients for people. They also have the functions of nourishing the yin and kidneys, improving eyesight, and strengthening the brain, and they can be applied to various diseases. However, there is no report on the MADS-box gene family in almond (Prunus dulcis). In this study, a total of 67 PdMADS genes distributed across 8 chromosomes were identified from the genome of almond ‘Wanfeng’. The PdMADS members were divided into five subgroups—Mα, Mβ, Mγ, Mδ, and MIKC—and the members in each subgroup had conserved motif types and exon and intron numbers. The number of exons of PdMADS members ranged from 1 to 20, and the number of introns ranged from 0 to 19. The number of exons and introns of different subfamily members varied greatly. The results of gene duplication analysis showed that the PdMADS members had 16 pairs of segmental duplications and 9 pairs of tandem duplications, so we further explored the relationship between the MADS-box gene members in almond and those in Arabidopsis thaliana, Oryza sativa, Malus domestica, and Prunus persica based on colinear genes and evolutionary selection pressure. The results of the cis-acting elements showed that the PdMADS members were extensively involved in a variety of processes, such as almond growth and development, hormone regulation, and stress response. In addition, the expression patterns of PdMADS members across six floral transcriptome samples from two almond cultivars, ‘Wanfeng’ and ‘Nonpareil’, had significant expression differences. Subsequently, the fluorescence quantitative expression levels of the 15 PdMADS genes were highly similar to the transcriptome expression patterns, and the gene expression levels increased in the samples at different flowering stages, indicating that the two almond cultivars expressed different PdMADS genes during the flowering process. It is worth noting that the difference in flowering time between ‘Wanfeng’ and ‘Nonpareil’ may be caused by the different expression activities of PdMADS47 and PdMADS16 during the dormancy period, resulting in different processes of vernalization. We identified a total of 13,515 target genes in the genome based on the MIKC DNA-binding sites. The GO and KEGG enrichment results showed that these target genes play important roles in protein function and multiple pathways. In summary, we conducted bioinformatics and expression pattern studies on the PdMADS gene family and investigated six flowering samples from two almond cultivars, the early-flowering ‘Wanfeng’ and late-flowering ‘Nonpareil’, for quantitative expression level identification. These findings lay a foundation for future in-depth studies on the mechanism of PdMADS gene regulation during flowering in different almond cultivars. Full article
Show Figures

Figure 1

Back to TopTop