Computational Methods for the Analysis of Genomic Data and Biological Processes (II)

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Technologies and Resources for Genetics".

Deadline for manuscript submissions: closed (20 March 2022) | Viewed by 23847

Special Issue Editors

Special Issue Information

Dear Colleagues,

In recent decades, new technologies have made remarkable progress in helping to explain complex biological systems. Rapid advances in genomic profiling techniques, such as microarrays or high-performance sequencing, have presented new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could be used to provide a complete view of a multitude of organisms.

As a result, it is necessary to develop new techniques and algorithms which can be used to analyze these data with reliability and efficiency.

The aim of this Special Issue is to bring together the latest advances in the field of computational methods for the analysis of gene expression data and, in particular, the modeling of biological processes. We welcome you to participate in this exciting II edition, a first edition from 2020 is available here: https://www.mdpi.com/journal/genes/special_issues/comput_genetics

We encourage researchers to share their original works in the field of computational analysis of gene expression data. Topics of primary interest include, but are not limited to, the following:

  1. Computational methods or machine learning approaches for modeling biological processes;
  2. Discovering genome–disease or genome–phenotype associations;
  3. Gene–gene interactions and gene–environment interactions for disease association analysis;
  4. New computational methods for gene expression data analysis;
  5. Machine learning approaches for modeling gene regulatory networks;
  6. Identification of expression patterns;
  7. Reviews of computational methods for gene expression data analysis.

Prof. Dr. Federico Divina
Prof. Dr. Francisco A. Gómez Vela
Prof. Dr. Miguel García-Torres
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Computational biology
  • Bioinformatics
  • Genomics
  • Gene expression
  • Gene regulation
  • Biomarker discovery
  • Gene network
  • Biomedical data analysis

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 4795 KiB  
Article
In Silico Analysis Identified Putative Pathogenic Missense nsSNPs in Human SLITRK1 Gene
by Muhammad Zeeshan Ali, Arshad Farid, Safeer Ahmad, Muhammad Muzammal, Mohammed Al Mohaini, Abdulkhaliq J. Alsalman, Maitham A. Al Hawaj, Yousef N. Alhashem, Abdulmonem A. Alsaleh, Eman M. Almusalami, Mahpara Maryam and Muzammil Ahmad Khan
Genes 2022, 13(4), 672; https://doi.org/10.3390/genes13040672 - 11 Apr 2022
Cited by 5 | Viewed by 2803
Abstract
Human DNA contains several variations, which can affect the structure and normal functioning of a protein. These variations could be single nucleotide polymorphisms (SNPs) or insertion-deletions (InDels). SNPs, as opposed to InDels, are more commonly present in DNA and may cause genetic disorders. [...] Read more.
Human DNA contains several variations, which can affect the structure and normal functioning of a protein. These variations could be single nucleotide polymorphisms (SNPs) or insertion-deletions (InDels). SNPs, as opposed to InDels, are more commonly present in DNA and may cause genetic disorders. In the current study, several bioinformatic tools were used to prioritize the pathogenic variants in the SLITRK1 gene. Out of all of the variants, 16 were commonly predicted to be pathogenic by these tools. All the variants had very low frequency, i.e., <0.0001 in the global population. The secondary structure of all filtered variants was predicted, but no structural change was observed at the site of variation in any variant. Protein stability analysis of these variants was then performed, which determined a decrease in protein stability of 10 of the variants. Amino acid conservation analysis revealed that all the amino acids were highly conserved, indicating their structural and functional importance. Protein 3D structure of wildtype SLITRK1 and all of its variants was predicted using I-TASSER, and the effect of variation on 3D structure of the protein was observed using the Missense3D tool, which presented the probable structural loss in three variants, i.e., Asn529Lys, Leu496Pro and Leu94Phe. The wildtype SLITRK1 protein and these three variants were independently docked with their close interactor protein PTPRD, and remarkable differences were observed in the docking sites of normal and variants, which will ultimately affect the functional activity of the SLITRK1 protein. Previous studies have shown that mutations in SLITRK1 are involved in Tourette syndrome. The present study may assist a molecular geneticist in interpreting the variant pathogenicity in research as well as diagnostic setup. Full article
Show Figures

Figure 1

18 pages, 4634 KiB  
Article
Comparative Study of Classification Algorithms for Various DNA Microarray Data
by Jingeun Kim, Yourim Yoon, Hye-Jin Park and Yong-Hyuk Kim
Genes 2022, 13(3), 494; https://doi.org/10.3390/genes13030494 - 11 Mar 2022
Cited by 7 | Viewed by 2806
Abstract
Microarrays are applications of electrical engineering and technology in biology that allow simultaneous measurement of expression of numerous genes, and they can be used to analyze specific diseases. This study undertakes classification analyses of various microarrays to compare the performances of classification algorithms [...] Read more.
Microarrays are applications of electrical engineering and technology in biology that allow simultaneous measurement of expression of numerous genes, and they can be used to analyze specific diseases. This study undertakes classification analyses of various microarrays to compare the performances of classification algorithms over different data traits. The datasets were classified into test and control groups based on five utilized machine learning methods, including MultiLayer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and k-Nearest Neighbors (KNN), and the resulting accuracies were compared. k-fold cross-validation was used in evaluating the performance and the result was analyzed by comparing the performances of the five machine learning methods. Through the experiments, it was observed that the two tree-based methods, DT and RF, showed similar trends in results and the remaining three methods, MLP, SVM, and DT, showed similar trends. DT and RF generally showed worse performance than other methods except for one dataset. This suggests that, for the effective classification of microarray data, selecting a classification algorithm that is suitable for data traits is crucial to ensure optimum performance. Full article
Show Figures

Figure 1

13 pages, 10687 KiB  
Article
BRCA Variations Risk Assessment in Breast Cancers Using Different Artificial Intelligence Models
by Niyazi Senturk, Gulten Tuncel, Berkcan Dogan, Lamiya Aliyeva, Mehmet Sait Dundar, Sebnem Ozemri Sag, Gamze Mocan, Sehime Gulsun Temel, Munis Dundar and Mahmut Cerkez Ergoren
Genes 2021, 12(11), 1774; https://doi.org/10.3390/genes12111774 - 09 Nov 2021
Cited by 3 | Viewed by 2345
Abstract
Artificial intelligence provides modelling on machines by simulating the human brain using learning and decision-making abilities. Early diagnosis is highly effective in reducing mortality in cancer. This study aimed to combine cancer-associated risk factors including genetic variations and design an artificial intelligence system [...] Read more.
Artificial intelligence provides modelling on machines by simulating the human brain using learning and decision-making abilities. Early diagnosis is highly effective in reducing mortality in cancer. This study aimed to combine cancer-associated risk factors including genetic variations and design an artificial intelligence system for risk assessment. Data from a total of 268 breast cancer patients have been analysed for 16 different risk factors including genetic variant classifications. In total, 61 BRCA1, 128 BRCA2 and 11 both BRCA1 and BRCA2 genes associated breast cancer patients’ data were used to train the system using Mamdani’s Fuzzy Inference Method and Feed-Forward Neural Network Method as the model softwares on MATLAB. Sixteen different tests were performed on twelve different subjects who had not been introduced to the system before. The rates for neural network were 99.9% for training success, 99.6% for validation success and 99.7% for test success. Despite neural network’s overall success was slightly higher than fuzzy logic accuracy, the results from developed systems were similar (99.9% and 95.5%, respectively). The developed models make predictions from a wider perspective using more risk factors including genetic variation data compared with similar studies in the literature. Overall, this artificial intelligence models present promising results for BRCA variations’ risk assessment in breast cancers as well as a unique tool for personalized medicine software. Full article
Show Figures

Figure 1

26 pages, 5882 KiB  
Article
DNN-m6A: A Cross-Species Method for Identifying RNA N6-methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion
by Lu Zhang, Xinyi Qin, Min Liu, Ziwei Xu and Guangzhong Liu
Genes 2021, 12(3), 354; https://doi.org/10.3390/genes12030354 - 28 Feb 2021
Cited by 18 | Viewed by 2400
Abstract
As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As [...] Read more.
As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58–83.38% and an area under the curve (AUC) of 81.39–91.04%. Furthermore, the independent datasets achieved an ACC of 72.95–83.04% and an AUC of 80.79–91.09%, which shows an excellent generalization ability of our proposed method. Full article
Show Figures

Figure 1

12 pages, 1810 KiB  
Article
Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model
by Zhenqiu Liu
Genes 2021, 12(2), 311; https://doi.org/10.3390/genes12020311 - 22 Feb 2021
Cited by 11 | Viewed by 3021
Abstract
Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at [...] Read more.
Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data. Full article
Show Figures

Figure 1

10 pages, 405 KiB  
Article
4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network
by Zeeshan Abbas, Hilal Tayara and Kil To Chong
Genes 2021, 12(2), 296; https://doi.org/10.3390/genes12020296 - 20 Feb 2021
Cited by 19 | Viewed by 2639
Abstract
Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several [...] Read more.
Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme—one-hot encoding—we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics. Full article
Show Figures

Figure 1

11 pages, 1718 KiB  
Article
pcPromoter-CNN: A CNN-Based Prediction and Classification of Promoters
by Muhammad Shujaat, Abdul Wahab, Hilal Tayara and Kil To Chong
Genes 2020, 11(12), 1529; https://doi.org/10.3390/genes11121529 - 21 Dec 2020
Cited by 34 | Viewed by 4343
Abstract
A promoter is a small region within the DNA structure that has an important role in initiating transcription of a specific gene in the genome. Different types of promoters are recognized by their different functions. Due to the importance of promoter functions, computational [...] Read more.
A promoter is a small region within the DNA structure that has an important role in initiating transcription of a specific gene in the genome. Different types of promoters are recognized by their different functions. Due to the importance of promoter functions, computational tools for the prediction and classification of a promoter are highly desired. Promoters resemble each other; therefore, their precise classification is an important challenge. In this study, we propose a convolutional neural network (CNN)-based tool, the pcPromoter-CNN, for application in the prediction of promotors and their classification into subclasses σ70, σ54, σ38, σ32, σ28 and σ24. This CNN-based tool uses a one-hot encoding scheme for promoter classification. The tools architecture was trained and tested on a benchmark dataset. To evaluate its classification performance, we used four evaluation metrics. The model exhibited notable improvement over that of existing state-of-the-art tools. Full article
Show Figures

Figure 1

16 pages, 2922 KiB  
Article
Hybrid of Restricted and Penalized Maximum Likelihood Method for Efficient Genome-Wide Association Study
by Wenlong Ren, Zhikai Liang, Shu He and Jing Xiao
Genes 2020, 11(11), 1286; https://doi.org/10.3390/genes11111286 - 29 Oct 2020
Cited by 1 | Viewed by 2410
Abstract
In genome-wide association studies, linear mixed models (LMMs) have been widely used to explore the molecular mechanism of complex traits. However, typical association approaches suffer from several important drawbacks: estimation of variance components in LMMs with large scale individuals is computationally slow; single-locus [...] Read more.
In genome-wide association studies, linear mixed models (LMMs) have been widely used to explore the molecular mechanism of complex traits. However, typical association approaches suffer from several important drawbacks: estimation of variance components in LMMs with large scale individuals is computationally slow; single-locus model is unsatisfactory to handle complex confounding and causes loss of statistical power. To address these issues, we propose an efficient two-stage method based on hybrid of restricted and penalized maximum likelihood, named HRePML. Firstly, we performed restricted maximum likelihood (REML) on single-locus LMM to remove unrelated markers, where spectral decomposition on covariance matrix was used to fast estimate variance components. Secondly, we carried out penalized maximum likelihood (PML) on multi-locus LMM for markers with reasonably large effects. To validate the effectiveness of HRePML, we conducted a series of simulation studies and real data analyses. As a result, our method always had the highest average statistical power compared with multi-locus mixed-model (MLMM), fixed and random model circulating probability unification (FarmCPU), and genome-wide efficient mixed model association (GEMMA). More importantly, HRePML can provide higher accuracy estimation of marker effects. HRePML also identifies 41 previous reported genes associated with development traits in Arabidopsis, which is more than was detected by the other methods. Full article
Show Figures

Graphical abstract

Back to TopTop