Genes

Research

14 pages, 4795 KiB

Open AccessArticle

In Silico Analysis Identified Putative Pathogenic Missense nsSNPs in Human SLITRK1 Gene

by Muhammad Zeeshan Ali, Arshad Farid, Safeer Ahmad, Muhammad Muzammal, Mohammed Al Mohaini, Abdulkhaliq J. Alsalman, Maitham A. Al Hawaj, Yousef N. Alhashem, Abdulmonem A. Alsaleh, Eman M. Almusalami, Mahpara Maryam and Muzammil Ahmad Khan

Genes 2022, 13(4), 672; https://doi.org/10.3390/genes13040672 - 11 Apr 2022

Cited by 5 | Viewed by 2803

Abstract

Human DNA contains several variations, which can affect the structure and normal functioning of a protein. These variations could be single nucleotide polymorphisms (SNPs) or insertion-deletions (InDels). SNPs, as opposed to InDels, are more commonly present in DNA and may cause genetic disorders. [...] Read more.

Human DNA contains several variations, which can affect the structure and normal functioning of a protein. These variations could be single nucleotide polymorphisms (SNPs) or insertion-deletions (InDels). SNPs, as opposed to InDels, are more commonly present in DNA and may cause genetic disorders. In the current study, several bioinformatic tools were used to prioritize the pathogenic variants in the SLITRK1 gene. Out of all of the variants, 16 were commonly predicted to be pathogenic by these tools. All the variants had very low frequency, i.e., <0.0001 in the global population. The secondary structure of all filtered variants was predicted, but no structural change was observed at the site of variation in any variant. Protein stability analysis of these variants was then performed, which determined a decrease in protein stability of 10 of the variants. Amino acid conservation analysis revealed that all the amino acids were highly conserved, indicating their structural and functional importance. Protein 3D structure of wildtype SLITRK1 and all of its variants was predicted using I-TASSER, and the effect of variation on 3D structure of the protein was observed using the Missense3D tool, which presented the probable structural loss in three variants, i.e., Asn529Lys, Leu496Pro and Leu94Phe. The wildtype SLITRK1 protein and these three variants were independently docked with their close interactor protein PTPRD, and remarkable differences were observed in the docking sites of normal and variants, which will ultimately affect the functional activity of the SLITRK1 protein. Previous studies have shown that mutations in SLITRK1 are involved in Tourette syndrome. The present study may assist a molecular geneticist in interpreting the variant pathogenicity in research as well as diagnostic setup. Full article

(This article belongs to the Special Issue Computational Methods for the Analysis of Genomic Data and Biological Processes (II))

► Show Figures

Figure 1

18 pages, 4634 KiB

Open AccessArticle

Comparative Study of Classification Algorithms for Various DNA Microarray Data

by Jingeun Kim, Yourim Yoon, Hye-Jin Park and Yong-Hyuk Kim

Genes 2022, 13(3), 494; https://doi.org/10.3390/genes13030494 - 11 Mar 2022

Cited by 7 | Viewed by 2806

Abstract

Microarrays are applications of electrical engineering and technology in biology that allow simultaneous measurement of expression of numerous genes, and they can be used to analyze specific diseases. This study undertakes classification analyses of various microarrays to compare the performances of classification algorithms [...] Read more.

Microarrays are applications of electrical engineering and technology in biology that allow simultaneous measurement of expression of numerous genes, and they can be used to analyze specific diseases. This study undertakes classification analyses of various microarrays to compare the performances of classification algorithms over different data traits. The datasets were classified into test and control groups based on five utilized machine learning methods, including MultiLayer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and k-Nearest Neighbors (KNN), and the resulting accuracies were compared. k-fold cross-validation was used in evaluating the performance and the result was analyzed by comparing the performances of the five machine learning methods. Through the experiments, it was observed that the two tree-based methods, DT and RF, showed similar trends in results and the remaining three methods, MLP, SVM, and DT, showed similar trends. DT and RF generally showed worse performance than other methods except for one dataset. This suggests that, for the effective classification of microarray data, selecting a classification algorithm that is suitable for data traits is crucial to ensure optimum performance. Full article

(This article belongs to the Special Issue Computational Methods for the Analysis of Genomic Data and Biological Processes (II))

► Show Figures

Figure 1

13 pages, 10687 KiB

Open AccessArticle

BRCA Variations Risk Assessment in Breast Cancers Using Different Artificial Intelligence Models

by Niyazi Senturk, Gulten Tuncel, Berkcan Dogan, Lamiya Aliyeva, Mehmet Sait Dundar, Sebnem Ozemri Sag, Gamze Mocan, Sehime Gulsun Temel, Munis Dundar and Mahmut Cerkez Ergoren

Genes 2021, 12(11), 1774; https://doi.org/10.3390/genes12111774 - 09 Nov 2021

Cited by 3 | Viewed by 2345

Abstract

Artificial intelligence provides modelling on machines by simulating the human brain using learning and decision-making abilities. Early diagnosis is highly effective in reducing mortality in cancer. This study aimed to combine cancer-associated risk factors including genetic variations and design an artificial intelligence system [...] Read more.

Artificial intelligence provides modelling on machines by simulating the human brain using learning and decision-making abilities. Early diagnosis is highly effective in reducing mortality in cancer. This study aimed to combine cancer-associated risk factors including genetic variations and design an artificial intelligence system for risk assessment. Data from a total of 268 breast cancer patients have been analysed for 16 different risk factors including genetic variant classifications. In total, 61 BRCA1, 128 BRCA2 and 11 both BRCA1 and BRCA2 genes associated breast cancer patients’ data were used to train the system using Mamdani’s Fuzzy Inference Method and Feed-Forward Neural Network Method as the model softwares on MATLAB. Sixteen different tests were performed on twelve different subjects who had not been introduced to the system before. The rates for neural network were 99.9% for training success, 99.6% for validation success and 99.7% for test success. Despite neural network’s overall success was slightly higher than fuzzy logic accuracy, the results from developed systems were similar (99.9% and 95.5%, respectively). The developed models make predictions from a wider perspective using more risk factors including genetic variation data compared with similar studies in the literature. Overall, this artificial intelligence models present promising results for BRCA variations’ risk assessment in breast cancers as well as a unique tool for personalized medicine software. Full article

(This article belongs to the Special Issue Computational Methods for the Analysis of Genomic Data and Biological Processes (II))

► Show Figures

Figure 1

26 pages, 5882 KiB

Open AccessArticle

DNN-m6A: A Cross-Species Method for Identifying RNA N6-methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion

by Lu Zhang, Xinyi Qin, Min Liu, Ziwei Xu and Guangzhong Liu

Genes 2021, 12(3), 354; https://doi.org/10.3390/genes12030354 - 28 Feb 2021

Cited by 18 | Viewed by 2400

Abstract

As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As [...] Read more.

As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58–83.38% and an area under the curve (AUC) of 81.39–91.04%. Furthermore, the independent datasets achieved an ACC of 72.95–83.04% and an AUC of 80.79–91.09%, which shows an excellent generalization ability of our proposed method. Full article

(This article belongs to the Special Issue Computational Methods for the Analysis of Genomic Data and Biological Processes (II))

► Show Figures

Figure 1

12 pages, 1810 KiB

Open AccessArticle

Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model

by Zhenqiu Liu

Genes 2021, 12(2), 311; https://doi.org/10.3390/genes12020311 - 22 Feb 2021

Cited by 11 | Viewed by 3021

Abstract

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at [...] Read more.

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter

λ

. Therefore, we can simply set

λ = 2

or

λ = log (p)

for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data. Full article

(This article belongs to the Special Issue Computational Methods for the Analysis of Genomic Data and Biological Processes (II))

► Show Figures

Figure 1

10 pages, 405 KiB

Open AccessArticle

4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network

by Zeeshan Abbas, Hilal Tayara and Kil To Chong

Genes 2021, 12(2), 296; https://doi.org/10.3390/genes12020296 - 20 Feb 2021

Cited by 19 | Viewed by 2639

Abstract

Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several [...] Read more.

Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme—one-hot encoding—we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics. Full article

(This article belongs to the Special Issue Computational Methods for the Analysis of Genomic Data and Biological Processes (II))

► Show Figures

Figure 1

11 pages, 1718 KiB

Open AccessArticle

pcPromoter-CNN: A CNN-Based Prediction and Classification of Promoters

by Muhammad Shujaat, Abdul Wahab, Hilal Tayara and Kil To Chong

Genes 2020, 11(12), 1529; https://doi.org/10.3390/genes11121529 - 21 Dec 2020

Cited by 34 | Viewed by 4343

Abstract

A promoter is a small region within the DNA structure that has an important role in initiating transcription of a specific gene in the genome. Different types of promoters are recognized by their different functions. Due to the importance of promoter functions, computational [...] Read more.

A promoter is a small region within the DNA structure that has an important role in initiating transcription of a specific gene in the genome. Different types of promoters are recognized by their different functions. Due to the importance of promoter functions, computational tools for the prediction and classification of a promoter are highly desired. Promoters resemble each other; therefore, their precise classification is an important challenge. In this study, we propose a convolutional neural network (CNN)-based tool, the pcPromoter-CNN, for application in the prediction of promotors and their classification into subclasses σ70, σ54, σ38, σ32, σ28 and σ24. This CNN-based tool uses a one-hot encoding scheme for promoter classification. The tools architecture was trained and tested on a benchmark dataset. To evaluate its classification performance, we used four evaluation metrics. The model exhibited notable improvement over that of existing state-of-the-art tools. Full article

(This article belongs to the Special Issue Computational Methods for the Analysis of Genomic Data and Biological Processes (II))

► Show Figures

Figure 1

16 pages, 2922 KiB

Open AccessArticle

Hybrid of Restricted and Penalized Maximum Likelihood Method for Efficient Genome-Wide Association Study

by Wenlong Ren, Zhikai Liang, Shu He and Jing Xiao

Genes 2020, 11(11), 1286; https://doi.org/10.3390/genes11111286 - 29 Oct 2020

Cited by 1 | Viewed by 2410

Abstract

In genome-wide association studies, linear mixed models (LMMs) have been widely used to explore the molecular mechanism of complex traits. However, typical association approaches suffer from several important drawbacks: estimation of variance components in LMMs with large scale individuals is computationally slow; single-locus [...] Read more.

In genome-wide association studies, linear mixed models (LMMs) have been widely used to explore the molecular mechanism of complex traits. However, typical association approaches suffer from several important drawbacks: estimation of variance components in LMMs with large scale individuals is computationally slow; single-locus model is unsatisfactory to handle complex confounding and causes loss of statistical power. To address these issues, we propose an efficient two-stage method based on hybrid of restricted and penalized maximum likelihood, named HRePML. Firstly, we performed restricted maximum likelihood (REML) on single-locus LMM to remove unrelated markers, where spectral decomposition on covariance matrix was used to fast estimate variance components. Secondly, we carried out penalized maximum likelihood (PML) on multi-locus LMM for markers with reasonably large effects. To validate the effectiveness of HRePML, we conducted a series of simulation studies and real data analyses. As a result, our method always had the highest average statistical power compared with multi-locus mixed-model (MLMM), fixed and random model circulating probability unification (FarmCPU), and genome-wide efficient mixed model association (GEMMA). More importantly, HRePML can provide higher accuracy estimation of marker effects. HRePML also identifies 41 previous reported genes associated with development traits in Arabidopsis, which is more than was detected by the other methods. Full article

(This article belongs to the Special Issue Computational Methods for the Analysis of Genomic Data and Biological Processes (II))

► Show Figures

Graphical abstract

Journal Menu

Journal Browser

Computational Methods for the Analysis of Genomic Data and Biological Processes (II)

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI