Genes

Editorial

Jump to: Research

6 pages, 202 KiB

Open AccessEditorial

Innovating Computational Biology and Intelligent Medicine: ICIBM 2019 Special Issue

by Yan Guo, Xia Ning, Ewy Mathé, Kai Wang, Lang Li, Chi Zhang and Zhongming Zhao

Genes 2020, 11(4), 437; https://doi.org/10.3390/genes11040437 - 17 Apr 2020

Cited by 1 | Viewed by 1840

Abstract

The International Association for Intelligent Biology and Medicine (IAIBM) is a nonprofit organization that promotes intelligent biology and medical science. It hosts an annual International Conference on Intelligent Biology and Medicine (ICIBM), which was established in 2012. The ICIBM 2019 was held from [...] Read more.

The International Association for Intelligent Biology and Medicine (IAIBM) is a nonprofit organization that promotes intelligent biology and medical science. It hosts an annual International Conference on Intelligent Biology and Medicine (ICIBM), which was established in 2012. The ICIBM 2019 was held from 9 to 11 June 2019 in Columbus, Ohio, USA. Out of the 105 original research manuscripts submitted to the conference, 18 were selected for publication in a Special Issue in Genes. The topics of the selected manuscripts cover a wide range of current topics in biomedical research including cancer informatics, transcriptomic, computational algorithms, visualization and tools, deep learning, and microbiome research. In this editorial, we briefly introduce each of the manuscripts and discuss their contribution to the advance of science and technology. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

Research

Jump to: Editorial

14 pages, 13297 KiB

Open AccessArticle

Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification

by Maryam Zand and Jianhua Ruan

Genes 2020, 11(4), 377; https://doi.org/10.3390/genes11040377 - 31 Mar 2020

Cited by 8 | Viewed by 4042

Abstract

Single-cell RNA sequencing is a powerful technology for obtaining transcriptomes at single-cell resolutions. However, it suffers from dropout events (i.e., excess zero counts) since only a small fraction of transcripts get sequenced in each cell during the sequencing process. This inherent sparsity of [...] Read more.

Single-cell RNA sequencing is a powerful technology for obtaining transcriptomes at single-cell resolutions. However, it suffers from dropout events (i.e., excess zero counts) since only a small fraction of transcripts get sequenced in each cell during the sequencing process. This inherent sparsity of expression profiles hinders further characterizations at cell/gene-level such as cell type identification and downstream analysis. To alleviate this dropout issue we introduce a network-based method, netImpute, by leveraging the hidden information in gene co-expression networks to recover real signals. netImpute employs Random Walk with Restart (RWR) to adjust the gene expression level in a given cell by borrowing information from its neighbors in a gene co-expression network. Performance evaluation and comparison with existing tools on simulated data and seven real datasets show that netImpute substantially enhances clustering accuracy and data visualization clarity, thanks to its effective treatment of dropouts. While the idea of netImpute is general and can be applied with other types of networks such as cell co-expression network or protein–protein interaction (PPI) network, evaluation results show that gene co-expression network is consistently more beneficial, presumably because PPI network usually lacks cell type context, while cell co-expression network can cause information loss for rare cell types. Evaluation results on several biological datasets show that netImpute can more effectively recover missing transcripts in scRNA-seq data and enhance the identification and visualization of heterogeneous cell types than existing methods. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

15 pages, 1220 KiB

Open AccessArticle

Computational Cancer Cell Models to Guide Precision Breast Cancer Medicine

by Lijun Cheng, Abhishek Majumdar, Daniel Stover, Shaofeng Wu, Yaoqin Lu and Lang Li

Genes 2020, 11(3), 263; https://doi.org/10.3390/genes11030263 - 28 Feb 2020

Cited by 9 | Viewed by 2589

Abstract

Background: Large-scale screening of drug sensitivity on cancer cell models can mimic in vivo cellular behavior providing wider scope for biological research on cancer. Since the therapeutic effect of a single drug or drug combination depends on the individual patient’s genome characteristics and [...] Read more.

Background: Large-scale screening of drug sensitivity on cancer cell models can mimic in vivo cellular behavior providing wider scope for biological research on cancer. Since the therapeutic effect of a single drug or drug combination depends on the individual patient’s genome characteristics and cancer cells integration reaction, the identification of an effective agent in an in vitro model by using large number of cancer cell models is a promising approach for the development of targeted treatments. Precision cancer medicine is to select the most appropriate treatment or treatments for an individual patient. However, it still lacks the tools to bridge the gap between conventional in vitro cancer cell models and clinical patient response to inhibitors. Methods: An optimal two-layer decision system model is developed to identify the cancer cells that most closely resemble an individual tumor for optimum therapeutic interventions in precision cancer medicine. Accordingly, an optimal grid parameters selection is designed to seek the highest accordance for treatment selection to the patient’s preference for drug response and in vitro cancer cell drug screening. The optimal two-layer decision system model overcomes the challenge of heterology data comparison between the tumor and the cancer cells, as well as between the continual variation of drug responses in vitro and the discrete ones in clinical practice. We simulated the model accuracy using 681 cancer cells’ mRNA and associated 481 drug screenings and validated our results on 315 breast cancer patients drug selection across seven drugs (docetaxel, doxorubicin, fluorouracil, paclitaxel, tamoxifen, cyclophosphamide, lapitinib). Results: Comparing with the real response of a drug in clinical patients, the novel model obtained an overall average accordance over 90.8% across the seven drugs. At the same time, the optimal cancer cells and the associated optimal therapeutic efficacy of cancer drugs are recommended. The novel optimal two-layer decision system model was used on 1097 patients with breast cancer in guiding precision medicine for a recommendation of their optimal cancer cells (30 cancer cells) and associated efficacy of certain cancer drugs. Our model can detect the most similar cancer cells for each individual patient. Conclusion: A successful clinical translation model (optimal two-layer decision system model) was developed to bridge in-vitro basic science to clinical practice in a therapeutic intervention application for the first time. The novel tool kills two birds with one stone. It can help basic science to seek optimal cancer cell models for an individual tumor, while prioritizing clinical drugs’ recommendations in practice. Tool associated platform website: We extended the breast cancer research to 32 more types of cancers across 45 therapy predictions. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

13 pages, 2681 KiB

Open AccessArticle

CNV Detection from Circulating Tumor DNA in Late Stage Non-Small Cell Lung Cancer Patients

by Hao Peng, Lan Lu, Zisong Zhou, Jian Liu, Dadong Zhang, Kejun Nan, Xiaochen Zhao, Fugen Li, Lei Tian, Hua Dong and Yu Yao

Genes 2019, 10(11), 926; https://doi.org/10.3390/genes10110926 - 14 Nov 2019

Cited by 33 | Viewed by 5543

Abstract

While methods for detecting SNVs and indels in circulating tumor DNA (ctDNA) with hybridization capture-based next-generation sequencing (NGS) have been available, copy number variations (CNVs) detection is more challenging. Here, we present a method enabling CNV detection from a 150-gene panel using a [...] Read more.

While methods for detecting SNVs and indels in circulating tumor DNA (ctDNA) with hybridization capture-based next-generation sequencing (NGS) have been available, copy number variations (CNVs) detection is more challenging. Here, we present a method enabling CNV detection from a 150-gene panel using a very low amount of ctDNA. First, a read depth-based CNV estimation method without a paired blood sample was developed and cfDNA sequencing data from healthy people were used to build a panel of normal (PoN) model. Then, in silico and in vitro simulations were performed to define the limit of detection (LOD) for EGFR, ERBB2, and MET. Compared to the WES results of the 48 samples, the concordance rate for EGFR, ERBB2, and MET CNVs was 78%, 89.6%, and 92.4%, respectively. In another cohort profiled with the 150-gene panel from 5980 lung cancer ctDNA samples, we detected the three genes’ amplification with comparable population frequency with other cohorts. One lung adenocarcinoma patient with MET amplification detected by our method reached partial response to crizotinib. These findings show that our ctDNA CNV detection pipeline can detect CNVs with high specificity and concordance, which enables CNV calling in a non-invasive way for cancer patients when tissues are not available. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

17 pages, 5143 KiB

Open AccessArticle

DNA Methylation Markers for Pan-Cancer Prediction by Deep Learning

by Biao Liu, Yulu Liu, Xingxin Pan, Mengyao Li, Shuang Yang and Shuai Cheng Li

Genes 2019, 10(10), 778; https://doi.org/10.3390/genes10100778 - 04 Oct 2019

Cited by 53 | Viewed by 6131

Abstract

For cancer diagnosis, many DNA methylation markers have been identified. However, few studies have tried to identify DNA methylation markers to diagnose diverse cancer types simultaneously, i.e., pan-cancers. In this study, we tried to identify DNA methylation markers to differentiate cancer samples from [...] Read more.

For cancer diagnosis, many DNA methylation markers have been identified. However, few studies have tried to identify DNA methylation markers to diagnose diverse cancer types simultaneously, i.e., pan-cancers. In this study, we tried to identify DNA methylation markers to differentiate cancer samples from the respective normal samples in pan-cancers. We collected whole genome methylation data of 27 cancer types containing 10,140 cancer samples and 3386 normal samples, and divided all samples into five data sets, including one training data set, one validation data set and three test data sets. We applied machine learning to identify DNA methylation markers, and specifically, we constructed diagnostic prediction models by deep learning. We identified two categories of markers: 12 CpG markers and 13 promoter markers. Three of 12 CpG markers and four of 13 promoter markers locate at cancer-related genes. With the CpG markers, our model achieved an average sensitivity and specificity on test data sets as 92.8% and 90.1%, respectively. For promoter markers, the average sensitivity and specificity on test data sets were 89.8% and 81.1%, respectively. Furthermore, in cell-free DNA methylation data of 163 prostate cancer samples, the CpG markers achieved the sensitivity as 100%, and the promoter markers achieved 92%. For both marker types, the specificity of normal whole blood was 100%. To conclude, we identified methylation markers to diagnose pan-cancers, which might be applied to liquid biopsy of cancers. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

8 pages, 3178 KiB

Open AccessArticle

A Portal to Visualize Transcriptome Profiles in Mouse Models of Neurological Disorders

by Rami Al-Ouran, Ying-Wooi Wan, Carl Grant Mangleburg, Tom V. Lee, Katherine Allison, Joshua M. Shulman and Zhandong Liu

Genes 2019, 10(10), 759; https://doi.org/10.3390/genes10100759 - 26 Sep 2019

Cited by 7 | Viewed by 3085

Abstract

Target nomination for drug development has been a major challenge in the path to finding a cure for several neurological disorders. Comprehensive transcriptome profiles have revealed brain gene expression changes associated with many neurological disorders, and the functional validation of these changes is [...] Read more.

Target nomination for drug development has been a major challenge in the path to finding a cure for several neurological disorders. Comprehensive transcriptome profiles have revealed brain gene expression changes associated with many neurological disorders, and the functional validation of these changes is a critical next step. Model organisms are a proven approach for the elucidation of disease mechanisms, including screening of gene candidates as therapeutic targets. Frequently, multiple models exist for a given disease, creating a challenge to select the optimal model for validation and functional follow-up. To help in nominating the best mouse models for studying neurological diseases, we developed a web portal to visualize mouse transcriptomic data related to neurological disorders. Users can examine gene expression changes across mouse model studies to help select the optimal mouse model for further investigation. The portal provides access to mouse studies related to Alzheimer’s diseases (AD), Parkinson’s disease (PD), Huntington’s disease (HD), Amyotrophic Lateral Sclerosis (ALS), Spinocerebellar ataxia (SCA), and models related to aging. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

19 pages, 2793 KiB

Open AccessArticle

Identification of Alternatively-Activated Pathways between Primary Breast Cancer and Liver Metastatic Cancer Using Microarray Data

by Limei Wang, Jin Li, Enze Liu, Garrett Kinnebrew, Xiaoli Zhang, Daniel Stover, Yang Huo, Zhi Zeng, Wanli Jiang, Lijun Cheng, Weixing Feng and Lang Li

Genes 2019, 10(10), 753; https://doi.org/10.3390/genes10100753 - 25 Sep 2019

Cited by 10 | Viewed by 3117

Abstract

Alternatively-activated pathways have been observed in biological experiments in cancer studies, but the concept had not been fully explored in computational cancer system biology. Therefore, an alternatively-activated pathway identification method was proposed and applied to primary breast cancer and breast cancer liver metastasis [...] Read more.

Alternatively-activated pathways have been observed in biological experiments in cancer studies, but the concept had not been fully explored in computational cancer system biology. Therefore, an alternatively-activated pathway identification method was proposed and applied to primary breast cancer and breast cancer liver metastasis research using microarray data. Interestingly, the results show that cytokine-cytokine receptor interaction and calcium signaling were significantly enriched under both conditions. TGF beta signaling was found to be the hub in network topology analysis. In total, three types of alternatively-activated pathways were recognized. In the cytokine-cytokine receptor interaction pathway, four active alteration patterns in gene pairs were noticed. Thirteen cytokine-cytokine receptor pairs with inverse activity changes of both genes were verified by the literature. The second type was that some sub-pathways were active under only one condition. For the third type, nodes were significantly active in both conditions, but with different active genes. In the calcium signaling and TGF beta signaling pathways, node E2F5 and E2F4 were significantly active in primary breast cancer and metastasis, respectively. Overall, our study demonstrated the first time using microarray data to identify alternatively-activated pathways in breast cancer liver metastasis. The results showed that the proposed method was valid and effective, which could be helpful for future research for understanding the mechanism of breast cancer metastasis. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

13 pages, 1854 KiB

Open AccessArticle

Forming Big Datasets through Latent Class Concatenation of Imperfectly Matched Databases Features

by Christopher W. Bartlett, Brett G. Klamer, Steven Buyske, Stephen A. Petrill and William C. Ray

Genes 2019, 10(9), 727; https://doi.org/10.3390/genes10090727 - 19 Sep 2019

Cited by 2 | Viewed by 2569

Abstract

Informatics researchers often need to combine data from many different sources to increase statistical power and study subtle or complicated effects. Perfect overlap of measurements across academic studies is rare since virtually every dataset is collected for a unique purpose and without coordination [...] Read more.

Informatics researchers often need to combine data from many different sources to increase statistical power and study subtle or complicated effects. Perfect overlap of measurements across academic studies is rare since virtually every dataset is collected for a unique purpose and without coordination across parties not-at-hand (i.e., informatics researchers in the future). Thus, incomplete concordance of measurements across datasets poses a major challenge for researchers seeking to combine public databases. In any given field, some measurements are fairly standard, but every organization collecting data makes unique decisions on instruments, protocols, and methods of processing the data. This typically denies literal concatenation of the raw data since constituent cohorts do not have the same measurements (i.e., columns of data). When measurements across datasets are similar prima facie, there is a desire to combine the data to increase power, but mixing non-identical measurements could greatly reduce the sensitivity of the downstream analysis. Here, we discuss a statistical method that is applicable when certain patterns of missing data are found; namely, it is possible to combine datasets that measure the same underlying constructs (or latent traits) when there is only partial overlap of measurements across the constituent datasets. Our method, ROSETTA empirically derives a set of common latent trait metrics for each related measurement domain using a novel variation of factor analysis to ensure equivalence across the constituent datasets. The advantage of combining datasets this way is the simplicity, statistical power, and modeling flexibility of a single joint analysis of all the data. Three simulation studies show the performance of ROSETTA on datasets with only partially overlapping measurements (i.e., systematically missing information), benchmarked to a condition of perfectly overlapped data (i.e., full information). The first study examined a range of correlations, while the second study was modeled after the observed correlations in a well-characterized clinical, behavioral cohort. Both studies consistently show significant correlations >0.94, often >0.96, indicating the robustness of the method and validating the general approach. The third study varied within and between domain correlations and compared ROSETTA to multiple imputation and meta-analysis as two commonly used methods that ostensibly solve the same data integration problem. We provide one alternative to meta-analysis and multiple imputation by developing a method that statistically equates similar but distinct manifest metrics into a set of empirically derived metrics that can be used for analysis across all datasets. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

17 pages, 1894 KiB

Open AccessArticle

Identifying Interaction Clusters for MiRNA and MRNA Pairs in TCGA Network

by Xinqing Dai, Lizhong Ding, Hannah Liu, Zesheng Xu, Hui Jiang, Samuel K Handelman and Yongsheng Bai

Genes 2019, 10(9), 702; https://doi.org/10.3390/genes10090702 - 11 Sep 2019

Cited by 12 | Viewed by 3697

Abstract

Existing methods often fail to recognize the conversions for the biological roles of the pairs of genes and microRNAs (miRNAs) between the tumor and normal samples. We have developed a novel cluster scoring method to identify messenger RNA (mRNA) and miRNA interaction pairs [...] Read more.

Existing methods often fail to recognize the conversions for the biological roles of the pairs of genes and microRNAs (miRNAs) between the tumor and normal samples. We have developed a novel cluster scoring method to identify messenger RNA (mRNA) and miRNA interaction pairs and clusters while considering tumor and normal samples jointly. Our method has identified 54 significant clusters for 15 cancer types selected from The Cancer Genome Atlas project. We also determined the shared clusters across tumor types and/or subtypes. In addition, we compared gene and miRNA overlap between lists identified in our liver hepatocellular carcinoma (LIHC) study and regulatory relationships reported from human and rat nonalcoholic fatty liver disease studies (NAFLD). Finally, we analyzed biological functions for the single significant cluster in LIHC and uncovered a significantly enriched pathway (phospholipase D signaling pathway) with six genes represented in the cluster, symbols: DGKQ, LPAR2, PDGFRB, PIK3R3, PTGFR and RAPGEF3. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

30 pages, 32768 KiB

Open AccessArticle

A Super-Clustering Approach for Fully Automated Single Particle Picking in Cryo-EM

by Adil Al-Azzawi, Anes Ouadou, John J. Tanner and Jianlin Cheng

Genes 2019, 10(9), 666; https://doi.org/10.3390/genes10090666 - 30 Aug 2019

Cited by 9 | Viewed by 3738

Abstract

Structure determination of proteins and macromolecular complexes by single-particle cryo-electron microscopy (cryo-EM) is poised to revolutionize structural biology. An early challenging step in the cryo-EM pipeline is the detection and selection of particles from two-dimensional micrographs (particle picking). Most existing particle-picking methods require [...] Read more.

Structure determination of proteins and macromolecular complexes by single-particle cryo-electron microscopy (cryo-EM) is poised to revolutionize structural biology. An early challenging step in the cryo-EM pipeline is the detection and selection of particles from two-dimensional micrographs (particle picking). Most existing particle-picking methods require human intervention to deal with complex (irregular) particle shapes and extremely low signal-to-noise ratio (SNR) in cryo-EM images. Here, we design a fully automated super-clustering approach for single particle picking (SuperCryoEMPicker) in cryo-EM micrographs, which focuses on identifying, detecting, and picking particles of the complex and irregular shapes in micrographs with extremely low signal-to-noise ratio (SNR). Our method first applies advanced image processing procedures to improve the quality of the cryo-EM images. The binary mask image-highlighting protein particles are then generated from each individual cryo-EM image using the super-clustering (SP) method, which improves upon base clustering methods (i.e., k-means, fuzzy c-means (FCM), and intensity-based cluster (IBC) algorithm) via a super-pixel algorithm. SuperCryoEMPicker is tested and evaluated on micrographs of β-galactosidase and 80S ribosomes, which are examples of cryo-EM data exhibiting complex and irregular particle shapes. The results show that the super-particle clustering method provides a more robust detection of particles than the base clustering methods, such as k-means, FCM, and IBC. SuperCryoEMPicker automatically and effectively identifies very complex particles from cryo-EM images of extremely low SNR. As a fully automated particle detection method, it has the potential to relieve researchers from laborious, manual particle-labeling work and therefore is a useful tool for cryo-EM protein structure determination. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

17 pages, 2236 KiB

Open AccessArticle

Gene Co-Expression Networks Restructured Gene Fusion in Rhabdomyosarcoma Cancers

by Bryan R. Helm, Xiaohui Zhan, Pankita H. Pandya, Mary E. Murray, Karen E. Pollok, Jamie L. Renbarger, Michael J. Ferguson, Zhi Han, Dong Ni, Jie Zhang and Kun Huang

Genes 2019, 10(9), 665; https://doi.org/10.3390/genes10090665 - 30 Aug 2019

Cited by 5 | Viewed by 2882

Abstract

Rhabdomyosarcoma is subclassified by the presence or absence of a recurrent chromosome translocation that fuses the FOXO1 and PAX3 or PAX7 genes. The fusion protein (FOXO1-PAX3/7) retains both binding domains and becomes a novel and potent transcriptional regulator in rhabdomyosarcoma subtypes. Many studies [...] Read more.

Rhabdomyosarcoma is subclassified by the presence or absence of a recurrent chromosome translocation that fuses the FOXO1 and PAX3 or PAX7 genes. The fusion protein (FOXO1-PAX3/7) retains both binding domains and becomes a novel and potent transcriptional regulator in rhabdomyosarcoma subtypes. Many studies have characterized and integrated genomic, transcriptomic, and epigenomic differences among rhabdomyosarcoma subtypes that contain the FOXO1-PAX3/7 gene fusion and those that do not; however, few investigations have investigated how gene co-expression networks are altered by FOXO1-PAX3/7. Although transcriptional data offer insight into one level of functional regulation, gene co-expression networks have the potential to identify biological interactions and pathways that underpin oncogenesis and tumorigenicity. Thus, we examined gene co-expression networks for rhabdomyosarcoma that were FOXO1-PAX3 positive, FOXO1-PAX7 positive, or fusion negative. Gene co-expression networks were mined using local maximum Quasi-Clique Merger (lmQCM) and analyzed for co-expression differences among rhabdomyosarcoma subtypes. This analysis observed 41 co-expression modules that were shared between fusion negative and positive samples, of which 17/41 showed significant up- or down-regulation in respect to fusion status. Fusion positive and negative rhabdomyosarcoma showed differing modularity of co-expression networks with fusion negative (n = 109) having significantly more individual modules than fusion positive (n = 53). Subsequent analysis of gene co-expression networks for PAX3 and PAX7 type fusions observed 17/53 were differentially expressed between the two subtypes. Gene list enrichment analysis found that gene ontology terms were poorly matched with biological processes and molecular function for most co-expression modules identified in this study; however, co-expressed modules were frequently localized to cytobands on chromosomes 8 and 11. Overall, we observed substantial restructuring of co-expression networks relative to fusion status and fusion type in rhabdomyosarcoma and identified previously overlooked genes and pathways that may be targeted in this pernicious disease. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

16 pages, 3412 KiB

Open AccessArticle

Sparse Convolutional Denoising Autoencoders for Genotype Imputation

by Junjie Chen and Xinghua Shi

Genes 2019, 10(9), 652; https://doi.org/10.3390/genes10090652 - 28 Aug 2019

Cited by 21 | Viewed by 5264

Abstract

Genotype imputation, where missing genotypes can be computationally imputed, is an essential tool in genomic analysis ranging from genome wide associations to phenotype prediction. Traditional genotype imputation methods are typically based on haplotype-clustering algorithms, hidden Markov models (HMMs), and statistical inference. Deep learning-based [...] Read more.

Genotype imputation, where missing genotypes can be computationally imputed, is an essential tool in genomic analysis ranging from genome wide associations to phenotype prediction. Traditional genotype imputation methods are typically based on haplotype-clustering algorithms, hidden Markov models (HMMs), and statistical inference. Deep learning-based methods have been recently reported to suitably address the missing data problems in various fields. To explore the performance of deep learning for genotype imputation, in this study, we propose a deep model called a sparse convolutional denoising autoencoder (SCDA) to impute missing genotypes. We constructed the SCDA model using a convolutional layer that can extract various correlation or linkage patterns in the genotype data and applying a sparse weight matrix resulted from the L₁ regularization to handle high dimensional data. We comprehensively evaluated the performance of the SCDA model in different scenarios for genotype imputation on the yeast and human genotype data, respectively. Our results showed that SCDA has strong robustness and significantly outperforms popular reference-free imputation methods. This study thus points to another novel application of deep learning models for missing data imputation in genomic studies. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

13 pages, 1378 KiB

Open AccessArticle

Tumor-Infiltrating Leukocyte Composition and Prognostic Power in Hepatitis B- and Hepatitis C-Related Hepatocellular Carcinomas

by Yi-Wen Hsiao, Lu-Ting Chiu, Ching-Hsuan Chen, Wei-Liang Shih and Tzu-Pin Lu

Genes 2019, 10(8), 630; https://doi.org/10.3390/genes10080630 - 20 Aug 2019

Cited by 27 | Viewed by 3938

Abstract

Background: Tumor-infiltrating leukocytes (TILs) are immune cells surrounding tumor cells, and several studies have shown that TILs are potential survival predictors in different cancers. However, few studies have dissected the differences between hepatitis B- and hepatitis C-related hepatocellular carcinoma (HBV−HCC and HCV−HCC). Therefore, [...] Read more.

Background: Tumor-infiltrating leukocytes (TILs) are immune cells surrounding tumor cells, and several studies have shown that TILs are potential survival predictors in different cancers. However, few studies have dissected the differences between hepatitis B- and hepatitis C-related hepatocellular carcinoma (HBV−HCC and HCV−HCC). Therefore, we aimed to determine whether the abundance and composition of TILs are potential predictors for survival outcomes in HCC and which TILs are the most significant predictors. Methods: Two bioinformatics algorithms, ESTIMATE and CIBERSORT, were utilized to analyze the gene expression profiles from 6 datasets, from which the abundance of corresponding TILs was inferred. The ESTIMATE algorithm examined the overall abundance of TILs, whereas the CIBERSORT algorithm reported the relative abundance of 22 different TILs. Both HBV−HCC and HCV−HCC were analyzed. Results: The results indicated that the total abundance of TILs was higher in non-tumor tissue regardless of the HCC type. Alternatively, the specific TILs associated with overall survival (OS) and recurrence-free survival (RFS) varied between subtypes. For example, in HBV−HCC, plasma cells (hazard ratio [HR] = 1.05; 95% CI 1.00–1.10; p = 0.034) and activated dendritic cells (HR = 1.08; 95% CI 1.01–1.17; p = 0.03) were significantly associated with OS, whereas in HCV−HCC, monocytes (HR = 1.21) were significantly associated with OS. Furthermore, for RFS, CD8+ T cells (HR = 0.98) and M0 macrophages (HR = 1.02) were potential biomarkers in HBV−HCC, whereas neutrophils (HR = 1.01) were an independent predictor in HCV−HCC. Lastly, in both HBV−HCC and HCV−HCC, CD8+ T cells (HR = 0.97) and activated dendritic cells (HR = 1.09) had a significant association with OS, while γ delta T cells (HR = 1.04), monocytes (HR = 1.05), M0 macrophages (HR = 1.04), M1 macrophages (HR = 1.02), and activated dendritic cells (HR = 1.15) were highly associated with RFS. Conclusions: These findings demonstrated that TILs are potential survival predictors in HCC and different kinds of TILs are observed according to the virus type. Therefore, further investigations are warranted to elucidate the role of TILs in HCC, which may improve immunotherapy outcomes. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

15 pages, 1887 KiB

Open AccessArticle

The Molecular Evolution of Circadian Clock Genes in Spotted Gar (Lepisosteus oculatus)

by Yi Sun, Chao Liu, Moli Huang, Jian Huang, Changhong Liu, Jiguang Zhang, John H. Postlethwait and Han Wang

Genes 2019, 10(8), 622; https://doi.org/10.3390/genes10080622 - 17 Aug 2019

Cited by 9 | Viewed by 4640

Abstract

Circadian rhythms are biological rhythms with a period of approximately 24 h. While canonical circadian clock genes and their regulatory mechanisms appear highly conserved, the evolution of clock gene families is still unclear due to several rounds of whole genome duplication in vertebrates. [...] Read more.

Circadian rhythms are biological rhythms with a period of approximately 24 h. While canonical circadian clock genes and their regulatory mechanisms appear highly conserved, the evolution of clock gene families is still unclear due to several rounds of whole genome duplication in vertebrates. The spotted gar (Lepisosteus oculatus), as a non-teleost ray-finned fish, represents a fish lineage that diverged before the teleost genome duplication (TGD), providing an outgroup for exploring the evolutionary mechanisms of circadian clocks after whole-genome duplication. In this study, we interrogated the spotted gar draft genome sequences and found that spotted gar contains 26 circadian clock genes from 11 families. Phylogenetic analysis showed that 9 of these 11 spotted gar circadian clock gene families have the same number of genes as humans, while the members of the nfil3 and cry families are different between spotted gar and humans. Using phylogenetic and syntenic analyses, we found that nfil3-1 is conserved in vertebrates, while nfil3-2 and nfil3-3 are maintained in spotted gar, teleost fish, amphibians, and reptiles, but not in mammals. Following the two-round vertebrate genome duplication (VGD), spotted gar retained cry1a, cry1b, and cry2, and cry3 is retained in spotted gar, teleost fish, turtles, and birds, but not in mammals. We hypothesize that duplication of core clock genes, such as (nfil3 and cry), likely facilitated diversification of circadian regulatory mechanisms in teleost fish. We also found that the transcription factor binding element (Ahr::Arnt) is retained only in one of the per1 or per2 duplicated paralogs derived from the TGD in the teleost fish, implicating possible subfuctionalization cases. Together, these findings help decipher the repertoires of the spotted gar’s circadian system and shed light on how the vertebrate circadian clock systems have evolved. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

22 pages, 647 KiB

Open AccessArticle

Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles

by Saurav Mallik and Zhongming Zhao

Genes 2019, 10(8), 611; https://doi.org/10.3390/genes10080611 - 13 Aug 2019

Cited by 12 | Viewed by 4960

Abstract

Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. [...] Read more.

Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. We proposed a multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data. First, we conducted initial filtering and SCnorm normalization. We considered various case studies by selecting different cluster numbers (

c l

= 2 to a user-defined number), and applied fuzzy c-means clustering algorithm individually. From each case, we evaluated the scores of four cluster validity index measures, Partition Entropy (

P E

), Partition Coefficient (

P C

), Modified Partition Coefficient (

M P C

), and Fuzzy Silhouette Index (

F S I

). Next, we set the first measure as minimization objective (↓) and the remaining three as maximization objectives (↑), and then applied a multi-objective decision-making technique, TOPSIS, to identify the best optimal solution. The best optimal solution (case study) that had the highest TOPSIS score was selected as the final optimal clustering. Finally, we obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters. We applied our approach to a scRNA-seq dataset for the rare intestinal cell type in mice [GEO ID: GSE62270, 23,630 features (genes) and 288 cells]. The optimal cluster result (TOPSIS optimal score= 0.858) comprised two clusters, one with 115 cells and the other 91 cells. The evaluated scores of the four cluster validity indices,

F S I

,

P E

,

P C

, and

M P C

for the optimized fuzzy clustering were 0.482, 0.578, 0.607, and 0.215, respectively. The Limma analysis identified 1240 DEGs (cluster 1 vs. cluster 2). The top ten gene markers were Rps21, Slc5a1, Crip1, Rpl15, Rpl3, Rpl27a, Khk, Rps3a1, Aldob and Rps17. In this list, Khk (encoding ketohexokinase) is a novel marker for the rare intestinal cell type. In summary, this method is useful to detect cell clusters from scRNA-seq data. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

10 pages, 1263 KiB

Open AccessArticle

Network as a Biomarker: A Novel Network-Based Sparse Bayesian Machine for Pathway-Driven Drug Response Prediction

by Qi Liu, Louis J. Muglia and Lei Frank Huang

Genes 2019, 10(8), 602; https://doi.org/10.3390/genes10080602 - 09 Aug 2019

Cited by 12 | Viewed by 3501

Abstract

With the advances in different biological networks including gene regulation, gene co-expression, protein–protein interaction networks, and advanced approaches for network reconstruction, analysis, and interpretation, it is possible to discover reliable and accurate molecular network-based biomarkers for monitoring cancer treatment. Such efforts will also [...] Read more.

With the advances in different biological networks including gene regulation, gene co-expression, protein–protein interaction networks, and advanced approaches for network reconstruction, analysis, and interpretation, it is possible to discover reliable and accurate molecular network-based biomarkers for monitoring cancer treatment. Such efforts will also pave the way toward the realization of biomarker-driven personalized medicine against cancer. Previously, we have reconstructed disease-specific driver signaling networks using multi-omics profiles and cancer signaling pathway data. In this study, we developed a network-based sparse Bayesian machine (NBSBM) approach, using previously derived disease-specific driver signaling networks to predict cancer cell responses to drugs. NBSBM made use of the information encoded in a disease-specific (differentially expressed) network to improve its prediction performance in problems with a reduced amount of training data and a very high-dimensional feature space. Sparsity in NBSBM is favored by a spike and slab prior distribution, which is combined with a Markov random field prior that encodes the network of feature dependencies. Gene features that are connected in the network are assumed to be both relevant and irrelevant to drug responses. We compared the proposed method with network-based support vector machine (NBSVM) approaches and found that the NBSBM approach could achieve much better accuracy than the other two NBSVM methods. The gene modules selected from the disease-specific driver networks for predicting drug sensitivity might be directly involved in drug sensitivity or resistance. This work provides a disease-specific network-based drug sensitivity prediction approach and can uncover the potential mechanisms of the action of drugs by selecting the most predictive sub-networks from the disease-specific network. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

16 pages, 29249 KiB

Open AccessArticle

Long Non-Coding RNA Expression Levels Modulate Cell-Type-Specific Splicing Patterns by Altering Their Interaction Landscape with RNA-Binding Proteins

by Felipe Wendt Porto, Swapna Vidhur Daulatabad and Sarath Chandra Janga

Genes 2019, 10(8), 593; https://doi.org/10.3390/genes10080593 - 06 Aug 2019

Cited by 14 | Viewed by 5111

Abstract

Recent developments in our understanding of the interactions between long non-coding RNAs (lncRNAs) and cellular components have improved treatment approaches for various human diseases including cancer, vascular diseases, and neurological diseases. Although investigation of specific lncRNAs revealed their role in the metabolism of [...] Read more.

Recent developments in our understanding of the interactions between long non-coding RNAs (lncRNAs) and cellular components have improved treatment approaches for various human diseases including cancer, vascular diseases, and neurological diseases. Although investigation of specific lncRNAs revealed their role in the metabolism of cellular RNA, our understanding of their contribution to post-transcriptional regulation is relatively limited. In this study, we explore the role of lncRNAs in modulating alternative splicing and their impact on downstream protein–RNA interaction networks. Analysis of alternative splicing events across 39 lncRNA knockdown and wildtype RNA-sequencing datasets from three human cell lines—HeLa (cervical cancer), K562 (myeloid leukemia), and U87 (glioblastoma)—resulted in the high-confidence (false discovery rate (fdr) < 0.01) identification of 11,630 skipped exon events and 5895 retained intron events, implicating 759 genes to be impacted at the post-transcriptional level due to the loss of lncRNAs. We observed that a majority of the alternatively spliced genes in a lncRNA knockdown were specific to the cell type. In tandem, the functions annotated to the genes affected by alternative splicing across each lncRNA knockdown also displayed cell-type specificity. To understand the mechanism behind this cell-type-specific alternative splicing pattern, we analyzed RNA-binding protein (RBP)–RNA interaction profiles across the spliced regions in order to observe cell-type-specific alternative splice event RBP binding preference. Despite limited RBP binding data across cell lines, alternatively spliced events detected in lncRNA perturbation experiments were associated with RBPs binding in proximal intron–exon junctions in a cell-type-specific manner. The cellular functions affected by alternative splicing were also affected in a cell-type-specific manner. Based on the RBP binding profiles in HeLa and K562 cells, we hypothesize that several lncRNAs are likely to exhibit a sponge effect in disease contexts, resulting in the functional disruption of RBPs and their downstream functions. We propose that such lncRNA sponges can extensively rewire post-transcriptional gene regulatory networks by altering the protein–RNA interaction landscape in a cell-type-specific manner. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Graphical abstract

10 pages, 1949 KiB

Open AccessArticle

Kinetic Modeling of DUSP Regulation in Herceptin-Resistant HER2-Positive Breast Cancer

by Petronela Buiga, Ari Elson, Lydia Tabernero and Jean-Marc Schwartz

Genes 2019, 10(8), 568; https://doi.org/10.3390/genes10080568 - 26 Jul 2019

Cited by 2 | Viewed by 2820

Abstract

Background: HER2 (human epidermal growth factor 2)-positive breast cancer is an aggressive type of breast cancer characterized by the overexpression of the receptor-type protein tyrosine kinase HER2 or amplification of the HER2 gene. It is commonly treated by the drug trastuzumab (Herceptin), but [...] Read more.

Background: HER2 (human epidermal growth factor 2)-positive breast cancer is an aggressive type of breast cancer characterized by the overexpression of the receptor-type protein tyrosine kinase HER2 or amplification of the HER2 gene. It is commonly treated by the drug trastuzumab (Herceptin), but resistance to its action frequently develops and limits its therapeutic benefit. Dual-specificity phosphatases (DUSPs) were previously highlighted as central regulators of HER2 signaling; therefore, understanding their role is crucial to designing new strategies to improve the efficacy of Herceptin treatment. We investigated whether inhibiting certain DUSPs re-sensitized Herceptin-resistant breast cancer cells to the drug. We built a series of kinetic models incorporating the key players of HER2 signaling pathways and simulating a range of inhibition intensities. The simulation results were compared to live tumor cells in culture, and showed good agreement with the experimental analyses. In particular, we observed that Herceptin-resistant DUSP16-silenced breast cancer cells became more responsive to the drug when treated for 72 h with Herceptin, showing a decrease in resistance, in agreement with the model predictions. Overall, we showed that the kinetic modeling of signaling pathways is able to generate predictions that assist experimental research in the identification of potential targets for cancer treatment. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures

Figure 1

13 pages, 3350 KiB

Open AccessArticle

Changes in the Microbial Community Diversity of Oil Exploitation

by Jingjing Liu, Jing Wu, Jiawei Lin, Jian Zhao, Tianyi Xu, Qichang Yang, Jing Zhao, Zhongming Zhao and Xiaofeng Song

Genes 2019, 10(8), 556; https://doi.org/10.3390/genes10080556 - 24 Jul 2019

Cited by 13 | Viewed by 2658

Abstract

To systematically evaluate the ecological changes of an active offshore petroleum production system, the variation of microbial communities at several sites (virgin field, wellhead, storage tank) of an oil production facility in east China was investigated by sequencing the V3 to V4 regions [...] Read more.

To systematically evaluate the ecological changes of an active offshore petroleum production system, the variation of microbial communities at several sites (virgin field, wellhead, storage tank) of an oil production facility in east China was investigated by sequencing the V3 to V4 regions of 16S ribosomal ribonucleic acid (rRNA) of microorganisms. In general, a decrease of microbial community richness and diversity in petroleum mining was observed, as measured by operational taxonomic unit (OTU) numbers, α (Chao1 and Shannon indices), and β (principal coordinate analysis) diversity. Microbial community structure was strongly affected by environmental factors at the phylum and genus levels. At the phylum level, virgin field and wellhead were dominated by Proteobacteria, while the storage tank had higher presence of Firmicutes (29.3–66.9%). Specifically, the wellhead displayed a lower presentence of Proteobacteria (48.6–53.4.0%) and a higher presence of Firmicutes (24.4–29.6%) than the virgin field. At the genus level, the predominant genera were Ochrobactrum and Acinetobacter in the virgin field, Lactococcus and Pseudomonas in the wellhead, and Prauseria and Bacillus in the storage tank. Our study revealed that the microbial community structure was strongly affected by the surrounding environmental factors, such as temperature, oxygen content, salinity, and pH, which could be altered because of the oil production. It was observed that the various microbiomes produced surfactants, transforming the biohazard and degrading hydro-carbon. Altering the microbiome growth condition by appropriate human intervention and taking advantage of natural microbial resources can further enhance oil recovery technology. Full article

(This article belongs to the Special Issue Selected Papers from the International Conference on Intelligent Biology and Medicine (ICIBM 2019))

► Show Figures