Statistical Methods for Genetic Epidemiology

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Molecular Genetics and Genomics".

Deadline for manuscript submissions: 5 June 2024 | Viewed by 10725

Special Issue Editors


E-Mail Website
Guest Editor
Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
Interests: statistical genetics; population genetics; bioinformatics; psychiatric disorder; cancer epidemiology; DNA methylation; molecular phylogenetics
Division of Biostatistics and Bioinformatics and Maryland Psychiatric Research Center, School of Medicine, University of Maryland, Baltimore, MD, USA
Interests: biostatistics; imaging genetics; neuropsychiatric disorder; network analysis
Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, VIC 3010, Australia
Interests: cancer epidemiology; genetic epidemiology; epigenetic epidemiology; cancer risk modeling; twin and family research

E-Mail Website
Guest Editor
Department of Statistics and Data Science, University of Central Florida, Orlando, FL 32816, USA
Interests: big data; machine learning; regularized low-rank matrix models; genomics modeling and analysis; Bayesian ultra-high dimensional variable selection and clustering; spatiotemporal models

Special Issue Information

Dear Colleagues,

Genetic epidemiology, an important area of public health research, has rapidly evolved in the last two decades. This field of study seeks to understand the contribution of genetic factors to health and disease in families and populations and the interplay between genes and environmental factors. Recent advances in high-throughput genomic profiling techniques have brought a sharp increase in “omics” data (genomics, proteomics, transcriptomics, epigenomics, metabolomics, metagenomics, single-cell, etc.) and have accelerated the development of the knowledge and methodologies used to gain a better understanding of the multifactorial causes, distribution, and prediction of inherited diseases in populations.

This Special Issue aims to highlight the latest advances in statistical methods in genetic epidemiology. We encourage researchers to share their original research on developing novel statistical, bioinformatical, and computational approaches or applying advanced statistical techniques to complex traits or diseases. Review papers addressing current advances in this field are also welcome. Topics of primary interest include, but are not limited to:

  • Family and twin studies;
  • Genome-wide association studies;
  • Population genetics;
  • Heritability and genetic correlation;
  • Polygenic risk score;
  • Gene–environment interaction;
  • Multi-omics study;
  • Imaging genetics;
  • Expression quantitative trait loci (eQTLs);
  • Mendelian randomization;
  • Epigenetic epidemiology;
  • Single-cell epidemiology.

Dr. Chenglong Yu
Dr. Shuo Chen
Dr. Shuai Li
Dr. Hsin-Hsiung Huang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • family and twin study
  • GWAS
  • heritability
  • polygenic risk score
  • multi-omics
  • imaging genetics
  • eQTL
  • gene–environment interaction
  • epigenetics
  • Mendelian randomization

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

13 pages, 1424 KiB  
Article
Multiple Sclerosis Heritability Estimation on Sardinian Ascertained Extended Families Using Bayesian Liability Threshold Model
by Andrea Nova, Teresa Fazia, Valeria Saddi, Marialuisa Piras and Luisa Bernardinelli
Genes 2023, 14(8), 1579; https://doi.org/10.3390/genes14081579 - 02 Aug 2023
Viewed by 1102
Abstract
Heritability studies represent an important tool to investigate the main sources of variability for complex diseases, whose etiology involves both genetics and environmental factors. In this paper, we aimed to estimate multiple sclerosis (MS) narrow-sense heritability (h2), on a liability scale, [...] Read more.
Heritability studies represent an important tool to investigate the main sources of variability for complex diseases, whose etiology involves both genetics and environmental factors. In this paper, we aimed to estimate multiple sclerosis (MS) narrow-sense heritability (h2), on a liability scale, using extended families ascertained from affected probands sampled in the Sardinian province of Nuoro, Italy. We also investigated the sources of MS liability variability among shared environment effects, sex, and categorized year of birth (<1946, ≥1946). The latter can be considered a proxy for different early environmental exposures. To this aim, we implemented a Bayesian liability threshold model to obtain posterior distributions for the parameters of interest adjusting for ascertainment bias. Our analysis highlighted categorized year of birth as the main explanatory factor, explaining ~70% of MS liability variability (median value = 0.69, 95% CI: 0.64, 0.73), while h2 resulted near to 0% (median value = 0.03, 95% CI: 0.00, 0.09). By performing a year of birth-stratified analysis, we found a high h2 only in individuals born on/after 1946 (median value = 0.82, 95% CI: 0.68, 0.93), meaning that the genetic variability acquired a high explanatory role only when focusing on this subpopulation. Overall, the results obtained highlighted early environmental exposures, in the Sardinian population, as a meaningful factor involved in MS to be further investigated. Full article
(This article belongs to the Special Issue Statistical Methods for Genetic Epidemiology)
Show Figures

Figure 1

17 pages, 942 KiB  
Article
Exploring the Lifetime Effect of Children on Wellbeing Using Two-Sample Mendelian Randomisation
by Benjamin Woolf, Hannah M. Sallis and Marcus R. Munafò
Genes 2023, 14(3), 716; https://doi.org/10.3390/genes14030716 - 14 Mar 2023
Cited by 1 | Viewed by 1753
Abstract
Background: Observational research implies a negative effect of having children on wellbeing. Objectives: To provide Mendelian randomisation evidence of the effect of having children on parental wellbeing. Design: Two-sample Mendelian randomisation. Setting: Non-clinical European ancestry participants. Participants: We used the UK Biobank (460,654 [...] Read more.
Background: Observational research implies a negative effect of having children on wellbeing. Objectives: To provide Mendelian randomisation evidence of the effect of having children on parental wellbeing. Design: Two-sample Mendelian randomisation. Setting: Non-clinical European ancestry participants. Participants: We used the UK Biobank (460,654 male and female European ancestry participants) as a source of genotype-exposure associations, the Social Science Genetics Consortia (SSGAC) (298,420 male and female European ancestry participants), and the Within-Family Consortia (effective sample of 22,656 male and female European ancestry participants) as sources of genotype-outcome associations. Interventions: The lifetime effect of an increase in the genetic liability to having children. Primary and secondary outcome measures: The primary analysis was an inverse variance weighed analysis of subjective wellbeing measured in the 2016 SSGAC Genome Wide Association Study (GWAS). Secondary outcomes included pleiotropy robust estimators applied in the SSGAC and an analysis using the Within-Family consortia GWAS. Results: We did not find strong evidence of a negative (standard deviation) change in wellbeing (β = 0.153 (95% CI: −0.210 to 0.516) per child parented. Secondary outcomes were generally slightly deflated (e.g., −0.049 [95% CI: −0.533 to 0.435] for the Within-Family Consortia and 0.090 [95% CI: −0.167 to 0.347] for weighted median), implying the presence of some residual confounding and pleiotropy. Conclusions: Contrary to the existing literature, our results are not compatible with a measurable negative effect of number of children on the average wellbeing of a parent over their life course. However, we were unable to explore non-linearities, interactions, or time-varying effects. Full article
(This article belongs to the Special Issue Statistical Methods for Genetic Epidemiology)
Show Figures

Figure 1

16 pages, 555 KiB  
Article
Variable Selection for Sparse Data with Applications to Vaginal Microbiome and Gene Expression Data
by Niloufar Dousti Mousavi, Jie Yang and Hani Aldirawi
Genes 2023, 14(2), 403; https://doi.org/10.3390/genes14020403 - 03 Feb 2023
Cited by 3 | Viewed by 1404
Abstract
Sparse data with a high portion of zeros arise in various disciplines. Modeling sparse high-dimensional data is a challenging and growing research area. In this paper, we provide statistical methods and tools for analyzing sparse data in a fairly general and complex context. [...] Read more.
Sparse data with a high portion of zeros arise in various disciplines. Modeling sparse high-dimensional data is a challenging and growing research area. In this paper, we provide statistical methods and tools for analyzing sparse data in a fairly general and complex context. We utilize two real scientific applications as illustrations, including a longitudinal vaginal microbiome data and a high dimensional gene expression data. We recommend zero-inflated model selections and significance tests to identify the time intervals when the pregnant and non-pregnant groups of women are significantly different in terms of Lactobacillus species. We apply the same techniques to select the best 50 genes out of 2426 sparse gene expression data. The classification based on our selected genes achieves 100% prediction accuracy. Furthermore, the first four principal components based on the selected genes can explain as high as 83% of the model variability. Full article
(This article belongs to the Special Issue Statistical Methods for Genetic Epidemiology)
Show Figures

Figure 1

11 pages, 298 KiB  
Article
Generating Minimal Models of H1N1 NS1 Gene Sequences Using Alignment-Based and Alignment-Free Algorithms
by Meng Fang, Jiawei Xu, Nan Sun and Stephen S.-T. Yau
Genes 2023, 14(1), 186; https://doi.org/10.3390/genes14010186 - 10 Jan 2023
Viewed by 1023
Abstract
For virus classification and tracing, one idea is to generate minimal models from the gene sequences of each virus group for comparative analysis within and between classes, as well as classification and tracing of new sequences. The starting point of defining a minimal [...] Read more.
For virus classification and tracing, one idea is to generate minimal models from the gene sequences of each virus group for comparative analysis within and between classes, as well as classification and tracing of new sequences. The starting point of defining a minimal model for a group of gene sequences is to find their longest common sequence (LCS), but this is a non-deterministic polynomial-time hard (NP-hard) problem. Therefore, we applied some heuristic approaches of finding LCS, as well as some of the newer methods of treating gene sequences, including multiple sequence alignment (MSA) and k-mer natural vector (NV) encoding. To evaluate our algorithms, a five-fold cross validation classification scheme on a dataset of H1N1 virus non-structural protein 1 (NS1) gene was analyzed. The results indicate that the MSA-based algorithm has the best performance measured by classification accuracy, while the NV-based algorithm exhibits advantages in the time complexity of generating minimal models. Full article
(This article belongs to the Special Issue Statistical Methods for Genetic Epidemiology)
Show Figures

Figure 1

15 pages, 350 KiB  
Article
Clustering Gene Expressions Using the Table Invitation Prior
by Charles W. Harrison, Qing He and Hsin-Hsiung Huang
Genes 2022, 13(11), 2036; https://doi.org/10.3390/genes13112036 - 04 Nov 2022
Cited by 2 | Viewed by 1439
Abstract
A prior for Bayesian nonparametric clustering called the Table Invitation Prior (TIP) is used to cluster gene expression data. TIP uses information concerning the pairwise distances between subjects (e.g., gene expression samples) and automatically estimates the number of clusters. TIP’s hyperparameters are estimated [...] Read more.
A prior for Bayesian nonparametric clustering called the Table Invitation Prior (TIP) is used to cluster gene expression data. TIP uses information concerning the pairwise distances between subjects (e.g., gene expression samples) and automatically estimates the number of clusters. TIP’s hyperparameters are estimated using a univariate multiple change point detection algorithm with respect to the subject distances, and thus TIP does not require an analyst’s intervention for estimating hyperparameters. A Gibbs sampling algorithm is provided, and TIP is used in conjunction with a Normal-Inverse-Wishart likelihood to cluster 801 gene expression samples, each of which belongs to one of five different types of cancer. Full article
(This article belongs to the Special Issue Statistical Methods for Genetic Epidemiology)
Show Figures

Figure 1

Review

Jump to: Research

32 pages, 810 KiB  
Review
Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions
by Bingqing Han, Chongjiao Ren, Wenda Wang, Jiashan Li and Xinqi Gong
Genes 2023, 14(2), 432; https://doi.org/10.3390/genes14020432 - 08 Feb 2023
Cited by 2 | Viewed by 2932
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big [...] Read more.
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives. Full article
(This article belongs to the Special Issue Statistical Methods for Genetic Epidemiology)
Show Figures

Figure 1

Back to TopTop