Next Article in Journal
Eradication of PRRS from Hungarian Pig Herds between 2014 and 2022
Previous Article in Journal
Mass Balance Studies of Robenidine Hydrochloride in the Body of Channel Catfish (Ictalurus punctatus)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Preselecting Variants from Large-Scale Genome-Wide Association Study Meta-Analyses Increases the Genomic Prediction Accuracy of Growth and Carcass Traits in Large White Pigs

1
National Engineering Research Centre for Swine Breeding Industry, Provincial Key Laboratory of Agricultural Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510640, China
2
Guangxi State Farms Yongxin Animal Husbandry Group Co., Ltd., Nanning 530022, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Animals 2023, 13(24), 3746; https://doi.org/10.3390/ani13243746
Submission received: 11 October 2023 / Revised: 30 November 2023 / Accepted: 30 November 2023 / Published: 5 December 2023
(This article belongs to the Section Animal Genetics and Genomics)

Abstract

:

Simple Summary

Preselected potential causal variants from genome-wide association studies (GWASs) for genomic prediction is an appealing strategy for improving the accuracy of genomic prediction. Here, we performed genomic prediction using preselected variants identified with a large GWAS meta-analysis on three growth and carcass traits in Large White pigs. Compared to using the SNP chip data, the accuracies of genomic predictions based on different SNP preselection strategies were on average improved by 0.3 to 2% using the genomic best linear unbiased prediction given genetic architecture (BLUP|GA) model. Our results provide new opportunities for future applications of genomic prediction in livestock and poultry by using large amounts of genotype data.

Abstract

Preselected variants associated with the trait of interest from genome-wide association studies (GWASs) are available to improve genomic prediction in pigs. The objectives of this study were to use preselected variants from a large GWAS meta-analysis to assess the impact of single-nucleotide polymorphism (SNP) preselection strategies on genome prediction of growth and carcass traits in pigs. We genotyped 1018 Large White pigs using medium (50k) SNP arrays and then imputed SNPs to sequence level by utilizing a reference panel of 1602 whole-genome sequencing samples. We tested the effects of different proportions of selected top SNPs across different SNP preselection strategies on genomic prediction. Finally, we compared the prediction accuracies by employing genomic best linear unbiased prediction (GBLUP), genomic feature BLUP and three weighted GBLUP models. SNP preselection strategies showed an average improvement in accuracy ranging from 0.3 to 2% in comparison to the SNP chip data. The accuracy of genomic prediction exhibited a pattern of initial increase followed by decrease, or continuous decrease across various SNP preselection strategies, as the proportion of selected top SNPs increased. The highest level of prediction accuracy was observed when utilizing 1 or 5% of top SNPs. Compared with the GBLUP model, the utilization of estimated marker effects from a GWAS meta-analysis as SNP weights in the BLUP|GA model improved the accuracy of genomic prediction in different SNP preselection strategies. The new SNP preselection strategies gained from this study bring opportunities for genomic prediction in limited-size populations in pigs.

1. Introduction

Pork is one of the most important sources of animal protein for humans [1]. At present, growth and carcass traits such as average daily gain at 100 kg (ADG100), average back fat thickness at 100 kg (BFT100), and days to 100 kg (DAYS100) are recognized as the most important breeding objectives in pig breeding programs due to their considerable influence on pork production and quality [2,3,4]. The genetic improvement of these traits represents a compelling avenue for augmenting pork production and quality. With advances in genotyping technologies, genomic selection (GS) or genomic prediction, proposed by Meuwissen et al. [5], has been introduced and applied to address the limitations of traditional breeding methods in livestock populations, particularly in terms of improving prediction accuracy of traits and decreasing generation intervals [6].
Presently, GS has been widely applied in modern pig breeding. Since its early implementations, genomic prediction has typically been performed by using SNP chip data that capture the effects of the causal variants (often unknown) through LD, which can modestly increase the accuracy of genomic predictions for economic traits of pigs [7,8,9]. In contrast, genomic prediction using whole-genome-sequence (WGS) data is expected to enhance the accuracy of genomic predictions. This is achieved by incorporating causal variants or SNPs in strong LD with causal variants that influence the target traits. Simulation studies have substantiated that the utilization of WGS data, as opposed to SNP chip data, engenders a noteworthy enhancement in predictive accuracy. Specifically, the improvements observed are in the range of 2.5 to 5.9% within populations [10,11] and a considerably more substantial range of 22 to 31% for multi-breed populations [11]. Another simulation study showed that the accuracy of genomic prediction increased by up to 30% when WGS data captured causal variants with a low minor-allele frequency [12]. However, in practical applications, the use of WGS data for genomic prediction in pig populations has not shown substantial improvements in accuracy compared to traditional SNP chip data, particularly when analyzed within specific breeds [13,14]. Other studies have found a slight increase in prediction accuracy (e.g., from 0.3 to 3.5%) for some economic traits within or across pig populations, depending on the different prediction methods used [15,16]. Nevertheless, the suboptimal outcomes observed in genomic prediction when employing WGS data can be ascribed to the presence of a considerable surplus of redundant SNPs. These neighboring SNPs are probably in strong LD with causative variants or with other SNPs located in specific genomic blocks [17,18].
One of the most successful strategies to improve the accuracy of genomic predictions is the utilization of preselected variants associated with the trait of interest from WGS data. In recent years, preselected potential causal variants identified through genome-wide association studies (GWASs) have already been applied in genomic prediction to improve the prediction accuracy for a variety of traits in livestock and poultry [19,20], including pigs [21,22]. In certain instances, this strategy has resulted in improved accuracy of genomic prediction for growth and carcass traits in pigs, with improvements ranging from 0.9 to 46% for multi-breed populations [21,22,23]. However, it should be noted that this strategy did not result in improved prediction accuracy in all cases [21,24,25]. Fine mapping of causal variants was still challenging, and the advantages for genomic predictions were limited.
A simulation study indicated that the detection of causal variants explained only 20% of the total genetic variance when using a sample size of 7000 for a GWAS in an effective population size of 20 [23]. Recently, Ros-Freixedes et al. [21] used preselected SNPs based on GWAS results for genomic prediction, studying a population of nearly 100,000 pigs from the largest lines with imputed WGS data. Compared to the use of SNP chips, they found marginal or no gains (from −3.4 to 9.8%) in prediction accuracy for eight common complex traits in pigs. In these data, the preselected SNP panels consisted of standard chip data augmented with causal variants from GWAS results or the top 40,000 variants with the lowest p-values from GWAS results. Subsequently, Jang et al. [26] also indicated marginal or no gains (from 0.1 to 4.9%) in prediction accuracy for eight common complex traits in pigs when using single-step genomic best linear unbiased prediction (ssGBLUP) with BayesR SNP variances as weights on the preselected SNP panels described above. However, these preselected SNPs may contain a number of redundant SNP or may have a small effect on the trait of interest, which could restrict the available information for genomic prediction.
Several studies have demonstrated the importance of using estimated marker effects from existing GWAS results, either within the dataset or from public sources. These marker effects can be used to construct weighted genomic relationship matrices (G) in the prediction model, resulting in improved accuracy of genomic predictions compared to traditional methods such as genomic best linear unbiased prediction (GBLUP), ridge regression BLUP (rr-BLUP), and BayesB [27,28]. However, another two studies reported that using estimated marker effects obtained from GWAS results as weighting factors to construct a weighted G matrix for genomic prediction resulted in lower accuracy compared to the Bayesian model [28,29]. A previous study reported that using the top SNPs selected according to their estimated marker effects obtained from the rr-BLUP model for genomic prediction results in superior accuracy of genomic prediction compared to WGS data [30]. This is mainly due to capturing the major genes that affect the trait, thereby improving the prediction ability of the G matrix. Therefore, using the top SNPs selected according to their estimated marker effects obtained from the rr-BLUP model for genomic prediction is expected to improve the accuracy of genomic prediction for growth and carcass traits in pigs.
To investigate the effect of leveraging preselected SNPs on genomic prediction, we used preselected variants associated with growth and carcass traits based on a large-scale GWAS meta-analysis containing over 40,000 pigs from nine populations of Large White pigs (http://pigbiobank.farmgtex.org, accessed on 1 April 2023). The objectives of this study were to use preselected variants from a large-scale GWAS meta-analysis: (1) to assess the impact of different SNP preselection strategies for genome prediction compared to using all variants from the SNP chip data; (2) to evaluate the impact of different proportions of top selected SNPs for genome prediction within different SNP preselection strategies; and (3) to identify the most efficient prediction models for a variety of traits within different SNP preselection strategies.

2. Materials and Methods

2.1. Animals and Phenotypes

In this study, a total of 1026 Large White pigs (194 boars and 832 sows) were used, originating from the breeding nucleus farm in Guangxi (Guangxi, Nanning, China), spanning from July 2017 to June 2021. The average age of the pigs was 186 days. In addition, all pigs were raised with the same diet and under identical conditions, and they were fed according to large-scale breeding conditions. Phenotypic observations included three growth and carcass traits: ADG100, BFT100, and DAYS100. The descriptive statistics for the growth and carcass traits analyzed in this work are listed in Table 1, along with the frequency distribution of the phenotypic data in Figure S1. The ADG100 trait was defined as the animal’s body weight measured at the off test divided by the animal’s age. The DAYS100 trait was collected when the pigs were taken off test and then adjusted to days at 100 kg body weight. The BFT100 trait was measured between the 10th and 11th rib of Large White pigs using an Aloka-SSD-500 B ultrasound device (Corometrics Medical Systems, USA) at the off test and then adjusted to the average back fat thickness at 100 kg body weight. We calculated the correction of ADG100, DAYS100, and BFT100 according to the formula described by the Swine Genetic Prediction Project (http://www.breeding.cn, accessed on 12 December 2022) [31]. Phenotypes were then corrected for removing the identified fixed effects by using the lm () function in base R. The formula is as follows:
Y i j k m n = μ + Y e a r i + S e a s o n j + S e x k + F a r m m + e i j k m n  
where Y i j k m n represents the measured trait value of the m th individual; μ is the general mean; Y e a r i represents the fixed effect of the year of the start of testing (i = 2017–2021); S e a s o n j represents the fixed effect of the season of the start of testing (j = 1~2; 1: May–October, and 2: January–April and November–December); S e x k represents the fixed effect of sex (k =1~2; 1: boar, and 2: sow); F a r m m represents the fixed effect of farm at the start of testing (m = 11); e i j k m n represents the random residual effect.

2.2. Genotyping and Imputation

In our study, 1018 Large White pigs were genotyped by using the Geneseek GGP50k (Neogen Corporation, Lansing, MI, USA), comprising 45,073 SNPs. We filtered out variants with an SNP call rate < 0.05, MAF < 0.05, and the Hardy–Weinberg equilibrium < 1 × 10−5. Finally, a total of 38,678 SNPs in the study population were kept for further study. We then imputed the filtered SNPs on autosomes to sequence level with Beagle (version 5.1) [32] using 1602 WGS samples from the PigGTEx PGRP (http://pigbiobank.farmgtex.org, accessed on 1 April 2023) as a reference panel. To evaluate the accuracy of genotype imputation from the SNP chip, we used a 20-fold cross-validation method that was orderly masked for 5% of the SNP genotypes of all animals, and all the SNP chips were then imputed using Beagle (version 5.1). Finally, the imputation accuracy was evaluated as 0.952 based on Pearson’s correlation between the imputed and true genotypes. After imputation, we filtered out variants with an SNP call rate < 0.05, minor-allele frequency < 0.05, the Hardy–Weinberg equilibrium < 1 × 10−5, and LD > 0.8 (50 kb) by using PLINK software (version 1.9) [33]. Finally, a total of 863504 SNPs in the study population were retained for further study.

2.3. Variant Selection Based on Genome-Wide Association Meta-Analysis

To investigate the impact of preselected variants on genomic prediction, we obtained the summary statistics of large GWAS meta-analyses from the PigBiobank, which is part of the Pig Genotype-Tissue Expression (PigGTEx) project. The summary statistics were downloaded from the website http://pigbiobank.farmgtex.org (accessed on 1 April 2023). These datasets included samples from nine populations, totaling 28,967 (ADG100), 45,863 (BFT100), and 37,618 (DAYS100) samples. For each phenotype, we finally obtained the GWAS meta-analysis information of 16,860,419~27,288,578 SNPs.
To test whether preselected variants from the GWAS summary statistics data could provide better prediction accuracy than all variants from the SNP chip data or WGS data, we conducted genomic prediction using all variants from the SNP chip data and the WGS data. The SNP chip data (also defined as ‘Chip’) were adopted as the benchmark set for prediction accuracy. We assessed different SNP preselection strategies for preselected variants based on GWAS meta-analysis results:
(1) Selected top SNPs based on p-value ranking: We first ranked the SNPs based on the p-value of the GWAS meta-analysis results, then matched SNPs from the SNP chip data or WGS data in the GS study to the GWAS meta-analysis results according to the SNP’s physical location. We therefore ranked the SNPs from the GS population and defined them as P_top_Chip and P_top_WGS strategies, respectively. For the P_top_Chip strategy, we preselected the top 1%, 5%, 10%, 20%, 40%, 60%, 80%, and 100% variants from the SNP chip data with the lowest p-value based on the GWAS meta-analysis summary statistics in the GS population. For the P_top_WGS strategy, we preselected the top 0.1%, 0.3%, 0.5%, 1%, 5%, 10%, 20%, 40%, 60%, 80%, and 100% variants from the WGS data with the lowest p-value based on the summary statistics of GWAS meta-analysis in the GS population. The number of selected top SNPs from the SNP chip data and the WGS data were presented in Table 2. In addition, we preselected the most significant SNPs (p < 0.01) from the GWAS results (i.e., 5% of the SNPs) to compare the prediction accuracy of all markers from the SNP chip data and other alternative strategies, as already proposed by de los Campos et al. [34].
(2) Selected top SNPs based on estimated marker effects of the trait: Initially, we conducted the rr-BLUP analysis using LDAK v5.0 [35] to estimate marker effects using either the SNP chip data or WGS data from the GS population. Subsequently, we matched SNPs from the SNP chip data or WGS data in the GS study to the GWAS meta-analysis results according to the SNP’s physical location. Then, we obtained the corresponding effect for these SNPs based on the rr-BLUP analysis results from the SNP chip data or WGS data in the GS population. Finally, we ranked the SNPs based on the absolute value of these estimated marker effects obtained from the SNP chip data or WGS data in the GS population and defined them as G_top_Chip and G_top_WGS strategies, respectively. For the G_top_Chip strategy, we preselected the top 1%, 5%, 10%, 20%, 40%, 60%, 80%, and 100% variants from the SNP chip data with the highest estimated effect size in the GS population. For the G_top_WGS strategy, we preselected the top 0.1%, 0.3%, 0.5%, 1%, 5%, 10%, 20%, 40%, 60%, 80%, and 100% variants from the WGS data with the highest estimated effect size in the GS population.

2.4. Genomic Prediction Models

In this study, we implemented five models to perform genomic prediction.

2.4.1. GBLUP

We used GBLUP as the benchmark model to evaluate other models presented in this study. The model includes a single random genetic effect, and is as follows:
Y = μ + Z g + e
where Y is a vector of corrected phenotypes for ADG100, BFT100, or DAYS100; μ is the general mean; Z is a matrix of variant genotypes; g ~ N ( 0 , G σ g 2 ) represents the additive genetic effects, where G and σ g 2 denote the genomic relationship matrix [36] and additive genetic variance, respectively; e ~ N ( 0 , I σ e 2 ) represents the random residuals, where I and σ e 2 denote the identity matrix and residual variance, respectively. The G matrix was constructed using all SNPs according to:
G = Z Z 2 i = 1 m p i 1 p i
where Z is a matrix of centered genotypes, with elements 0 2 p i , 1 2 p i , 1 2 p i denoting 11 homozygous, 12 or 21 heterozygous, and 22 homozygous, respectively; and p i is the minor-allele frequency of the ith SNP.

2.4.2. GFBLUP

The genomic feature BLUP (GFBLUP) model is an extension of the linear mixed model used in general GBLUP, which includes two components [37]. The prediction model is as follows:
Y = μ + Z f + Z g + e
where f ~ N ( 0 , G f σ f 2 ) represents additive genetic effects captured by a proportion of top SNPs selected, where G f and σ f 2 denote the genomic relationship matrix and additive genetic variance captured by the top SNPs, respectively; g ~ N ( 0 , G σ g 2 ) represents additive genetic effects captured by all genetic variants, where G and σ g 2 denote the genomic relationship matrix [36] and additive genetic variance, respectively.

2.4.3. TABLUP

The trait-specific relationship matrix BLUP (TABLUP) model is an improvement of the RRBLUP and GBLUP models; it emphasizes trait-specific SNPs to explain more of the genetic variance in the trait [30]. The statistical model can be expressed as follows:
Y = μ + Z S a + e
where S a ~ N ( 0 , S a σ s a 2 ) represents additive genetic effects captured by all SNPs, and σ s a 2 is the additive genetic variance captured by all SNPs. The S a matrix is defined as:
S a = M D M 2 i = 1 m p i 1 p i
where M represents the genotype matrix of all SNPs; D is a diagonal matrix, and its diagonal elements diag (D) contain the weights as the estimated marker effects obtained from the GWAS meta-analysis results for the selected SNPs in M . Of the remaining SNPs, we rescaled diag (D) so that the mean of diag (D) was 1 to facilitate the interpretation of the model results.

2.4.4. BLUP|GA

The BLUP|GA (BLUP approach conditional on the genetic architecture) model is an improvement of the TABLUP model [30]. This is achieved by constructing a trait-specific covariance matrix (T), which is a weighted sum of the S f matrix and the G matrix, and was calculated as follows:
T = ω S f + 1 ω G
where ω is the overall weight for the S f matrix, which was set to 0.1, 0.3, 0.5, 0.7, and 0.9; the S f and G matrices are genomic relationship matrixes, which are calculated by the selected SNPs and all SNPs, respectively. The S f matrix is defined as:
S f = M D M 2 i = 1 m p i 1 p i
where M represents the genotype matrix of selected top SNPs; D is a diagonal matrix, and its diagonal elements, diag (D), contain the weights as the estimated marker effect obtained from the GWAS meta-analysis results for the selected SNPs in M . To ensure the analogy between S f and G, we rescaled diag (D) so that the mean of diag (D) was 1 to facilitate interpretation of the model results.

2.4.5. GTBLUP

The genomic trait-specific BLUP (GTBLUP) model utilizes estimated marker effects of selected SNPs to construct the weighted G, including two components similar to the GFBLUP model. The statistical model for GTBLUP can be expressed as:
Y = μ + Z 1 g + Z 2 s f + e
where all variables are defined as described above.

2.4.6. Evaluation of the Accuracy of GEBV

In this study, we estimated the accuracy of genomic prediction using the R-regress [38]. We then used a 5×tenfold cross-validation with replicate 50 times, resulting in 50 averaged accuracies of genomic prediction. The accuracy of genomic prediction was calculated as Pearson’s correlation coefficient of the corrected phenotype and GEBV, and was as follows:
r a c c u r a c y = C o v X , Y v a r X v a r Y
where X and Y represent the corrected phenotype and GEBV of all individuals, respectively; C o v X , Y represents the covariance of X and Y; v a r X represents the variance of X; and v a r Y represents the variance of Y.

3. Results

3.1. The Impact of Alternative Strategy for Preselecting Variants on Genome Prediction

To investigate the impact of preselected variants on genome prediction, we implemented four strategies to preselect variants from a large GWAS meta-analysis. We preselected the most significant SNPs (p < 0.01) from a GWAS meta-analysis (i.e., 5% of the SNPs) and used the BLUP|GA model with the best weight parameter ( ω = 0.1) for genome prediction across different SNP preselection strategies (Table S1, Figures S2 and S3). Figure 1 and Table S2 show the prediction accuracy for the three traits using preselecting variants from the four different SNP preselection strategies and all variants from the SNP chip data and WGS data. Across all three traits, when compared to using all markers from the SNP chip data, we observed that the prediction accuracy increased by 0.3 to 0.7% for P_top5%_Chip, by 0.3 to 1.4% for P_top5%_WGS, by 0.8 to 1.1% for G_top5%_Chip, and by 0.9 to 2% for G_top5%_WGS. We also found that all markers from the WGS data increased by 0.2 to 0.8% or even did not change compared with all markers of the SNP chip data, indicating that further increasing the marker density does not improve prediction accuracy. In addition, we found that the prediction accuracy was slightly higher for preselecting variants from P_top5%_WGS, G_top5%_Chip, and G_top5%_WGS scenarios compared to using all markers of the WGS data, with values ranging from 0.3 to 1.2%.

3.2. The Impact of Different Proportion of Selected Top SNPs on Genome Prediction

To assess the impact of different proportions of selected top SNPs on genome prediction, we categorized them into eight or eleven SNP categories. These categories were determined based on either the p-values obtained from a large GWAS meta-analysis or the estimated marker effects derived from an rr-BLUP analysis conducted in this study population. We used the BLUP|GA model with the best weight parameter ( ω = 0.1) for genome prediction across different SNP preselection strategies (Table S1, Figures S2 and S3). Figure 2 and Table S3 show the prediction accuracy of three traits based on using different proportions of selected top SNPs for four SNP preselection strategies. We found a higher prediction accuracy using selected top SNPs for four strategies in all traits compared with all markers from the chip data, except for ADG100 using 0.1% of the top SNPs for P_top_WGS and 1% of the top SNPs for P_top_Chip. For the P_top_WGS and G_top_WGS strategies, we found that the prediction accuracy in three traits showed a trend of rising and then declining when the proportion of selected top SNPs increased. When using 0.1 to 0.5% of the top SNPs (containing from 854 to 4290 significant variants) for genome prediction, we observed a gradual improvement in prediction accuracy with an increasing number of variants impacting the trait. The highest prediction accuracy was observed when using 1% or 5% of the top SNPs, followed by a gradual decline when the proportion of selected top SNPs increased from 10 to 100%. This decrease in prediction accuracy might be attributed to the increase in noise caused by the inclusion of non-associated variants. For the G_top_Chip strategy, we found that prediction accuracy showed a trend of decline when using 1 to 100% of the top SNPs from the SNP chip data, with the highest prediction accuracy using 1% or 5% of the top SNPs. For the P_top_Chip strategy, we found that prediction accuracy showed a trend of decline (ADG100 and BFT100) or no change (DAYS100).

3.3. The Impact of Different Models of Selected Top SNPs on Genome Prediction

We compared the prediction performance of different prediction models, including the GBLUP, GFBLUP, BLUP|GA, GTBLUP, and TABLUP models. Using the prediction accuracy of GBLUP as the reference, the influences of different prediction models varied under different strategies for three traits based on different proportions of selected top SNPs (Table S4, Figures S4 and S5). For the P_top_Chip and P_top_WGS strategies, we found a trend that the prediction accuracy of the different prediction models increased slightly with the proportion of selected top SNPs. For the G_top_Chip and G_top_WGS strategies, we observed that the prediction accuracy of different prediction models was not robust and varied across traits and proportions of selected top SNPs. In many cases, this resulted in no improvement in prediction accuracy or even a decrease compared to using all markers from the SNP chip data.
To determine the statistical significance of different weighted models, we performed multiple paired t-tests based on the prediction accuracy of different strategies (Figure 3). For the P_top_Chip strategy, all the prediction models had very limited advantages over GBLUP (Figure 3). The prediction accuracy of the BLUP|GA and TABLUP models for ADG100 was slightly higher (0.15 to 0.28%) than GBLUP, but the difference did not reach significance. The prediction accuracy of the TABLUP model for ADG100 and DAYS100 was significantly higher (0.4 to 0.5%) than other prediction models. For the G_top_Chip strategy, the prediction accuracy of the BLUP|GA model for ADG100 was slightly higher (0.6%) than GBLUP, but the difference did not reach significance (Figure 3). The prediction accuracy of the BLUP|GA model for ADG100 and DAYS100 was significantly higher (0.7 to 0.9%) than the TABLUP model. In the context of the P_top_WGS strategy, the prediction accuracy of the BLUP|GA, GFBLUP, and GTBLUP models for BFT100 displayed significant advantages (0.4 to 0.8%) over the GBLUP model, as illustrated in Figure 3. The prediction accuracy of the TABLUP model for ADG100 and DAYS100 was significantly higher (0.1%) than GBLUP. In addition, the prediction accuracy of BLUP|GA for ADG100 and DAYS100 was significantly higher (0.2 to 0.4%) than the GFBLUP and GTBLUP models. For the G_top_WGS strategy, the prediction accuracy of the BLUP|GA model for three traits was significantly higher (0.1 to 1.0%) than the GBLUP and TABLUP models (Figure 3).

4. Discussion

The use of preselected variants based on the genetic architecture of complex traits as a strategy for genomic prediction offers an attractive approach to enhance predictive accuracy [11]. In pursuit of this objective, we compared the accuracy of genomic prediction based on the SNP chip data with that based on sets of preselected variants from a large GWAS meta-analysis. Our results indicated that the use of preselected variants from a large GWAS meta-analysis has shown small and non-robust improvements in the accuracy of genomic prediction compared to using all variants from the SNP chip data in pigs. A previous study [22] based on preselected variants by p-value ranking from GWAS results demonstrated an improvement in the accuracy of genomic prediction for BFT, ADG, and loin muscle area across different pig breeds. The improvement ranged from 4 to 13.2% compared to the SNP chip data, which was slightly higher than our results. Our observations might be attributed to several underlying factors. Firstly, the advantage of using preselected variants from a large GWAS meta-analysis may also be limited by the effective population size of the current selection population and the sizes of training sets. In a previous simulation study, it was discovered that the accuracy of genomic prediction was obviously lower in populations with a small effective population size across different population sizes [39]. Meanwhile, a previous study has shown that various strategies for preselected variants from WGS data based on GWAS results can improve the accuracy of genomic prediction, especially with larger training sets, when compared to using SNP chip data [21]. Secondly, the GWAS meta-analysis summary results are influenced by LD, which could introduce noise and limit the fine mapping of associated regions when using WGS data [21]. For example, in cases where a single-population GWAS fails to identify the causal variant, a meta-analysis may uncover another variant that exhibits greater significance than the causal variant itself [40]. This could be attributed to the presence of a stronger LD between these two variants within the same region. Finally, GWAS meta-analysis with WGS data can be affected by false positives in a more severe way than GWAS meta-analysis with SNP chip data, especially for highly polygenic traits [21]. In our study, the GWAS meta-analysis summary results were obtained using imputed WGS data, which may have introduced false positives due to limitations in sample size and the number of minor alleles contributing to an SNP across participating studies [41]. In addition, according to Zhang et al. [30], the use of preselected variants based on their size of effect estimated from an rr-BLUP model for genomic prediction improved the accuracy of genomic prediction (increased by 1.9 to 40%) compared to using all variants from WGS data. This strategy provides several advantages as it captures the genetic architecture of complex traits through the inclusion of SNPs with unknown effects, particularly in scenarios involving small data sets and/or traits with low heritability.
The predictive ability of preselected variants also depends on the proportion of selected top SNPs from the GWAS meta-analysis summary results. For this purpose, we compared the prediction accuracy using different proportions of selected top SNPs across different SNP preselection strategies. Among these strategies, namely the P_top_WGS and G_top_WGS strategies across all traits, a discernible pattern emerged, characterized by an initial increase in prediction accuracy followed by a subsequent decline. Notably, this trend was most pronounced when using the top 1% or 5% of selected SNPs, aligning with findings by Zhang et al. [30]. This is due to the occurrence of the inclusion of a larger number of trait-related SNPs in the model. However, as the number of available SNPs increased, the number of SNPs that were not correlated with the trait also increased. This, in turn, elevated the noise level for non-associated markers and caused a decline in prediction accuracy, approaching that observed when using all markers from the SNP chip data. In addition, Akbarzadeh et al. [42] considered that the number of SNP markers included in the prediction model is an essential factor for the prediction model.
Genomic prediction models need to be improved to leverage the potential of selected top SNPs from GWAS meta-analysis summary results. In this study, we compared the accuracy of genomic prediction of five models—GBLUP, GFBLUP, TABLUP, GA|BLUP, and GTBLUP models—among different SNP preselection strategies. The results showed that the superiority of BLUP|GA, which used a GBLUP with a weighted G matrix, was the largest when using a weighting factor of estimated marker effects from GWAS meta-analysis. For the P_top_Chip and P_top_GWAS strategies, the BLUP|GA model not only captured the large number of significant SNPs associated with traits from large-scale GWAS meta-analysis results but also efficiently utilized the estimated marker effect for constructing weighted G matrices. Many previous studies have reported that the BLUP|GA model leads to better genomic predictions than GBLUP, where weighting factors were obtained with estimated marker effects from either the GS model or GWAS model in the studied population [30,43]. A previous study demonstrated that a G matrix weighted with the estimated marker effect from the GWAS model may mitigate the adverse effects of imprecise estimated marker effects [29]. In GWAS results, it is common to observe an overestimation of QTL effects in the total estimated SNP effects [29]. This occurs due to the presence of several linked markers, all estimating the effect of the same QTL. Additionally, the total SNP variances often tend to be significantly larger than the total additive genetic variance. However, the issue of overestimation is no longer a problem when the estimated marker effect from a GWAS model is used as a weighting factor [28]. In addition, Ren et al. [28] reported that the estimated marker effect for constructing weighted G matrices from the GWAS method can significantly increase prediction accuracy compared with general GBLUP. Conversely, Su et al. [29] reported that the G matrix weighted with the estimated marker effect using the GWAS model resulted in lower accuracies than the general G matrix. Alternatively, the increased prediction accuracy in our study may be attributed to the accurate marker effect estimates obtained through the GWAS meta-analysis method.
For the G_top_Chip and G_top_WGS strategies, the BLUP|GA model captured the genetic architecture provided by the SNPs of larger effect from the rr-BLUP model. It also efficiently utilized the estimated marker effect from large-scale GWAS meta-analysis summary results to construct weighted G matrices. The findings indicated that, among the G_top_Chip and G_top_WGS strategies, the BLUP|GA model consistently exhibited the highest degree of robustness in terms of genomic prediction accuracy across all considered traits. This result is consistent with the results of similar studies by Zhang et al. [30], where using the genetic architecture provided by SNPs from the rr-BLUP model for genomic prediction resulted in a significant enhancement of the accuracy of predictions compared to GBLUP. This may occur because the genetic architecture part (S matrix) could effectively improve the prediction ability of the G matrix by capturing major genes affecting traits. In addition, the increase in accuracy is also possibly due to the increased similarity between the T matrix and the genetic relationship matrix at unobserved causal loci [30,44].
In our study, the superiority of the TABLUP model for ADG100 and DAYS100 outperformed other predictive models in the G_top_Chip and G_top_WGS strategies. For the G_top_Chip and G_top_WGS strategies, compared with the BLUP|GA model, the TABLUP model only utilized preselected SNPs based on estimated marker effects of rr-BLUP analysis for constructing S a matrices, as well as the estimated marker effect from the large-scale GWAS meta-analysis summary results for weighting factors. The superiority of the TABLUP model over the GBLUP and rrBLUP models has been investigated previously and is nearly equally to BayesB in terms of accuracy of genomic prediction [30]. In these models, the TABLUP model places greater emphasis on variants that explain a larger proportion of the genetic variance in the trait. In addition, the increased predictive accuracy of the TABLUP model may also be attributed to the fact that ADG100 and DAYS100 have complex genetic backgrounds and may be influenced by loci with both large and moderate effects simultaneously. However, the GTBLUP and GFBLUP models performed slightly lower compared to other models in our study, where the GTBLUP model has an advantage over the GFBLUP model in terms of prediction accuracy. This is probably due to the use of different genomic relationship matrices in the models. In contrast to the GFBLUP model, the GTBLUP model utilized the estimated marker effect from the large-scale GWAS meta-analysis summary results for weighting factors in the S f matrix. This is further evidence that utilizing the information provided by large-scale GWAS meta-analysis can increase prediction accuracy.

5. Conclusions

Our results show that the use of preselected variants based on a large-scale GWAS meta-analysis has potential to improve the accuracy of genomic prediction compared to all variants from the SNP chip data in pigs. However, the prediction accuracy using a given set of preselected variants exhibited a marginal improvement compared to all variants from the SNP chip data in pigs. By evaluating the effect of different proportion of selected top SNPs on genome prediction, we found that the accuracy of genomic prediction exhibited a trend of either initially increasing and then decreasing or simply decreasing across different SNP preselection strategies when the proportion of selected top SNPs increased. The BLUP|GA model, which incorporates SNP weights derived from posterior estimated marker effects obtained through a comprehensive GWAS meta-analysis, the potential to enhance predictive accuracy relative to the GBLUP model. The results of this study provide a strategy to maximize the advantage of using preselected variants based on large-scale GWAS meta-analysis, providing valuable insights for future applications of genomic prediction using extensive genotype data in livestock and poultry.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ani13243746/s1, Figure S1: the frequency distribution of the data of growth and carcass traits; Figure S2: the influences of the BLUP|GA model with weight parameter for genomic prediction using different proportion of selected top SNPs from the SNP chip data; Figure S3: the influences of the BLUP|GA model with weight parameter for genomic prediction using the proportion of selected top SNPs from the WGS data; Figure S4: the influences of different prediction models for genomic prediction using the proportion of selected top SNPs from the SNP chip data; Figure S5: the influences of different prediction models for genomic prediction using the proportion of selected top SNPs from the WGS data; Table S1: accuracy of the BLUP|GA model with weight parameter based on different SNP preselection strategies for growth and carcass traits in Large White pigs; Table S2: accuracy of using preselecting variants based on different SNP preselection strategies for growth and carcass traits in Large White pigs; Table S3: accuracy of the proportion of selected top SNPs based on different SNP preselection strategies for growth and carcass traits in Large White pigs; Table S4: accuracy of different prediction models based on different SNP preselection strategies for growth and carcass traits in Large White pigs.

Author Contributions

Conceptualization, Z.Z.; methodology, C.C., D.R., X.C., S.S., X.W. and J.S.; software, C.C.; validation, C.W., C.C. and Z.Z.; formal analysis, C.W. and C.C.; investigation, W.Z. and T.Z.; resources, X.W. and J.S.; data curation, C.C.; writing—original draft preparation, C.W., S.S., X.Y. and J.L; writing—review and editing, Z.Z., C.W., W.Z., D.R., X.C., T.Z., X.W. and J.S.; visualization, C.W., X.W., J.S., X.Y. and J.L.; supervision, X.Y. and J.L.; project administration, Z.Z.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 32022078, the Guangxi Science and Technology Program Project, grant number GuikeJB23023003, and supported by the earmarked fund for China Agriculture Research System, grant number CARS-35. We greatly thank the farm staff for data collection and the National Supercomputer Center in Guangzhou for its computing support.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Pig Genotype-Tissue Expression (PigGTEx) raw whole-genome sequence data and the GWAS meta-analyses summary statistics are available at http://piggtex.farmgtex.org/ (accessed on 1 April 2023) and http://pigbiobank.farmgtex.org (accessed on 1 April 2023), respectively. All other relevant data are available in the manuscript, supporting information files, and from the corresponding author upon request.

Acknowledgments

We would like to thank Guangxi State Farms Yongxin Animal Husbandry Group Co., Ltd. (Guangxi, China) for providing all phenotypic data and genotype data.

Conflicts of Interest

Guangxi State Farms Yongxin Animal Husbandry Group Co., Ltd., who was involved in the provision of data for the study, did not interfere with co-authors’ access to all of the study’s data, analysing and interpreting the data, preparing and publishing manuscripts independently. Xibo Wu and Jinglei Si were from the Guangxi State Farms Yongxin Animal Husbandry Group Co., Ltd. All authors declare no conflicts of interest.

References

  1. Liu, S.Q.; Du, M.; Tu, Y.; You, W.J.; Chen, W.T.; Liu, G.L.; Li, J.Y.; Wang, Y.Z.; Lu, Z.Q.; Wang, T.H.; et al. Fermented mixed feed alters growth performance, carcass traits, meat quality and muscle fatty acid and amino acid profiles in finishing pigs. Anim. Nutr. 2023, 12, 87–95. [Google Scholar] [CrossRef] [PubMed]
  2. Ruan, D.; Zhuang, Z.; Ding, R.; Qiu, Y.; Zhou, S.; Wu, J.; Xu, C.; Hong, L.; Huang, S.; Zheng, E.; et al. Weighted single-step GWAS identified candidate genes associated with growth traits in a Duroc pig population. Genes 2021, 12, 117. [Google Scholar] [CrossRef] [PubMed]
  3. Jibrila, I.; Vandenplas, J.; ten Napel, J.; Bergsma, R.; Veerkamp, R.F.; Calus, M.P.L. Impact of genomic preselection on subsequent genetic evaluations with ssGBLUP using real data from pigs. Genet. Sel. Evol. 2022, 54, 48. [Google Scholar] [CrossRef] [PubMed]
  4. Abdollahi-Arpanahi, R.; Lourenco, D.; Misztal, I. A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP. Genet. Sel. Evol. 2022, 54, 34. [Google Scholar] [CrossRef]
  5. Meuwissen, T.H.; Hayes, B.J.; Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef] [PubMed]
  6. VanRaden, P.M.; Van Tassell, C.P.; Wiggans, G.R.; Sonstegard, T.S.; Schnabel, R.D.; Taylor, J.F.; Schenkel, F.S. Invited review: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 2009, 92, 16–24. [Google Scholar] [CrossRef]
  7. Lillehammer, M.; Meuwissen, T.H.E.; Sonesson, A.K. Genomic selection for maternal traits in pigs. J. Anim. Sci. 2011, 89, 3908–3916. [Google Scholar] [CrossRef]
  8. Cleveland, M.A.; Hickey, J.M.; Forni, S. A Common Dataset for Genomic Analysis of Livestock Populations. G3 2012, 2, 429–435. [Google Scholar] [CrossRef]
  9. Sevillano, C.A.; ten Napel, J.; Guimaraes, S.E.F.; Silva, F.F.; Calus, M.P.L. Effects of alleles in crossbred pigs estimated for genomic prediction depend on their breed-of-origin. BMC Genom. 2018, 19, 740. [Google Scholar] [CrossRef]
  10. Meuwissen, T.; Goddard, M. Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics 2010, 185, 623–631. [Google Scholar] [CrossRef]
  11. Iheshiulor, O.O.M.; Woolliams, J.A.; Yu, X.J.; Wellmann, R.; Meuwissen, T.H.E. Within- and across-breed genomic prediction using whole-genome sequence and single nucleotide polymorphism panels. Genet. Sel. Evol. 2016, 48, 15. [Google Scholar] [CrossRef]
  12. Druet, T.; Macleod, I.M.; Hayes, B.J. Toward genomic prediction from whole-genome sequence data: Impact of sequencing design on genotype imputation and accuracy of predictions. Heredity 2014, 112, 39–47. [Google Scholar] [CrossRef] [PubMed]
  13. Zhao, W.; Zhang, Z.Y.; Ma, P.P.; Wang, Z.; Wang, Q.S.; Zhang, Z.; Pan, Y.C. The effect of high-density genotypic data and different methods on joint genomic prediction: A case study in large white pigs. Anim. Genet. 2023, 54, 45–54. [Google Scholar] [CrossRef] [PubMed]
  14. Zhuang, Z.W.; Wu, J.; Qiu, Y.B.; Ruan, D.L.; Ding, R.R.; Xu, C.E.; Zhou, S.P.; Zhang, Y.L.; Liu, Y.Y.; Ma, F.C.; et al. Improving the accuracy of genomic prediction for meat quality traits using whole genome sequence data in pigs. J. Anim. Sci. Biotechnol. 2023, 14, 67. [Google Scholar] [CrossRef] [PubMed]
  15. Zhang, C.Y.; Kemp, R.A.; Stothard, P.; Wang, Z.Q.; Boddicker, N.; Krivushin, K.; Dekkers, J.; Plastow, G. Genomic evaluation of feed efficiency component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genet. Sel. Evol. 2018, 50, 14. [Google Scholar] [CrossRef] [PubMed]
  16. Fang, F.; Li, J.L.; Guo, M.; Mei, Q.S.; Yu, M.; Liu, H.M.; Legarra, A.; Xiang, T. Genomic evaluation and genome-wide association studies for total number of teats in a combined American and Danish Yorkshire pig populations selected in China. J. Anim. Sci. 2022, 100, skac174. [Google Scholar] [CrossRef]
  17. Li, Y.H.; Chang, M.; Schrodi, S.J.; Callis-Duffin, K.P.; Matsunami, N.; Civello, D.; Bui, N.; Catanese, J.J.; Leppert, M.F.; Krueger, G.G.; et al. The 5q31 variants associated with psoriasis and Crohn’s disease are distinct. Hum. Mol. Genet. 2008, 17, 2978–2985. [Google Scholar] [CrossRef]
  18. Li, H.W.; Zhu, B.; Xu, L.; Wang, Z.Z.; Xu, L.; Zhou, P.N.; Gao, H.; Guo, P.; Chen, Y.; Gao, X.; et al. Genomic prediction using LD-based haplotypes inferred from high-density chip and imputed sequence variants in Chinese Simmental beef cattle. Front. Genet. 2021, 12, 665382. [Google Scholar] [CrossRef]
  19. Brondum, R.F.; Su, G.; Janss, L.; Sahana, G.; Guldbrandtsen, B.; Boichard, D.; Lund, M.S. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction. J. Dairy Sci. 2015, 98, 4107–4116. [Google Scholar] [CrossRef]
  20. Veerkamp, R.F.; Bouwman, A.C.; Schrooten, C.; Calus, M.P.L. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet. Sel. Evol. 2016, 48, 95. [Google Scholar] [CrossRef]
  21. Ros-Freixedes, R.; Johnsson, M.; Whalen, A.; Chen, C.Y.; Valente, B.D.; Herring, W.O.; Gorjanc, G.; Hickey, J.M. Genomic prediction with whole-genome sequence data in intensely selected pig lines. Genet. Sel. Evol. 2022, 54, 65. [Google Scholar] [CrossRef] [PubMed]
  22. Zhang, R.F.; Zhang, Y.; Liu, T.N.; Jiang, B.; Li, Z.Y.; Qu, Y.P.; Chen, Y.S.; Li, Z.C. Utilizing variants identified with multiple genome-wide association study methods optimizes genomic selection for growth traits in pigs. Animals 2023, 13, 722. [Google Scholar] [CrossRef] [PubMed]
  23. Jang, S.; Tsuruta, S.; Leite, N.G.; Misztal, I.; Lourenco, D. Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: A simulation study. Genet. Sel. Evol. 2023, 55, 49. [Google Scholar] [CrossRef]
  24. Botelho, M.E.; Lopes, M.S.; Mathur, P.K.; Knol, E.F.; Guimaraes, S.E.F.; Marques, D.B.D.; Lopes, P.S.; Silva, F.F.; Veroneze, R. Applying an association weight matrix in weighted genomic prediction of boar taint compounds. J. Anim. Breed. Genet. 2021, 138, 442–453. [Google Scholar] [CrossRef] [PubMed]
  25. Jang, S.; Ros-Freixedes, R.; Hickey, J.M.; Chen, C.Y.; Herring, W.O.; Holl, J.; Misztal, I.; Lourenco, D. Multi-line ssGBLUP evaluation using preselected markers from whole-genome sequence data in pigs. Front. Genet. 2023, 14, 1163626. [Google Scholar] [CrossRef] [PubMed]
  26. Jang, S.; Ros-Freixedes, R.; Hickey, J.M.; Chen, C.Y.; Holl, J.; Herring, W.O.; Misztal, I.; Lourenco, D. Using pre-selected variants from large-scale whole-genome sequence data for single-step genomic predictions in pigs. Genet. Sel. Evol. 2023, 55, 55. [Google Scholar] [CrossRef] [PubMed]
  27. Tiezzi, F.; Maltecca, C. Accounting for trait architecture in genomic predictions of US Holstein cattle using a weighted realized relationship matrix. Genet. Sel. Evol. 2015, 47, 24. [Google Scholar] [CrossRef] [PubMed]
  28. Ren, D.Y.; An, L.X.; Li, B.J.; Qiao, L.Y.; Liu, W.Z. Efficient weighting methods for genomic best linear-unbiased prediction (BLUP) adapted to the genetic architectures of quantitative traits. Heredity 2021, 126, 320–334. [Google Scholar] [CrossRef]
  29. Su, G.; Christensen, O.F.; Janss, L.; Lund, M.S. Comparison of genomic predictions using genomic relationship matrices built with different weighting factors to account for locus-specific variances. J. Dairy Sci. 2014, 97, 6547–6559. [Google Scholar] [CrossRef]
  30. Zhang, Z.; Ober, U.; Erbe, M.; Zhang, H.; Gao, N.; He, J.L.; Li, J.Q.; Simianer, H. Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS ONE 2014, 9, e93017. [Google Scholar] [CrossRef]
  31. Genetic Evaluation of South China Breeding Pigs. Available online: http://www.breeding.cn (accessed on 22 February 2022).
  32. Browning, B.L.; Browning, S.R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 2016, 98, 116–126. [Google Scholar] [CrossRef] [PubMed]
  33. Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P.I.; Daly, M.J.; et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef] [PubMed]
  34. de los Campos, G.; Vazquez, A.I.; Fernando, R.; Klimentidis, Y.C.; Sorensen, D. Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet. 2013, 9, e1003608. [Google Scholar] [CrossRef] [PubMed]
  35. Speed, D.; Hemani, G.; Johnson, M.R.; Balding, D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 2012, 91, 1011–1021. [Google Scholar] [CrossRef] [PubMed]
  36. VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef] [PubMed]
  37. Sarup, P.; Jensen, J.; Ostersen, T.; Henryon, M.; Sorensen, P. Increased prediction accuracy using a genomic feature model including prior information on quantitative trait locus regions in purebred Danish Duroc pigs. BMC Genet. 2016, 17, 11. [Google Scholar] [CrossRef] [PubMed]
  38. Clifford, D.; McCullagh, P. The Regress Function. The Newsletter of the R Project Volume 6/2, 6 May 2006. Available online: https://cran.r-project.org/web/packages/regress/regress.pdf (accessed on 1 April 2023).
  39. Pocrnic, I.; Lourenco, D.A.L.; Masuda, Y.; Misztal, I. Accuracy of genomic BLUP when considering a genomic relationship matrix based on the number of the largest eigenvalues: A simulation study. Genet. Sel. Evol. 2019, 51, 75. [Google Scholar] [CrossRef] [PubMed]
  40. van den Berg, I.; Xiang, R.D.; Jenko, J.; Pausch, H.; Boussaha, M.; Schrooten, C.; Tribout, T.; Gjuvsland, A.B.; Boichard, D.; Nordbo, O.; et al. Meta-analysis for milk fat and protein percentage using imputed sequence variant genotypes in 94,321 cattle from eight cattle breeds. Genet. Sel. Evol. 2020, 52, 37. [Google Scholar] [CrossRef]
  41. Winkler, T.W.; Day, F.R.; Croteau-Chonka, D.C.; Wood, A.R.; Locke, A.E.; Magi, R.; Ferreira, T.; Fall, T.; Graff, M.; Justice, A.E.; et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 2014, 9, 1192–1212. [Google Scholar] [CrossRef]
  42. Akbarzadeh, M.; Dehkordi, S.R.; Roudbar, M.A.; Sargolzaei, M.; Guity, K.; Sedaghati-khayat, B.; Riahi, P.; Azizi, F.; Daneshpour, M.S. GWAS findings improved genomic prediction accuracy of lipid profile traits. Sci. Rep. 2021, 11, 5780. [Google Scholar] [CrossRef]
  43. He, Z.X.; Li, S.; Li, W.; Ding, J.Q.; Zheng, M.Q.; Li, Q.H.; Fahey, A.G.; Wen, J.; Liu, R.R.; Zhao, G.P. Comparison of genomic prediction methods for residual feed intake in broilers. Anim. Genet. 2022, 53, 466–469. [Google Scholar] [CrossRef] [PubMed]
  44. Do, D.N.; Strathe, A.B.; Ostersen, T.; Jensen, J.; Mark, T.; Kadarmideen, H.N. Genome-wide association study reveals genetic architecture of eating behavior in pigs and its implications for humans obesity by comparative mapping. PLoS ONE 2013, 8, e71509. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The impact of different alternative strategies for preselecting variants on genome prediction. “Chip” represents all variants from the SNP chip data, and “WGS” represents all variants from the WGS data. “P_top5%_Chip” and “P_top5%_WGS”represent the top 5% of SNPs based on p-value ranking from a large GWAS meta-analysis in the SNP chip and WGS data, respectively. “G_top5%_Chip” and “G_top5%_WGS” represent the top 5% of SNPs based on estimated marker effects from an rr-BLUP analysis in the SNP chip and WGS data, respectively. The dashed lines indicate the value of Chip as a reference. ADG100: average daily gain at 100 kg, BFT100: average back fat thickness at 100 kg, DAYS100: days to 100 kg.
Figure 1. The impact of different alternative strategies for preselecting variants on genome prediction. “Chip” represents all variants from the SNP chip data, and “WGS” represents all variants from the WGS data. “P_top5%_Chip” and “P_top5%_WGS”represent the top 5% of SNPs based on p-value ranking from a large GWAS meta-analysis in the SNP chip and WGS data, respectively. “G_top5%_Chip” and “G_top5%_WGS” represent the top 5% of SNPs based on estimated marker effects from an rr-BLUP analysis in the SNP chip and WGS data, respectively. The dashed lines indicate the value of Chip as a reference. ADG100: average daily gain at 100 kg, BFT100: average back fat thickness at 100 kg, DAYS100: days to 100 kg.
Animals 13 03746 g001
Figure 2. The impact of different proportions of selected top SNPs on genome prediction “P_top_Chip” and “P_top_WGS” represent top SNPs based on p-value ranking from a large GWAS meta-analysis in the SNP chip and WGS data, respectively. “G_top_Chip” and “G_top_WGS” represent top SNPs based on estimated marker effects from an rr-BLUP analysis in the SNP chip and WGS data, respectively. The dashed lines indicate the value of the SNP chip as a reference. ADG100: average daily gain at 100 kg, BFT100: average back fat thickness at 100 kg, DAYS100: days to 100 kg.
Figure 2. The impact of different proportions of selected top SNPs on genome prediction “P_top_Chip” and “P_top_WGS” represent top SNPs based on p-value ranking from a large GWAS meta-analysis in the SNP chip and WGS data, respectively. “G_top_Chip” and “G_top_WGS” represent top SNPs based on estimated marker effects from an rr-BLUP analysis in the SNP chip and WGS data, respectively. The dashed lines indicate the value of the SNP chip as a reference. ADG100: average daily gain at 100 kg, BFT100: average back fat thickness at 100 kg, DAYS100: days to 100 kg.
Animals 13 03746 g002
Figure 3. The prediction accuracy of different prediction models for growth and carcass traits with different SNP preselection strategies. “P_top_Chip” and “P_top_WGS “represent top SNPs based on p-value ranking from a large GWAS meta-analysis in the SNP chip and WGS data, respectively. “G_top_Chip” and “G_top_WGS” represent top SNPs based on estimated marker effects from an rr-BLUP analysis in the SNP chip and WGS data, respectively. * means p < 0.05, ** means p < 0.01. ADG100: average daily gain at 100 kg, BFT100: average back fat thickness at 100 kg, DAYS100: days to 100 kg.
Figure 3. The prediction accuracy of different prediction models for growth and carcass traits with different SNP preselection strategies. “P_top_Chip” and “P_top_WGS “represent top SNPs based on p-value ranking from a large GWAS meta-analysis in the SNP chip and WGS data, respectively. “G_top_Chip” and “G_top_WGS” represent top SNPs based on estimated marker effects from an rr-BLUP analysis in the SNP chip and WGS data, respectively. * means p < 0.05, ** means p < 0.01. ADG100: average daily gain at 100 kg, BFT100: average back fat thickness at 100 kg, DAYS100: days to 100 kg.
Animals 13 03746 g003
Table 1. Summary statistics of growth and carcass traits in Large White pigs.
Table 1. Summary statistics of growth and carcass traits in Large White pigs.
Traits 1NumberMinimumMaximumAverageSDCV
ADG100/g1026443764589498
BFT100/mm10263.8218.89.5222.25723.702
DAYS100/d1026129.08222.43168.77313.9828.285
1 ADG100, average daily gain at 100 kg; BFT100, average back fat thickness at 100 kg; DAYS100, days to 100 kg; SD, standard deviation; CV, coefficient of variation.
Table 2. Number of selected top SNPs from the SNP chip data and the WGS data in the study population.
Table 2. Number of selected top SNPs from the SNP chip data and the WGS data in the study population.
The Proportion of Selected Top SNPsADG100 1BFT100 1DAYS100 1
ChipWGSChipWGSChipWGS
0.1%-854 -858 -856
0.3%-2562 -2574 -2569
0.5%-4270 -4290 -4281
1%382 8541 382 8580 3838563
5%1912 42,704 1908 42,898 19142,814
10%3824 85,409 381685,797 383385,629
20%7648 170,818 7632171,594 7666171,257
40%15,297 341,636 15,264343,188 15,332342,514
60%22,945 512,453 22,897514,781 22,999513,771
80%30,594 683,271 30,529686,375 30,665685,028
100%38,242 854,089 38,161 857,969 38,332856,285
1. ADG100, average daily gain at 100 kg; BFT100, average back fat thickness at 100 kg; DAYS100, days to 100 kg; “Chip” represents all variants from the SNP chip data, and “WGS” represents all variants from the WGS data.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, C.; Chang, C.; Zhang, W.; Ren, D.; Cai, X.; Zhou, T.; Shi, S.; Wu, X.; Si, J.; Yuan, X.; et al. Preselecting Variants from Large-Scale Genome-Wide Association Study Meta-Analyses Increases the Genomic Prediction Accuracy of Growth and Carcass Traits in Large White Pigs. Animals 2023, 13, 3746. https://doi.org/10.3390/ani13243746

AMA Style

Wei C, Chang C, Zhang W, Ren D, Cai X, Zhou T, Shi S, Wu X, Si J, Yuan X, et al. Preselecting Variants from Large-Scale Genome-Wide Association Study Meta-Analyses Increases the Genomic Prediction Accuracy of Growth and Carcass Traits in Large White Pigs. Animals. 2023; 13(24):3746. https://doi.org/10.3390/ani13243746

Chicago/Turabian Style

Wei, Chen, Chengjie Chang, Wenjing Zhang, Duanyang Ren, Xiaodian Cai, Tianru Zhou, Shaolei Shi, Xibo Wu, Jinglei Si, Xiaolong Yuan, and et al. 2023. "Preselecting Variants from Large-Scale Genome-Wide Association Study Meta-Analyses Increases the Genomic Prediction Accuracy of Growth and Carcass Traits in Large White Pigs" Animals 13, no. 24: 3746. https://doi.org/10.3390/ani13243746

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop