Integrating Parental Phenotypic Data Enhances Prediction Accuracy of Hybrids in Wheat Traits

Montesinos-López, Osval A.; Bentley, Alison R.; Saint Pierre, Carolina; Crespo-Herrera, Leonardo; Salinas Ruiz, Josafhat; Valladares-Celis, Patricia Edwigis; Montesinos-López, Abelardo; Crossa, José

doi:10.3390/genes14020395

Open AccessArticle

Integrating Parental Phenotypic Data Enhances Prediction Accuracy of Hybrids in Wheat Traits

by

Osval A. Montesinos-López

¹

,

Alison R. Bentley

²,

Carolina Saint Pierre

²,

Leonardo Crespo-Herrera

²

,

Josafhat Salinas Ruiz

³,

Patricia Edwigis Valladares-Celis

⁴,

Abelardo Montesinos-López

^5,* and

José Crossa

^2,6,*

¹

Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico

²

International Maize and Wheat Improvement Center (CIMMYT), Km 45, Mexico City 52640, Mexico

³

Colegio de Postgraduados Campus Córdoba, Km. 348 Carretera Federal Córdoba-Veracruz, Amatlán de los Reyes, Veracruz 94946, Mexico

⁴

Bachillerato 22, Universidad de Colima, Cuauhtémoc, Colima 28510, Mexico

⁵

Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico

⁶

Colegio de Postgraduados, Montecillos 56230, Mexico

^*

Authors to whom correspondence should be addressed.

Genes 2023, 14(2), 395; https://doi.org/10.3390/genes14020395

Submission received: 15 December 2022 / Revised: 27 January 2023 / Accepted: 30 January 2023 / Published: 2 February 2023

(This article belongs to the Section Plant Genetics and Genomics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Genomic selection (GS) is a methodology that is revolutionizing plant breeding because it can select candidate genotypes without phenotypic evaluation in the field. However, its practical implementation in hybrid prediction remains challenging since many factors affect its accuracy. The main objective of this study was to research the genomic prediction accuracy of wheat hybrids by adding covariates with the hybrid parental phenotypic information to the model. Four types of different models (MA, MB, MC, and MD) with one covariate (same trait to be predicted) (MA_C, MB_C, MC_C, and MD_C) or several covariates (of the same trait and other correlated traits) (MA_AC, MB_AC, MC_AC, and MD_AC) were studied. We found that the four models with parental information outperformed models without parental information in terms of mean square error by at least 14.1% (MA vs. MA_C), 5.5% (MB vs. MB_C), 51.4% (MC vs. MC_C), and 6.4% (MD vs. MD_C) when parental information of the same trait was used and by at least 13.7% (MA vs. MA_AC), 5.3% (MB vs. MB_AC), 55.1% (MC vs. MC_AC), and 6.0% (MD vs. MD_AC) when parental information of the same trait and other correlated traits were used. Our results also show a large gain in prediction accuracy when covariates were considered using the parental phenotypic information, as opposed to marker information. Finally, our results empirically demonstrate that a significant improvement in prediction accuracy was gained by adding parental phenotypic information as covariates; however, this is expensive since, in many breeding programs, the parental phenotypic information is unavailable.

Keywords:

genomic prediction; parental information; prediction accuracy; correlated traits

1. Introduction

The question of how to feed the growing world population is not new, and there is evidence that several thousand years ago, farmers had already begun to genetically and physically modify plants to achieve better yields. In the 21st century, however, food security is not only threatened by a rapidly growing population, but also by climate change, which limits natural resources such as water, fuel, minerals, and arable land and contributes to the elevation of emissions with greenhouse gas potential. Plant breeding is considered by many experts as the starting point for the human food chain, as it significantly contributes to producing high and stable yields with low external inputs of non-renewable resources, low greenhouse-gas emissions, and a low concentration of undesirable substances [1].

Meuwissen [2] proposed improving the efficiency of plant-breeding through a predictive methodology called Genomic Selection (GS), which can select the best candidates without phenotypic information. This is possible because a reference population with phenotypic and genotypic information can be trained using a statistical machine-learning model that is subsequently used to make predictions for candidate lines (target population) that were only genotyped [3]. This methodology is transforming plant-breeding and is being implemented for many crops like wheat, maize, cassava, rice, chickpea, and groundnut, among others [4,5,6,7]

For GS to be successfully implemented, an accurate prediction model must be based on a reference population comprised of individuals with both genotypic and phenotypic data to predict unobserved cultivars with only genotypic information. Extensive research studies have been conducted and novel statistical methods that incorporate pedigree, genomic, and environmental covariates (e.g., weather data) into statistical–genetic prediction models have been developed [8]. Genomic Best Linear Unbiased Predictor (GBLUP) models are widely used in GS, and the extension of GBLUP for incorporating genotype × environment interaction (GE) has improved the accuracy of predicting unobserved cultivars in environments. Jarquin et al. [9] found that the prediction accuracy of models, including the GE term, was substantially higher (17–34%) than models based only on main effects. For a maize data set with an ordinal response variable, Montesinos-López et al. [10] reported that models that included GE achieved gains of 9–14% in prediction accuracy over models that only included main effects. Using wheat data, Cuevas et al. [11] observed that models with the GE term were up to 60–68% better in terms of GP accuracy than the corresponding single-environment models.

Prediction of hybrid performance is fundamentally important in modern hybrid breeding programs, and the best linear unbiased prediction (BLUP) model has been found useful for predicting the performance of unobserved single crosses using the performance of observed single crosses based on the pedigree (i.e., coancestry coefficient) relationship between the inbred lines forming the unobserved and observed single crosses. When studying and assessing hybrid performance, two sources of variation are important: the estimation of the additive effects among lines based on the variance of the general combining ability and the dominance and/or epistatic effects among the lines based on the variance of the specific combining ability of the cross between lines.

In the context of hybrid development, GS can provide significant savings in resources, since only the parental information is required; the genotypes of hybrids can be deduced from the genotypes of their parents rather than sequenced anew, which significantly reduces the cost of hybrid development [12]. However, it is challenging to identify the best hybrids, since many combinations of parental information should be evaluated: a process that significantly increases the cost. For this reason, to improve the efficiency of the GS methodology and take advantage of the information available during hybrid development, some authors have proposed the integration of parental information in genomic prediction models [13,14]. The approach for integrating the parental information in the genomic prediction models used by Liang et al. [13] and Jarquin et al. [15] was to increase the training set with the parental phenotypic information; that is, the training set was extended by including the phenotypic and genotypic data for the inbred lines. Conversely, the approach of Xu et al. [14] consisted of integrating the parental information in the prediction models as covariates computed from the parents of each hybrid while maintaining a fixed-size training set. The results of Liang et al. [13] were mixed: in some traits, there was a slight improvement in prediction accuracy, while in others, a decrease in prediction accuracy was observed. They concluded that “the naive incorporation of inbred genotype and trait information into training datasets decreased prediction accuracy for high heterosis traits” [13]. The results of Jarquin et al. [15] also showed no clear advantage in prediction performance by adding the parental phenotypic information into the training set. However, the results of Xu et al. [14] reported an increase in prediction accuracy of 13.6%, 54.5%, 19.9%, and 8.3%, for GY, number of tillers per plant, number of grains per panicle, and 1000 grain weight, respectively, showing a consistent gain in terms of prediction accuracy.

We propose using parental phenotypic information as covariates like Xu et al. [14] in genomic prediction models to improve the prediction accuracy of the genomic selection methodology. Four types (groups) of models were evaluated with these two types of parental information covariates using a wheat data set from the International Maize and Wheat Improvement Center (CIMMYT). These four types of models are MA, MB, MC, and MD. We further proposed two ways of adding the parental phenotypic information as covariates to the genomic prediction models: (a) using only the parental information of the trait to be predicted (MA_C, MB_C, MC_C, and MD_C) and (b) using both parental information of the trait to be predicted and the parental information other correlated traits (MA_AC, MB_AC, MC_AC, and MD_AC).

2. Materials and Methods

2.1. Phenotypic Data

A total of 1888 hybrids obtained by crossing 667 females and 18 males were evaluated in field experiments for three years at CIMMYT’s Campo Experimental Norman E. Borlaug (CENEB or the Norman E. Borlaug Experiment Station) near Ciudad Obregon, Sonora, Mexico. The number of hybrids evaluated during the winter growing seasons in 2014 to 2015 (Year 1), 2015 to 2016 (Year 2), and 2016 to 2017 (Year 3) were 703, 655, and 1197, respectively, with 225 and 383 common hybrids in each consecutive year. The elite female and male parents were chosen from CIMMYT’s spring bread wheat program based on their performance for the traits of interest, suitability for producing hybrids, and ancestral diversity as measured with a coefficient of parentage [16].

Using a chemical hybridizing agent provided by Syngenta Inc., the hybrids were produced in alternate male and female strip plots measuring 6.4 m. Parents and hybrids were evaluated in α-lattice trials with two replications in two years. To guarantee a uniform plant density, 1000 seeds were sown in 4.8 m yield trial plots. In a high-yield-potential environment, the trials were conducted with four supplementary irrigations using standard agronomic practices. In all trials, males and females were planted together with the hybrids and two checks. Days to flowering (DTF), days to heading (DTH), days to maturity (DTM), grain yield (GY), and plant height (PHT) per plot were recorded for each entry. Phenotypic data were analyzed by using a mixed linear model implemented in META-R software [17], where genomic best linear unbiased predictions (BLUEs) were estimated after fitting the model with trial, genotype, and replication nested within trials, and sub-blocks nested within trials and replications. BLUEs were obtained for each hybrid and parent and used for further analyses [16]. Three traits were analyzed in this paper: GY, DTF, and DTH.

2.2. Genotypic Data

The 18 male and 667 female parents were genotyped using the Illumina iSelect 90 K Infinitum SNP genotyping array in the first year and the Illumina Infinium 15 K wheat SNP array (TraitGenetics GmbH) in the second and third years. A total of 13,005 single-nucleotide polymorphisms (SNPs) remained after combining the three datasets. Markers with <15% missing values were kept; then, after cleaning, the remaining missing markers were imputed using the mean allele frequency of the wheat lines with that specific marker. After imputation, markers with <0.05 minor allele frequency were removed. A total of 10,250 markers were used for further analysis. Although a larger set of hybrids and parents were evaluated in the field, only hybrids derived from SNP-genotyped parents were used for genomic predictions and different numbers of hybrids were included in each year [16].

2.3. Statistical Model

In this study, we evaluated four different types of models: Type A, B, C, and D (MA, MB, MC, and MD), where Types A and C did not include genomics, and Types B and D did. Models could include either one covariate (same trait to be predicted) (MA_C, MB_C, MC_C, and MD_C) or several covariates (of the same trait and other correlated traits) (MA_AC, MB_AC, MC_AC, and MD_AC). As shown below, while Type A and B models are similar, (A) is without genomics, and (B) is with genomics. Type C and D models are also similar, but (C) is without genomics, and (D) is with genomics.

2.3.1. Model MB_AC

This model is given by

Y = Z_{E} β_{E} + Z_{M} g_{M} + Z_{F} g_{F} + Z_{H} h + u_{M} + u_{F} + u_{H} + X_{A C} β_{A C} + ϵ

(1)

where

Y

is the response vector (i.e., the hybrids’ adjusted phenotypic information);

Z_{E}

is the design matrix for environments (year);

β_{E}

is the vector of environmental effects:

β_{E}

\sim N_{I} (0, σ_{E}^{2} I)

;

g_{M}

is the vector of random effects due to the general combining ability (GCA) of markers for paternal lines (males, M);

g_{F}

is the vector of random effects due to the GCA of markers for maternal lines (females, F); and

h

is the vector of SCA random effects for the crosses (hybrids, H). The incidence matrices

Z_{M}

,

Z_{F}

, and

Z_{H}

relate

Y

to

g_{M}

,

g_{M}

, and

h

with

g_{M} \sim N (0, σ_{M}^{2} G_{M})

,

g_{F} \sim N (0, σ_{F}^{2} G_{F})

, and

h \sim N (0, σ_{h}^{2} H)

, where

σ_{M}^{2}

,

σ_{F}^{2}

, and

σ_{H}^{2}

are variance components associated with GCA and SGA;

G_{M}

,

G_{F}

, and

H

are relationship matrices for parental and maternal lines and hybrids, respectively. Finally,

ϵ \sim N (0, σ_{ϵ}^{2} I)

, where

σ_{ϵ}^{2}

is the variance associated with the residuals. The relationship matrices

G_{M}

and

G_{F}

were computed using markers [18]. Let

X_{m}

,

m \in

{Male, Female} be the matrix of markers, and let

W_{m}

, be the matrix of centered and standardized markers. Then,

G_{m} = \frac{W_{m} W_{m}^{T}}{p}

[16,19,20], where

p

is the number of markers and

H = G_{M} \otimes G_{F}

, where

\otimes

denotes the Kronecker product. While

u_{M} \sim N (0, σ_{M E}^{2} V_{M})

,

u_{F} \sim N (0, σ_{F E}^{2} V_{F})

,

u_{H} \sim N (0, σ_{H E}^{2} V_{H})

;

σ_{M E}^{2}

,

σ_{F E}^{2}

, and

σ_{h E}^{2}

are variance components associated with male × environment, female × environment, and hybrid × environment interactions, respectively; and

V_{M}

,

V_{F}

, and

V_{H}

are the associated variance–covariance matrices. The variance–covariance matrix is given by

V_{M} = Z_{M} G_{M} Z_{M}^{T} # Z_{E} Z_{E}^{T}

,

V_{F} = Z_{F} G_{F} Z_{F}^{T} # Z_{E} Z_{E}^{T}

, and

V_{H} = Z_{H} H Z_{H}^{T} # Z_{E} Z_{E}^{T}

, where # stands for the Hadamard product.

X_{A C}

is the matrix that contains the parental covariates of the trait to be predicted and of correlated traits. We computed two covariates from each trait using the parental phenotypic information. One covariate that captures the additive part is computed as

X_{A C, t, a} = \frac{(P_{M, t} + P_{F, t})}{2}

where

t = G Y, D T F and D T H, and a

denotes additive,

P_{M, t}

is the phenotypic value of the parental male line for the t^th trait, and

P_{F, t}

is the phenotypic value of the parental female line for the tth trait where the male and female are assumed to belong to different heterotic groups. The other covariates capture the dominance part, and it is computed as the absolute value of

P_{M, t} + P_{F, t}

(

|P_{M, t} + P_{F, t}|

) for the tth trait:

X_{A C, t, d} = \frac{|P_{M, t} + P_{F, t}|}{2}

where

d

denotes dominance. The matrix

X_{A C}

contains 6 columns, since two covariates (one for

a

and the other for

d

) were computed for each of the three traits under study. It is also important to point out that when the matrix of covariates

X_{A C}

was ignored in the predictor, the model is denoted as MB, but when the matrix

X_{A C}

only contained the information of one trait that corresponded to the trait to be predicted, this model is named MB_C, and the covariates are denoted as

X_{C}

. To facilitate the understanding and comparison, all these models described (MB, MB_C, and MB_AC) will be called Type B models.

2.3.2. Model MA_AC

The predictor of model MA_AC is exactly equal to the predictor of model MB_AC (equation 1), but without markers information because (a) the vector of random effects due to the GCA of markers for paternal lines; (b) the vector of random effects due to the GCA of markers for maternal lines; and (c) the vector of SCA random effects for the hybrids (crosses) are distributed as

g_{M} \sim N (0, σ_{M}^{2} I_{M})

,

g_{F} \sim N (0, σ_{F}^{2} I_{F})

, and

h \sim N (0, σ_{H}^{2} I_{H})

respectively. The distribution of interaction terms are

u_{M} \sim N (0, σ_{M E}^{2} Z_{M} Z_{M}^{T} # Z_{E} Z_{E}^{T})

,

u_{F} \sim N (0, σ_{F E}^{2} Z_{F} Z_{F}^{T} # Z_{E} Z_{E}^{T})

,

u_{H} \sim N (0, σ_{H E}^{2} Z_{H} H Z_{H}^{T} # Z_{E} Z_{E}^{T})

. Under model MA_AC, when the matrix of covariates

X_{A C}

was ignored in the predictor, this model is denoted as MA; when the matrix

X_{A C}

only contained the information of one trait, which corresponds to the trait to be predicted, it is named model MA_C, and the covariates are denoted as

X_{C}

. To facilitate the understanding and comparison of the models just described (MA, MA_C, and MA_AC), they will be called Type A models. It is important to point out that those models that end with _C use parental phenotypic information of the trait to be predicted as covariates, while those models that end with _AC in addition to the covariates of the parental phenotypic information of the same trait also use parental phenotypic information of other correlated traits.

2.3.3. Model MD_AC

This model is given by

Y = Z_{E} β_{E} + Z_{H} h + u_{H} + X_{A C} β_{A C} + ϵ

(2)

Model MD_AC is equal to model MB_AC but without the main effects of paternal effects (males), maternal effects (females), and its interactions with environments. Under model MD_AC, when the matrix of covariates

X_{A C}

was ignored in the predictor, the model is denoted as MD, but when the matrix

X_{A C}

only contained the information of one trait that corresponded to the trait to be predicted, it is termed MD_C, where the covariates were denoted as

X_{C}

. Again, the models just described (MD, MD_C, and MD_AC) will be called Type D models.

2.3.4. Model MC_AC

The predictor of model MC_AC is exactly equal to the predictor of model MD_AC (Equation 2), but without markers information. For this reason, the vector of SCA random effects for the hybrids (crosses) were distributed as

h \sim N (0, σ_{h}^{2} I_{H})

, and the distribution of the interaction term was

u_{H} \sim N (0, σ_{H E}^{2} Z_{H} H Z_{H}^{T} # Z_{E} Z_{E}^{T})

. Under model MC_AC, when the matrix of covariates

X_{A C}

was ignored in the predictor, we call this Model MC, but when the matrix

X_{A C}

only contained the information of one trait that corresponded to the trait to be predicted, we call this model MC_C, where the covariates are denoted as

X_{C}

. By the same logic, these models (MC, MC_C, and MC_AC) will be called Type C models.

The implementation of these models was carried out in the R statistical software using the BGLR library [21].

2.4. Evaluation of Prediction Performance

In each of the twelve models, we implemented a type of cross-validation that mimicked real breeding strategies, called untested lines, in tested environments under seven-fold cross-validation [22]. For this reason,

7 - 1

folds were assigned to the training set and the remaining to the testing set until each of the

7

folds were used at least once in the testing set. Next, the average of the seven folds was reported as prediction performance using the mean square error (MSE) as a metric. To compare the prediction accuracies between models of the same type (Type A, Type B, Type C, and Type D models), the relative efficiencies in terms of MSE were computed as

R E_{M S E} = \frac{M S E_{M X}}{M S E_{M X_Z}}

where

M S E_{M X}

and

M S E_{M X_Z}

designate the

M S E

of model, where

X = A, B, C and D

and

Z = C and A C

, respectively. For other scenarios, the relative efficiency was computed as

R E_{M S E} = \frac{M S E_{M X_C}}{M S E_{M X_A C}}

Under both, if

R E_{M S E} > 1,

the best prediction performance in terms of

M S E

was obtained using method

M X_Z

(or

M X_A C)

, but when

R E_{M S E} < 1,

the best method was

M X (o r M X_C)

. When

R E_{N R M S E} = 1,

both methods were equally efficient.

3. Results

The results are provided in the three main sections, an Appendix A, and Supplementary Tables. Section 3.1 and Section 3.2 show the results of Type A and B models, while Section 3.3 provides a comparison between the 12 models (MA, MA_C, MA_AC, MB, MB_C, MB_AC, MC, MC_C, MC_AC, MD, MD_C, and MD_AC) implemented across years for three traits. To make the results interpretable and easy to read, we have shown the results from Type C and D models in the Appendix A. Note that all results reported are, under cross-validation, called untested lines in tested environments. Furthermore, five Supplementary Tables are given in Tables S1–S5 to complement the results displayed in the five figures (Figure 1, Figure 2 and Figure 3 in the main text and Figure A1 and Figure A2 in the Appendix A). Supplementary Tables S1–S3 show the predictions for each trait in each environment and across environments for each trait (global) for Type A and B models and across years in terms of mean squared error under untested lines in tested environments cross-validation strategies. Tables S4 and S5 show the prediction performance for each trait in each environment and across environments for each trait (global) for Type C and D models in terms of mean squared error (MSE) under untested lines in tested environments across validation strategies. In each of the five Supplementary Tables, the header identifies both the model (Type A, B, C, D) and the inclusion (or not) of one or more covariates.

3.1. Type A Models

For the Type A model, we can observe in Figure 1A that the REs from comparing model MA with MA_C for GY in terms of MSE for each year and across years were 1.135 (year 1), 0.962 (year 2), 1.179 (year 3), and 1.141 (Global). This indicates that the MA_C model outperformed the MA model in most of the years: 13.5% (year 1), 17.9% (year 3), and 14.1% (Global). The REs of comparing the MA with MA_AC for GY in terms of MSE for each year and across years were 1.136 (year 1), 0.969 (year 2), 1.162 (year 3), and 1.137 (Global) (Figure 1A). This indicates that MA_AC outperformed the MA model in all years, except year two: 13.6% (year 1), 16.2% (year 3), and 13.7% (Global). Finally, the REs from comparing the MA_C with MA_AC for GY in terms of MSE for each year and across years were 1.001 (year 1), 1.007 (year 2), 0.986 (year 3), and 0.996 (Global). This indicates that MA_AC slightly outperformed MA_C by 0.1% (year 1) and 0.7% (year 2). For more details, see Table S5 (Supplementary Tables).

In Figure 1B, the RE is given in terms of MSE and compares model MA with model MA_C for trait DTF for each year and across years. As shown, the REs are 1.019 (year 1), 1.126 (year 2), 1.208 (year 3), and 1.169 (Global), indicating that model MA_C outperformed the MA model in all years: 1.9% (year 1), 12.6% (year 2), 20.8% (year 3), and 16.9% (Global). The REs from comparing model MA with MA_AC for trait DTF in terms of MSE for each year and across years were 1.036 (year 1), 1.136 (year 2), 1.197 (year 3), and 1.167 (Global). This indicates that MA_AC outperformed the MA model in all years: 3.6% (year 1), 13.6% (year 2), 19.7% (year 3), and 16.7% (Global). Finally, the REs from comparing model MA_C with MA_AC for DTF in terms of MSE for each year and across years were 1.016 (year 1), 1.008 (year 2), 0.991 (year 3), and 0.998 (Global). This indicates that the MA_AC model is quite like the MA_C model, since it was only better in two years: 1.6% (year 1) and 0.8% (year 2). For more details, see Table S5 (Supplementary Tables).

We can observe in Figure 1C the RE from comparing model MA with MA_C for trait DTH in terms of MSE for each year and across years: 1.026 (year 1), 1.141 (year 2), 1.208 (year 3), and 1.176 (Global). This indicates that model MA_C outperformed the MA model in all years: 2.6% (year 1), 14.1% (year 2), 20.8% (year 3), and 17.6% (Global). The REs of comparing model MA with MA_AC for trait DTH in terms of MSE for each year and across years were 1.176 (year 1), 1.044 (year 2), 1.207 (year 3), and 1.178 (Global). This indicates that model MA_AC outperformed MA model in all years: 17.6% (year 1), 4.4% (year 2), 20.7% (year 3), and 17.8% (Global). Finally, the REs of comparing the MA_C with MA_AC for trait DTH in terms of MSE for each year and across years were 1.018 (year 1), 1.009 (year 2), 0.999 (year 3), and 1.002 (Global). This indicates that MA_AC outperformed the MA_C model by only a small margin: 1.8% (year 1), 0.9% (year 2), and 0.2% (Global). For more details, see Table S5 (Supplementary Tables).

3.2. Type B Models

For the Type B models, we can observe in Figure 2A that the REs from comparing model MB with MB_C for GY in terms of MSE for each year and across years were 1.050 (year 1), 0.958 (year 2), 1.114 (year 3), and 1.055 (Global). This indicates that model MB_C outperformed model MB in all years except year 2, by 5.0% (year 1), 11.4% (year 3), and 5.5% (Global). The REs from comparing the MB with MB_AC for GY in terms of MSE for each year and across years were 1.052 (year 1), 0.973 (year 2), 1.101 (year 3), and 1.053 (Global) (Figure 2A). This indicates that the MB_AC model outperformed the MB model in all but the second year by 5.2% (year 1), 10.1% (year 3), and 5.3% (Global). Finally, the Res from comparing the MB_C with MB_AC for GY in terms of MSE for each year and across years were 1.002 (year 1), 1.016 (year 2), 0.988 (year 3), and 0.998 (Global). This indicates that MB_AC outperformed the MB_C model by only a small margin: by 0.2% (year 1) and 1.6% (year 2). For more details, see Table S2 (Supplementary Tables).

Figure 2B displays the REs in terms of MSE from comparing model MB with MB_C for trait DTF for each year and across years: 1.002 (year 1), 1.084 (year 2), 1.180 (year 3), and 1.119 (Global). This indicates that model MB_C outperformed model MB in all years, by 0.2% (year 1), 8.4% (year 2), 18.0% (year 3), and 11.9% (Global). The REs from comparing the MB model with the MB_AC model for trait DTF in terms of MSE for each year and across years were 1.042 (year 1), 1.092 (year 2), 1.180 (year 3), and 1.129 (Global). This indicates that the MB_AC model outperformed the MB model in all years by 4.2% (year 1), 9.2% (year 2), 18.0% (year 3), and 12.9% (Global). Finally, the REs from comparing the MB_C with MB_AC for trait DTF in terms of MSE for each year and across years were 1.041 (year 1), 1.008 (year 2), 1.00 (year 3), and 1.009 (Global). This indicates that the MB_AC model outperformed MB_C only by 4.1% (year 1), 0.8% (year 2), and 0.9% (Global). For more details, see Table S2 (Supplementary Tables).

Figure 2C shows that the REs from comparing model MB with MB_C for trait DTH in terms of MSE for each year and across years were 1.000 (year 1), 1.089 (year 2), 1.181 (year 3), and 1.120 (Global). This indicates that the MB_C model outperformed the MB model in all years except the first year by 8.9% (year 2), 18.1% (year 3), and 12.0% (Global). The REs from comparing the MB with MB_AC for trait DTH in terms of MSE for each year and across years were 1.047 (year 1), 1.088 (year 2), 1.186 (year 3), and 1.133 (Global). This indicates that the MB_AC model outperformed the MB model in all years by 4.7% (year 1), 8.8% (year 2), 18.6% (year 3), and 13.3% (Global). Finally, the REs from comparing the MB_C with MB_AC for trait DTH in terms of MSE for each year and across years were 1.047 (year 1), 0.999 (year 2), 1.004 (year 3), and 1.011 (Global). This indicates that the MB_AC model outperformed MB_C in most of the years: 4.7% (year 1), 0.4% (year 3), and 1.1% (Global). For more details, see Table S2 (Supplementary Tables).

3.3. Comparison across Years

Figure 3A–C show the differences between the twelve models in terms of prediction performance using the MSE as metrics for the three traits evaluated. Figure 3D displays the RE when comparing models without parental phenotypic information versus models with parental phenotypic information of the same type in terms of MSE for GY trait across years, which were 1.141 (MA vs. MA_C), 1.055 (MB vs. MB_C), 1.514 (MC vs. MC_C), and 1.064 (MD vs. MD_C). This indicates that models with parental phenotypic information of the same type of outperformed models without parental information across years (Global) by 14.1% (MA vs. MA_C), 5.5% (MB vs. MB_C), 51.4% (MC vs. MC_C), and 6.4% (MD vs. MD_C). The RE of comparing models without parental phenotypic information versus parental information of the same type plus additional correlated traits in terms of MSE for GY trait across years were 1.137 (MA vs. MA_AC), 1.053 (MB vs. MB_AC), 1.551 (MC vs. MC_AC), and 1.060 (MD vs. MD_AC); that is, models with parental phenotypic covariates of the same type plus additional covariates of correlated traits outperformed models without covariates of parental phenotypic information across years (Global) by 13.7% (MA vs. MA_C), 5.3% (MB vs. MB_C), 55.1% (MC vs. MC_C), and 6.0% (MD vs. MD_C). For more details, see Table S3 (Supplementary Tables).

In Figure 3E, we can observe that the REs from comparing models without parental phenotypic information with models with parental information of the same type in terms of MSE for trait DTF across years were 1.169 (MA vs. MA_C), 1.119 (MB vs. MB_C), 1.964 (MC vs. MC_C), and 1.106 (MD vs. MD_C); that is, models with parental phenotypic information outperformed models without parental information across years (Global) in all cases: 16.9% (MA vs. MA_C), 11.9% (MB vs. MB_C), 96.4% (MC vs. MC_C), and 10.6% (MD vs. MD_C). The REs from comparing models without parental phenotypic information with models with parental phenotypic information of the same type plus additional correlated traits in terms of MSE for trait DTF across years were 1.167 (MA vs. MA_AC), 1.129 (MB vs. MB_AC), 1.996 (MC vs. MC_AC), and 1.113 (MD vs. MD_AC). This indicates that models with parental phenotypic information of the same trait and additional correlated traits outperformed models without parental information across years (Global) by 16.7% (MA vs. MA_C), 12.9% (MB vs. MB_C), 99.6% (MC vs. MC_C), and 11.3% (MD vs. MD_C). For more details, see Table S3 (Supplementary Tables).

Figure 3F shows that the REs from comparing models without parental information and with parental information of the same trait in terms of MSE for trait DTH across years were 1.176 (MA vs. MA_C), 1.120 (MB vs. MB_C), 1.955 (MC vs. MC_C), and 1.112 (MD vs. MD_C). This means that models with parental phenotypic information of the same trait outperformed models without parental phenotypic information across years (Global) by 17.6% (MA vs. MA_C), 12.0% (MB vs. MB_C), 95.5% (MC vs. MC_C), and 11.2% (MD vs. MD_C). Meanwhile, the REs from comparing models without parental phenotypic information with models with parental phenotypic information of the same type plus additional correlated traits in terms of MSE for trait DTH across years were 1.178 (MA vs. MA_AC), 1.133 (MB vs. MB_AC), 2.005 (MC vs. MC_AC), and 1.120 (MD vs. MD_AC). This indicates that models with parental phenotypic information of the same type and additional correlated traits outperformed models without parental phenotypic information across years (Global) by 17.8% (MA vs. MA_C), 13.3% (MB vs. MB_C), 105.0% (MC vs. MC_C), and 12.0% (MD vs. MD_C). For more details, see Table S3 (Supplementary Tables).

4. Discussion

Genomic selection is a very attractive predictive methodology because candidate lines can be selected without phenotypic evaluation. However, its practical implementation is still challenging, because when all factors that affect its accuracy are not efficiently optimized, its results may contain a high degree of uncertainty. For this reason, it is necessary to continue research on this topic.

For this paper, we studied the incorporation of parental phenotypic information as covariates in genomic prediction models. Two approaches for incorporating the parental phenotypic information were studied. The first approach consisted of adding the parental information of the trait to be predicted only as a covariate, while the second used both the parental information of the trait and the parental phenotypic information of other correlated traits as a covariate. To capture the additive and dominance information of the parental information for each trait, two covariates were computed; (Pheno_male + Pheno_female)/2 and the absolute value of (Pheno_male + Pheno_female)/2 captures the additive and dominance part, respectively. Using these covariates, we evaluated four types of models that contain different main and interaction effects in the predictor, and observed the trait GY across years, models, and type of covariates. The addition of parental information improved the prediction accuracy in terms of mean square error by 19.68%, while in trait DHT, this improvement was 34.98% and in trait DFT, the improvement was 34.53%. However, it is important to note that the Type C models (MC, MC_C, and MC_AC) displayed the largest improvement in prediction accuracy when the parental phenotypic information was incorporated. For example, when the parental information was added only for the trait to be predicted, the gain in prediction accuracy was 81.1%, while when in addition to the trait to be predicted the parental phenotypic information of other correlated traits was incorporated, the gain in prediction performance in terms of mean square error increased to 85.06%.

Our results coincide with those reported by Xu et al. [14] when working with rice. The authors reported that incorporating parental phenotypic information into conventional genomic prediction models increased prediction accuracy by 13.6%, 54.5%, 19.9%, and 8.3% for GY, number of tillers per plant, number of grains per panicle, and 1000 grain weight, respectively. However, these differ greatly from the findings of Liang et al. [13], who obtained mixed results. In some traits, improvements in prediction accuracy were obtained, while in others, there was a decrease. In Jarquin’s study [15], only a modest increase in prediction accuracy was observed. Some of the difference in our results versus those of Liang et al. [13] and Jarquin et al. [15] are because of the integration of the parental phenotypic information. In our case, like Xu et al. [14], the parental information was incorporated as covariates in the prediction models, whereas with Liang et al. [13] and Jarquin et al. [15], the training set with the parental phenotypic information was enlarged. Another reason for different results can be attributed to our use of the mean square error as a metric for evaluating the prediction accuracy, while the studies of Liang et al. [13], Jarquin et al. [15], and Xu et al. [14] used the Pearson’s correlation.

Hybrid breeding is an efficient system to break the yield barriers in many crops. CIMMYT elite lines from the spring bread wheat breeding program are crossed and the resulting F1 hybrids are primarily evaluated in Mexico. However, as CIMMYT lacks distinct male and female breeding pools, the main challenge remains in the parental selection for hybrid production. Although the per se performance of inbreds, plant phenology, and cross-pollinating traits are the initial criteria to select parents for hybrid production and testing, there is a limitation on the number of combinations that we can produce and test in the field. Since the beginning of its hybrid wheat program, CIMMYT has been routinely using the coefficient of parentage (COP) as one of the parental selection criteria to maximize genetic diversity. The wheat hybrid study of Basnet [16] demonstrated the potential application of pedigree information and molecular marker data in predicting single-cross hybrid performance. Applying hybrid prediction, we envision selecting potential hybrid parents from a broader genetic germplasm pool and earlier in the breeding cycle after preliminary and elite yield trail evaluation. However, Basnet’s [16] study did not include any covariate using parental information to enhance hybrid prediction; it shows the increase in genomic prediction accuracy by using existing parental phenotypic data. Additional studies are required to analyze environmental studies together with parental phenotypic information.

Furthermore, results of this study show that adding the parental phenotypic information to the genomic prediction model enhances the prediction accuracy of the GS methodology. It is noted that the way that the parental phenotypic information is incorporated in the genomic prediction models is key to improving the prediction performance. We also found that only a small gain in prediction performance is reached when, in addition to the parental phenotypic information of the same trait to be predicted, we add the parental phenotypic information of other correlated traits. This result could be interpreted such that adding other correlated traits from parents as covariates can be helpful in increasing the prediction accuracy only in those cases where these other traits are highly correlated with the trait of interest (to be predicted); otherwise, if the degree of correlation is low, no further improvement on genomic prediction accuracy can be achieved.

Our research contributes to the growing empirical evidence that adding metabolic/biochemical traits, also called endophenotypes, as covariates can help increase the prediction accuracy of genomic prediction models, since these covariates complement the explanatory power of genetic markers when used as covariates in prediction models, as noted by Melandri et al. [23]. For this reason, Westhues [24] points out that “complementing genomic data with other “omics” predictors can increase the probability of success for predicting the best hybrid combinations using complex agronomic traits.”

5. Conclusions

We proposed the integration of parental phenotypic information as covariates in conventional genomic prediction models. We found empirical evidence that, when parental phenotypic information is considered as a covariate in genomic prediction models, the prediction accuracy improves. Across traits and types of parental covariate information, we observed a gain in terms of mean square error of 16.1% (average of models MA_C and MA_AC vs. model MA), 10.2% (average of models MB_C and MB_AC vs. model MB), 83.1% (average of models MC_C and MC_AC vs. model MC), and 9.9% (average of models MD_C and MD_AC vs. model MD) by adding the parental phenotypic covariate information. However, we did not observe significant differences between the two approaches by adding the parental phenotypic covariate information (only of the trait to be predicted and with the trait to be predicted and correlated traits). Even though these results are not conclusive, they provide empirical evidence that a significant gain in prediction accuracy can be obtained by adding parental phenotypic information as covariates, and as such, we encourage more research with other data sets to corroborate our empirical findings.

Supplementary Materials

Five Supplementary Tables can be downloaded at https://www.mdpi.com/article/10.3390/genes14020395/s1 with results. A brief description of each Supplementary Table follows. Table S1. Prediction performance for each trait in each environment and across environments for each trait (global) for type A models (MA, MA_C and MA_AC). Table S2. Prediction performance for each trait in each environment and across environments for each trait (global) for type B models (MB, MB_C and MB_AC. Table S3. Prediction performance across environments for each trait (global) for all models: MA, MA_C, MA_AC, MB, MB_C, MB_AC, MC, MC_C, MC_AC, MD, MD_C, and MD_AC. Table S4. Prediction performance for each trait in each environment and across environments for each trait (global) for type C models (MC, MC_C, and MC_AC). Table S5. Prediction performance for each trait in each environment and across environments for each trait (global) for type D models (MD, MD_C, and MD_AC).

Author Contributions

Conceptualization, O.A.M.-L. and A.M.-L.; methodology, O.A.M.-L., A.M.-L. and J.C.; software, J.S.R. and P.E.V.-C.; formal analysis, O.A.M.-L. and A.M.-L.; investigation, A.R.B., C.S.P., L.C.-H., O.A.M.-L., O.A.M.-L. and J.C.; writing—original draft preparation, O.A.M.-L., A.M.-L. and J.C.; writing—review and editing all the authors. All authors have read and agreed to the published version of the manuscript.

Funding

We are thankful for the financial support provided by the Bill & Melinda Gates Foundation [INV-003439, BMGF/FCDO, Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods (AG2MW)], the USAID projects [USAID Amend. No. 9 MTO 069033, USAID-CIMMYT Wheat/AGGMW, AGG-Maize Supplementary Project, AGG (Stress Tolerant Maize for Africa], and the CIMMYT CRP (maize and wheat). We acknowledge the support of the Window 1 and 2 funders to the Accelerated Breeding Initiative (ABI). We are also thankful for the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) through the Research Council of Norway for grants 301835 (Sustainable Management of Rust Diseases in Wheat) and 320090 (Phenotyping for Healthier and more Productive Wheat Crops).

Informed Consent Statement

Not applicable.

Data Availability Statement

Phenotypic and genomic data can be downloaded from the link: http://hdl.handle.net/11529/10548129.

Acknowledgments

The authors are thankful to the administrative, technical field support, and lab assistance that established the different experiments in the field as well as in the laboratories at the different institutions that generated the data used in this study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Models of Type C

Figure A1A shows that the REs of comparing model MC to MC_C for GY trait in terms of MSE for each year and across years were 1.401 (year 1), 1.151 (year 2), 1.510 (year 3), and 1.514 (Global). This indicates that the MC_C model outperformed the MC model in all years by 40.1% (year 1), 15.1% (year 2), 51.0% (year 3), and 51.4% (Global), while the REs comparing the MC model to the MC_AC for GY in terms of MSE for each year and across years were 1.433 (year 1), 1.218 (year 2), 1.511 (year 3), and 1.551 (Global). This indicates that the MC_AC model outperformed the MC model in all the years by 43.3% (year 1), 21.8% (year 2), 51.1% (year 3), and 55.1% (Global). Finally, the REs comparing the MC_C to MC_AC for GY trait in terms of MSE for each year and across years were 1.022 (year 1), 1.059 (year 2), 1.000 (year 3), and 1.024 (Global). This indicates that MC_AC outperformed model MC_C by a small margin: 2.2% (year 1), 5.9% (year 2), and 2.4% (Global). For more details, see Table S4.

Figure A1B shows the REs in terms of MSE after comparing the MC model to the MC_C model for trait DTF for each year and across years. These were 1.488 (year 1), 1.955 (year 2), 1.983 (year 3), and 1.964 (Global). This indicates that model MC_C outperformed MC model in all years by 48.8% (year 1), 95.5% (year 2), 98.3% (year 3), and 96.4% (Global), while the REs of comparing the MC to MC_AC for trait DTF in terms of MSE for each year and across years were 1.559 (year 1), 1.973 (year 2), 1.994 (year 3), and 1.996 (Global). This indicates that MC_AC outperformed MC in all years by 55.9% (year 1), 97.3% (year 2), 99.4% (year 3), and 99.6% (Global). Finally, the REs comparing model MC_C to MC_AC for trait DTF in terms of MSE for each year and across years were 1.047 (year 1), 1.009 (year 2), 1.006 (year 3), and 1.016 (Global). This indicates that MC_AC outperformed MC_C in all years by 4.7% (year 1) and 0.9% (year 2), 0.6% (year 3), and 1.6% (Global). For more details, see Table S4.

Figure A1C shows the REs of comparing the MC model to the MC_C model for trait DTH in terms of MSE for each year and across years were 1.461 (year 1), 1.986 (year 2), 1.989 (year 3), and 1.955 (Global). This indicates that MC_C outperformed MC model in all the years by 46.1% (year 1), 98.6% (year 2), 98.9% (year 3), and 95.5% (Global). While the REs of comparing model MC to MC_AC for trait DTH in terms of MSE for each year and across years were 1.554 (year 1), 2.021 (year 2), 2.014 (year 3), and 2.005 (Global). This indicates that MC_AC outperformed MC model in all years by 55.4% (year 1), 102.1% (year 2), 101.4% (year 3), and 100.5% (Global). Finally, the REs of comparing the MC_C to MC_AC for trait DTH in terms of MSE for each year and across years were 1.064 (year 1), 1.017 (year 2), 1.013 (year 3), and 1.026 (Global). This indicates that model MC_AC outperformed MC_C in all years by 6.4% (year 1), 1.7% (year 2), 1.3% (year 3), and 2.6% (Global). For more details, see Table S4.

Figure A1. Prediction performance in terms of mean square error (MSE) for models MC, MC_C, and MC_AC, for each trait in each environment and across environments (Global) under untested lines in tested environments cross-validation strategy. The relative efficiencies (RE_MSE) were computed to compare models MC_C to MC_AC, MC to MC_C, and MC to MC_AC. (A) for GY trait, (B) for DTF trait, and (C) for DTH trait.

Appendix A.2. Models of Type D

We can observe in Figure A2A that the REs of comparing model MD to MD_C for GY trait in terms of MSE for each year and across years were 1.052 (year 1), 0.954 (year 2), 1.129 (year 3), and 1.064 (Global). This indicates that MD_C outperformed model MD in most years: 5.2% (year 1), 12.9% (year 3), and 6.4% (Global). The REs of comparing model MD to MD_AC for GY trait in terms of MSE for each year and across years were 1.053 (year 1), 0.965 (year 2), 1.113 (year 3), and 1.060 (Global). This indicates that MD_AC outperformed MD model in most of the years: 5.3% (year 1), 11.3% (year 3), and 6.0% (Global)—except in year two. Finally, the REs comparing models MD_C to MD_AC for GY in terms of MSE for each year and across years were 1.002 (year 1), 1.012 (year 2), 0.986 (year 3), and 0.996 (Global). This indicates that the MD_AC model slightly outperformed model MD_C by 0.2% (year 1) and 1.2% (year 2). For more details, see Table S2.

In Figure A2B, we can see the REs of comparing model MD to MD_C for trait DTF in terms of MSE for each year and across years: 0.986 (year 1), 1.079 (year 2), 1.169 (year 3), and 1.106 (Global). This indicates that MD_C outperformed the MD model in most years by 7.9% (year 2), 16.9% (year 3), and 10.6% (Global). Meanwhile, the REs from comparing the MD to MD_AC for trait DTF in terms of MSE for each year and across years were 1.020 (year 1), 1.087 (year 2), 1.166 (year 3), and 1.113 (Global). This indicates that model MD_AC outperformed model MD in all years: 2.0% (year 1), 8.7% (year 2), 16.6% (year 3), and 11.3% (Global). Finally, the REs comparing the MD_C to MD_AC for trait DTF in terms of MSE for each year and across years were 1.035 (year 1), 1.008 (year 2), 0.998 (year 3), and 1.007 (Global). This indicates that model MD_AC outperformed, by a small margin, model MD_C in most years; by 3.5% (year 1), 0.8% (year 2), and 0.7% (Global). For more details, see Table S2.

Figure A2C shows the REs of comparing model MD to model MD_C for trait DTH in terms of MSE for each year and across years: 0.987 (year 1), 1.083 (year 2), 1.175 (year 3), and 1.112 (Global). This indicates that model MD_C outperformed model MD in most years: 8.3% (2), 17.5% (3), and 11.2% (Global); meanwhile, the REs from comparing model MD versus MD_AC for trait DTH in terms of MSE for each year and across years were 1.029 (year 1), 1.080 (year 2), 1.174 (year 3), and 1.120 (Global). This indicates that model MD_AC outperformed MD model in all years, by 2.9% (year 1), 8.0% (year 2), 12.0% (year 3), and 12.0% (Global). Finally, the REs comparing model MD_C versus MD_AC for DTH in terms of MSE for each year and across years were 1.043 (year 1), 0.997 (year 2), 0.999 (year 3), and 1.007 (Global). This indicates that model MD_AC outperformed model MD_C by a small margin in year 1 (4.3%) and Global (0.7%). For more details, see Table S5.

Figure A2. Prediction performance in terms of mean square error (MSE) for models MD, MD_C, and MD_AC for each trait in each environment and across environments (Global) under untested lines in a tested environment cross-validation strategy. The relative efficiencies (RE_MSE) were computed to compare models MD_C to MD_AC, MD to MD_C, and MD to MD_AC. (A) for GY trait, (B) for DTF trait, and (C) for DTH trait.

References

Flachowsky, G.; Meyer, U. Challenges for Plant Breeders from the View of Animal Nutrition. Agriculture 2015, 5, 1252–1276. [Google Scholar] [CrossRef]
Meuwissen, T.H.E.; Hayes, B.J.; Goddard, M.E. Prediction of total genetic value using genome-wide dense marker map. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef] [PubMed]
Desta, Z.A.; Ortiz, R. Genomic selection: Genome-wide prediction in plant improvement. Trends Plant Sci. 2014, 19, 592–601. [Google Scholar] [CrossRef] [PubMed]
Roorkiwal, M.; Rathore, A.; Das, R.R.; Singh, M.K.; Jain, A.; Srinivasan, S.; Gaur, P.; Chellapilla, B.; Tripathi, S.; Li, Y.; et al. Genome-enabled prediction models for yield related traits in Chickpea. Front. Plant Sci. 2016, 7, 1666. [Google Scholar] [CrossRef] [PubMed]
Crossa, J.; Pérez-Rodríguez, P.; Cuevas, J.; Montesinos-López, O.A.; Jarquín, D.; de Los Campos, G.; Burgueño, J.; González-Camacho, J.M.; Pérez-Elizalde, S.; Beyene, Y.; et al. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. Trends Plant Sci. 2017, 22, 961–975. [Google Scholar] [CrossRef]
Wolfe, M.D.; Del Carpio, D.P.; Alabi, O.; Ezenwaka, L.C.; Ikeogu, U.N.; Kayondo, I.S.; Lozano, R.; Okeke, U.G.; Ozimati, A.A.; Williams, E.; et al. Prospects for Genomic Selection in Cassava Breeding. Plant Genome 2017, 10. [Google Scholar] [CrossRef]
Huang, M.; Balimponya, E.G.; Mgonja, E.M.; McHale, L.K.; Luzi-Kihupi, A.; Wang, G.-L.; Sneller, C.H. Use of genomic selection in breeding rice (Oryza sativa L.) for resistance to rice blast (Magnaporthe oryzae). Mol. Breed. 2019, 39, 114. [Google Scholar] [CrossRef]
Crossa, J.; Fritsche-Neto, R.; Montesinos-Lopez, O.A.; Costa-Neto, G.; Dreisigacker, S.; Montesinos-Lopez, A.; Bentley, A.R. The Modern Plant Breeding Triangle: Optimizing the Use of Genomics, Phenomics, and Enviromics Data. Front. Plant Sci. 2021, 12, 651480. [Google Scholar] [CrossRef]
Jarquín, D.; Crossa, J.; Lacaze, X.; Du Cheyron, P.; Daucourt, J.; Lorgeou, J.; Piraux, F.; Guerreiro, L.; Pérez, P.; de los Campos, G. A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor. Appl. Genet. 2014, 127, 595–607. [Google Scholar] [CrossRef]
Montesinos-López, O.A.; Montesinos-López, A.; Pérez-Rodríguez, P.; de los Campos, G.; Eskridge, K.M.; Crossa, J. Threshold models for genome-enabled prediction of ordinal categorical traits in plant breeding. G3 Genes Genomes Genet. 2015, 5, 291–300. [Google Scholar] [CrossRef] [Green Version]
Cuevas, J.; Crossa, J.; Soberanis, V.; Pérez-Elizalde, S.; Pérez-Rodríguez, P.; de los Campos, G.; Montesinos-López, O.A.; Burgueño, J. Genomic prediction of genotype × environment interaction kernel regression models. Plant Genome 2016, 9, 1–20. [Google Scholar] [CrossRef]
Xu, S.; Zhu, D.; Zhang, Q. Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc. Natl. Acad. Sci. USA 2014, 111, 12456–12461. [Google Scholar] [CrossRef]
Liang, Z.; Gupta, S.K.; Yeh, C.T.; Zhang, Y.; Ngu, D.W.; Kumar, R.; Patil, H.T.; Mungra, K.D.; Yadav, D.V.; Rathore, A.; et al. Phenotypic Data from Inbred Parents Can Improve Genomic Prediction in Pearl Millet Hybrids. G3 Genes Genomes Genet. 2018, 8, 2513–2522. [Google Scholar] [CrossRef]
Xu, Y.; Zhao, Y.; Wang, X.; Ma, Y.; Li, P.; Yang, Z.; Zhang, X.; Xu, C.; Xu, S. Incorporation of parental phenotypic data into multi-omic models improves prediction of yield-related traits in hybrid rice. Plant Biotechnol. J. 2021, 19, 261–272. [Google Scholar] [CrossRef]
Jarquin, D.; Howard, R.; Liang, Z.; Gupta, S.K.; Schnable, J.C.; Crossa, J. Enhancing Hybrid Prediction in Pearl Millet Using Genomic and/or Multi-Environment Phenotypic Information of Inbreds. Front. Genet. 2020, 10, 1294. [Google Scholar] [CrossRef]
Basnet, B.R.; Crossa, J.; Dreisigacker, S.; Pérez-Rodríguez, P.; Manes, Y.; Singh, R.P.; Rosyara, U.R.; Camarillo-Castillo, F.; Murua, M. Hybrid Wheat Prediction Using Genomic, Pedigree, and Environmental Covariables Interaction Models. Plant Genome 2019, 12, 180051. [Google Scholar] [CrossRef]
Alvarado, G.M.; Lopez-Cruz, M.; Vargas, A.; Pacheco, F.; Rodriguez, J.; Burgueñoo, J.; Crossa, J. META-R: Multi Environment Trial Analysis with R for Windows. Vers. 6.03. hdl:11529/10201, 2015, CIMMYT Research Data & Software Repository Network. Available online: https://excellenceinbreeding.org/toolbox/tools/multi-environment-trail-analysis-r-meta-r (accessed on 1 December 2022).
Van Raden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef]
Technow, F.; Melchinger, A.E. Genomic prediction of dichotomous traits with Bayesian logistic models. Theor. Appl. Genet. 2013, 125, 1133–1143. [Google Scholar] [CrossRef]
Lopez-Cruz, M.; Crossa, J.; Bonnet, D.; Dreisigacker, S.; Poland, J.; Jannink, L.L.; Singh, R.P.; Autrique, E.; de los Campos, G. Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model. G3 Genes Genomes Genet. 2015, 5, 569–582. [Google Scholar] [CrossRef]
Pérez, P.; de los Campos, G. Genome-Wide Regression and Prediction with the BGLR Statistical Package. Genetics 2014, 198, 483–495. [Google Scholar] [CrossRef]
Montesinos-López, O.A.; Montesinos-López, A.; Crossa, J. (Eds.) Multivariate Statistical Machine Learning Methods for Genomic Prediction; Springer International Publishing: Cham, Switzerland, 2022; ISBN 978-3-030-89010-0. [Google Scholar]
Melandri, G.; Liana Monteverde, E.; Riewe, D.; Abdelgawad, H.; Mccouch, S.; Bouwmeester, H. Can biochemical traits bridge the gap between genomics and plant performance? A study in rice under drought. Plant Physiol. 2022, 189, 1139–1152. [Google Scholar] [CrossRef] [PubMed]
Westhues, M.; Schrag, T.A.; Heuer, C.; Thaller, G.; Utz, H.F.; Schipprack, W.; Thiemann, A.; Seifert, F.; Ehret, A.; Schlereth, A.; et al. Omics-based hybrid prediction in maize. Theor. Appl. Genet. 2017, 130, 1927–1939. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Prediction performance in terms of mean square error (MSE) for models MA, MA_C, and MA_AC for each trait in each environment and across environments (Global) under untested lines in tested environments cross-validation strategy. The relative efficiencies (RE_MSE) were computed to compare models MA_C vs. MA_AC, MA vs. MA_C, and MA vs. MA_AC. (A) for GY trait, (B) for DTF trait, and (C) for DTH trait.

Figure 2. Prediction performance in terms of mean square error (MSE) for models MB, MB_C, and MB_AC for each trait in each environment and across environments (Global) under untested lines in a tested environment cross-validation strategy. The relative efficiencies (RE_MSE) were computed to compare models MB_C vs. MB_AC, MB vs. MB_C, and MB vs. MB_AC. (A) for GY trait, (B) for DTF trait, and (C) for DTH trait.

Figure 3. Prediction performance across environments (years) for each trait for all models—MA, MA_C, MA_AC, MB, MB_C, MB_AC, MC, MC_C, MC_AC, MD, MD_C, and MD_AC—in terms of mean squared error (MSE) under untested lines in a tested environment cross validation strategy. The relative efficiencies (RE_MSE) were computed by dividing the MSE of model MX by the MSE of model MX_C or MX_AC model, with X taking the values of (A–D). (A–C) correspond to traits GY, DTF, and DTH fitted to the 12 models shown on the right-hand side. (D–F) correspond to traits GY, DTF, and DTH comparing fitted to models shown on the right hand side.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Montesinos-López, O.A.; Bentley, A.R.; Saint Pierre, C.; Crespo-Herrera, L.; Salinas Ruiz, J.; Valladares-Celis, P.E.; Montesinos-López, A.; Crossa, J. Integrating Parental Phenotypic Data Enhances Prediction Accuracy of Hybrids in Wheat Traits. Genes 2023, 14, 395. https://doi.org/10.3390/genes14020395

AMA Style

Montesinos-López OA, Bentley AR, Saint Pierre C, Crespo-Herrera L, Salinas Ruiz J, Valladares-Celis PE, Montesinos-López A, Crossa J. Integrating Parental Phenotypic Data Enhances Prediction Accuracy of Hybrids in Wheat Traits. Genes. 2023; 14(2):395. https://doi.org/10.3390/genes14020395

Chicago/Turabian Style

Montesinos-López, Osval A., Alison R. Bentley, Carolina Saint Pierre, Leonardo Crespo-Herrera, Josafhat Salinas Ruiz, Patricia Edwigis Valladares-Celis, Abelardo Montesinos-López, and José Crossa. 2023. "Integrating Parental Phenotypic Data Enhances Prediction Accuracy of Hybrids in Wheat Traits" Genes 14, no. 2: 395. https://doi.org/10.3390/genes14020395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Parental Phenotypic Data Enhances Prediction Accuracy of Hybrids in Wheat Traits

Abstract

1. Introduction

2. Materials and Methods

2.1. Phenotypic Data

2.2. Genotypic Data

2.3. Statistical Model

2.3.1. Model MB_AC

2.3.2. Model MA_AC

2.3.3. Model MD_AC

2.3.4. Model MC_AC

2.4. Evaluation of Prediction Performance

3. Results

3.1. Type A Models

3.2. Type B Models

3.3. Comparison across Years

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Models of Type C

Appendix A.2. Models of Type D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI