Next Article in Journal
A Fast Path Planning Method of Seedling Tray Replanting Based on Improved Particle Swarm Optimization
Previous Article in Journal
Effect of Planting Patterns and Seeding Rate on Dryland Wheat Yield Formation and Water Use Efficiency on the Loess Plateau, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genotype Selection for Grain Yield of Sorghum through Generalized Linear Mixed Model

1
Department of Statistics, College of Science, Bahir Dar University, Bahir Dar P.O. Box 79, Ethiopia
2
Department of Statistics, College of Natural Science, Wollo University, Dessie P.O. Box 1145, Ethiopia
3
School of Mathematics, Statistics & Computer Science, University of KwaZulu-Natal, Durban 4041, South Africa
4
Department of Plant Breeding, College of Agriculture, Woldia University, Woldia P.O. Box 53, Ethiopia
5
School of Agricultural, Earth and Environmental Sciences, College of Agriculture, Engineering & Science, Pietermaritzburg, University of KwaZulu-Natal, Durban 4041, South Africa
*
Author to whom correspondence should be addressed.
Agronomy 2023, 13(3), 852; https://doi.org/10.3390/agronomy13030852
Submission received: 22 December 2022 / Revised: 6 March 2023 / Accepted: 10 March 2023 / Published: 14 March 2023
(This article belongs to the Section Farming Sustainability)

Abstract

:
The classical model only provides a correct analysis if all the effects are fixed. For experiments that include fixed and random effects, the general linear mixed model is appropriate for handling the non-normal distributed response variables. The aim of this study is to perform the genotype selection through a generalized linear mixed model and identify the impact of treatment and the related traits on grain yield. The data were collected using a lattice square design and measured the phenotype traits of sorghum. The result of PCA was used as an input variable for the general linear mixed model. The data analysis was performed using a general linear mixed model with maximum likelihood methods to estimate the parameters of the model. The result showed that the grain yield had a gamma distribution and a treatment effect on grain yield. The first principal component was significant for grain yield. The variability of grain yield due to the random effects of replication within treatment, genotype, and the interaction of genotype by treatment were significant. The best genotypes effective for the mass production of sorghum were G137, G66 and G156 under stress conditions and G55, G41 and G78 under irrigated conditions. Overall, genotype selection using a general linear mixed model for grain yield is recommended for genotype selection of plant breeding.

1. Introduction

Ethiopia is the center of origin for sorghum, where the agroecological district zones significantly contribute to the genetic diversity of the crops. It is the main popular cereal crop in arid and semi-arid areas. In addition, it is the main source of food for humans and fodder for animals [1,2,3]. Therefore, selecting the genotypes that generate better grain yield production is essential for the agrarians who produce sorghum for their feed in semi-arid and arid parts of the world. Various crop researchers have implemented the use of the classical analysis of variance (ANOVA) model or general linear model (GLM) that provides a correct analysis only if all the effects are fixed [4]. However, different agronomy experiments are carried out using complete or incomplete block designs with few replications at multiple sites and/or over several years, and the results of the experiment are influenced by both fixed and random factors that are not directly relevant to the goal of the experiment but cause to mask the effect of the treatment. To account for this phenomenon, statistical models, such as linear mixed models or generalized linear mixed models, must be used [3,4,5,6].
A general linear mixed model developed by [7] to deal with non-normal response variables is an alternative approach for dealing with non-normal distributed response variables. The experiment includes both fixed and random effects. It includes a wide range of random effects that satisfy the standard framework for linear mixed models while retaining the flexibility associated with classic linear models [5,6,7,8]. Moreover, using a general linear mixed model to analyze plant pests and diseases in wheat, grapevine, banana, and sweet potato, where the response variables have incidence nature for wheat and grapevine, binary nature for banana, and count nature for sweet potato. Logit link functions were used for the first three crops, and log link functions for the count nature of the response variable [9].
The study on the characteristics of plants with repeated measure experiments, linear mixed, and general linear mixed models is appropriate to analyze the response variable as the observations were correlated. Therefore, the study was conducted to solve the different problems through the linear mixed model, the response variable is normally distributed, and the general linear mixed model for the response variable is non-normally distributed; for the repeated measure experiment on plant height and stem thickness using various distributions such as Poisson, gamma, lognormal, normal and binomial data distribution [4,9].
Researchers conducted a study to identify the ancestor haplotype-based association mapping for stratification, which used a linear function of different covariates incorporating polygenic and local genomic effects to describe the expected value of the observed phenotype via a link function [10]. The observations with a non-normal nature for the response variable must be examined using various techniques, with the general linear mixed model being the most important to solve such a problem. Furthermore, it is common to use principal component analysis to generate uncorrelated new variables that are important to use as independent variables in the generalized linear mixed model [11]. Furthermore, because the mixed model is more accurate than the general linear model, the general linear mixed model is also useful for the genotype selection of sorghum grain yield.
The aim of this study was to perform the genotype selection through a generalized linear mixed model and identify the impact of treatment and the related traits on the grain yield of sorghum.
The article is organized as follows. First, Section 2 describes the data sources, how the data was collected, and the statistical models used for data analysis. Then, Section 3 explains the main results of the study, while Section 4 discusses and compares the results with other scientific works. Finally, the study’s conclusion and recommendations are presented in Section 5.

2. Methods and Data

2.1. Data and Design Description

The data was collected using an incomplete block design called lattice square design having 14 blocks, and each block contains 14 experimental units (plots) where the genotypes were applied and two replications for each treatment level that are a non-irrigated condition having a lack of water, and the condition having sufficient [12,13]. The experiment was held at the Kobo site of Sirinka agriculture research center in Ethiopia. The experiment was designed to measure the phenotypic characteristics of sorghum, through which 14 phenotypic characteristics were measured. From the measurements, grain yield is considered the response variable. It assumes the treatment and the results of the principal component analysis as a fixed effect since the related yield traits are correlated, which leads to generating an uncorrelated input factor for the model. The result of the principal component analysis decided to generate three components which accounted for 77.46% of the total variance of the originally related yield traits. Additionally, replication within treatment, genotype, and the interaction of genotype by treatment are random effects of the model for the study [11].

2.2. Generalized Linear Mixed Model

Let y represent the vector of the observed phenotypic measured data values, particularly the grain yield of sorghum, which is the non-negative measurement, and the distribution of the grain yield is non-normally distributed. The problem of non-normality of the response variable is solved through the Generalized Linear Mixed Model (GLMM), which is an extension of the general linear model by including the random effects in addition to fixed effects [8,14,15,16]. The advantage of this model is using an exponential family member of a probability distribution that incorporates the link functions for the distributions of the response variables. The vector of the response variable given the vector of the random effect such that the conditional distribution of the response given the random effect is a member of the exponential family with probability density function
f i y i / γ = e x p y i ξ i + b ξ i a ϕ + c i y i ; ϕ
where b . ,   a . ,   and   c . are known functions, and ϕ and μ are the dispersion parameter and are associated with the condition mean, respectively, on the other hand, related to the linear predictor. GLMM is a linear function of the various covariates to describe the expected value of the observed data and allows the response variable from different distributions which belong to exponential family density [8,14,17], and the conditional mean is formulated as
E Y i j k l ρ α , γ , α γ = g 1 ω + β 1 m 1 l + β 2 m 2 l + β 3 m 3 l + α i + ρ j i + γ k + α γ i k
where g . is the differentiable monotonic link function having the inverse g 1 . . Linear predictor of GLMM incorporates the fixed and random effect and excludes the error term and the inverse of the link function [15]. GLMM accommodates the non-normal distributed response, uses the nonlinear link between the mean of the response and the predictors, and allows correlation in the data [18]. The conditional mean is associated with a linear predictor through the link function g μ = η [17], and is given as
η i j k l = ω + β 1 m 1 l + β 2 m 2 l + β 3 m 3 l + α i + ρ j i + γ k + α γ i k
where η i j k l is the linear predictor of the lth observation of ith treatment of jth replication and kth genotype; ω is the overall mean; β 1 ,   β 2 and β 3 are the coefficients for the 1st, 2nd and 3rd principal components, respectively, which are considered to be fixed effects for a general linear mixed model; α i is the ith treatment effect, which is a fixed effect; γ k is the random effect of the kth genotype; ρ j i is the random effect of the jth replication nested within the ith treatment; α γ i k is the interaction effect of the ith treatment by kth genotype. In this case, ρ j i ,   γ k , α γ i k and ε i j k t are random effects, and they are uncorrelated to each other. The effect of genotypes, the effect of replication within the treatment, and the effect of the interaction of treatment by genotype are normally distributed with mean zero and variances σ γ 2   ,   σ ρ α 2 and σ α γ 2 , respectively.
The variance of the observations given the random effects [19,20], denoted by v a r y i j k l ρ α , γ , α γ , is given as
v a r y i j k l ρ α , γ , α γ = a i ϕ v μ i j k l
where a i ϕ is a known function and equivalent to ϕ w i with w i being the weight and v . is the variance function. The response variable given the random effect is conditionally independent with mean E Y i j k l ρ α , γ , α γ = μ i j k l , which is related to the linear predictors via the link function provided above, and variance v a r y i j k l ρ α , γ , α γ = a i ϕ v μ i j k l [17,20].
The vector notation of the conditional mean, having observed response y and design matrices X and Z for fixed and random effects, respectively, are provided [20] as
E y / b = g 1 X β + Z b
where β and b are the fixed and random effects parameters. The variance of the random effect is normally distributed with mean 0 and covariance D , which is depend on the unknown variance components.
v a r y / b = D 1 2 R D 1 2
where D is the variance that is a function of μ i j k l , R is the residual variance.
The estimation method for parameters is estimated by maximum likelihood methods. Still, the marginal distribution for maximum likelihood includes an integration that does not close due to the incorporation of the random effects and non-linearity of the model. Thus, the maximum likelihood has no analytic solution and applies a numeric solution to estimate the parameters of the general linear mixed model [7,17,20,21,22].
The pseudo-likelihood method equivalently penalized quasi-likelihood, which is developed under the assumption that the residuals are well approximated by a normal distribution, estimates the model parameters of GLMM by using a linearization technique, which applies Taylor expressions iteratively to approximate the initial GLMM with a linear mixed model [8,18,20]. The pseudo-likelihood estimation uses the results of the LMM and GLM as starting values of the GLMM. The predictors of the random effects are the estimated best linear unbiased predictors, referred to as EBLUP in the approximated linear mixed model [18,19,23]. The study used the GLIMMIX procedure of SAS version 9.4 for data analysis.

3. Results of the Study

3.1. Descriptive Summary

The overall mean and the standard deviation of the grain yield under the treatments’ levels indicate the difference in grain yield among the treatment levels. Under stress conditions, the grain yield was distributed around the average 2.48 ± 0.71 t/ha under stress. For irrigated environments, the distribution of grain yield was distributed around 3.17 ± 1.15 t/ha. The average grain yield varies among the levels of treatment, which explain the influence of drought sorghum production. The overall mean and standard deviation of the log of the grain yield were 0.98106 and 0.3305, respectively.
Figure 1 indicates the distribution of the grain yield, which is skewed distribution (red color denotes the kernel of the data (a)) that explains the non-normality of the grain yield as the normal distribution and the kernel density of grain yield did not overlap or fit a normal distribution as a result of non-normality of the grain yield. On the other hand, the distribution of grain yield approximately tends to the gamma distribution (the blue color curve denotes the gamma density (b)), which is very important to model the skewed datasets. Therefore, the grain yield has no normal distribution and assumes another distribution to fit the observed data.
Figure 2 indicates the distribution of grain yield relative to inverse Gaussian distribution that can be a candidate for the distribution of grain yield. Inverse Gaussian distribution is the other distribution we may use as the distribution of the grain yield and, finally, identify the better distribution that represents the distribution of the grain yield. Figure 1 and Figure 2 above indicate that gamma and inverse Gaussian distributions are closed to represent the grain yield distribution. These distributions will be checked to fit grain yield data using different link functions for the distributions.

3.2. Results of GLMM

The result of GLMM depends on the distribution of the grain yield, which tends to have a gamma or inverse Gaussian distribution. Still, the result of GLMM is based on the inverse Gaussian distribution because the result of GLMM using gamma distribution did not converge for all link functions. Therefore, the main reason for applying a general linear mixed model with the non-normality of the response variable and identifying link functions for the model is performed using the model diagnosis technique. In this case, the link function that provides the best model is preferable for the specified distribution of the response variable, grain yield.
Table 1 shows the results of model diagnosis for various link functions using Inverse Gaussian and gamma distribution. The result was manipulated using different link functions (inverse, inverse square and log) by considering inverse Gaussian and gamma distributions for grain yield. Depending on the result, the link function for inverse Gaussian distribution, in which the link function generates the smallest pseudo-AIC, is the inverse square. The link function for the gamma distribution of the grain yield is also inverse square and generates the smallest pseudo-AIC. Finally, the better distribution that can be used for grain yield is compared using information criteria for which the distribution that generates the smallest AIC would be the better one. According to the information criteria, in the general linear mixed model, the grain yield has a gamma distribution that is better than a model with inverse Gaussian distribution for grain yield. For overall analysis, the gamma distribution is the best distribution for the grain yield analysis via GLMM.
Table 2 shows the covariance estimates of the random effects with their corresponding standard errors. The result is obtained from the generalized linear mixed model, which considers the grain yield’s gamma distribution and the distribution’s inverse square link function. According to the result, the variability of grain yield associated with the variability of replication within treatment, genotype, and the interaction of genotype are 0.000034, 0.0031 and 0.000742 t/ha with their corresponding standard errors of 0.000043, 0.000383 and 0.000107, respectively. The variability of grain yield is 56.56%, and 13.66%, associated with the random variables related to genotype. The interaction of genotype by treatment is the intraclass correlation of the genotype and the interaction of genotype by treatment to the variability of the grain yield.
Table 3 indicates the results of the tests of covariance parameters based on the likelihood. The tests for a diagonal G matrix by setting the Off-diagonal elements in the G matrix to zero against the Off-diagonal elements in G is non-zero, and the R-side structure is not modified. This test uses an inverse square link function based on the grain yield’s gamma distribution. According to the result of the tests of the covariance parameters based on the likelihood, the covariance of the random effect is a diagonal matrix as the chi-square is zero with a small p-value greater than 0.05 indicating the random effects are uncorrelated or independent. As a result, genotype, replication nested with treatment, and interaction of genotype by treatment are independent of each other. The test of independence tests the generalized linear mixed model against a null model of complete independence, which is a general linear model. The result shows that the G-side covariance parameters did not eliminate, and all R-side structures were not reduced to a diagonal model because the null model of complete independence is rejected at a 5% level of significance. Finally, no G-side effects tests whether the covariance matrix of the random effects (G matrix) is reduced to zero or not to indicate the presence of significant random effects in a generalized linear mixed model. The result indicates that the random effects did not eliminate from the general linear mixed model as the test rejected the null hypothesis of no G-side effects with a p-value less than 0.05.
Table 4 presents the least square means of the levels of the treatment with the model scale and data scale form. The mean column indicates the data scale obtained by converting the model scale least square means back to the original data scale using the inverse square link function. For example, the average grain yield of sorghum was 2.275 t/ha under rainfed (non-irrigated) conditions, whereas, under irrigation, the average sorghum production was 2.629 t/ha, which shows the effect of treatment difference on grain yield.
Figure 3 indicates the mean difference of the levels of the treatment that explained the grain yield mean under treatment was significantly different as the least square means plot did not cover the same bar.
Table 5 presents the fixed effects estimates for the generalized linear mixed model. The average grain yield is 0.1933 t/ha for production under stress conditions of the treatment while using irrigation for sorghum production; the average grain yield is 0.1447 t/ha. The average change of the grain yield for a unit change of the first, second and third principal components are −0.01723, 0.00196 and −0.00006 t/ha, respectively. The result identified that the first and third principal components have a negative impact on grain yield. In contrast, the second principal component positively correlates with grain yield. The treatment effect and the first principal component were a significant effect on grain yield, while the second and third principal components were insignificant on grain yield at a 5% significance level. The overall test of fixed effect indicates that the treatment effect and first principal components were significant, while the second and third principal components were insignificant for grain yield.
Table 6 presents the worst- and best-performer genotypes on grain yield, which depend on the best linear unbiased genotype prediction. The model results were obtained using gamma distribution with a link function of inverse square. According to the results of BLUP, the last three worst-performer genotypes are G196 (Genotype 196), G48 (Genotype 48) and G186 (Genotype 186) and their estimates are −0.08483, −0.07241 and −0.06368, respectively. The top three best-performer genotypes are G41 (Genotype 41), G78 (Genotype 78) and G66 (Genotype 66), with their estimates of 0.2424, 0.1772 and 0.1733 t/ha, respectively. The worst- and best-performer genotypes significantly affect the grain yield variability, and the estimates are accurate as their standard errors are small.
Table 7 shows the performance of the genotypes under irrigation that helps to identify the genotype performing better and worse under irrigation. Depending on the result, G137 (Genotype 137), G156 (Genotype 156) and G157 (Genotype 157) are the last three genotypes that perform poorly under irrigation treatment and do not recommend the genotypes for mass production under irrigation conditions. The BLUP estimates of the last three worst-performer genotypes are −0.06484, −0.05602 and −0.05325, respectively, and the estimates have the least standard error that indicates the precision of the estimates. The best-performer genotypes under an irrigated environment are G55 (Genotype 55), G41 (Genotype 41) and G78 (Genotype 78) and these genotypes are recommended for mass production under irrigation-supported production of sorghum. The BLUP estimate of the genotypes that perform better under irrigation-supported sorghum production is 0.0658, 0.0556 and 0.0521, respectively. The result shows that the top three best-performer genotypes are applicable to the irrigated environment, while the last three worst-performer genotypes are not applicable to the irrigated environment.
Table 8 shows the performance of genotypes under stress environments, ranked using BLUP estimates of the genotypes in the specified environment. The result indicates that the last three worst-performer genotypes under a stress environment are G55 (Genotype 55), G49 (Genotype 49) and G8 (Genotype 8), which yield the least prediction of grain yield. The BLUP estimates were −0.04783, −0.03849 and −0.03324, respectively; these genotypes did not resist the drought and were not recommended to use as mass production of the sorghum under stress conditions. On the other hand, the top three best-performer genotypes under a stress environment are G137 (Genotype 137), G66 (Genotype 66) and G156 (Genotype 156), which have the best capacity to resist the impact of drought as they performed better under drought conditions/stress. The BLUP estimates are 0.09289, 0.07757 and 0.07572 since the BLUP estimate is large with high precision and the random effect is significant for the variability of the grain yield.
Figure 4 presents the model diagnosis of a general linear mixed model using the conditional residuals (left-hand side of Figure 3) and marginal residuals (right-hand side of Figure 3). According to the plot, the top left of the conditional residual shows that the residual is related to the predictors and shows some trend between the predictor. The residual and the marginal residual plot indicate the presence of some trend which shows the non-normality of the residual. The distribution of the conditional residuals is smaller and tends to be normal relative to the distribution of marginal residuals.

4. Discussion

This study tried to examine the way of genotype selection of sorghum using a general linear mixed model that solves the problem of non-normally distributed response, in this case, grain yield, which violated the assumption of normality, constant variance, and linearity, depending on different fixed and random factors [6,24]. The study showed the influence of fixed and random effects on grain yield through the general linear mixed model that demonstrates the relationship between the grain yield and the treatment, and the new variables created by the principal analysis, which are orthogonal to each other, considered as a fixed effect of the study and replication with the treatment, genotype, and the interaction of genotype by the treatment, which are assumed as random effects [25,26,27].
According to the data exploration of grain yield, the distribution of the grain yield revealed non-normality, and the result showed that the grain yield has a gamma distribution and inverse Gaussian distribution for further analysis. This study agrees with an investigation by [12] that applied a general linear model which required a normality assumption of the grain yield [12]. Therefore, identifying the distribution of the grain yield is essential to model the observed data through a general linear mixed model. Furthermore, this study agreed with the study on longitudinal outcomes using a general linear mixed model by other investigators [28,29].
The result shows that the intraclass correlations of genotype and interaction of genotype by treatment were 56.56% and 13.66%, respectively, which are the total variability of the grain yield associated with the random effects such as genotype, replication within the treatment and the interaction of the genotype by treatment. The test of the covariance parameter indicated the presence of a significant dependence and the G-side effect for grain yield. The study agreed with investigations that explain the association between the grain yield and random effects such as the genotype, replication within the treatment and the interaction of the genotype by treatment [1].
The result showed that the average grain yield is 0.1933 t/ha for production without irrigation treatment, and for irrigation treatment, the average grain yield is 0.2419 t/ha. The average change of the grain yield for a unit alter of the first, second and third principal components analysis are −0.01723, 0.00196 and −0.00006 t/ha, respectively. The effects of the treatment and the first components of the phenotypic traits are highly significant on the grain yield of sorghum. This study agreed with the studies conducted by other investigators that found the impact of climate change on the grain yield of sorghum [1,12,30,31].
The performance of the genotype was evaluated through the empirical restricted maximum likelihood that helps to rank the performance of the genotypes for the grain yield. The results of GLMM indicated that the best- and worst-performer genotypes were selected for sorghum. The best-performer genotypes were G41 (0.2424 t/ha), G78 (0.1772 t/ha) and G66 (0.1733 t/ha), which were recommended for any environment. On the other hand, the worst-performer genotypes were G196 (−0.08483 t/ha), G48 (−0.07241 t/ha) and G186 (−0.06368 t/ha), which are not effective for any environment. This result also agreed with the study conducted on the Genetic diversity study of sorghum (Sorghum bicolor (L.) Moenc) genotypes in Ethiopia [3,25,32].
Under irrigated treatment, identifying the genotypes that outperform grain yield production of sorghum and selecting the best- and worst-performer genotypes are the most important case for genotype selection. For example, genotypes that performed better in grain yield were G55 (0.0658 t/ha), G41 (0.0556 t/ha) and G78 (0.0521 t/ha) and recommended for mass production, whereas G137 (−0.06484 t/ha), G156 (−0.05602 t/ha) and G157 (−0.05325 t/ha) were the worst-performer genotypes under an irrigated environment and were not recommended for such environments [12].
On the other hand, the genotypes that were evaluated for their performance under stress environments and best-performer genotypes for stress environments were G137 (0.1108), G66 (0.1046) and G156 (0.1031), which were recommended for the drought environment. On the other hand, genotypes, which were not recommended for mass production, are G55 (Genotype 55), G49 (Genotype 49) and G8 (Genotype 8) for drought environments [1,12,25].

5. Conclusions

The descriptive summary of the treatment identified that there were differences in grain yield among the treatment level ( 2.48 ± 0.71 t/ha for the stress condition and 3.17 ± 1.15 t/ha for the irrigated condition) and the first three principal components were significant on grain yield of sorghum.
This study explained the effect of fixed and random factors on the grain yield of sorghum through the generalized linear mixed model. The distribution of the grain yield, which was the response variable for the study, was non-normal. Rather, it considered a gamma distribution and Inverse Gaussian distribution as candidate distributions for the analysis of the general linear mixed model of grain yield. Therefore, the gamma distribution was selected for the analysis using information criteria with the smallest AIC, and the associated link function of the model was inverse square.
The variability of grain yield due to the random effects of replication within treatment, genotype, and the interaction of genotype by treatment, were significant. The best genotypes that are efficient for the mass production of sorghum were G137, G66 and G156 under stress conditions and G55, G41 and G78 under irrigated conditions.
In conclusion, a general linear mixed model for grain yield is recommended when it is non-normally distributed and is important for the genotype selection of plant breeding. Therefore, future work will focus on modeling the grain yield using the general additive mixed model to identify the relationship between the continuous covariates and the grain yield in addition to the random effect.

Author Contributions

M.T. was engaged in the study from data management, data analysis, and drafting and revising the final manuscript. T.Z. and D.B.B. contributed to the conception, design, interpretation of data, and manuscript review and revisions. S.A.D. and M.L. contributed the data for the study. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We acknowledge Bahir Dar University and Wollo University for offering admission and a monthly salary to the Ph.D. candidate, respectively.

Conflicts of Interest

The authors declare that there are no conflict of interest regarding the publication of this article.

References

  1. Amare, K.; Zeleke, H.; Bultosa, G. Variability for yield, yield related traits and association among traits of sorghum (Sorghum bicolor (L.) Moench) varieties in Wollo, Ethiopia. J. Plant Breed. Crop Sci. 2015, 7, 125–133. [Google Scholar]
  2. Enyew, M.; Feyissa, T.; Carlsson, A.S.; Tesfaye, K.; Hammenhag, C.; Geleta, M. Genetic Diversity and Population Structure of Sorghum [Sorghum bicolor (L.) Moench] Accessions as Revealed by Single Nucleotide Polymorphism Markers. Front. Plant Sci. 2021, 12, e12830. [Google Scholar] [CrossRef] [PubMed]
  3. Tesfaye, K. Genetic diversity study of sorghum (Sorghum bicolor (L.) Moenc) genotypes, Ethiopia. Acta Univ. Sapientiae Agric. Environ. 2017, 9, 44–54. [Google Scholar] [CrossRef] [Green Version]
  4. Torres, V.; Medina, Y.; Rodríguez, Y.; Sardiñas, Y.; Herrera, M.; Rodríguez, R. Application of the linear mixed and generalized mixed model as alternatives for analysis in experiments with repeated measures. Cuba. J. Agric. Sci. 2019, 53, 7. [Google Scholar]
  5. Zhao, Y.; Staudenmayer, J.; Coull, B.A.; Wand, M.P. General design Bayesian generalized linear mixed models. Stat. Sci. 2006, 21, 35–51. [Google Scholar] [CrossRef] [Green Version]
  6. Yang, R.-C. Towards understanding and use of mixed-model analysis of agricultural experiments. Can. J. Plant Sci. 2010, 90, 605–627. [Google Scholar] [CrossRef]
  7. Nelder, J.A.; Wedderburn, R.W. Generalized linear models. J. R. Stat. Soc. Ser. A (Gen.) 1972, 135, 370–384. [Google Scholar] [CrossRef]
  8. Dean, C.; Nielsen, J.D. Generalized linear mixed models: A review and some extensions. Lifetime Data Anal. 2007, 13, 497–512. [Google Scholar] [CrossRef]
  9. Michel, L.; Brun, F.; Makowski, D. A framework based on generalised linear mixed models for analysing pest and disease surveys. Crop Prot. 2017, 94, 1–12. [Google Scholar] [CrossRef]
  10. Zhang, Z.; Guillaume, F.; Sartelet, A.; Charlier, C.; Georges, M.; Farnir, F.; Druet, T. Ancestral haplotype-based association mapping with generalized linear mixed models accounting for stratification. Bioinformatics 2012, 28, 2467–2473. [Google Scholar] [CrossRef] [Green Version]
  11. Jain, S.; Patel, P. Principal component and cluster analysis in sorghum (Sorghum bicolor (L.) Moench). Forage Res. J. 2016, 42, 90–95. [Google Scholar]
  12. Derese, S.A.; Shimelis, H.; Mwadzingeni, L.; Laing, M. Agro-morphological characterisation and selection of sorghum landraces. Acta Agric. Scand. Sect. B—Soil Plant Sci. 2018, 68, 585–595. [Google Scholar] [CrossRef]
  13. Bose, R.C. A note on the resolvability of balanced incomplete block designs. Sankhyā Indian J. Stat. 1942, 6, 105–110. [Google Scholar]
  14. Tuerlinckx, F.; Rijmen, F.; Verbeke, G.; De Boeck, P. Statistical inference in generalized linear mixed models: A review. Br. J. Math. Stat. Psychol. 2006, 59, 225–255. [Google Scholar] [CrossRef] [PubMed]
  15. Kachman, S.D. An introduction to generalized linear mixed models. In Proceedings of the Symposium at the Organizational Meeting for a NCR Coordinating Committee on “Implementation Strategies for National Beef Cattle Evaluation”, Athens, Greece; pp. 59–73. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=4f1c417fb9bac04309f0ba9b0ec44d5ef51e36aa (accessed on 1 December 2022).
  16. Moscatelli, A.; Mezzetti, M.; Lacquaniti, F. Modeling psychophysical data at the population-level: The generalized linear mixed model. J. Vis. 2012, 12, 26. [Google Scholar] [CrossRef] [Green Version]
  17. Bakbergenuly, I.; Kulinskaya, E. Meta-analysis of binary outcomes via generalized linear mixed models: A simulation study. BMC Med. Res. Methodol. 2018, 18, 70. [Google Scholar] [CrossRef]
  18. McCulloch, C.E. An Introduction to Generalized Linear Mixed Models. 1997. Available online: https://ecommons.cornell.edu/bitstream/handle/1813/31937/BU-1340-MA.pdf?sequence=1 (accessed on 1 December 2022).
  19. Capanu, M.; Gönen, M.; Begg, C.B. An assessment of estimation methods for generalized linear mixed models with binary outcomes. Stat. Med. 2013, 32, 4550–4566. [Google Scholar] [CrossRef] [Green Version]
  20. Breslow, N.E.; Clayton, D.G. Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 1993, 88, 9–25. [Google Scholar]
  21. Vazquez, A.; Bates, D.; Rosa, G.; Gianola, D.; Weigel, K. an R package for fitting generalized linear mixed models in animal breeding. J. Anim. Sci. 2010, 88, 497–504. [Google Scholar] [CrossRef] [Green Version]
  22. De Resende, M.D.V.; Alves, R.S. Linear, generalized, hierarchical, Bayesian and random regression mixed models in genetics/genomics in plant breeding. Funct. Plant Breed. J. 2020, 2, 121–152. [Google Scholar]
  23. Groll, A.; Tutz, G. Regularization for generalized additive mixed models by likelihood-based boosting. Methods Inf. Med. 2012, 51, 168–177. [Google Scholar] [PubMed] [Green Version]
  24. Chen, J.; Zhang, D.; Davidian, M. A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics 2002, 3, 347–360. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Coulibaly, Y.; Mulia, R.; Sanou, J.; Zombré, G.; Bayala, J.; Kalinganire, A.; Van Noordwijk, M. Crop production under different rainfall and management conditions in agroforestry parkland systems in Burkina Faso: Observations and simulation with WaNuLCAS model. Agrofor. Syst. 2014, 88, 13–28. [Google Scholar] [CrossRef]
  26. Gianinetti, A. Basic features of the analysis of germination data with generalized linear mixed models. Data 2020, 5, 6. [Google Scholar] [CrossRef] [Green Version]
  27. Baoua, I.; Amadou, L.; Oumarou, N.; Payne, W.; Roberts, J.; Stefanova, K.; Nansen, C. Estimating effect of augmentative biological control on grain yields from individual pearl millet heads. J. Appl. Entomol. 2014, 138, 281–288. [Google Scholar] [CrossRef] [Green Version]
  28. Gebregziabher, M.; Zhao, Y.; Dismuke, C.; Axon, N.; Hunt, K.; Egede, L. Joint modeling of multiple longitudinal cost outcomes using multivariate generalized linear mixed models. Health Serv. Outcomes Res. Methodol. 2013, 13, 39–57. [Google Scholar] [CrossRef] [PubMed]
  29. Boykin, D.; Camp, M.J.; Johnson, L.; Kramer, M.; Meek, D.; Palmquist, D.; Vinyard, B.; West, M. Generalized linear mixed model estimation using PROC GLIMMIX: Results from simulations when the data and model match, and when the model is misspecified. In Proceedings of the 22nd Conference on Applied Statistics in Agriculture, Manhattan, KS, USA, 25–28 April 2010; pp. 137–156. [Google Scholar]
  30. Prasad, P.V.; Boote, K.J.; Allen, L.H., Jr. Adverse high temperature effects on pollen viability, seed-set, seed yield and harvest index of grain-sorghum [Sorghum bicolor (L.) Moench] are more severe at elevated carbon dioxide due to higher tissue temperatures. Agric. For. Meteorol. 2006, 139, 237–251. [Google Scholar] [CrossRef]
  31. Masasi, B. Evaluating the Impacts of Variable Irrigation Management Strategies on the Performance of Cotton and Grain Sorghum Using Monitoring and Modeling Techniques. Master’s Thesis, Oklahoma State University, Stillwater, OK, USA, 2019. [Google Scholar]
  32. Assefa, A.; Bezabih, A.; Girmay, G.; Alemayehu, T.; Lakew, A. Evaluation of sorghum (Sorghum bicolor (L.) Moench) variety performance in the lowlands area of wag lasta, north eastern Ethiopia. Cogent Food Agric. 2020, 6, 1778603. [Google Scholar] [CrossRef]
Figure 1. Histogram of Grain yield relative to kernel density and normal distribution (a) and the histogram of grain yield associated with the gamma distribution (b).
Figure 1. Histogram of Grain yield relative to kernel density and normal distribution (a) and the histogram of grain yield associated with the gamma distribution (b).
Agronomy 13 00852 g001
Figure 2. Histogram of Grain yield relative to inverse Gaussian. Grain yield is represented in X-axis, and the percent of the values in a certain interval is represented in Y-axis.
Figure 2. Histogram of Grain yield relative to inverse Gaussian. Grain yield is represented in X-axis, and the percent of the values in a certain interval is represented in Y-axis.
Agronomy 13 00852 g002
Figure 3. Least square differences of the treatment.
Figure 3. Least square differences of the treatment.
Agronomy 13 00852 g003
Figure 4. Conditional and marginal residual plots for a general linear mixed model.
Figure 4. Conditional and marginal residual plots for a general linear mixed model.
Agronomy 13 00852 g004
Table 1. Summary results to compare and select the best distribution and link functions for grain yield analysis.
Table 1. Summary results to compare and select the best distribution and link functions for grain yield analysis.
DistributionInverse GaussianGamma
Link FunctionInverseInverse SquareLogInverseInverse SquareLog
−2 Res Log PL−3212.88−3285.40−1873.19−3239.19−3324.15−1899.14
Pseudo−AIC−3204.88−3277.40−1865.19−3231.19−3316.15−1891.14
Pseudo−BIC−3207.34−3279.85−1867.64−3233.64−3318.61−1893.60
Pseudo−HQIC−3210.27−3282.79−1870.57−3236.58−3321.54−1896.53
Generalized χ 2 0.420.510.431.041.231.04
Gen. χ 2 /DF0.000.000.000.000.000.00
Table 2. Covariance Parameter Estimates of the random effects of GLMM.
Table 2. Covariance Parameter Estimates of the random effects of GLMM.
Covariance Parameter EstimateStd. Error
Replication 0.0000340.000034
Genotype 0.0030710.000466
Treatment * Genotype 0.0007420.000101
Residual0.0015830.000149
* interaction.
Table 3. Tests of Covariance Parameters Based on the residual Pseudo-Likelihood.
Table 3. Tests of Covariance Parameters Based on the residual Pseudo-Likelihood.
LabelDF−2 Res Log LikeChiSqPr > ChiSqNote
Diagonal G0−3324.150.001.0000DF
Independence3−2879.16444.99<0.0001MI
No G-side effects3−2879.16444.99<0.0001MI
Table 4. Treatment Least Square Means.
Table 4. Treatment Least Square Means.
Model ScaleData Scale
TreatmentEstimateSt. Err.DFt ValuePr > |t|LLULMeanSt Err.LLUL
Irrigated0.14470.00613223.620.00180.11830.17102.62930.055672.41822.9075
Stress0.19330.00615231.440.00100.16680.21972.27480.036182.13352.4485
St. Err. = Standard error, DF = degrees of freedom, LL = Lower Limit, UL = Upper Limit.
Table 5. The estimates and tests of the fixed effects of the GLMM.
Table 5. The estimates and tests of the fixed effects of the GLMM.
Solutions for Fixed Effects
EffectEstimateStd. Errort ValuePr > |t|
Intercept0.19330.00614731.440.0010
Treatment (Irrigation)−0.04860.006749−7.200.0187
Treatment (Stress)0
Prin1−0.017230.001066−16.17<0.0001
Prin20.001960.0011481.710.0885
Prin3−0.000060.000618−0.100.9213
Overall Tests of the Fixed effects
EffectNum DFDen DFF ValuePr > F
Treatment 1251.860.0187
Prin11387261.50<0.0001
Prin213872.920.0885
Prin313870.010.9213
Table 6. Genotype performance for GLMM.
Table 6. Genotype performance for GLMM.
Worst-Performer Genotypes for GLMMBest-Performer Genotypes for GLMM
GenotypeEstimateStd.Err.Predt Value
(Df = 387)
Pr > |t|GenotypeEstimateStd.Err.Predt Value
(Df = 387)
Pr > |t|
G196−0.084830.01905−4.45<0.0001G410.24240.025479.52<0.0001
G48−0.072410.0191−3.790.0002G780.17720.024697.18<0.0001
G186−0.063680.01961−3.250.0013G660.17330.022947.55<0.0001
G40−0.060710.01977−3.070.0023G1360.16390.023836.88<0.0001
G143−0.060010.01906−3.150.0018G390.15880.02326.84<0.0001
G187−0.051320.01933−2.660.0082G1080.15440.023996.43<0.0001
G25−0.050270.01913−2.630.0089G1800.14720.022876.44<0.0001
G31−0.049280.01972−2.50.0128G850.12440.022295.58<0.0001
G23−0.04790.01896−2.530.0119G560.1220.022775.36<0.0001
G5−0.047710.01899−2.510.0124G1370.11610.022365.19<0.0001
Std.Err.Pred = standard Error Prediction, Df = degrees of freedom.
Table 7. The performance of genotypes under irrigation.
Table 7. The performance of genotypes under irrigation.
The Worst-Performer Genotype under IrrigationThe Best-Performer Genotypes under Irrigation
GenotypeEstimateStd.Err.Predt Value (Df = 387)Pr > |t|GenotypeEstimateStd.Err.Predt Value (Df = 387)Pr > |t|
G137−0.064840.02134−3.040.0025G550.06580.021363.080.0022
G156−0.056020.02108−2.660.0082G410.05560.023092.410.0166
G157−0.053250.02015−2.640.0085G780.05210.022622.30.0218
G107−0.04280.0196−2.180.0296G490.042160.020442.060.0398
G155−0.037190.02−1.860.0637G1190.036170.021271.70.0898
G66−0.035690.02176−1.640.1017G360.032220.0211.530.1259
G62−0.035050.01955−1.790.0738G20.030980.019671.570.1161
G167−0.034120.01925−1.770.0771G1080.030340.022261.360.1737
G40−0.03250.01915−1.70.0904G820.026410.021351.240.2168
G104−0.032380.01974−1.640.1018G1850.026220.020971.250.2119
Table 8. The performance of Genotypes under stress.
Table 8. The performance of Genotypes under stress.
The Worst-Performer Genotypes under StressThe Best-Performer Genotypes under Stress
GenotypeEstimateStd.Err.Predt Value (Df = 387)Pr > |t|GenotypeEstimateStd.Err.Predt Value (Df = 387)Pr > |t|
G55−0.047830.02113−2.260.0242G1370.092890.022214.18<0.0001
G49−0.038490.0203−1.90.0587G660.077570.022323.480.0006
G8−0.033240.01854−1.790.0737G1560.075720.021713.490.0005
G25−0.029660.01841−1.610.108G1570.056010.020672.710.007
G196−0.028040.01842−1.520.1287G1470.041830.021371.960.051
G26−0.027360.01933−1.420.1577G1550.038390.02041.880.0606
G7−0.026650.01851−1.440.1507G1070.038380.020021.920.056
G23−0.025880.01843−1.40.1611G1040.031440.020071.570.1181
G16−0.025320.01853−1.370.1726G1360.031370.022421.40.1625
G3−0.025140.01874−1.340.1806G1380.031230.021241.470.1422
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tesfa, M.; Zewotir, T.; Derese, S.A.; Belay, D.B.; Laing, M. Genotype Selection for Grain Yield of Sorghum through Generalized Linear Mixed Model. Agronomy 2023, 13, 852. https://doi.org/10.3390/agronomy13030852

AMA Style

Tesfa M, Zewotir T, Derese SA, Belay DB, Laing M. Genotype Selection for Grain Yield of Sorghum through Generalized Linear Mixed Model. Agronomy. 2023; 13(3):852. https://doi.org/10.3390/agronomy13030852

Chicago/Turabian Style

Tesfa, Mulugeta, Temesgen Zewotir, Solomon Assefa Derese, Denekew Bitew Belay, and Mark Laing. 2023. "Genotype Selection for Grain Yield of Sorghum through Generalized Linear Mixed Model" Agronomy 13, no. 3: 852. https://doi.org/10.3390/agronomy13030852

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop