Next Article in Journal
A Novel Technique to Control the Accuracy of a Nonlinear Fractional Order Model of COVID-19: Application of the CESTAC Method and the CADNA Library
Next Article in Special Issue
Mathematical Modeling of Layered Nanocomposite of Fractal Structure
Previous Article in Journal
Implementation of Digital Technologies into Pre-Service Mathematics Teacher Preparation
Previous Article in Special Issue
Multivariate Multifractal Detrending Moving Average Analysis of Air Pollutants
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

ML-LME: A Plant Growth Situation Analysis Model Using the Hierarchical Effect of Fractal Dimension

1
College of Sciences, Huazhong Agricultural University, Wuhan 430070, China
2
College of Engineering, Huazhong Agricultural University, Wuhan 430070, China
*
Authors to whom correspondence should be addressed.
Current address: Huazhong Agricultural University, Wuhan 430070, China.
These authors contributed equally to this work.
Mathematics 2021, 9(12), 1322; https://doi.org/10.3390/math9121322
Submission received: 7 May 2021 / Revised: 26 May 2021 / Accepted: 2 June 2021 / Published: 8 June 2021
(This article belongs to the Special Issue Fractals, Fractional Calculus and Applied Statistics)

Abstract

:
Rice plays an essential role in agricultural production as the most significant food crop. Automated supervision in the process of crop growth is the future development direction of agriculture, and it is also a problem that needs to be solved urgently. Productive cultivation, production and research of crops are attributed to increased automation of supervision in the growth. In this article, for the first time, we propose the concept of rice fractal dimension heterogeneity and define it as rice varieties with different fractal dimension values having various correlations between their traits. To make a comprehensive prediction of the rice growth, Machine Learning and Linear Mixed Effect (ML-LME) model is proposed to model and analyze this heterogeneity, which is based on the existing automatic measurement system RAP and introduces statistical characteristics of fractal dimensions as novel features. Machine learning algorithms are applied to distinguish the rice growth stages with a high degree of accuracy and to excavate the heterogeneity of rice fractal dimensions with statistical meaning. According to the information of growth stage and fractal dimension heterogeneity, a precise prediction of key rice phenotype traits can be received by ML-LME using a Linear Mixed Effect model. In this process, the value of the fractal dimension is divided into groups and then rices of different levels are respectively fitted to improve the accuracy of the subsequent prediction, that is, the heterogeneity of the fractal dimension. Afterwards, we apply the model to analyze the rice pot image. The research results show that the ML-LME model, which possesses the hierarchical effect of fractal dimension, performs more excellently in predicting the growth situation of plants than the traditional regression model does. Further comparison confirmed that the model we proposed is the first to consider the hierarchy structure of plant fractal dimension, and that consideration obviously strengthens the model on the ability of variation interpretation and prediction precision.

1. Introduction

The most significant basic aspect of agricultural production and related scientific research is the acquisition of crop traits. So far, the main research on rice is focused on the physical and chemical properties of the growing conditions. Jing et al. studied the physicochemical characters and bacterial community structure of rice root in different fertilizers [1], and Mohammad et al. studied the relationship between the growth cycle of rice and the distribution of weeds [2]. Under normal circumstances, the growth of crops is greatly affected by soil physicochemical properties, light intensity and the proportion of air components with the fluctuation of actual situations. There is no doubt that the robustness of the model will be affected if the above-mentioned indicators are used to supervise or predict rice traits, and establish a simple linear model to analyze may result in a lack of comprehensive information extraction.establish a simple linear model and are considered to be random error terms. The same goes for certain correlations among the rice traits, so that the fitted accuracy of linear regression cannot be guaranteed. When the model is applied to the intelligent and automated management of agriculture, these factors will leave some limitations.
More complex and comprehensive properties of an image could be attracted by fractal dimensions. In this paper, given the diversity of the fractal dimension of the sample pictures and its manifold distribution in space, the metrics that are obtained using Locally Linear Embedding (LLE) dimensionality reduction are combined with other phenotypic features as discriminative metrics for subsequent machine learning. Research on the discrimination of crops generally focuses on the variety discrimination, like one study by Zhai et al. who classified crops using an improved non-linear L-ISOMAP dimensionality reduction algorithm [3]; Xue et al. compared the discriminating effects of two artificial neural network algorithms in Chinese herbal medicine. As a result, one of the neural network separators combined from the fractal body characteristics of Chinese herbal medicine [4].
The linear mixed effect model has come into widespread use in agricultural production, animal and plant breeding and botany research, intended mostly for analyzing duplicate data, vertical data, nested data, and cluster data structures. Wang et al. fitted the growth parameters of rice aboveground biomass (AGB) and leaf area index (LAI) by the mixed linear model and multispectral images data from a fixed-wing unmanned aerial vehicle [5]. Li et al., utilized LME to realize the non-destructive estimation of the above-ground biomass on the entire rice growing season with terrestrial laser scanning data [6]. In this study, we tried to take advantage of RAP [7] and fractal dimensions of rice image to estimate rice growth parameters. However, when the linear regression model was used to fit rice traits, the empirical regression function has a tendency of random slope and random intercept, and the residual sequence also exhibited heteroscedasticity (Appendix B). In this case, we tried to stratify the fractal dimension and assume that there was a phenotypic heterogeneity among different fractal dimension groups of rice. After that, a Gaussian Mixture Model and an LME model were introduced to analyze the heterogeneity. The model established in this paper is so updated heterogeneity model in VERBEK [8] that the whole work is more pertinent.
In this study, machine learning discrimination algorithms and an LME model (ML-LME model) were applied to rice growth stage recognition and rice phenotype prediction, and fractal dimensions were introduced as new features to the model as well. The model linked multifarious rice traits that could be automatically measured to identify rice samples at three growth stages (tillering stage, booting and jointing transition period, heading and grain filling transition period), and to predict essential rice growth parameters (fresh weight, dry weight, plant height and green leaf area). It turned out that the heterogeneity of rice fractal dimensions at different growth stages can be modelled by Gaussian Mixture Model which are conducive to the subsequent research on classification, prediction, and clustering of rice species.
More specifically, in the rest of this article, the data set our model performs on will be explained first, then the ML-LME model will be constructed in detail within three separate modules, which are discriminant modules based on machine learning, a GMM-based fractal dimension hierarchical division module and a rice phenotype fitting module based on LME. After that, the performance of module 1 and module 3 on the rice plot data set will be illustrated. During that, module 1 will obtain the optimal machine learning method to discriminant rice growth stage, and module 3 will obtain the optimal fractal dimension method that acquires the most significant heterogeneity and the best ability to make predictions on rice traits. Finally, the model performance and potential contribution will be detailed discussed in the discussion and conclusion sections.
The overall performance of the model was tested. In the process of identifying the growth stage of rice, the model was compared with traditional machine learning methods. By exploring the spatial manifold data features among multiple fractal dimensions of the rice data, the classification precision improved by more than 2%. After that thethe model was used for excavating and testing the hierarchical effect of the fractal dimension. When fitting the fresh weight and dry weight of plants, the intraclass correlation coefficient was adopted to verify the heterogeneity between two traits in different varieties. Combining the hierarchical effect of the fractal dimension and the mixed linear model to predict plant traits, the prediction of the four rice traits was more accurate than the simple linear regression model, the improvement of R R M S E ranges from 0.03% to 9.5% and R 2 could be endowed an improvement from 0.01 to 0.02.

2. Materials and Methods

2.1. Data Source and Description

The data in this paper are derived from the measurement results of rice traits grown in the Potted of Huazhong Agricultural University by Yang’s RAP (Automatic Rice Phenotyping System, 2014). RAP can measure 28 rice traits and traditional plant height, plant width, plant vertical height, plant height/width, side view projected area of rice plant, rice structural parameters, relative frequency and projected area/length by width are involved in this study. For the abbreviations of all rice phenotypic traits could be found in the Abbreviations of this article. The fresh weight (g), dry weight (g), plant height (cm), number of tillers and green leaf area (mm 2 ) are manually measured. The data set used in this paper contains the results of RAP, measuring the phenotypic traits of 521 rice varieties at three different growth stages [7].

2.2. ML-LME Model

As shown in Figure 1, we started with pretreatments of images including background denoising, binarization and grayscale before modelling. The panelists calculated the Box-counting dimension, the Sandbox dimension and the Random Walk fractal dimension, which were built on the above mentioned images. Taking into account the diversity of the fractal dimension and manifold distribution in space, LLE was applied to reduce dimensionality. Subsequently, the combination of indicators obtained by dimensionality reduction and other easy-to-observe rice phenotypic traits was a new machine learning discriminant indicator. The universal performance of four machine learning algorithms (BP neural network, SVM, KNN, decision tree) is compared when discriminating rice growth stage among all 521 varieties. The model with the highest precision are chosen to make a prediction (Module 1). Taking advantage of this model, the rice growth stage could be precisely divided during the rice tillering stage, the transition stage of booting and jointing and the transition stage of heading and grain filling. Whereafter, the distribution characteristics of the fractal dimension of rice were evacuated and fitted with a Gaussian mixture model (Module 2). The experiment proved that improved prediction accuracy was obtained when classifying the rice samples according to the distribution of fractal dimension—the hierarchical effect of the fractal dimension. Setting the hierarchical effect of the fractal dimension to the random effect of LME and the intra-group correlation coefficient have identified that there was a significant difference among various fractal dimension groups of rice phenotypic trait. All indicated that the fractal dimension contained the heterogeneity information of rice traits. Therefore, we classified fractal dimensions of rice after distinguishing different rice growth stages, and finally established an LME prediction model (Module 3) for each type of fractal dimension and corresponding rice phenotypic traits, which further improved the level of automation of rice supervision.

2.2.1. Module 1: Discriminant Module Based on Machine Learning

The concept of the fractal dimension was first proposed to describe complex and irregular physical characteristics [9]. The fractal dimension can reflect the space occupation and complexity of an image [10]. This paper mainly uses the fractal dimension to describe the rice image characteristics and then uses its statistical characteristics to provide more information for subsequent discrimination and prediction. There are many specific calculation methods for the fractal dimension; this article includes three fractal dimensions: the Box-counting dimension, the SandBox dimension and the Random Walk fractal dimension [9,11,12], see Appendix A for details.
Considering the variety of fractal dimensions and rice traditional characteristic traits, we use the LLE algorithm and the Principal Component Analysis algorithm (PCA) to reduce the dimensions of these indicators. Among them, the LLE dimensionality reduction algorithm, as a nonlinear dimensionality reduction method, has a good dimensionality reduction effect for the spatially overlapping data of manifold [13,14].
A three-dimensional scatter plot is introduced to speculate the manifold characteristics between the distribution of multiple fractal dimensions of rice, while the LLE algorithm is applied for experiments. At the same time, traditional PCA is used for comparative analysis.

2.2.2. Module 2: GMM-Based Fractal Dimension Hierarchical Division Module

Bayesian analysis models are widely used to detect heterogeneity [15,16]. Combining an LME model with a Gaussian Mixture Model to analyze heterogeneity was first relaized by Geert Verbek (1996) [8]. By setting the distribution of random effects as a mixture of multiple Gaussian distributions, Verbek proved that the LME model could model the heterogeneity of populations. However, in this study, the rice populations are not divided in advance. Thus, GMM is used to cluster the fractal dimensions thus the hierarchical structure of the rice data can be obtained. The clustering results of the fractal dimensions are set as the random effects of the LME model, then the LME model could be used to evaluate the heterogeneity of rice and improve the efficiency of prediction. In each growth stage of rice, the fractal dimension x of a rice image is supposed to be generated from multiple Gaussian populations with different means and variances, and its prior distribution is:
p ( x ; α , μ , Σ ) = i = 1 N α i N ( x ; μ i , σ i ) .
Among them, μ is the mean value of each Gaussian distribution, Σ is the variance of each Gaussian distribution, α is the proportion and N ( . ; μ i , σ i ) is the Gaussian distribution density function. The EM algorithm is used to fit the rice fractal dimension data. By clustering the fractal dimension, the hierarchical effect of the rice fractal dimension is obtained, and the algorithm will learn the parameters ( α , μ , Σ ) in the prior distribution. For the fractal dimension data within and outside the sample, Bayesian Maximum Posterior Probability Estimation is used to infer the hierarchy of the fractal dimension. Use { x I i } to indicate that the fractal dimension x belongs to the Gaussian population corresponding to the i-th level. Obtained by the Bayesian formula:
p ( x I i | x ; α , μ , Σ ) = p ( x | x I i ; α , μ , Σ ) p ( x I i ; α , μ , Σ ) p ( x I i , x ; α , μ , Σ )
p ( x | x I i ; α , μ , Σ ) p ( x I i ; α , μ , Σ )
= α i N ( x ; μ i , σ i ) .
Then use:
p ( x I i | x ; α , μ , Σ ) = max 1 j N p ( x I j | x ; α , μ , Σ )
to determine the clusters of fractal dimension. Where p ( x I i | x ; α , μ , Σ ) indicate the posterior probability of sample x belongs to cluster I i , which is obtained by the GMM.

2.2.3. Module 3: Rice Phenotype Fitting Module Based on LME

The general expression of the mixed linear model is:
Y = X β + Z μ + ϵ ,
where Y R n × 1 is the observation value vector. X R n × p is the matrix for fixed effects. β R p × 1 is a parameter vector of fixed effects, which is not random. Z R n × q is the design matrix of random effects. μ R q × 1 is the random effect of the model, usually set to obey the normal distribution of 0 mean is a random vector, that is, μ N ( 0 , Λ ) . ϵ R n × 1 is the residual vector, which satisfies ϵ N ( 0 , Σ ) . Assuming E ( μ ) = E ( ϵ ) = 0 , Λ = σ 2 G , Σ = σ 2 R , μ and ϵ are mutually exclusive. G and R are both positive definite matrices. It is usually assumed that G = G ( λ ) , R = R ( γ ) . Therefore, the parameter vector of the model to be estimated is Θ = ( σ , γ , λ ) .
The essence of the mixed linear model is to further model the residuals of the linear model [17]. For the selection of random effects, considering that the linear model cannot completely extract the residual information of the fractal dimension and there is a certain heterogeneity in the growth of rice at different growth stages, indicating that rice has a hierarchical effect on these two variables. Therefore, the fractal dimension and growth stage are set as random effects. The Pearson Correlation Coefficient is considered while screening out the phenotypic traits that are significantly related to the predicted traits as a fixed effect. In this module, plant width, plant vertical height (rice is not straightened, the height of the highest point in the natural state), rice plant texture parameters, projected area of rice plant side view, relative frequency and rice structural parameters are the final selected indicators. L ogarithmic transformation has been applied to all indicators to eliminate the influence of dimension.
The construction, data processing and analysis of the mixed linear model are completed in the R language (version: R i386 3.6.0) through the packages lme4 and lmerTest [18,19].
Various models can be obtained by the interaction between the growth stage and different dimensions, and the optimal model is selected by comprehensively considering the minimum criteria of AIC and BIC [18]. Intra-group correlation coefficient analysis is considered to evaluate the heterogeneity of rice traits among different groups. In the linear Mixed Effect Model, the Intra-group correlation coefficient is calculated through the analysis of variance components:
ρ = σ μ 2 σ μ 2 + σ ϵ 2 ,
where σ μ 2 indicates the variance of random effects in the model, and σ ϵ 2 indicates the variance of the residual. ICC was first used to quantify and evaluate the reliability of measurement [20,21], which is generally between 0 and 1, and the thresholds are 0.4 and 0.75, respectively. An ICC less than 0.4 illustrates poor reliability, while higher than 0.75 illustrates good reliability. In this study, ICC could quantify the heterogeneity of rice varieties between different fractal dimension levels. When an ICC higher than 0.75 occurs, it can be trusted that the rice varieties between different levels are heterogeneous, and the rice varieties within the same hierarchy are clustered and show correlations among similar traits.

3. Results

3.1. The Classification Result

3.1.1. Comparison of Classification and Discrimination Results Based on Traditional Fractal Dimension

Based on the traditional representation attributes, Module 1 contains four machine learning algorithms (BP neural network, Decision Tree, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) algorithm [22,23,24]) and nine fractal dimensions processing methods. The comparison of the performance of four machine learning algorithms combined with different kinds of fractal dimensions’ processing are detailed in Table 1.
It can be found that, on average, when the fractal dimension feature is introduced, the discrimination effect of the classifier has been significantly improved. The most obvious improvement in the discrimination effect is that, after the introduction of the DBCG dimension, the overall classification precision has increased by 1.51%, and the Kappa index has increased by 0.03. In terms of precision, the dimensions of Sandbox and RFD have also increased by 1.35% and 1.44%, respectively. Other fractal dimensions have also increased by close to 1%. From the perspective of Kappa coefficient, there is generally an increase of more than 0.02.
Therefore, the introduction of fractal dimension is of positive significance for the identification of growth stage.

3.1.2. Comparison of the Classification Results of Multiple Fractal Dimensions Using the Dimensionality Reduction Method

Table 2 discusses the classification effects of different classifiers on rice growth stages after combining the seven traditional representation attributes with five fractal dimensions, the dimensions obtained after dimensionality reduction using PCA, and the dimensions obtained after dimensionality reduction using LLE. At the same time, multiple multi-category evaluation indicators are used for measurement.
From Table 2, it can be seen that, after using the LLE dimensionality reduction method, the classification accuracy of KNN and Decision Tree has been improved the most, being improved by 1.62% and 1.15%, respectively, compared with the traditional model of T+ALL. In terms of classification accuracy, KNN and Decision Tree improved by 2.86% and 3.17%, respectively. At the same time, the classification accuracy under the BP neural network + LLE model is the highest, reaching 93.64% and its Kappa coefficient, Micro-F1 and Macro-F1 (0.83, 0.95, 0.95, respectively) have the highest index levels. Therefore, the BP+LLE model is selected as the first part of the ML-MLE model to improve the effect of rice growth stage discrimination.

3.2. The Results of the LME Fitting

The interaction between the growth stage and fractal dimension hierarchy can produce different random effect models. The optimal model is selected by comprehensively considering the minimum AIC and BIC criteria [25,26]. The significance of heterogeneity among rice populations could be evaluated through ICC. Table 3 contains the results of model selection. For fresh weight, the optimal division method is to consider the interaction between DBCB dimension and growth stage. Its intra-group correlation coefficient is 0.802 and the heterogeneity is significant. For dry weight, the optimal division method is to consider the interaction between the DBCB dimension and the growth stage. The intra-group correlation coefficient is 0.881, and the heterogeneity is extremely significant. For plant height, the optimal division method is SFD dimension, but the correlation coefficient within the group is less than 0.4, there is no obvious heterogeneity. For green leaf area, the optimal division is to consider the interaction between SFD dimension and growth stage, the correlation coefficient within the group is less than 0.4, no significant heterogeneity is detected.
In terms of prediction, for each trait, the model obtained by the interaction between the SandBox dimension and the growth stage achieves the best results on both AIC and BIC. The possible reason is that the SandBox dimension considers the centroid of the image when calculating the position, so more emphasis is placed on the part of the rice plant in the image. Therefore, consider using this model to predict rice traits. In Figure 2, predictions considering hierarchies of fractal dimensions have a certain improvement in accuracy and the variation of the observations comparing those of a simple Linear Regression model. This enhancement obviously demonstrates that the proposed hierarchical approach modeling more information of rice phenotypic traits relationship than classical approaches.
Shapiro-Wilk test is performed on the residual distribution of the linear model. The Shapiro-Wilk test can detect whether the sample data set obeys a normal distribution [27]. Table 4 shows the test results of the linear regression model for the prediction of the four rice traits involved in this article. If the residuals of the linear model fail the Shapiro-Wilk test at a higher confidence level (p = 0.01), the assumption that the residuals of the linear model fit are normally distributed can be rejected. This shows that the data set used in this article cannot match the assumptions required by the linear regression model, and the direct use of the linear model for modeling cannot fully extract the correlation information between rice traits.
According to the AIC and BIC model selection result, model SS1, which considers the interaction between rice growth stage and the hierarchy of SandBox dimension as a random effect, are the optimal Linear Mixed Effect model to make prediction. The Shapiro-Wilk test was performed on the residuals of the mixed linear model. Table 5 shows the test results of the residuals of the Linear Mixed Effect model in the four rice traits predicted in this paper. The residuals of this model have passed the normality test on the dry weight and green leaf area, while the normality of the residuals of weight and plant height is rejected. This result shows that, on the data set used in this article, the Linear Mixed Effect model is more suitable than the Linear Regression Model in fitting dry weight and green leaf area.
A box plot is used to more intuitively describe the distribution of the residuals of different models. A box plot could illustrate the dispersion (spread) and skewness in the data. Figure 3 shows the box plots of residuals of different models while fitting fresh weight, dry weight, plant height and green leaf area. The box plot indicates that, for fresh weight, dry weight and green leaf area, the Linear Mixed Effect model has more advantages than the Linear model. It is mainly reflected in the relatively concentrated residual distribution of the Linear Mixed Effect Model. The Linear Mixed Effect Model extracts more information about the correlation of plant traits than the Linear Model. Meanwhile, the Linear Mixed Effect Model established by the hierarchy of the SandBox dimension tends to have more arrow distribution of the residual compared with other Linear Mixed Effect models, which supports the result of the AIC and BIC model selection. However, when fitting plant height, the performance of the Linear Mixed Effect model has no evident advantage.
Detailed results of prediction are shown in Table 6. It is found that the mixed linear model has significant advantages for fresh weight and dry weight. But the mixed linear model and linear regression of plant height and leaf area have no significant advantages. It may be that the difference between these two traits in different categories is not significant, and the mixed linear model cannot extract more information. In terms of prediction accuracy, LME also showed good results, using the generalization ability of the 10-fold cross-validation model. In terms of fresh weight, the R R M S E of the LME is 0.172, while the R R M S E of the linear model is 0.259. In terms of dry weight, the R R M S E of the LME is 0.169, while the R R M S E of the linear model is 0.264. In terms of green leaf area, the R R M S E of the LME is 0.135, while the R R M S E of the linear model is 0.144. In terms of plant height, the R R M S E of the LME model is 0.093, while the R R M S E of the linear model is 0.096. In terms of prediction accuracy, the LME showed a lower root mean square error.

4. Discussion

This research brings together machine learning methods and the hierarchical effect of fractal dimensions to improve the ability of fitting rice essential phenotypic traits and the precision of rice growth discrimination, during which the potential contribution of fractal dimension theory in rice research is discovered. A reference for related research on smart agriculture and precision agriculture is provided. Moreover, previous research on rice phenotypes only modeled the traits studied and rarely considered the rice growth stage. In this research, the interaction of the rice growth stage and the hierarchical structure of the fractal dimension was set as random effects, while in previous studies, the random effects of mixed linear models were often directly set to physical quantities with practical meaning such as different individuals, different times, and geographical distribution. Before obtaining the hierarchical structure, the locally linear embedding algorithm combined with the machine learning models are used to realize high precision identification of the rice growth stage. Among these models, the BP+LLE model has the highest discriminant precision for the rice growth period, reaching 93.64%.
At each growth stage, the Gaussian Mixture Model was used to model the distribution of the fractal dimension of rice to obtain the hierarchy of the fractal dimension. The significance of the hierarchical effect of the fractal dimension on the fresh weight, dry weight and other traits was verified by intra-class correlation coefficient of mixed linear model. For fresh weight and dry weight, the model produces significant heterogeneity, whose optimal intra-class correlation coefficients are 0.802 and 0.811. The results showed that the correlations between the growth patterns and traits of rice varieties distributed in different fractal dimension levels may be different. This reflects the heterogeneity of different rice varieties, which is of reference significance for the study of the stratification effect in botany and the heterogeneity of rice varieties. In terms of prediction accuracy, both the modified R 2 and R R M S E of the linear mixed model are better than the general linear regression model, proving its potential in predicting key rice traits. The prediction of the four rice traits was more accurate than the simple linear regression model, the improvement of RRMSE variety from 0.03% to 9.5% and R 2 could be endowed an improvement from 0.01 to 0.02.

5. Conclusions

In this study, we propose a rice growth situation analysis method based on fractal dimension theory, which is the ML-LME model. In both rice growth stage discrimination and rice phenotypic traits, our model achieves significant improvement. After a detailed past reference comparison, it could be confirmed that we are the first to enhance rice traits fitting with the concept of rice fractal dimension heterogeneity. In the rice phenotype fitting module based on LME, for testing sets, the R 2 of four rice traits ranged from 0.90 to 0.97, respectively. Compared with the original precision made by Yang et al., the R 2 of four rice traits ranged from 0.82 to 0.90, respectively [7].
The model proposed in this paper is also applicable to other crop types, which can expand the application range of the hierarchical structure of the fractal dimension in agricultural production. In future research, considering that there is a nonlinear relationship between the traits of certain crops and that the fixed effects in the LME only use linear functions for fitting, using nonlinear functions as the fixed effects in the mixed-effects model is considered, that is, the Generalized Linear Mixed Model (GLME) [17]. Hajjem et al. proposed the Mixed Effects Random Forest Algorithm (MERF), which uses the random forest algorithm as a fixed effect to predict clustering data [28]. Fitting fixed effects by the random forest algorithm can extend the model in this article to more complex situations. Therefore, in future research, we can consider using these algorithms to expand the applicable fields of the model. At the same time, due to the ambiguity in the growth stage of the original rice data, in future long-term experiments, more rigorous and detailed experiments could be carried out and more accurate experimental results can be obtained.

Author Contributions

Conceptualization, X.M., J.S., L.D. and Y.W.; methodology, X.M.; software, Y.W.; validation, X.M. and Y.W.; formal analysis, X.M. and Y.L.; investigation, Y.W.; resources, X.M.; data curation, Y.W.; writing—original draft preparation, all; writing—review and editing, X.M., J.S. and L.D.; visualization, X.M. and Y.W.; supervision, J.S. and L.D.; project administration, Y.W., J.S. and L.D.; funding acquisition, J.S. and L.D. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by National strategic science and technology development fund (Grant No. G20200017074), National high-end foreign expert introduction program, Master studio of Huazhong Agricultural University, National Natural Science Foundation of China (Grant No. 31701317) and Fundamental Research Funds for the Central Universities (Grant No. 2662020GXPY007). This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Collected data are available from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PWplant width
PH_Vplant vertical height, rice is not straightened,
the height of the highest point in the natural state
PH/PWplant heightwidth
G_grice plant texture parameters
SAprojected area of rice plant side view
PHTraditional plant height
LD1Structural parameters
F1Relative frequency
RFDthe dimension measured by the random walk method based on the grayscale image
SFDthe box-counting dimension based on the minimum bounding rectangle of the binary image.
DBCBbox-counting dimension based on binary image
DBCGbox-counting dimension based on gray-scale image
Sandboxfractal dimension calculated by sandbox method under binary image

Appendix A. Algorithm for Calculating Fractal Dimension

The calculation of the fractal dimension in the research includes the gray image and the binary image of the rice plant. The grayscale image is obtained by the camera shooting each rice plant from 12 different angles. The binary image is converted from each gray image using MATLAB 2016a software programming. A total of 5 fractal dimension calculation methods are selected, namely box-counting dimension based on binary image (DBCB), box-counting dimension based on gray-scale image (DBCG), sandbox dimension based on gray-scale image (sandbox dimension) Count) the dimension (RWD) measured by the random walk method based on the grayscale image and the box-counting dimension (SFD) based on the minimum bounding rectangle of the binary image.

Appendix A.1. Hausdorff Dimension

The fractal dimension is a measure of the self-similarity and irregularity of the fractal structure, reflecting the effectiveness and complexity of the space occupied by the fractal structure. When studying the fractal dimension, the Hausdorff dimension is generally used. The Hausdorff dimension was introduced by the mathematician Felix Hausdorff in 1918 [3], is specifically defined as:
d i m H ( X ) : = inf { d 0 : H d ( X ) = 0 }
where, X E , E is a metric space, H d ( X ) is the d-dimensional Hausdorff measure of X in the metric space E, which is generally defined as:
H d ( X ) = lim ϵ 0 inf { i = 1 | B i | d : B i E , | B i | ϵ }
Among them, the sphere in the dimension space E of B i . Under weakly regular conditions, the Hausdorff dimension can be calculated by the following method, which is the most common box-counting dimension DBC [6]:
D B C = lim ϵ 0 l o g N ( ϵ ) l o g ( 1 ϵ )
N ( ϵ ) represents the minimum number of points required to cover the point set X with a cube with a width of ϵ in E. The definition of the box-counting dimension is simple, and the approximate algorithm for calculating the box-counting dimension can be easily realized by computer programming. Therefore, the box-counting dimension and its improved method used in this paper are used to calculate the fractal dimension of rice images. The following mainly introduces the box-counting dimension, the box-counting dimension of the sandbox method, the box-counting dimension with the smallest enclosing rectangle, and the calculation method of the random walk dimension.

Appendix A.2. Box-Dounting Dimension

Calculating the box-counting dimension of the image is to calculate the approximate value of the limit of Equation (A1). In the calculation, first select a series of positive real numbers r k monotonically decreasing towards 0 (usually take r k = 1 2 k , k 0 , which is two Equal segment sequence), use a pixel grid similar to the original image with a similarity ratio of r k to cover the original image, and calculate the number of fractal structures covered in the pixel grid N ( r k ) . Use the least square method to perform linear fitting on log N ( r k ) and log ( 1 r k ) to obtain the linear regression equation:
log N ( r ) = α log 1 r + β
Then the slope α of the straight line equation is the estimated value of the box-counting dimension [3].

Appendix A.3. Sandbox Dimension

When the SandBox method calculates the fractal, first select a column of increasing size value r k coverage area (can be a box or a circle), and calculate the number of pixels in the image in each area N ( r k ) , also using linear fitting to obtain the linear regression equation of log N ( r ) and log r , the coefficient of the log r term in the equation is the estimated value of the fractal dimension [5,7].
The SandBox method has many methods to determine the location of the coverage area. The method selected in this article is to use the centroid of the original image as the location of the center of all coverage areas, which can simplify the calculation complexity and obtain a higher value. The correlation between log N ( r ) and log r [4].

Appendix A.4. Random Walk Dimension

The random walk method can be used to calculate the fractal dimension of the grayscale image. The basic idea is that the fractal structure in the grayscale image is generated by random walk, so fractional Brownian motion can be used for modeling. The calculation of the random walk dimension is based on the following formula:
E ( | Δ I 2 | ) = c ( Δ r ) 6 2 D
E ( | Δ I 2 | ) = x = 0 M 1 y = 0 M r 1 | I ( x , y ) I ( x , y + r ) | + y = 0 M 1 x = 0 M r 1 | I ( x , y ) I ( x + r , y ) | 2 M ( M r 1 )
Among them, D is the fractal dimension of the image, c is a fixed constant; E is the expectation, Δ I = I ( x 2 , y 2 ) I ( x 1 , y 1 ) represents the difference in gray value between two points, Δ r = ( x 2 , y 2 ) ( x 1 , y 1 ) represents the distance between two points. Taking the logarithm of the definition of the random walk dimension, we get: log E ( | Δ I 2 | ) = log c + ( 3 D ) log Δ r 2 . For an image with a size of M × M , first select the largest pixel scale s during calculation, and calculate the pixel distance r as 1 , 2 , , s , any two on the image A point with a pixel distance of r, the expected value of Δ I 2 . Using the same method of least squares, the regression equation of log E ( | Δ I 2 | ) log Δ r 2 is obtained. The slope of the equation is 3 D [8].

Appendix B. The Hierarchical Effect of Fractal Dimension

Firstly, use linear regression model to evaluate the potential of fractal dimension in predicting rice plant traits. The three growth stages (Late Tillering Stage, Late Booting Stage and Milk Grain Stage) were predicted to use the double logarithmic model:
l o g Y i = β i j 0 + β i j 1 l o g X j + ϵ i j
to evaluate the effect of each fractal dimension on the four phenotype (fresh weight (g), dry weight (g) Fitting effect of plant height (cm) and green leaf area (mm 2 )). Four fractal dimensions chosen in this section are SFD dimension, DBCG dimension, DBCB dimension and SandBox dimension.
From the results of double logarithmic model fitting in Figure A1, the distribution of sample points illustrates the hierarchical effect of fractals dimensions. Specifically, for SFD and DBCB dimensions, as well as DBCG and DBCB of Late Tillering Stage, the sample points of low fractal dimension are sparse, and the empirical regression function is more affected by the sample points of high dimension. The predictive ability of the sample is limited; for the DBCG dimension, the sample points have a significant tendency to deviate from the empirical regression function in the low fractal dimension part, and the possibility of random slope and random intercept in the regression equation can be considered; As for SandBox dimension, The residual of the low dimension part is obviously larger than the high-dimensional part, indicating that the sample has heteroscedasticity.
Figure A1. Fractal dimension fits the effect of plant fresh weight. (a) Late tillering stage. (b) Late booting stage. (c) Milk grain stage. (d) Late tillering stage. (e) Late booting stage. (f) Milk grain stage. (g) Late tillering stage. (h) Late booting stage. (i) Milk grain stage. (j) Late tillering stage. (k) Late booting stage. (l) Milk grain stage.
Figure A1. Fractal dimension fits the effect of plant fresh weight. (a) Late tillering stage. (b) Late booting stage. (c) Milk grain stage. (d) Late tillering stage. (e) Late booting stage. (f) Milk grain stage. (g) Late tillering stage. (h) Late booting stage. (i) Milk grain stage. (j) Late tillering stage. (k) Late booting stage. (l) Milk grain stage.
Mathematics 09 01322 g0a1

Appendix C

Figure A2 presents the scatter plot to speculate the manifold characteristics using LLE algorithm.
Figure A2. Scatter plots to speculate the manifold characteristics using LLE algorithm. The round points represent tillering stag, the cross-shaped points represent booting and jointing transition period, and the triangular points represent heading and grain filling transition period. (a) Fractal dimension scatter plot. (b) LLE dimensionality reduction results.
Figure A2. Scatter plots to speculate the manifold characteristics using LLE algorithm. The round points represent tillering stag, the cross-shaped points represent booting and jointing transition period, and the triangular points represent heading and grain filling transition period. (a) Fractal dimension scatter plot. (b) LLE dimensionality reduction results.
Mathematics 09 01322 g0a2

References

  1. Jing, Y. Effects of different fertilizers on methane emissions and methanogenic community structures in paddy rhizosphere soil. Sci. Total Environ. 2018, 627, 770–781. [Google Scholar]
  2. Khanghahi, M.Y.; Pirdashti, H.; Mohseni-Moghadam, M.; Roham, R. Vertical Distribution of Nutsedge (Cyperus spp. L.) and Bahiagrass (Paspalum notatum L.) Seed Bank in Rice Growth Cycle. Acta Univ. Agric. Silvic. Mendel. Brun. 2019, 67, 787–795. [Google Scholar] [CrossRef] [Green Version]
  3. Yongguang, Z.; Xiaohang, M. Automatic Crop Classification in Northeastern China by Improved Nonlinear Dimensionality Reduction for Satellite Image Time Series. Remote Sens. 2020, 12, 2726. [Google Scholar]
  4. Xue, J.; Fuentes, S.; Poblete-Echeverria, C.; Viejo, C.G.; Tongson, E.; Du, H.; Su, B. Automated Chinese medicinal plants classification based on machine learning using leaf morpho-colorimetry, fractal dimension and visible/near infrared spectroscopy. Int. J. Agric. Biol. Eng. 2019, 12, 123–131. [Google Scholar] [CrossRef]
  5. Wang, Y. Estimation of Rice Growth Parameters Based on Linear Mixed-Effect Model Using Multispectral Images from Fixed-Wing Unmanned Aerial Vehicles. Remote Sens. 2019, 11, 1371. [Google Scholar] [CrossRef] [Green Version]
  6. Li, P.; Zhang, X. Estimating aboveground and organ biomass of plant canopies across the entire season of rice growth with terrestrial laser scanning. Int. J. Appl. Earth Obs. 2020, 91, 102132. [Google Scholar] [CrossRef]
  7. Yang, W.; Guo, Z.; Huang, C.; Duan, L.; Chen, G.; Jiang, N.; Fang, W.; Feng, H.; Xie, W.; Lian, X.; et al. Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice. Nat. Commun. 2014, 5, 1–9. [Google Scholar] [CrossRef]
  8. Verbeke, G.; Lesaffre, E. A linear mixed-effects model with heterogeneity in the random-effects population. J. Am. Stat. Assoc. 1996, 91, 217–221. [Google Scholar] [CrossRef]
  9. Mandelbrot, B.B. The Fractal Geometry of Nature; WH Freeman: New York, NY, USA, 1982; pp. 2–13. [Google Scholar]
  10. Le Méhauté, A. Fractal Geometries Theory and Applications; CRC Press: Boca Raton, FL, USA, 1991; pp. 2–14. [Google Scholar]
  11. Gagnepain, J.J.; Roques-Carmes, C. Fractal approach to two-dimensional and three-dimensional surface roughness. WEAR 1986, 109, 119–126. [Google Scholar] [CrossRef]
  12. Gazit, Y. Fractal Vasculature and Vascular Network Growth Modeling in Normal and Tumor Tissue. Ph.D. Thesis, Massachusetts Institute of Technology, Whitaker College of Health Sciences and Technology, Cambridge, MA, USA, 1996. [Google Scholar]
  13. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [Green Version]
  14. He, P.; Chang, X.; Xu, X.; Zhang, Z.; Jing, T.; Lou, Y. Discriminative locally linear mapping for medical diagnosis. Multimed. Tools Appl. 2020, 79, 14573–14591. [Google Scholar] [CrossRef]
  15. Rezapour, M.; Ksaibati, K. Accommodating Taste and Scale Heterogeneity for Front-Seat Passenger’ Choice of Seat Belt Usage. Mathematics 2021, 9, 460. [Google Scholar] [CrossRef]
  16. Młynarczyk, D.; Armero, C.; Gómez-Rubio, V.; Puig, P. Bayesian Analysis of Population Health Data. Mathematics 2021, 9, 577. [Google Scholar] [CrossRef]
  17. Gelman, A.; Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models (Analytical Methods for Social Research), 1st ed.; Cambridge University Press: Cambridge, UK, 2006; pp. 237–279. [Google Scholar]
  18. Douglas, B.; Martin, M.; Ben, B.; Steve, W. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar]
  19. Kuznetsova, A.; Brockhoff, P.B.; Christensen, R.H. lmerTest package: Tests in linear mixed effects models. J. Stat. Softw. 2017, 82, 1–26. [Google Scholar] [CrossRef] [Green Version]
  20. Shrout, P.E.; Fleiss, J.L. Intraclass correlations: Uses in assessing rater reliability. Psychol. Bull. 1979, 86, 420. [Google Scholar] [CrossRef]
  21. McGraw, K.O.; Wong, S.P. Forming inferences about some intraclass correlation coefficients. Psychol. Methods 1996, 1, 30–46. [Google Scholar] [CrossRef]
  22. Mitchell, T.M. Machine Learning; McGraw-Hill Education: New York, NY, USA, 1997; pp. 21–376. [Google Scholar]
  23. Yong, C.; Shanguang, B. Studies on Learning Algorithms for BP Net. J. Basic Sci. Eng. 1995, 4, 110–115. [Google Scholar]
  24. Hssina, B.; Merbouha, A.; Ezzikouri, H. A comparative study of decision tree ID3 and C4. 5. Int. J. Adv. Comput. Sci. Appl. 2014, 4, 13–19. [Google Scholar]
  25. Dalei, Y.; Kelvin, Y. Conditional Akaike information criterion for generalized linear mixed models. Comput. Stat. Data Anal. 2012, 56, 629–644. [Google Scholar]
  26. Burnham, K.P. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2nd ed.; Springer: New York, NY, USA, 2002; pp. 98–148. [Google Scholar]
  27. Shapiro, S.; Wilk, M. An Analysis of Variance Test for Normality (Complete Samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
  28. Ahlem, H.; François, B.; Denis, L. Mixed-effects random forest for clustered data. J. Stat. Comput. Simul. 2014, 84, 1313–1328. [Google Scholar]
Figure 1. The framework of the ML-LME model.
Figure 1. The framework of the ML-LME model.
Mathematics 09 01322 g001
Figure 2. Observations vs. Estimations using Linear Regression model and LME model. (a) Linear model predict rice fresh weight; (b) LME model predict rice fresh weight; (c) Linear model predict rice dry weight; (d) LME model predict rice dry weight.
Figure 2. Observations vs. Estimations using Linear Regression model and LME model. (a) Linear model predict rice fresh weight; (b) LME model predict rice fresh weight; (c) Linear model predict rice dry weight; (d) LME model predict rice dry weight.
Mathematics 09 01322 g002
Figure 3. Box plot of model residuals fitting rice phenotypic traits. (a) Models fitting fresh weights. (b) Models fitting dry weights. (c) Models fitting plant height. (d) Models fitting green leaf area.
Figure 3. Box plot of model residuals fitting rice phenotypic traits. (a) Models fitting fresh weights. (b) Models fitting dry weights. (c) Models fitting plant height. (d) Models fitting green leaf area.
Mathematics 09 01322 g003
Table 1. Comparison of classification effects based on traditional attributes added with fractal dimension.
Table 1. Comparison of classification effects based on traditional attributes added with fractal dimension.
PrecisionKappa
BPTreeSVMKNNAvgBPTreeSVMKNNAvg
T91.51%90.30%90.48%90.50%90.69%0.760.730.750.740.75
T+RFD92.89%91.61%91.76%92.26%92.13%0.800.760.790.800.79
T+SFD92.47%90.52%91.93%91.71%91.66%0.790.740.770.780.77
T+DBCB92.95%90.26%91.87%91.22%91.57%0.800.720.780.760.77
T+DBCG92.91%91.48%92.71%91.71%92.20%0.800.760.790.770.78
T+Sandbox92.76%91.18%92.24%91.9%92.04%0.800.760.780.780.78
Note: T in the table represents a collection of traditional attributes, which include plant width (PW), plant vertical height (PH_V), plant vertical height/width (PH_V/PW), traditional plant height (PH), plant height/width (PH/PW), side view projected area of rice plant (SA), side view projected area /projected area/length by width (SA/PH_V*PW).
Table 2. Compare the multi-classification effect after dimensionality reduction.
Table 2. Compare the multi-classification effect after dimensionality reduction.
SVM ModelAccuracyMicro-F1Macro-F1KappaPrecision
T+ALL86.95%0.940.940.8092.83%
T+LLE87.93%0.950.950.8192.81%
T+PCA85.61%0.940.940.7891.86%
Tree ModelTree modelAccuracyMicro-F1Macro-F1KappaPrecision
T+ALL81.77%0.920.920.7189.88%
T+LLE84.63%0.930.930.7691.23%
T+PCA83.12%0.920.920.7491.51%
BP Neural NetworkBP NeuralAccuracyMicro-F1Macro-F1KappaPrecision
T+ALL89.18%0.950.950.8393.42%
T+LLE89.15%0.950.950.8393.64%
T+PCA86.30%0.940.940.7992.59%
KNN ModelKNN modelAccuracyMicro-F1Macro-F1KappaPrecision
T+ALL85.00%0.930.930.7790.81%
T+LLE88.17%0.950.950.8292.43%
T+PCA84.89%0.930.930.7791.24%
Note: T in the table represents a collection of traditional attributes, which include plant width (PW), plant vertical height (PH), plant vertical height/width (PH_V/PW), traditional plant height (PH), plant height/width (PH/PW), side view projected area of rice plant (SA), projected area/length by width (SA/PH_V*PW); ALL represents the set of five fractal dimensions, including DBCB, DBCG, Sandbox, SFD, RFD. LLE represents the two low-dimensional sets of five fractal dimensions after the LLE nonlinear dimensionality reduction method; PCA represents the two low-dimensional sets of the five fractal dimension results after the PCA linear dimensionality reduction method.
Table 3. The fitting result of the mixed linear model.
Table 3. The fitting result of the mixed linear model.
Phenotype TraitsModelAICBICICC
fresh weightSR 1077.8 1027.8 0.780
SS1 1223.5 1173.5 0.709
SS2 1139.1 1089.1 0.787
SD1 1144.2 1094.2 0.738
SD2 1129.0 1079.1 0.802
dry weightSR 623.26 573.28 0.872
SS1 738.97 689.00 0.848
SS2 674.49 624.51 0.871
SD1 682.29 632.31 0.857
SD2 643.42 593.44 0.882
plant heightSR 1975.7 1925.7 0.094
SS1 2064.8 2014.8 0.139
SS2 1993.9 1944 0.240
SD1 1959.6 1909.6 0.178
SD2 1975.8 1925.8 0.239
green leaf areaSR 772.50 722.53 0.148
SS1 1258.8 1208.9 0.101
SS2 816.06 766.08 0.250
SD1 865.15 815.17 0.136
SD2 843.1 793.13 0.101
Note: SR: LME model that takes the interaction between the growth stage and the fractal dimension calculated by the random walk method as a random effect; SS1: LME model that takes the interaction between the growth stage and the fractal dimension calculated by the SandBox method as a random effect; SS2: LME model that takes the interaction between the growth stage and the fractal dimension calculated by the SFD method as a random effect; SD1: LME model that takes the interaction between the growth stage and the fractal dimension calculated by the DBCB method as a random effect; SD2: LME model that takes the interaction between the growth stage and the fractal dimension calculated by the DBCG method as a random effect.
Table 4. Linear regression model residual normality test.
Table 4. Linear regression model residual normality test.
Fresh WeightDry WeightPlant HeightGreen Leaf Area
Statistics W 0.995 0.996 0.972 0.995
P-value 0.002 * 0.007 * 2.258 × 10 13 * 0.003 *
Note: p-value with means the statistic is statistically significant at a confidence level of α = 0.01.
Table 5. Linear Mixed Effect model residual normality test.
Table 5. Linear Mixed Effect model residual normality test.
Fresh WeightDry WeightPlant HeightGreen Leaf Area
Statistics W 0.996 0.997 0.981 0.997
P-value 0.01 * 0.165 3.227 × 10 10 * 0.085
Note: p-value with means the statistic is statistically significant at a confidence level of α = 0.01.
Table 6. Different model correction R 2 .
Table 6. Different model correction R 2 .
ModelsFresh WeightDry WeightPlant HeightGreen Leaf Area
Whole stage0.950.920.890.94
Stage10.920.920.720.92
Stage20.860.740.690.89
Stage30.900.840.870.83
LME0.970.970.900.96
Note: Whole stage: Linear Regression model performance during total threee stage; Stage 1: Linear Regression model performance only on rice tillering stage; Stage 2: Linear Regression model performance only on transition stage of booting and jointing; Stage 3: Linear Regression model performance only on transition99stage of heading and grain filling; LME: Linear Mixed Effect model performance during total threee stage.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ma, X.; Wu, Y.; Shen, J.; Duan, L.; Liu, Y. ML-LME: A Plant Growth Situation Analysis Model Using the Hierarchical Effect of Fractal Dimension. Mathematics 2021, 9, 1322. https://doi.org/10.3390/math9121322

AMA Style

Ma X, Wu Y, Shen J, Duan L, Liu Y. ML-LME: A Plant Growth Situation Analysis Model Using the Hierarchical Effect of Fractal Dimension. Mathematics. 2021; 9(12):1322. https://doi.org/10.3390/math9121322

Chicago/Turabian Style

Ma, Xiaohang, Yongze Wu, Jingfang Shen, Lingfeng Duan, and Ying Liu. 2021. "ML-LME: A Plant Growth Situation Analysis Model Using the Hierarchical Effect of Fractal Dimension" Mathematics 9, no. 12: 1322. https://doi.org/10.3390/math9121322

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop