Next Article in Journal
High Soil pH and Plastic-Shed Lead to Iron Deficiency and Chlorosis of Citrus in Coastal Saline–Alkali Lands: A Field Study in Xiangshan County
Previous Article in Journal
In Vitro Micropropagation of Commercial Ginseng Cultivars (Panax ginseng Meyer) via Somatic Embryogenesis Compared to Traditional Seed Production
Previous Article in Special Issue
Effects of Dwarfing Interstock Length on the Growth and Fruit of Apple Tree
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Artificial Neural Network Based Apple Yield Prediction Using Morphological Characters

1
Division of Sample Surveys, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
2
Department of Basic Sciences, Dr. YSP University of Horticulture and Forestry, Nauni-Solan 173230, India
*
Author to whom correspondence should be addressed.
Horticulturae 2023, 9(4), 436; https://doi.org/10.3390/horticulturae9040436
Submission received: 15 September 2022 / Revised: 9 October 2022 / Accepted: 14 October 2022 / Published: 28 March 2023
(This article belongs to the Special Issue Advanced Studies in Cultivation and Breeding of Apple)

Abstract

:
The yield of the crop is a complex function of a number of dependent traits, which makes yield prediction a statistically difficult task. A number of work on yield prediction using morphological characters already exists in the literature. Most of the work used statistical techniques such as linear regression and crop yield models, which assume a linear relationship between yield and the morphological traits; in actual practice, such a linear relationship is seldom achieved. With the advancement in the field of machine learning techniques, these methods can provide a viable alternative for dealing with nonlinear relationships for yield prediction. Globally, apples are the most consumed fruit. In this paper, attempts have been made to predict the yield of the apple crop using morphological traits. PCA was used for selection of the significant variables. These variables were later used as input variables in the ANN model with different hidden layers for predicting crop yield. The predictive performance of the model was evaluated using standard statistical tests. Sensitivity analysis was performed to find out the individual effects of each character on the apple yield. The study contributes to a better understanding of the complex relationships between crop yield and morphological traits.

1. Introduction

Apple (Malus domestica) is commercially cultivated in the Himachal Pradesh, Jammu and Kashmir, and Uttarakhand. These states collectively produce 99.43% of the total apple production (2,734,000 tones) in India [1]. Apple yield is a complex variable that depends on a number of factors, either directly or indirectly, including vegetative, flowering, and fruiting characteristics. Identifying a single variable representative of yield may not be reliable, so researchers are faced with the possibility of separately examining many related variables [2]. The attempt to perform a series of univariate statistical analysis for each of the variables does not hold much promise, because it overlooks the correlation among the variables and occasionally the conclusions may be deceptive. Instead, statistical methods that consider the interdependence and relative importance of several affecting factors may yield information that is more informative. Therefore, morphological characters can be considered together for yield prediction using machine-learning techniques [3,4,5].
For proper crop management and plan strategies for the efficient marketing of fruits, yield prediction is very crucial. The earlier the prediction, the more effectively it can be applied to improve marketing strategies. A wide range of techniques, i.e., statistical tools, crop model, and algorithms, have been developed and applied for the prediction of yield in agriculture. Correlation and multiple regression analysis are the most frequently used techniques for the prediction of yield and for the identification of important variables that affect crop yield [6,7,8,9]. However, the results are not particularly encouraging, because polynomial and interaction terms, which were not taken into account, exist [10]. Moreover, the assumption of linear relationships between crop yield and explanatory variables is rarely met in reality, and when these relationships are not linear, the results may be deceptive [10,11,12]. Furthermore, principal component and factor analyses [7,13,14] can be used to lessen the issue brought on by interdependent variables, make it easier to understand complex relationships, and decrease the dimensionality of the dataset by selecting the most appropriate subset of variables that significantly affect the response variable [7]. For capturing the nonlinearity and complicated interaction between the variables, machine learning approaches such as artificial neural networks (ANNs) are emerging as an alternative to conventional linear models. ANN is a nonlinear data-driven method that follows a self-adaptive learning approach [15]. By analyzing a large number of input and output instances, ANNs discover relationships to create a formula that can be used for predictions. The development of models using ANN does not call for any prior knowledge of the inputs and outcomes. ANN is also better than any other linear model because it is also more capable of determining the optimal pattern of variables and offers less inaccuracy [16]. As a result of these benefits, ANN is very well-liked in a variety of fields, including hydrology and agriculture [3,12,13,17,18,19,20].
The majority of modelling research on crop yield prediction assumes a linear relationship between yield and its contributing characters, and is thus centered on linear regression, step-wise regression, path analysis, principle component (PCA), and factor analyses, etc. These techniques would decrease the number of variables, but they would not be sufficient or thorough enough to capture the highly nonlinear and complicated relationships between yield and other characteristics. Consequently, artificial neural networks (ANN) would be appropriate when the variables under consideration have complex and nonlinear relationships. Correlated input causes confusion for the neural network during learning, which is the key factor affecting a neural network’s performance. In addition, an ANN model with a large number of input variables may perform poorly in terms of generalization [21,22]. The principal component analysis and ANN can be used together to address these problems. The present study was conducted with the aim to build up and evaluate the predictive performance of PCA-based ANN models to predict the apple yield using morphological characters as the input variables. PCA was employed for the selection of variables (feature selection), which were used as input variables in ANN models for the prediction of yield. The results from the study will help to identify and model the complex relationship between apple yield and its related morphological characters.

2. Materials and Methods

2.1. Study Area and Data Description

The study was conducted in a commercial farmer’s apple orchard located at an elevation of 1901 m in Jubbal, Shimla, Himachal Pradesh (31°10′ N, 77°66′ E) during 2014–2015. The region is generally cool throughout the year with temperatures ranging from 15–25 °C during the summer and falling below zero degrees during the winter. A representative sample of trees were selected from the experimental orchard, and four branches from each of the tree in four directions as per the practice in vogue were selected for recording the observations on various morphological characters, i.e., plant height, canopy spread, plant girth, flower density (FD), flower density index (FDI), flowering intensity (FI), fruit set (FS), crop density (CD), and length diameter ratio (LD ratio) of variety Royal delicious were recorded. Data on vegetative characteristics were collected during April month. Data on flowering and fruiting characteristics were collected during months of May, July, and August. The yield of apple was recorded by collecting the fruit manually from each tree. Summary statistics of each characters are presented in Table 1.
The selection of appropriate input variables for the development of MLR and ANN models is very crucial [23,24]. Although much research has employed the simple correlation as an input selection approach [17,25], this method is unable to reveal the types of direct or indirect effects between variables. Furthermore, it also reduces the probability of having a unique solution [26]. Principal component analysis is a data reduction, which can be used to select the most important uncorrelated variables as the input variables.

2.2. Development of Artificial Neural Network Model

An artificial neural network (ANN) is a type of machine learning model that is a data-driven nonlinear adaptive learning method [15]. An ANN model network can capture the representation of complex data patterns, which are difficult to model either with traditional model-based approaches or knowledge-based expert systems [27]. A typical ANN model consists of three main layers, i.e., input layer, hidden layer, and output layer (Figure 1). These layers contain simple processing units that are known neurons or nodes. The nodes are interconnected to each other through weighted connection, which varies according to the specified architectures of required ANN model [27]. The number of hidden layers and its nodes depend on the specific problems of the study. Several studies [16,28,29] have suggested that the trial and error method is the most common method to find an optimum number of hidden layers and its nodes. Further details of the ANN model and its application are given in Haykin (2008) [30]. The output of ANN model can be expressed by following equation (Equation (1)) [18]:
y t = α 0 + j = 1 n α j f ( i = 1 m β i j y t 1 + β o j ) + ε t
where yt is the output of the neural network model (yield per plant), n is number of hidden nodes, m is the number of input nodes, f is the net input of the activation function, β i j {i = 1, 2, …, m; j = 0, 1, …, n} are the weights from input to hidden nodes, α j { j = 0 , 1 , , n } are the vectors of the weights from the hidden to output nodes, and α 0 and β 0 j are the weights of arcs leading from bias terms. Activation function is a differentiable function that is used for smoothing the result of the cross product of the covariates or neurons and the weights. In the artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs.
In the present study, multi-layered feed-forward network architecture with a logistic function was used as an activation function. The Levenberg–Marquardt (LM) learning algorithm was used to adjust the weights in the multi-layered feed-forward networks. To obtain the best topology of the ANN model, different numbers of hidden layers (1–6) and nodes in each hidden layer (1–20) were tested using the trial and error method. The epoch (iterations) size and mean square error (MSE) threshold values for each run in the training and cross validation dataset were 100 and 0.01, respectively. The convergence of the average MSE values during training and cross validation was investigated at an epoch length of 0 to 100 to avoid model over-fitting and memorization. It was investigated that the convergence point was epoch 60 so as to avoid over-fitting (Figure 2). The dataset was portioned into two subsets, i.e., training set (80%) and testing set (20%) for fitting of ANN and MLR model.

2.3. Development of Multiple Linear Regression Model

Crop yield prediction in agriculture has made extensive use of regression-based models. Multiple linear regression (MLR) attempts to model the relationship between the regressed and more than one regressor [31,32]. In MLR, an attempt is made to account for the variation of the regressors in the regressed synchronically [33]. The model for MLR can be written as follows:
y = β 0 + β 1 x 1 + β 2 x 2 + + β k x k + ε
where y denotes regressed (yield), x i denotes regressors (morphological characters), and ε is the error term which is normally distributed with zero mean and constant variance.

2.4. Model Performance Measures

The performance of the fitted models were evaluated using four statistical measures, including root mean square error (RMSE), mean absolute deviation (MAD), mean absolute percentage error (MAPE), and coefficient of determination (R2) [34]. The functional formula of these measures were used as follows
R M S E = i = 1 N ( y i y ^ i ) 2 N   M A D = i = 1 N | y i y ^ i | N M A P E = i = 1 N | y i y ^ i | y i N   R 2 = i = 1 N ( y i y ¯ ) ( y ^ i y ^ ¯ ) i = 1 N ( y i y ¯ ) 2 i = 1 N ( y ^ i y ^ ¯ ) 2
where y i   a n d   y ^ i and are the actual value and predicted value of response variable and N is the number of data.

3. Results

3.1. Selection of Input Variables

The input variables form the model structure, have an impact on the weighted coefficient, and influence the results of the models; this is why their selection is a key factor in any modelling method [24,35]. A simple correlation coefficient can be used as the input variable selection method as it can identify the characters, which have a strong correlation with the output variables (Figure 3).
There is a positive and significant correlation between apple yield and plant height, canopy spread, plant girth, FDI, and fruit set. Similar findings were reported by other researchers [19,36,37,38]. Plant height, canopy spread, plant girth, FDI, and fruit set were regarded as the most significant characters based on the correlation analysis, as they have a significant positive association with yield.
Principal component analysis (PCA) can be used as a more efficient method of input variable selection than correlation analysis, as a simple correlation coefficient between characters can be influenced by the positive or negative indirect effect of another variables [24,26]. The PCA results (Table 2) helped to find the explained variations using the morphological characters. Together, first and second principal components explained 56.53% of the total variation in the variables. According to these principal components, plant height, canopy spread, plant girth, FD, FDI, FI, and CD were identified as most appropriate input variables. Consequently, a suitable combination of these variables accompanied by high-performance modelling can result in an effective model to predict yield.

3.2. ANN Model Development

In the present study, the analysis of the dataset was carried out in RStudio. The multi-layered feed-forward network architecture with different functions was used for yield prediction. Seven plant characters viz. plant height, canopy spread, plant girth, FD, FDI, FI, and CD were found significant using PCA. These plant characters were used as input variables based on the variable selection method for ANN model fitting. Based on the significant advantage of Levenberg–Marquardt compared with other optimization algorithms, this algorithm was used in all ANN models. The performance measures, such as RMSE, MAD, MAPE, and R2, were considered for the evaluation of the model’s performance. The performance of different activation functions was also tested. It has been observed that logistic activation outperformed among the others due to its ability to capture nonlinear variation in the dataset. Ahmadi et al. [39], Hagan et al. [40], and Mansouri et al. [17] also reported the ability of nonlinear functions to cover nonlinear patterns in a dataset. The different number of hidden layers with a different number of nodes were fitted to obtain the best topology for the neural network model (Table 3). The results indicated that the ANN model with two hidden layers (5-5), i.e., 7-5-5-1 architecture provide best result. This ANN model (7-5-5-1) had the lowest RMSE, MAD, and MAPE values with the highest model accuracy in both the training and testing stages. The schematic diagram of the ANN structure (7-5-5-1) is presented in Figure 1.
The topology was able to express approximately 94% variability in the training phase and approximately 86% in the testing phase. The scatter plot of the measured and predicted yield of the apple in testing is represented in Figure 4a. The results show that the distribution of the predicted apple yield had a close distribution with the actual apple yield. Further boxplot (Figure 4b) results showed there was no outlier in the predicted data, which was an indication of proper model fitting. Balas et al. [41] suggested PCA, as variable selection (data pre-processing) reduced the chance of over-fitting. The application of ANN with PCA (PCA-ANN) increased the forecasting ability of the ANN model compared with a single ANN model [16].

3.3. MLR Model Development

Multiple linear regression (MLR) model is a commonly used method for crop yield prediction. In the present study, the same data variables that were used for the ANN model were used for MLR model building. The fitted regression model to predict the apple yield was as:
Y i e l d = 0.347 + 0.186 P l a n t   h e i g h t + 0.27 c a n o p y + 0.441 F D I
Equation (4) shows that the predicted value of the apple yield is a linear combination of other significant variables (plant height, canopy spread, and FDI). It also helped to see how the prediction value of the apple yield changed with the unit change in the variables (plant height, canopy spread, and FDI). The results also showed that the MLR model had a low R2 value (70.69%) in Figure 5a. The scatter plot indicated that the MLR model did not cover all of the data points and most of the data points deviated from the regression line. Further boxplots (Figure 5b) of the measured and predicted apple yield in the testing stage of MLR indicate the inefficiency of the MLR model to predict apple yield.

4. Discussion

4.1. Comparison of Fitted Models

The prediction performance of the MLR and ANN models was evaluated for statistical measures such as RMSE, MAD, MAPE, and R2. The results are presented in Table 4. It has been observed that the selected ANN model outperformed with an 18.60% increase in R2 and a reduction of 67.31%, 41.33%, and 21.80% in RMSE, MAD, and MAPE compared with the MLR model.
Besides these measures, a graphical representation (Figure 6) of the actual and predicted by the ANN and MLR model helped to understand the superiority of the ANN model over the MLR. The ANN model captured the data pattern more accurately than the MLR model. The possible reason behind the poor performance of the MLR model is the nonlinear portion of the relationship between the input variables and apple yield. The MLR model did not capture this relationship, while the ANN model took this relationship into account during model building. These results indicate how selection of the proper model improves the prediction of the dependent variable (yield). ANN models have a high ability to model nonlinear and complex relationships among the data variables compared with MLR models. In the literature, similar results have been reported in many studies [16,20,42].

4.2. Sensitivity Analysis

A sensitivity analysis was performed to find out the individual effects of each of the input variables on the prediction values of apple yield. The results (Figure 7) indicate how the performance of the ANN models change with different combinations of input variables (plant height, canopy spread, plant girth, FD, FDI, FI, and CD) [2].
The ability of ANN to predict apple yield was significantly decreased when it was run without FI, FDI, and FD. The models without FD have the lowest R2 and highest RMSE (79.47), MAD (79.44), and MAPE (23.07). Therefore, FD can be considered as an influential factor to predict apple yield. In addition to these characteristics, FDI and FI also had a significant effect on predicting apple yield.

5. Conclusions

Yield prediction of fruit crop based on the morphological characters, is a beneficial approach. In the present study, primary data on apple crop yield, as well as the morphological characters viz. plant height, canopy spread, plant girth, flower density, flower density index, flowering intensity, fruit set, crop density, and length diameter ratio was collected from commercial apple growing farmers in Jubbal block, Himachal Pradesh, India. A simple correlation was run to see the impact of the morphological characters on the yield, and it was observed that there was a positive and significant correlation between apple yield and plant height, canopy spread, plant girth, FDI, and fruit set. A PCA was run to select the significant variables, as the selection of variables is crucial for model building. First and second principal components explained 56.53% of the total variation in the variables, plant height, canopy spread, plant girth, FD, FDI, FI, and CD were identified as the most appropriate input variables. Hence, a combination of these variables was used as the input variables for model building. A multi-layered feed-forward network architecture with different functions was used for model building and yield prediction. Seven plant characters, i.e., plant height, canopy, tree girth, FD, FDI, FI, and CD were used as input variables based on the variable selection method for ANN model fitting. The Levenberg–Marquardt algorithm was used in all of the ANN models. The model performance was evaluated using standard statistical measures such as RMSE, MAD, MAPE, and R2. The logistic activation function was found to outperform all other activation functions. This ANN model (7-5-5-1) had the lowest RMSE, MAD, and MAPE values with the highest model accuracy in both the training and testing stages. Furthermore, the results show a close association between the predicted and actual yield of apple. As MLR models are predominantly used in crop yield prediction, the MLR model was also used for the study and it was observed that the selected ANN model outperformed the MLR model with an 18.60% increase in R2 and a reduction of 67.31%, 41.33%, and 21.80% in RMSE, MAD, and MAPE. All of the computations have been carried out by writing suitable codes in R software available with the authors.

Author Contributions

B. and P.D. conceived the conceptualization, investigation, formal analysis, data curation, and writing original draft. G.V. and S.D. gave the idea of resources, reviewing, data collection, and editing. R.B. and T.A. performed supervision and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the ICAR, Indian Agricultural Statistics Research Institute, New Delhi, India.

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors are thankful to ICAR-IASRI for providing facilities for carrying out the present research and to the farmers for their co-operation in data collection.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. FAO. FAOSTAT. Food and Agriculture Organization of the United Nations. 2020. Available online: https://www.fao.org/faostat/en/#home (accessed on 20 May 2022).
  2. Lezzoni, A.; Pritts, M.P. Application of principal component analysis to horticultural research. Hortic. Sci. 1991, 26, 334–338. [Google Scholar] [CrossRef] [Green Version]
  3. Guimarães, B.V.C.; Donato, S.L.R.; Aspiazú, I.; Azevedo, A.M. Yield prediction of ‘Prata Anã’ and ‘BRS Platina’ banana plants by artificial neural. Pesq. Agropec. Trop. Goiânia 2021, 51, 1–11. [Google Scholar] [CrossRef]
  4. Guimarães, B.V.C.; Donato, S.L.R.; Azevedo, A.M.; Aspiazú, I.; Silva Junior, A.A. Prediction of “Gigante” cactus pear yield by morphological characters and artificial neural networks. Rev. Bras. De Eng. Agrícola E Ambient. 2018, 22, 315–319. [Google Scholar] [CrossRef]
  5. Khazaei, J.F.; Shahbazi; Massah, J. Evaluation and modeling of physical and physiological damage to wheat seeds under successive impact loadings: Mathematical and neural networks modeling. Crop Sci. 2008, 48, 1532–1544. [Google Scholar] [CrossRef]
  6. Gutiérrez, P.A.; López-Granados, F.; Peña-Barragán, J.M.; Jurado-Expósito, M.; Hervás-Martínez, C. Logistic regression product-unit neural networks for mapping Ridolfia segetum infestations in sunflower crop using multitemporal remote sensed data. Comput. Electron. Agric. 2008, 64, 293–306. [Google Scholar] [CrossRef]
  7. Huang, Y.; Lan, Y.; Thomson, S.J.; Fang, A.; Hoffmann, W.C.; Lacey, R.E. Development of soft computing and applications in agricultural and biological engineering. Comput. Electron. Agric. 2010, 71, 107–127. [Google Scholar] [CrossRef] [Green Version]
  8. Kravchenko, A.N.; Bullock, D.G. Correlation of corn and soybean grain yield with topography and soil properties. Agron. J. 2000, 92, 75–83. [Google Scholar] [CrossRef]
  9. Park, S.J.; Hwang, C.S.; Vlek, P.L.G. Comparison of adaptive techniques to predict crop yield response under varying soil and land management conditions. Agric. Syst. 2005, 85, 59–81. [Google Scholar] [CrossRef]
  10. Kitchen, N.R.; Drummond, S.T.; Lund, E.D.; Sudduth, K.A.; Buchleiter, G.W. Soil electrical conductivity and topography related to yield for three contrasting soil-crop systems. Agron. J. 2003, 95, 483–495. [Google Scholar] [CrossRef]
  11. Miao, Y.; Mulla, D.J.; Robert, P.C. Identifying important factors influencing corn yield and grain quality variability using artificial neural networks. Precis. Agric. 2006, 7, 117–135. [Google Scholar] [CrossRef]
  12. Schultz, A.; Wieland, R.; Lutze, G. Neural networks in agroecological modeling-stylish application or helpful tool? Comput. Electron. Agric. 2000, 29, 73–97. [Google Scholar] [CrossRef]
  13. Fortin, J.G.; Anctil, F.; Parent, L.É.; Bolinder, M.A. A neural network experiment on the site-specific simulation of potato tuber growth in Eastern Canada. Comput. Electron. Agric. 2010, 73, 126–132. [Google Scholar] [CrossRef]
  14. Jiang, P.; Thelen, K.D. Effect of soil and topographic properties on crop yield in a north-central corn-soybean cropping system. Agron. J. 2004, 96, 252–258. [Google Scholar] [CrossRef]
  15. Das, P. Study on Machine Learning Techniques Based Hybrid Model for Forecasting in Agriculture. Ph.D. Thesis, PG-school IARI, New Delhi, India, 2019. [Google Scholar]
  16. Abdipour, M.; Younessi-Hmazekhanlu, M.; Ramazani, M.Y.H.; Omidi, A.H. Artificial neural networks and multiple linear regression as potential methods for modeling seed yield of safflower (Carthamus tinctorius L.). Ind. Crops Prod. 2019, 27, 185–194. [Google Scholar] [CrossRef]
  17. Mansouri, A.; Fadavi, A.; Mortazavian, S.M.M. An artificial intelligence approach for modeling volume and fresh weight of callus–A case study of cumin (Cuminum cyminum L.). J. Theor. Biol. 2016, 397, 199–205. [Google Scholar] [CrossRef]
  18. Hydrology. ASCE task committee on application of artificial neural networks in artificial neural networks in hydrology, I: Preliminary concepts. Hydrol. Eng. 2020, 5, 115–123. [Google Scholar] [CrossRef]
  19. Treder, W. Relationship between yield, crop density coefficient and average fruit weight of ‘gala’ apple. J. Fruit Ornam. Plant Res. 2008, 16, 53–63. [Google Scholar]
  20. Gholipoor, M.; Rohani, A.; Torani, S. Optimization of traits to increasing barley grain yield using an artificial neural network. Int. J. Plant Prod. 2013, 7, 1–17. [Google Scholar]
  21. Tiwari, M.K.; Chatterjee, C. Uncertainty assessment and ensemble flood forecasting using bootstrap based artificial neural networks (BANNs). J. Hydrol. 2010, 382, 20–33. [Google Scholar] [CrossRef]
  22. Tripathy, M. Power transformer differential protection using neural network principal component analysis and radial basis function neural network. Simul. Model. Pract. Theory 2010, 18, 600–611. [Google Scholar] [CrossRef]
  23. Abdipour, M.; Ramazani, S.H.R.; Younessi-Hmazekhanlu, M.; Niazian, M. Modeling oil content of sesame (Sesamum indicum L.) using artificial neural network and multiple linear regression approaches. J. Am. Oil Chem. Soc. 2018, 95, 283–297. [Google Scholar] [CrossRef]
  24. May, R.; Dandy, G.; Maier, H. Review of input variable selection methods for artificial neural networks. Artificial Neural Networks—Methodological Advances and Biomedical Applications. InTech 2011, 10, 16004. [Google Scholar] [CrossRef] [Green Version]
  25. Elhami, B.; Khanali, M.; Akram, A. Combined application of Artificial Neural Networks and life cycle assessment in lentil farming in Iran. Inform. Process. Agric. 2017, 4, 18–32. [Google Scholar] [CrossRef] [Green Version]
  26. Samarasinghe, S. Neural Networks for Applied Sciences and Engineering: From Fundamentals to Complex Pattern Recognition; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
  27. Hoskins, J.C.; Himmelblau, D.M. Artificial neural network models of knowledge representation in chemical engineering. Comput. Chem. Eng. 1988, 12, 881–890. [Google Scholar] [CrossRef]
  28. Sheela, K.G.; Deepa, S.N. Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng. 2013, 2013, 425740. [Google Scholar] [CrossRef] [Green Version]
  29. Tufail, M.; Ormsbee, L.; Teegavarapu, R. Artificial intelligence-based inductive models for prediction and classification of fecal coliform in surface waters. J. Environ. Eng. 2008, 134, 789–799. [Google Scholar] [CrossRef]
  30. Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall: Ontario, CA, USA, 1999. [Google Scholar]
  31. Buyukozturk, S. Sosyal Bilimler Icin very Analizi el Kitabi; Pegem Yayincihk: Ankara, Turkey, 2002. [Google Scholar]
  32. Tabachnick, B.G.; Fidell, S.L. Using Multivariate Statistics; Harper Collins College Publishers: New York, NY, USA, 1996. [Google Scholar]
  33. Unver, O.; Gamgam, H. Uygulamah Istatistik Yontemleri; Siyasal Kitabevi: Ankara, Turkey, 1999. [Google Scholar]
  34. Das, P.; Paul, A.K.; Paul, R.K. Non-linear mixed effect models for estimation of growth parameters in Goats. J. Indian Soc. Agric. Stat. 2016, 70, 205–210. [Google Scholar]
  35. Emamgholizadeh, S.; Parsaeian, M.; Baradaran, M. Seed yield prediction of sesame using artificial neural network. Eur. J. Agron. 2015, 68, 89–96. [Google Scholar] [CrossRef]
  36. Forshey, C.G.; Elfving, D.C. Fruit numbers, fruit size and yield relationship in ‘McIntosh’ apple. J. Am. Soc. Hortic. 1977, 24, 399–402. [Google Scholar] [CrossRef]
  37. Treder, W.; Mike, A. Relationship between yield, crop density and average fruit weight in ‘lobo’ apple trees under various planting systems and irrigation treatments. Horttech 2001, 11, 248–254. [Google Scholar] [CrossRef] [Green Version]
  38. Westwood, M.N.; Roberts, A.N. The relationship between trunk cross-sectional area and weight of apple trees. J. Am. Soc. Hortic. 1970, 95, 28–30. [Google Scholar] [CrossRef]
  39. Ahmadi, S.H.; Sepaskhah, A.R.; Andersen, M.N.; Plauborg, F.; Jensen, C.R.; Hansen, S. Modeling root length density of field grown potatoes under different irrigation strategies and soil textures using artificial neural networks. Field Crops Res. 2014, 162, 99–107. [Google Scholar] [CrossRef]
  40. Hagan, M.T.; Demuth, H.B.; Beale, M. Neural Network Design; PWS Publishing, Co.: Boston, MA, USA, 1997. [Google Scholar]
  41. Balas, C.F.; Koc, M.L.; Tur, R. Artificial neural network based on principal component analysis, fuzzy systems and fuzzy neural networks for preliminary design of rubble mound breakwaters. Appl. Ocean Res. 2010, 32, 425–433. [Google Scholar] [CrossRef]
  42. Singh, T.N.; Kanchan, R.; Verma, A.K.; Singh, S. An intelligent approach for prediction of triaxial properties using unconfined uniaxial strength. Miner. Eng. 2003, 5, 12–16. [Google Scholar]
Figure 1. Topology of neural network model for Apple yield prediction.
Figure 1. Topology of neural network model for Apple yield prediction.
Horticulturae 09 00436 g001
Figure 2. Convergence of the average MSE value during training and cross validation of ANN model.
Figure 2. Convergence of the average MSE value during training and cross validation of ANN model.
Horticulturae 09 00436 g002
Figure 3. Correlation plot of different input variables.
Figure 3. Correlation plot of different input variables.
Horticulturae 09 00436 g003
Figure 4. (a) Scatter plot of the measured and predicted yield of apple in the testing stage of ANN; (b) boxplot of measured and predicted apple yield in the testing stage of ANN. Green dots denote the observations and root mean square error (RMSE), mean absolute deviation (MAD), mean absolute percentage error (MAPE), and coefficient of determination (R2) are the performance measures.
Figure 4. (a) Scatter plot of the measured and predicted yield of apple in the testing stage of ANN; (b) boxplot of measured and predicted apple yield in the testing stage of ANN. Green dots denote the observations and root mean square error (RMSE), mean absolute deviation (MAD), mean absolute percentage error (MAPE), and coefficient of determination (R2) are the performance measures.
Horticulturae 09 00436 g004
Figure 5. (a) Measured and predicted apple yield in testing stage of MLR; (b) boxplot of the measured and predicted apple yield in the testing stage of MLR. Blue dots denote the observations.
Figure 5. (a) Measured and predicted apple yield in testing stage of MLR; (b) boxplot of the measured and predicted apple yield in the testing stage of MLR. Blue dots denote the observations.
Horticulturae 09 00436 g005
Figure 6. Predicted values of the fitted models with actual data points.
Figure 6. Predicted values of the fitted models with actual data points.
Horticulturae 09 00436 g006
Figure 7. Sensitivity analysis of input variables on apple yield in ANN model. A: The best ANN model without CD; B: The best ANN model without FI; C: The best ANN model without FDI; D: The best ANN model without FD; E: The best ANN model without plant girth; F: The best ANN model without canopy spread; G: The best ANN model without plant height; H: The best ANN model (with plant height, canopy spread, plant girth, FD, FDI, FI, and CD as the input).
Figure 7. Sensitivity analysis of input variables on apple yield in ANN model. A: The best ANN model without CD; B: The best ANN model without FI; C: The best ANN model without FDI; D: The best ANN model without FD; E: The best ANN model without plant girth; F: The best ANN model without canopy spread; G: The best ANN model without plant height; H: The best ANN model (with plant height, canopy spread, plant girth, FD, FDI, FI, and CD as the input).
Horticulturae 09 00436 g007
Table 1. Summary of plant parameters.
Table 1. Summary of plant parameters.
CharactersRangeMeanStd. Deviation
Plant height (m)3.05–11.897.222.21
Canopy spread (m)1.32–9.485.572.03
Plant girth (cm)0.15–0.910.610.18
Flower density1.00–10.823.541.99
Flower density index0.10–1.080.350.18
Flowering intensity0.35–0.500.410.03
Fruit set0.15–0.570.310.08
Crop density0.30–4.401.090.66
Length diameter ratio6.84–10.288.220.68
Table 2. Principal component analysis for input variable selection.
Table 2. Principal component analysis for input variable selection.
CharactersPHCSPGFDFDIFIFSCDLDREVCV
PC1−0.4223−0.4222−0.41780.39090.29240.10730.11830.43890.11083.0734.15
PC20.39050.30510.36720.38100.41140.4333−0.09690.3284−0.0112.0156.53
PH: plant height, CS: canopy spread, PG: plant girth, FD: flowering density, FDI: flowering density index, FI: flowering intensity, FS: fruit set, CD: crop density, LDR: LD ratio, EV: eigen value, CV: cumulative variance.
Table 3. The performance of ANN models with different hidden layers in the training and testing set.
Table 3. The performance of ANN models with different hidden layers in the training and testing set.
Hidden LayerBest TopologyRMSEMADMAPER2Accuracy (%)Error Rate
Training17-3-136.336025.73370.23060.812190.360.2422
27-5-5-124.830018.26070.15230.943098.720.0736
37-3-3-3-131.059022.39370.20530.862993.590.1769
47-3-3-3-3-127.496421.27440.21360.892492.230.1386
57-5-5-1-5-5-124.984019.81950.15560.911693.100.1140
67-3-3-3-3-5-5-134.830017.810.24260.911395.100.11438
Testing17-3-163.202643.96490.35820.562293.010.2422
27-5-5-136.607828.10450.21510.868595.360.0736
37-3-3-3-152.290638.24180.29740.712991.320.1769
47-3-3-3-3-143.771128.27880.19000.793593.490.1386
57-5-5-1-5-5-140.070328.27000.21110.823992.650.1140
67-3-3-3-3-5-5-143.268432.923710.23600.807389.330.1144
Table 4. Prediction performance of the fitted models in the testing stage.
Table 4. Prediction performance of the fitted models in the testing stage.
ModelRMSEMADMAPER2
ANN36.607828.10450.21510.8685
MLR61.250139.72030.26200.7069
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bharti; Das, P.; Banerjee, R.; Ahmad, T.; Devi, S.; Verma, G. Artificial Neural Network Based Apple Yield Prediction Using Morphological Characters. Horticulturae 2023, 9, 436. https://doi.org/10.3390/horticulturae9040436

AMA Style

Bharti, Das P, Banerjee R, Ahmad T, Devi S, Verma G. Artificial Neural Network Based Apple Yield Prediction Using Morphological Characters. Horticulturae. 2023; 9(4):436. https://doi.org/10.3390/horticulturae9040436

Chicago/Turabian Style

Bharti, Pankaj Das, Rahul Banerjee, Tauqueer Ahmad, Sarita Devi, and Geeta Verma. 2023. "Artificial Neural Network Based Apple Yield Prediction Using Morphological Characters" Horticulturae 9, no. 4: 436. https://doi.org/10.3390/horticulturae9040436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop