Next Article in Journal
Postharvest Quality and Storability of Organically versus Conventionally Grown Tomatoes: A Comparative Approach
Previous Article in Journal
Shading Net and Grafting Reduce Losses by Environmental Stresses during Vegetables Production and Storage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Performance of Gradient Boosting Learning Algorithm for Crop Stress Identification in Greenhouse Cultivation †

by
Angeliki Elvanidi
and
Nikolaos Katsoulas
*
Department of Agriculture Crop Production and Rural Environment, University of Thessaly, Fytokou Str., 38446 Volos, Greece
*
Author to whom correspondence should be addressed.
Presented at the 1st International Electronic Conference on Horticulturae, 16–30 April 2022; Available online: https://iecho2022.sciforum.net/.
Biol. Life Sci. Forum 2022, 16(1), 25; https://doi.org/10.3390/IECHo2022-12508
Published: 15 April 2022
(This article belongs to the Proceedings of The 1st International Electronic Conference on Horticulturae)

Abstract

:
Greenhouse cultivation is one of the most crucial circular economy systems in agriculture that allows maximum production with less cultivation area, minimum inputs, and low environmental impact. The data generated in high-tech and sophisticated greenhouse operations are provided by a variety of different sensors that enable a better understanding of the operational environment. In this study, a learning algorithm, the gradient boosting machine, was tested using the generated database in order to estimate different types of stress in tomato crops. The examined model performed qualitative classification of the data, depending on the type of stress (such as no stress, water stress, and cold stress). For the comparison, a 10-fold cross-validation strategy on the 10,763 samples from the training set was selected. The dataset was divided in two parts, one for training validation (80%; 8610) and a second for testing (20%; 2152). The cross-validation process was repeated 50 times. Among the data entries used to build the model, the leaf temperature was one of the highest in the feature importance with a ratio of 0.51. According to the results, the gradient boosting algorithm defined all the cases with high accuracy. Particularly, the model correctly identified all 372 samples of the cold stress plants, 1305 out of 1321 samples of the no stress plants, and 431 out of 452 samples of the water stress plants. In these results, the model preserved 98% accuracy in the testing performance and more than 98% accuracy in the validation performance.

1. Introduction

The most important call for a sustainable future in the food sector is to produce more food per hectare without expanding agricultural land in order to accommodate the rapid growth of the world population. To achieve this, there is a need to increase the productivity in greenhouse hydroponic cultivation by redesigning the operation control system [1].
The development of a machine learning (ML) model that combines climate and crop physiology data for detecting different type of stress will result in the improvement of the greenhouse operation.
Up to now, it was not feasible to incorporate crop physiology data in an ML model since most agronomy factors are measured using labor- and time-consuming protocols [2]. Leaf temperature is one of the few indicators that can be measured in a time-series protocol to produce a large volume database required to build a machine learning model. However, leaf or crop temperature is an unstable factor that can present an intense variation according to the climatic and abiotic conditions and cannot be used on its own to estimate different types of crop stress [3]. The combination of leaf temperature with the photosynthesis (Ps) could improve the methodology of defining the type of stress produced in vegetable cultivation.
Recently, the photochemical reflectance index (PRI) that is correlated with rapid changes in de-epoxidation of the xanthophylls cycle and the photosynthesis efficiency (Ps) showed very good results and are able to be measured using soft sensors (i.e., mathematical models using real-time sensor data) to produce a time-series database.
In this research, the methodology of developing a gradient boosting algorithm is presented in order to build an ML model that will combine climatic data with leaf temperature and photosynthesis rate. In this sense, a multisensory tower placed within the greenhouse was used to record how the physiology status of the tomato plants was changing according to their surrounding microclimate. The plants were cultivated under extreme conditions of air temperature and water in the root zone. The resulting database was used to train and test the model.

2. Material and Methods

The measurements were carried out from May to December of 2019 in one of the five compartments of a multi-tunnel greenhouse with a total ground area of 1500 m2 (250 m2 each compartment). The establishments were located at the facilities of the University of Thessaly, Velestino, Volos (latitude 39°22′, longitude 22°44′, and altitude 85 m) in the continental area of eastern Greece.
The tomato plants (Solanum lycopersicum cv. Elpida, Spirou Ltd., Athens, Greece) were cultivated in slabs filled with perlite slabs (ISOCON Perloflor Hydro 1, ISOCON S.A., Athens, Greece). The plants were fertigated with a fresh nutrient solution with set points of electrical conductivity (EC) around 2 dS m−1 and pH 5.8. The nutrient solution supplied to the crop was a standard nutrient solution for tomatoes grown in open hydroponic systems adapted to Mediterranean climatic conditions. The nutrient solution was supplied via a drip system and controlled by a time program irrigation controller (8 irrigation events per day).
In order to record the physiological response of the plants to their surrounding microclimate, tomato plants were imposed to different types of stress. Specifically, the plants were cultivated under (i) a low air temperature around 15 °C (LTS treatment) and (ii) a low water concentration in the root zone with a dose of 30 mL per plant (LWS treatment). Additionally, measurements of (iii) no stressed (NoS treatment) plants were recorded.
In order to build the database of crop physiology and environment microclimate under the mentioned extreme conditions, a multisensory tower was built, consisting of an air temperature sensor (Thygro SDI-12, Symmetron, Gerakas, Greece), relative humidity sensor (Thygro SDI-12, Symmetron, Gerakas, Greece), solar radiation sensor (SP-SS, Apogee Instruments, Logan, UT, USA), leaf temperature sensor (Thermocouples, type T, Delta Ltd., Pico Rivera, CA, USA), leaf wetness sensor (PS-0061-AD, Netsense, Calenzano, Italia), and PRI sensor (type SRS-PRI, Meter Group, Pullman, WA, USA). The multisensory tower was placed within the greenhouse, parallel to the vertical axis of the tomato’s main stem. The measurements started 10 days after each treatment was applied and lasted for 25 days.
In total, 9 features, air temperature (Ta, °C), relative humidity (RH, %), solar radiation (SR, W m−2), leaf temperature (TL, °C), leaf wetness in young leaves (Lwup, %), leaf wetness in mature leaves (Lwdn, %), photochemical reflectance index (PRI), photosynthesis rate (Ps, μmol m−2 s−1), and crop water stress index (CWSI), were added to the model in order to show three outputs (LTS, LWS, and NoS).
In the current research, the CWSI developed by Jackson et al. [4] was calculated. The methodology followed in the current research was described in Baille et al. [5]. The calibration procedure of the remote PRI sensor and the Ps calculation was presented in Elvanidi & Katsoulas [6]. The resulting data sample was of 10,763 values.
In order to obtain high performance in greenhouse data, a series of ML algorithms, such as gradient boosting (GB), multilayer perceptron (MLP), and other artificial neural network algorithms, were examined. Among the algorithms, the GB technique corresponded more sufficiently in the studied tested sample where the measurable parameters were defined as distinct and not as time-series. The GB modeling part of the ensemble learning algorithms that rely on a collective decision from inefficient prediction models is called decision trees.
In the model, a list of hyperparameters were used (learning rate, number of estimators, max tree depth, max features). The cross-validation process was repeated 50 times. The methodology was followed in the current research and described in Friedman et al. [7], Khan et al. [8], and Karamoutsou [9].
The dataset was divided into two parts: one for training validation (80%; 8610) and a second for testing (20%; 2152). All steps, learning, and classification were written in Python. For machine learning, the Python ML Scikit-learn [10] library and the Spyder environment were used.
The statistical criteria involved the accuracy (1), positive predictive values (PPV or precision) (2), sensitivity (or recall) (3), and F1 (F1-score) (4) (where P is the number of real positive cases in the data and N is the number of real negative cases in the data):
Accuracy = TP + TN/(P + N)
Precision = TP/(TP + FP)
Sensitivity = TP/(TP + FN)
F1 = 2 (Precision * Sensitivity)/(Precision + Sensitivity)

3. Results

During the training procedure, the optimum rates of each hyperparameter were defined.
The range values of the learning rate were 0–1, with the most common values being 0.001–0.3. Smaller values made the model robust to the specific characteristics of each individual tree and reduced the possibility of overfitting. However, the low values increased the risk of not reaching the optimum with a fixed number of trees. For the development of the current GB-based classifier, the optimum value that has been chosen among the above learning values (0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1) was 0.5.
The optimum number of estimators in which the total number of sequential trees was defined has been chosen among the values (10, 20, 30, 40, 50, 60, 70, 80, 90, 100) and was 70.
In the max tree depth indicator, in which the depth of the individual trees was controlled, the optimum value has been chosen among the values (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) and was 9.
The optimum max features indicator that defines the number of features that will be used for a best split was chosen among the values (1, 2, 3, 4, 5, 6, 7, 8) and was 7.
The combination of the optimum hyperparameters developed the current GB algorithm for detecting the three specific types of stress. According to the training process, the number of features that will be imposed to the model was defined as 9.
Figure 1 shows the feature importance values obtained from the GB approach in histograms. It is observed that out of the 9 features, 2 features improve the present models to classify the three types of stress, namely (a) TL and (b) Ta. The other characteristics complement the forecasting process by further improving the model. Therefore, in the current algorithm, the more variables were performed as an input, the higher the predictor accuracy. For decreasing the number of inputs, there is a need to increase the testing sample since the greenhouse system is considered a nonlinear system, where the lack of datasets produces a very complex dynamic relation between the climatic factors and the crop physiology, making the response difficult to predict.
Table 1 presents the statistical criteria performed in the GB algorithm during the training and testing processes. According to the data, the GB algorithm produced high criteria in the training set where the accuracy, precision, sensitivity, and F1-score were 100%. The GB model belongs to the family of models that can handle even features with a low predictive power. In addition, the GB model was found to have a high performance in the test set with 98% accuracy, 98% precision, 98% sensitivity, and 98% F1. A comparison of the metrics between the training and testing phase shows that overfitting was avoided.
Figure 2 shows the performance distribution for the GB model according to the three types of stress. More specifically, the GB model correctly “understood” all cases presented as LTS; it “confused” 16 NoS cases as LWS, 21 LWS cases as NoS, and only 1 LWS case as LTS.

4. Discussion

The gradient boosting algorithm is one of the most powerful algorithms in the field of machine learning. The gradient boosting algorithm can be used for predicting not only a continuous target variable (such as a regressor) but also a categorical target variable (such as a classifier). In the current research, quality and quantitative data are involved in the process of building an ML model. Additionally, the GB algorithm can build a highly efficient, more accurate, and high-quality ML model in less time. The GB algorithm performs well under a small, weak size of datasets and unbalanced data such as real-time data management [11,12]. Ravi and Baranidharan [13] and Cai et al. [14] sustain that the GB algorithm is faster than all other machine learning algorithms.
In the current research, the GB algorithm was performed for the first time ever to classify qualitative and quantitative data under greenhouse conditions with very good statistical results. The developed model can be applicable in other greenhouse systems in the Mediterranean region that cultivate tomato crops in hydroponics.
The next step of the current research is to improve the model that was developed by the GB algorithm by decreasing the number of inputs in order to define more types of stress, such as stress occurring in the plants due to high air temperature and low nutrient performance.

5. Conclusions

The current research presented the development of the gradient boosting algorithm to predict three types of stress under greenhouse conditions. The model was made for tomato crops while the training and the testing of the models was performed in a sample of 10,763 datasets. In the model, nine feature inputs were adjusted for predicting three outputs. The developed GB model presented high statistical criteria with more than 98% accuracy, producing high sustainability in greenhouse data that is able to be connected with the operation systems already used. The future perspective of the current research is to extend the model in order to predict more than three type of stress. Application of the current model in greenhouse cultivation allows more efficient and precise farming with less human manpower with high-quality production contributing to the further reduction of the resource’s inputs, energy, and environmental footprint.

Author Contributions

Conceptualization, N.K.; methodology, N.K. and A.E.; formal analysis, A.E.; investigation, A.E.; resources, N.K. and A.E.; data curation, A.E.; writing—original draft preparation, A.E.; writing—review and editing, N.K. and A.E.; supervision, N.K.; project administration, N.K.; funding acquisition, N.K. and A.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research is co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme «Human Resources Development, Education and Lifelong Learning» in the context of the project “Reinforcement of Postdoctoral Researchers—2nd Cycle” (MIS-5033021), implemented by the State Scholarships Foundation (ΙΚΥ).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Elvanidi, A.; Benitez Reascos, C.M.; Gourzoulidou, E.; Kunze, A.; Max, J.F.J.; Katsoulas, N. Implementation of the circular economy concept in greenhouse hydroponics for ultimate use of water and nutrients. Horticulturae 2020, 6, 83. [Google Scholar] [CrossRef]
  2. Katsoulas, N.; Elvanidi, A.; Ferentinos, K.P.; Kacira, M.; Bartzans, T.; Kittas, C. Crop reflectance monitoring as a tool for water stress detection in greenhouses: A review. Biosyst. Eng. 2016, 151, 374–398. [Google Scholar] [CrossRef]
  3. Katsoulas, N.; Savas, D.; Tsirogiannis, I.; Merkouris, O.; Kittas, C. Response of an eggplant crop grown under Mediterranean summer conditions to greenhouse fog cooling. Sci. Hortic. 2009, 123, 90–98. [Google Scholar] [CrossRef]
  4. Jackson, R.D.; Idso, S.B.; Reginato, R.J.; Pinter, P.J. Canopy temperature as a crop water stress indicator. Water Resour. Res. 1981, 171, 133–138. [Google Scholar] [CrossRef]
  5. Baille, A.; Kittas, C.; Katsoulas, N. Influence of whitening on greenhouse microclimate and crop energy. Agric. For. Meteorol. 2001, 107, 293–306. [Google Scholar] [CrossRef]
  6. Elvanidi, A.; Katsoulas, N. Calibration Methodology of a Remote PRI Sensor for Photosynthesis Rate Assessment in Greenhouses. Biol. Life Sci. Forum 2021, 3, 60. [Google Scholar] [CrossRef]
  7. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  8. Khan, R.; Mishra, P.; Baranidharan, B. Crop Yield Prediction using Gradient Boosting Regression. Int. J. Innov. Technol. Explor. Eng. 2020, 9, 2293–2297. [Google Scholar]
  9. Karamoutsou, L. Investigation of the Water Quality Parameters of Lake Kastoria from Time-Series Monitoring Data Using Machine Learning Techniques for Simulation and Prediction. Ph.D. Thesis, University of Thessaly, Volos, Greece, 2020. [Google Scholar]
  10. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  11. Puligudla, P.; Karthik, K.S.; Kumar, K.V.N.; Thirugnanam, M. Prediction of crop yield using gradient boosting. J. Xi’an Univ. Archit. Technol. 2020, 12, 369–374. [Google Scholar]
  12. Shyamala, K.; Rajeshwar, I. Enhanced gradient boosting regression tree for crop yield prediction. Int. J. Sci. Technol. Res. 2020, 9, 1651–1654. [Google Scholar]
  13. Ravi, R.; Baranidharan, B. Crop yield Prediction using XG Boost algorithm. Int. J. Recent Technol. Eng. 2020, 8, 3516–3520. [Google Scholar] [CrossRef]
  14. Cai, W.; Wei, R.; Xu, L.; Ding, X. A method for modelling greenhouse temperature using gradient boost decision tree. Inf. Process. Agric. 2021, 9, 343–354. [Google Scholar] [CrossRef]
Figure 1. Feature importance of the measured factors in the setup of the GB algorithm.
Figure 1. Feature importance of the measured factors in the setup of the GB algorithm.
Blsf 16 00025 g001
Figure 2. The predicted category of the samples of each treatment according to the type of stress for the GB algorithm in the testing process (testing sample 2152).
Figure 2. The predicted category of the samples of each treatment according to the type of stress for the GB algorithm in the testing process (testing sample 2152).
Blsf 16 00025 g002
Table 1. Statistical criteria resulted from (a) the validation (training sample 8610) and (b) the performance (testing sample 2152) of the GB algorithm.
Table 1. Statistical criteria resulted from (a) the validation (training sample 8610) and (b) the performance (testing sample 2152) of the GB algorithm.
Performance and Validation
GB AlgorithmAccuracyPrecisionRecallF1
Training set100%100%100%100%
Testing set98%98%98%98%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Elvanidi, A.; Katsoulas, N. Performance of Gradient Boosting Learning Algorithm for Crop Stress Identification in Greenhouse Cultivation. Biol. Life Sci. Forum 2022, 16, 25. https://doi.org/10.3390/IECHo2022-12508

AMA Style

Elvanidi A, Katsoulas N. Performance of Gradient Boosting Learning Algorithm for Crop Stress Identification in Greenhouse Cultivation. Biology and Life Sciences Forum. 2022; 16(1):25. https://doi.org/10.3390/IECHo2022-12508

Chicago/Turabian Style

Elvanidi, Angeliki, and Nikolaos Katsoulas. 2022. "Performance of Gradient Boosting Learning Algorithm for Crop Stress Identification in Greenhouse Cultivation" Biology and Life Sciences Forum 16, no. 1: 25. https://doi.org/10.3390/IECHo2022-12508

Article Metrics

Back to TopTop