Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessFeature PaperArticle

Peer-Review Record

Greenhouse Temperature Prediction Based on Time-Series Features and LightGBM

Appl. Sci. 2023, 13(3), 1610; https://doi.org/10.3390/app13031610

by Qiong Cao¹

, Yihang Wu¹, Jia Yang^2,* and Jing Yin¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2023, 13(3), 1610; https://doi.org/10.3390/app13031610

Submission received: 27 December 2022 / Revised: 15 January 2023 / Accepted: 16 January 2023 / Published: 27 January 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

The authors presented the use of six models to predict the temperature in the greenhouse. Using the mean square error (MSE), mean absolute error (MAE), and the coefficient of determination R-square (R2), and they assessed the degree of fit of the model to the added measurements. Noteworthy is the inclusion of time series functions in predictive analysis. However, using a limited set of data does not allow for a more accurate determination of the suitability of the selected lightGBM model for temperature prediction in various conditions of greenhouse operation. The study can only be treated as a contribution to further research.

Author Response

We would like to thank you for your effort in reading our manuscript. And thanks a lot for your affirmation .

Reviewer 2 Report

On the basis of an analysis of environmental factors that affect the indoor temperature, the authors added difference features, statistical features, and other time-series characteristics of the temperature, humidity, and air pressure using cross features as the input of the nonlinear relationship features and applying the LightGBM model, linear regression, SVM, BP neural network, RBF neural network, and MLPRegressor for predictive analysis. This is interesting research work, but there are some problems that need to be solved.

1、The authors should use more data sets to verify the generalization or robustness of the model.

2、Why is the time series model based on deep learning not used?

3、Some recent time series literature should be discussed in relevant work, for example [1,2]

[1] Refined Nonuniform Embedding for Coupling Detection in Multivariate Time Series

[2] Detecting Causality in Multivariate Time Series via Non-Uniform Embedding

Author Response

Question 1：The authors should use more data sets to verify the generalization or robustness of the model.

Response to question 1:

Data were taken from an online competition, and we thus verified the experimental effect directly on the test set in the previous version of the manuscript. No more data could be obtained from the competition organizer, and we divided the existing training set into k folds to block validate. The results show that the MSE and MAE of the prediction results of each model are reduced after cross-validation.

Question 2：Why is the time series model based on deep learning not used?

Response to question 2:

Because the data set are small and the quality is not high, we know from our previous results of many experiments that the prediction effect of using deep learning time series model for this small sample of structured data is not better than that of machine learning, so deep learning is not used this time.

Question 3：Some recent time series literature should be discussed in relevant work, for example [1,2]

[1] Refined Nonuniform Embedding for Coupling Detection in Multivariate Time Series

[2] Detecting Causality in Multivariate Time Series via Non-Uniform Embedding

Response to question 3: Thank you for your suggestions. We added some recent time series literature discussion in related work section in this revised version.

Reviewer 3 Report

In this paper, a method of establishing a prediction model of the greenhouse temperature is proposed, based on time-series features and LightGBM framework.

I have following observation and suggestions to the authors:

1. The abstract seams to have two contradictory sentences:

"Among the models tested, LightGBM performs best, with the mean square error of the prediction results of the model decreasing by 18.61% after adding time-series features", but then, the next sentence

"Comparing with the ... after adding time-series features, the mean square error is 11.70% to 29.12% lower".

If we add the sentence from "Analysis" section (p.16): "In summary, Table 4 shows that the prediction of each model is improved after adding time-series features, and the time-series features make the most obvious improvements to the prediction of LightGBM, with the MSE on the test set decreasing from 0.5868 to 0.4776 (i.e., by 18.61%)."

So which is actually the best result 18.61% or 29.12%, and which algorithm has 29.12% improvement in MSE?

2. The authors should consider moving citation of relevant literature from Introduction to Related work section.

3. The whole text in Related work section does not belong there. Authors should consider moving it to Experiment section. It is not clear at that moment which data containing indoor temperature was used in Figure 1.

4. The article should not be based and named after specific framework (LightGBM). Instead it should rely on ML algorithm. The algorithm is the one that should prove its performance, not the framework. What if somebody doesn't have the access to LightGBM framework, does that mean that he can not use GBDT algorithm in order to reproduce the results?

5. It is not clear to me what 25%, 50% an 75% are in Table 2.

6. Figure 2 shows data distribution before and after the forward filling regarding only one feature - "air pressure (indoor)". Why was this particular feature selected for presenting in the Figure? Does it have the greater number of abnormalities?

Forward filling method propagates the last valid observation forward down the column. I do not see how this could get us from graph 1 (before correction) to graph 2 (after correction) in the Figure 2. It can be seen that, in the original data distribution, the line is almost straight, except drastic drops in pressure value somewhere after 29 hours, close to 44 hours, 132 hours and 248 hours. Only one of those drops (at 132) can be seen in the distribution data after the correction. How could this be done by applying forward filling method alone?

8. In Feature engineering section, are there t-1 Temperature(outdoor)_diff features, for each t-1 and t pair? If so, then the correct name for the feature should be Temperature(outdoor)_diff(t-1,t) or similar.

The way the authors generate "statistical" features are not clear at all to me.

9. It is mentioned in the Abstract that there is decrease of 29.12% in some ML model. In case of what algorithm we can see this decrease?

10. There is no comparison between presented results and the results from the existing literature that deals with the similar research. It is not possible to make a conclusion how proposed model performs compared to other models described in the literature.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors have solved my problems.

Author Response

We would be grateful for your consideration to our paper and for providing helpful comments and suggestions.

Best wishes.

Jia Yang (on behalf of all co-authors)

Jan ,15,2023

Reviewer 3 Report

The authors have answered all the questions and clarified parts of the paper that were unclear to me.

However, I believe that authors should include some parts of the answers into the paper, so that the clarifications become available to the readers too.

For example, the shorter version of the answer to the question 1 (or question 9) should be included in order to clarify what 29.12% refer to.

Also, it should be noted in the paper what authors responded to the question 10, so that is clear why there is no comparison to the other relevant results. For example, the fact that authors "have checked relevant works published in the past 2 years and have not found any other study using this data set." is very relevant in my opinion and increases the scientific value of the paper.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Article Menu

Greenhouse Temperature Prediction Based on Time-Series Features and LightGBM

Further Information

Guidelines

MDPI Initiatives

Follow MDPI