Time Series Feature Extraction Using Transfer Learning Technology for Crop Pest Prediction

Tsai, Ming-Fong; Lan, Chun-Ying; Wang, Neng-Chung; Chen, Lien-Wu

doi:10.3390/agronomy13030792

Open AccessArticle

Time Series Feature Extraction Using Transfer Learning Technology for Crop Pest Prediction

by

Ming-Fong Tsai

^1,*,

Chun-Ying Lan

¹,

Neng-Chung Wang

²

and

Lien-Wu Chen

³

¹

Department of Electronic Engineering, National United University, Miaoli 360302, Taiwan

²

Department of Computer Science and Information Engineering, National United University, Miaoli 360302, Taiwan

³

Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, Taiwan

^*

Author to whom correspondence should be addressed.

Agronomy 2023, 13(3), 792; https://doi.org/10.3390/agronomy13030792

Submission received: 7 February 2023 / Revised: 6 March 2023 / Accepted: 8 March 2023 / Published: 9 March 2023

(This article belongs to the Special Issue The New Agricultural Revolution: From Traditional Farms to Smart Agriculture—New Technologies in Agriculture 5.0)

Download

Browse Figures

Versions Notes

Abstract

:

Following the rapid development of information and communication technology, and the huge amounts of data that have undergone explosive growth, artificial intelligence and machine learning have been used for predictive analysis in many fields. However, the prediction accuracy of these machine learning recognition models depends on the quality of the features selected for training. It is therefore very important to analyse characteristics that are meaningful and in line with the target variables as the training conditions for machine learning recognition models. In this paper, we analyse the correlation between features and target variables using the Pearson product-moment correlation coefficient, and integrate transfer learning technology for sequential feature extraction to enhance the prediction accuracy of a machine learning recognition model for the prediction of multiple crop pests and diseases as the performance verification target of the proposed method. The performance of our machine learning recognition model is compared with schemes in related work, and our approach is shown to increase the prediction accuracy by between 3% and 15%.

Keywords:

time series feature extraction; transfer learning technology; crop pest prediction

1. Introduction

Benefiting from the development of artificial intelligence and big data technologies, which are now being implemented in many different fields, many prior studies have used artificial intelligence and machine learning to construct recognition models to solve the prediction problems that arise in these fields. Due to the slight differences in the data types that are suitable for various machine learning prediction algorithms, the basic requirement is to select the most suitable machine learning prediction algorithm for a specific data type [1,2]. The most important aspect that affects the prediction accuracy of various machine learning recognition models is feature selection, as selecting features that are uncorrelated or not highly correlated with the target variable will affect the predictive performance of the machine learning recognition model, meaning that the feature selection technology used for training of a machine learning recognition model is a very important research issue [3,4,5,6,7,8,9,10,11,12,13,14]. The prediction of crop diseases and insect pests is a key topic in the field of machine learning algorithms for intelligent agriculture [15,16,17]. One previous study used climate conditions, cotton pests and disease datasets, provided by the Crop Pest Decision Support System website, to develop a machine learning recognition model based on LSTM technology to predict the occurrence of cotton diseases and insect pests [18]. Features were selected that were highly correlated with the target variable to construct a machine learning recognition model, and it was proven that the occurrence of cotton diseases and insect pests was related to sequential changes in the climate. Another study used the random forest algorithm in a machine learning model to predict and select the characteristics of atmospheric circulation parameters and construct climate combinations, and Bi-LSTM was used to construct a machine learning recognition model of the degree of damage done by cotton pests [19]. From the above two previous studies, the machine learning recognition model was updated and selected features were used for training and prediction of machine learning recognition models. However, factors such as the overall climate change and pest evolution were considered, which led to the original high correlation features of the data to change the target variable. The use of changed features for the construction of a machine learning recognition model will lead to a decrease in prediction accuracy. In this paper, we therefore propose to use transfer learning technology for sequential feature extraction to dynamically adjust the highly correlated features selected by the machine learning recognition model for new data. We verify the effectiveness of the proposed method for the application of crop disease and insect pest prediction, and conduct an efficiency analysis involving the prediction of Bactrocera dorsalis pests on red dates and the machine learning recognition model of cotton pests from the study described above. The remainder of this paper is structured as follows: we review related work in Section 2 and explain time series feature extraction using a transfer learning technology method in Section 3. We introduce our experimental method and analyse its performance in Section 4. Finally, conclusions and prospects for future work are presented in Section 5.

2. Related Work

The primary task for predictive analysis with artificial intelligence based on machine learning is feature selection for the training of a recognition model. Selecting feature variables that are highly correlated with target variables for training of a recognition model can improve the prediction accuracy. This can be achieved by using the filter, wrapper or embedded methods, features which have a high level of correlation with the target variable; removing uncorrelated features or redundant features can make recognition model training simpler and improve the prediction performance [20,21]. One study in the literature used machine learning to predict the occurrence or non-occurrence of crop pests based on climate conditions and cotton pest datasets provided by the Crop Pest Decision Support System website [18]. The authors used the eight climate conditions provided by this website as features for the training of an LSTM machine learning recognition model. The K-means algorithm was used to discretise the feature values for these eight climatic conditions in advance, while the Apriori algorithm was used to verify whether or not these climatic conditions affected the occurrence of cotton pests. The experimental results of the study proved that the occurrence of cotton pests was related to sequential changes in the climate. However, the authors did not remove the features of the machine learning recognition model training; the use of redundant features can increase the complexity of training a recognition model and reduce the prediction performance. In this paper, we use the Pearson product-moment correlation coefficient for feature analysis and selection to obtain features that are highly correlated with the target variable for training and prediction of a machine learning recognition model, in order to reduce the complexity of training and improve the prediction performance. Another paper extended the research results from the study described above, which considered atmospheric circulation conditions and cotton pest datasets, by adding four further climatic factor combinations to the eight climatic conditions provided by the website for training of a machine learning recognition model [19]. In addition, the random forest algorithm was applied to the 74 atmospheric circulation conditions to observe the contributions of each feature, and the top 25 atmospheric circulation conditions with high correlation were selected. The 12 climatic conditions described above and 25 atmospheric circulation conditions were used as the training features for a Bi-LSTM machine learning recognition model. However, the authors did not take into account the fact that the data may change due to factors such as overall climate change and sudden changes in the evolution of insect pests, which lead to changes in the original features with strong correlation with the target variable. The use of historical features and data was not sufficient to accurately predict the degree of future crop pest damage when used as input to train the machine learning recognition model. In this paper, we use transfer learning techniques with sequential feature extraction to adjust the highly correlated features and data dynamically, thus adjusting the machine learning recognition model to real-time data.

A study in the related literature [22] used the number of pests (captured daily) and weather station information as a forecast dataset to construct a machine learning recognition model. An observation period of three to five days was selected based on the occurrence of pests to randomly divide the data, while retaining the data to carry out training of the machine learning identification model in a sequential manner. Real data obtained from different locations were used as the test dataset to adjust the machine learning identification model to predict the dates of pest occurrence. The goal was to use the climate temperature and relative humidity data information collected by the sensor to predict the occurrence of pests in different places in real time, and to provide enough warning to allow farmers to choose methods to prevent the occurrence of pests and to control the degree of damage. However, the abovementioned study did not consider the changes in sensing values caused by climate change, which can affect the occurrence of pests. Another study [23] used a multi-data type fusion model to predict disease in rice. The authors used information from agricultural meteorological sensors and camera images as a dataset, and an MLP architecture was applied to integrate the images with the sensor data. A CNN architecture was employed to classify rice diseases. The classification results from the MLP and CNN were input as two connection layers to obtain the final rice disease classification. However, the acquisition of camera images in this study was prone to noise from outdoor light and occlusion of the line of sight, resulting in a decrease in classification accuracy. A method [24] was proposed to calculate the number of insects based on density maps from deep learning, in which the RGB colour space and the DenseNet model were used to identify the leaves and the background. The first method used the colour space and VGG16 to classify the pests affecting the leaves. Deep learning was applied to identify the location of insect pests and to construct a number density map for the leaves. However, the acquisition of insect images in this study was prone to problems from outdoor illumination and line of sight occlusion problems, resulting in a decline in classification accuracy. Other authors [25] carried out research on the classification and prediction of pests and diseases based on existing machine learning recognition models. As images are captured in the outdoor environment, the process is often affected by outdoor light or problems with the line of sight. This paper focuses on integrating transfer learning technology for sequential feature extraction to enhance the prediction accuracy of machine learning recognition models.

3. Time Series Feature Extraction Using a Transfer Learning Technology Method

3.1. Overview of the Proposed System

We propose a sequential feature extraction method based on the use of transfer learning. A diagram of the system architecture is shown in Figure 1. After applying data processing, such as cleaning and feature aggregation from historical data, the data are passed to the feature selection stage. At the feature selection stage, many methods can be used to compute the feature variables that are highly correlated with the target variable, to obtain information on the strength of the correlation between each feature variable and the target variable. Multiple highly correlated features are selected as candidate features for our machine learning recognition model. Since real-time data will be added to the dataset one by one, the correlation between the features and the target variables will change. Sequential feature extraction by transfer learning technology involves using the features retained by the previous machine learning recognition model as candidate features, and then using these candidate features for the historical and real-time data to select the top N features for training of the machine learning recognition model and prediction. At the same time, since the data types that are suitable for various machine learning prediction algorithms are slightly different, we construct a complex prediction model for these top N features, and use those with the highest prediction accuracy for the machine learning recognition model as our sequential feature extraction method for transfer learning.

3.2. Design of the Recognition Model

In the data processing phase, the data are cleaned to avoid missing values affecting the subsequent modelling process and resulting in a decrease in prediction accuracy. At this stage, we pre-check the various characteristic types of data and identify abnormal and missing values to perform value-added correction processing. Furthermore, multiple forms of feature data with the same physical meaning are aggregated into a single feature. The method used in this paper to calculate the characteristic variables that are highly correlated with the target variable involves applying the Pearson product-moment correlation coefficient to obtain the correlation between feature variables and target variables. We can use the above method to obtain the correlation between variables; the degree of correlation index that observes the same changes among the above variables has directionality, and ranges from −1 (negative correlation) to 1 (positive correlation). A larger absolute value of the Pearson product-moment correlation coefficient indicates higher correlation, while a lower value indicates lower correlation. The proposed approach applies a sequential feature extraction method with transfer learning in which the Pearson product-moment correlation coefficient is used to analyse the correlation strength between each feature condition (X) and the target variable (Y). The value of the correlation coefficient between each feature condition and the target variable can be obtained as shown in Equation (1). Based on this value, we select the top N features for training and prediction, as shown in Equation (2).

r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}} \times \sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}

(1)

topN _{(j = 1, 2, 3, …, n)} = r _{(i = 1, 2, 3, …, n)}

(2)

When constructing the machine learning recognition model, the dataset that remains after the data processing stage is randomly divided into a training dataset and a test dataset, based on a 6:4 ratio, to build the machine learning recognition model. We apply leave-one-out cross-validation and use the recognition model with the highest prediction accuracy as a sequential feature extraction method for transfer learning. Our sequential feature extraction method with transfer learning takes into consideration the fact that real-time data will be added to the dataset one by one, meaning that the correlations between the features and target variables will change. Transfer learning technology is used for sequential feature extraction and to retain the features of the previous model as the candidate features. We integrate the values of the correlation coefficient for historical and real-time data between each feature condition and target variable to select the top N features for training and prediction. A complex prediction model is constructed based on the above selected top N features, and the k nearest neighbours (KNN), support vector machine (SVM), naive Bayes and random forest algorithms are used for training and prediction of the machine learning recognition model. The approach with the highest prediction accuracy is used as the sequential feature extraction method for transfer learning in the machine learning recognition model, as shown in Figure 2.

3.3. Examples of Recognition Models

When the historical data have passed through the data processing stages, such as data cleaning and feature aggregation, the data have features A, B, C, D, E, and F. The Pearson product-moment correlation coefficient is used to sort the highly correlated features at the feature selection stage, and the top four features are selected (in this case, features A, C, D, and F) for training and prediction by the machine learning recognition model. If real-time data points are added to the dataset one by one, the same Pearson product-moment correlation coefficient will be used to sort the highly correlated features. We assume that the top four candidate features at this time are features A, C, D, and E. We then use the sequential feature extraction method of transfer learning to integrate these four features from the historical data with those of the real-time data, giving features A, C, D, E and F, as shown in Figure 3. The proposed machine learning recognition model uses the KNN, SVM, naive Bayes and random forest algorithms.

3.4. Performance Assessment

The proposed method applies the Pearson product-moment correlation coefficient to analyse the correlations between features and target variables, and uses transfer learning to extract sequential features to strengthen the prediction accuracy of the machine learning recognition models. The accuracy (ACC), precision (Precision), recall rate (Recall) and F1-score are considered to assess the accuracy and effectiveness of the prediction model. For binary classification model outputs, there are only two types of results: correct (P) and incorrect (N). Our model has four outcomes for the case predictions: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The definitions of ACC, Precision, Recall and F1-score are shown in Equations (3)–(6). Multiclass classification tasks of precision, recall, and F1-score can be applied to each label independently, which can be obtained the information through combine results across labels, specified by the average argument.

ACC = \frac{T P + T N}{P + N}

(3)

Precision = \frac{T P}{T P + F P}

(4)

Recall = \frac{T P}{T P + F N}

(5)

F 1-score = \frac{2 P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(6)

4. Experiments and Results

4.1. Prediction of Bactrocera Dorsalis Crop Pests

Bactrocera dorsalis is the primary pest affecting red dates during the growth process, and often causes the fruits to rot or drop, thereby reducing the harvest. Our first experiment was therefore carried out to predict the number of Bactrocera dorsalis pests on red dates to verify the performance of the proposed method. We used climate feature datasets from Taiwan’s agricultural meteorological observation and monitoring platform system (https://agr.cwb.gov.tw/NAGR/history/station_day, accessed on 17 August 2022), as shown in Table 1. We applied data cleaning and feature aggregation to the historical data with the aim of improving the learning ability and generalisability of our machine learning recognition model. The average values were calculated for all mean ground temperature feature conditions, and these were combined into a single value representing the mean ground temperature feature. In this experiment, the Pearson product-moment correlation coefficient was used to analyse the climate characteristics provided by the original Taiwan agricultural meteorological observation and monitoring platform system. We then obtained the correlation between these and the activity of Bactrocera dorsalis pests. A heat map is shown in Figure 4.

Based on the absolute values of the Pearson product-moment correlation coefficients, the correlations were sorted and the top six climate characteristics were selected to construct a complex prediction model. In this experiment, the leave-one-out method was used for cross-validation to evaluate the prediction accuracy of the recognition model, in order to avoid overfitting or underfitting of the model. A dataset containing the numbers of oriental fruit flies from 2020 to 2022, as shown in Table 1, provided by the Farmers Association of Gongguan Township, Miaoli County, was used. The target of abovementioned oriental fruit fly pest number prediction is shown in Table 2. The proposed model takes into consideration the fact that real-time data will be added one by one, thus affecting the correlation of the climate characteristics with the activity of the fruit fly pests. The model uses transfer learning technology for sequential feature extraction to retain the features of the previous machine learning recognition model as candidate features, integrates historical and real-time data, and selects the top six features for training and prediction. In this way, it dynamically adjusts the highly correlated climatic features from the machine learning recognition model to the real-time data. In this experiment, the learning of the top six climate characteristics obtained from the 2020 historical data was transferred to the 2021 real-time data, as shown in Figure 5, and the learning of the top six climate characteristics obtained from the 2021 historical data was transferred to the 2022 real-time data. The top six climate characteristics obtained from this process are shown in Figure 6.

In this experiment, KNN, random forest, SVM and naive Bayes classifier algorithms were constructed to carry out a performance analysis of the selected climate characteristics. The parameters for the KNN algorithm were weights = distance, n_neighbors = 3, algorithm = auto and p = 2. The parameter used for the random forest algorithm was n_estimators = 100. The parameters for the SVM algorithm were kernel = linear and C = 1, while for the Naive Bayes algorithm, we used priors = None. In this experiment, transfer learning technology was integrated with sequential feature extraction to strengthen the machine learning recognition model, and the prediction results are shown in Table 3 and Table 4. From these experimental results, it can be seen that the random forest algorithm gave the highest accuracy for the numbers of oriental fruit fly pests. The learning curves for the model are shown in Figure 7 and Figure 8, and it can be seen that the curve gradually converges as the number of samples increases. We therefore used it in the remainder of the experiment in our machine learning recognition model with the sequential feature extraction method and transfer learning. In addition, the climate conditions were used without feature selection for modelling, and the prediction results are shown in Table 5 and Table 6. A comparison of the results in Table 3, Table 4, Table 5 and Table 6 shows that the use of feature selection was important.

4.2. Prediction of Whitefly Crop Pests

The second experiment aimed to predict the number of whitefly pests on cotton crops, and was used to validate the proposed method. The Crop Pest Decision Support System (http://www.icar-crida.res.in:8080/naip/index.jsp, accessed on 30 July 2022) from the literature provided the climatic characteristics and datasets on the activity of whitefly pests in cotton, as shown in Table 7. The Pearson product-moment correlation coefficients of the abovementioned climate characteristics were analysed to obtain the correlations between various climate characteristics and the effects of whitefly pests, and a heat map is shown in Figure 9.

The results were sorted based on the Pearson product-moment correlation coefficient, from high to low, after taking the absolute values. The top four climate characteristics were selected to construct a complex prediction model, involving the transfer learning of the top four climate characteristics obtained from the historical data in the first year to the real-time data of the second year. The top four climate characteristics obtained in this way are shown in Figure 10. This experiment combined transfer learning technology with sequential feature extraction to strengthen the machine learning recognition model. The prediction results are shown in Table 8, and it can be seen that the KNN algorithm had the highest prediction accuracy for the numbers of whitefly pests. The model learning curve is shown in Figure 11. It was therefore used in the machine learning recognition model with the sequential feature extraction method and transfer learning. Compared with the performance of another scheme in the literature, the method proposed in this paper could improve the prediction accuracy by more than 3%.

4.3. Prediction of Aphid Crop Pests

The third experiment aimed to predict the number of aphid pests on cotton crops, as a validation of the proposed method. We used data from the Crop Pest Decision Support System (http://www.icar-crida.res.in:8080/naip/index.jsp, accessed on 30 July 2022), which provided the climate characteristics and datasets for whitefly pests in cotton, as shown in Table 9. In this experiment, the Pearson product-moment correlation coefficients for the abovementioned climate characteristics were analysed to obtain the correlation between the various climate characteristics and the activity of whitefly pests. A heat map is shown in Figure 12.

The Pearson product-moment correlation coefficients were sorted from high to low after taking the absolute values, and the top four climate characteristics were selected to construct a complex prediction model in this experiment. Transfer learning was applied to the top four climate characteristics obtained from the historical data from the first year, and these were transferred to the real-time data for the second year. The top four climate characteristics obtained in this way are shown in Figure 13. In this experiment, we integrated transfer learning technology with sequential feature extraction to strengthen the machine learning recognition model. The prediction results are shown in Table 10. From these experimental results, it could be seen that the random forest algorithm had the highest prediction accuracy for the numbers of aphid pests. The model learning curve is shown in Figure 14. We therefore used this in the machine learning recognition model with our sequential feature extraction method and transfer learning. Compared with the performance of the related work, the proposed method in this paper was able to improve the prediction accuracy by more than 14%.

5. Conclusions

In this study, we used the Pearson product-moment correlation coefficient to analyse the correlation between features and target variables. Transfer learning technology was integrated with time-series feature extraction to enhance the prediction accuracy of our machine learning recognition models. From a comparison of the performance of our machine learning recognition models for the prediction of multiple types of crop pests with a scheme from the literature, we found that the prediction accuracy value was increased by at least 3% and up to 15%.

Author Contributions

Supervision, M.-F.T.; Writing—original draft, M.-F.T., C.-Y.L., N.-C.W. and L.-W.C. All authors have read and agreed to the published version of the manuscript.

Funding

We thank the Industrial Development Bureau, Ministry of Economic Affairs, Chumpower Machinery Corporation, Kung-Kuan Farmer’s Association and National Science and Technology Council of Taiwan for supports of this project under grant number MOST 111-2221-E-239-024.

Institutional Review Board Statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Uddin, S.; Khan, A.; Hossain, E.; Moni, M.A. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 2019, 19, 1–16. [Google Scholar] [CrossRef]
Krittanawong, C.; Virk, H.U.H.; Bangalore, S.; Wang, Z.; Johnson, K.W.; Pinotti, R.; Zhang, H.; Kaplin, S.; Narasimhan, B.; Kitai, T.; et al. Machine learning prediction in cardiovascular diseases: A meta-analysis. Sci. Rep. 2020, 10, 16057. [Google Scholar] [CrossRef] [PubMed]
Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Khalid, S.; Khalil, T.; Nasreen, S. A survey of feature selection and feature extraction techniques in machine learning. In Proceedings of the Science and Information Conference (SAI), London, UK, 27–29 August 2014; pp. 372–378. [Google Scholar]
Xue, B.; Zhang, M.; Browne, W.N.; Yao, X. A Survey on Evolutionary Computation Approaches to Feature Selection. IEEE Trans. Evol. Comput. 2015, 20, 606–626. [Google Scholar] [CrossRef] [Green Version]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
Salcedo-Sanz, S.; Cornejo-Bueno, L.; Prieto, L.; Paredes, D.; García-Herrera, R. Feature selection in machine learning prediction systems for renewable energy applications. Renew. Sustain. Energy Rev. 2018, 90, 728–741. [Google Scholar] [CrossRef]
Tsai, M.-F.; Chu, Y.-C.; Li, M.-H.; Chen, L.-W. Smart Machinery Monitoring System with Reduced Information Transmission and Fault Prediction Methods Using Industrial Internet of Things. Mathematics 2021, 9, 3. [Google Scholar] [CrossRef]
Masoudi-Sobhanzadeh, Y.; MotieGhader, H.; Masoudi-Nejad, A. FeatureSelect: A software for feature selection based on machine learning approaches. BMC Bioinform. 2019, 20, 1–17. [Google Scholar] [CrossRef]
Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh—A Python package). Neurocomputing 2018, 307, 72–77. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
Shaha, M.; Pawar, M. Transfer learning for image classification. In Proceedings of the IEEE International Conference on Electronics, 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA 2018), Coimbatore, India, 29–31 March 2018; RVS Technical Campus: Coimbatore, India, 2018; pp. 656–660. [Google Scholar]
Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In Proceedings of the International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018; pp. 270–277. Available online: https://link.springer.com/chapter/10.1007/978-3-030-01424-7_27 (accessed on 6 February 2023).
Shen, F.; Chen, C.; Yan, R.; Gao, R.X. Bearing fault diagnosis based on SVD feature extraction and transfer learning classification. In Proceedings of the 2015 Prognostics and System Health Management Conference (PHM), Beijing, China, 21–23 October 2015; pp. 1–6. [Google Scholar]
Newton, A.C.; Johnson, S.; Gregory, P. Implications of climate change for diseases, crop yields and food security. Euphytica 2011, 179, 3–18. [Google Scholar] [CrossRef]
Sharma, A.; Jain, A.; Gupta, P.; Chowdary, V. Machine Learning Applications for Precision Agriculture: A Comprehensive Review. IEEE Access 2021, 9, 4843–4873. [Google Scholar] [CrossRef]
Xie, C.; Wang, R.; Zhang, J.; Chen, P.; Dong, W.; Li, R.; Chen, T.; Chen, H. Multi-level learning features for automatic classification of field crop pests. Comput. Electron. Agric. 2018, 152, 233–241. [Google Scholar] [CrossRef]
Xiao, Q.; Li, W.; Kai, Y.; Chen, P.; Zhang, J.; Wang, B. Occurrence prediction of pests and diseases in cotton on the basis of weather factors by long short term memory network. BMC Bioinform. 2019, 20, 688. [Google Scholar] [CrossRef] [PubMed]
Chen, P.; Xiao, Q.; Zhang, J.; Xie, C.; Wang, B. Occurrence prediction of cotton pests and diseases by bidirectional long short-term memory networks with climate and atmosphere circulation. Comput. Electron. Agric. 2020, 176, 105612. [Google Scholar] [CrossRef]
Senan, E.M.; Abunadi, I.; Jadhav, M.E.; Fati, S.M. Score and Correlation Coefficient-Based Feature Selection for Predicting Heart Failure Diagnosis by Using Machine Learning Algorithms. Comput. Math. Methods Med. 2021, 2021, 8500314. [Google Scholar] [CrossRef]
Marouf, A.; Hasan, M.; Mahmud, H. Comparative analysis of feature selection algorithms for computational personality prediction from social media. IEEE Trans. Comput. Soc. Syst. 2020, 7, 587–599. [Google Scholar] [CrossRef]
Marković, D.; Vujičić, D.; Tanasković, S.; Đorđević, B.; Ranđić, S.; Stamenković, Z. Prediction of Pest Insect Appearance Using Sensors and Machine Learning. Sensors 2021, 21, 4846. [Google Scholar] [CrossRef]
Patil, R.R.; Kumar, S. Rice-Fusion: A Multimodality Data Fusion Framework for Rice Disease Diagnosis. IEEE Access 2022, 10, 5207–5222. [Google Scholar] [CrossRef]
Bereciartua-Pérez, A.; Gómez, L.; Picón, A.; Navarra-Mestre, R.; Klukas, C.; Eggers, T. Insect counting through deep learning-based density maps estimation. Comput. Electron. Agric. 2022, 197, 106933. [Google Scholar] [CrossRef]
Domingues, T.; Brandão, T.; Ferreira, J.C. Machine Learning for Detection and Prediction of Crop Diseases and Pests: A Comprehensive Survey. Agriculture 2022, 12, 1350. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed system.

Figure 2. Pseudocode for the proposed method.

Figure 3. Example illustrating the proposed method.

Figure 4. Correlation between climate characteristics and pest activity.

Figure 5. Transfer of historical data from 2020 to real-time data from 2021.

Figure 6. Correlation between climate characteristics and pest activity.

Figure 7. Learning curve for the random forest model for data from 2021.

Figure 8. Learning curve for the random forest model for data from 2022.

Figure 9. Heat map of the correlation coefficients for Experiment 2.

Figure 10. Transfer learning from historical data to real-time data in Experiment 2.

Figure 11. Learning curve for the KNN model of whitefly pests.

Figure 12. Heat map of the correlation coefficients in Experiment 3.

Figure 13. Transfer learning from historical data to real-time data for Experiment 3.

Figure 14. Learning curve for the random forest model of aphid pests.

Table 1. Climate characteristics used in Experiment 1.

Climate Characteristics
1. Mean pressure (hpa)	13. Maximum gust wind direction (winddirmax)
2. Mean sea-level pressure (hpasea)	14. Cumulative rainfall (rainday)
3. Maximum pressure (hpamax)	15. Maximum precipitation (10 min) (rainmax10)
4. Minimum pressure (hpamin)	16. Maximum precipitation (60 min) (rainmax60)
5. Mean temperature (temp)	17. Accumulative irradiation time (sunhr)
6. Maximum temperature (tempmax)	18. Cumulative solar radiation (solar)
7. Minimum temperature (tempmin)	19. Mean ground temperature (0 cm) (templand)
8. Dew point temperature (dew)	20. Mean ground temperature (5 cm) (templand)
9. Mean relative humidity (RH)	21. Mean ground temperature (10 cm) (templand)
10. Mean wind speed (vrl)	22. Mean ground temperature (20 cm) (templand)
11. Mean wind direction (winddir)	23. Mean ground temperature (50 cm) (templand)
12. Gust peak speed (windmax)	24. Mean ground temperature (100 cm) (templand)

Table 2. Classification of the numbers of pests.

Number of Pests	Category
0~10	0
10~20	1
20~30	2
30~50	3
50~100	4
100~200	5
200~500	6
500 or more	7

Table 3. Results from complex forecasting models for sequential feature extraction from data for 2021.

2021	KNN with Proposed Method	Random Forest with Proposed Method	SVM with Proposed Method	Naive Bayes with Proposed Method
ACC	0.9196	0.9700	0.9700	0.9112
Precision	0.9396	0.9805	0.9800	0.9364
Recall	0.9392	0.9803	0.9803	0.9330
F1-score	0.9280	0.9772	0.9700	0.9250

Table 4. Results from complex forecasting models for sequential feature extraction from data for 2022.

2022	KNN with Proposed Method	Random Forest with Proposed Method	SVM with Proposed Method	Naive Bayes with Proposed Method
ACC	1.0000	1.0000	1.0000	1.0000
Precision	1.0000	1.0000	1.0000	1.0000
Recall	1.0000	1.0000	1.0000	1.0000
F1-score	1.0000	1.0000	1.0000	1.0000

Table 5. Results from complex forecasting models for sequential feature extraction from data for 2021, without feature selection.

2021	KNN	Random Forest	SVM	Naive Bayes
ACC	0.7300	0.88	0.881	0.88
Precision	0.6892	0.9565	0.85	0.9555
Recall	0.8380	0.9142	0.9143	0.9142
F1-score	0.6954	0.9204	0.9200	0.9200

Table 6. Results from complex forecasting models for sequential feature extraction from data for 2022, without feature selection.

2022	KNN	Random Forest	SVM	Naive Bayes
ACC	0.8055	0.8050	0.8055	0.8000
Precision	0.7666	0.9247	0.7666	0.9247
Recall	0.7407	0.7410	0.7400	0.7407
F1-score	0.6083	0.7454	0.6080	0.7450

Table 7. Climate characteristics used in Experiment 2.

Climate Characteristics
MaxT (°C) MinT (°C) RH1 (%) RH2 (%) RF (mm) WS (kmph) SSH (hrs) EVP (mm)

Table 8. Results for prediction of the numbers of whitefly pests.

Whitefly	KNN with Proposed Method	Random Forest with Proposed Method	SVM with Proposed Method	Naive Bayes with Proposed Method	Related Work [18]
ACC	0.95555	0.94444	0.84444	0.7888	0.9244
F1-score	0.95553	0.94438	0.84413	0.7886	0.9243

Table 9. Climate characteristics used in Experiment 3.

Climate Characteristics
1. MaxT (°C)	7. SSH (h)
2. MinT (°C)	8. EVP (mm)
3. RH1 (%)	9. LTP
4. RH2 (%)	10. TF
5. RF (mm)	11. PTR
6. WS (kmph)	12. THC

Table 10. Results for prediction of the number of aphid pests.

Aphid	KNN with Proposed Method	Random Forest with Proposed Method	SVM with Proposed Method	Naive Bayes with Proposed Method	Related Work [19]
ACC	0.8898	0.9661	0.8983	0.8305	0.8298
F1-score	0.8959	0.9725	0.9048	0.8430	0.8234

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsai, M.-F.; Lan, C.-Y.; Wang, N.-C.; Chen, L.-W. Time Series Feature Extraction Using Transfer Learning Technology for Crop Pest Prediction. Agronomy 2023, 13, 792. https://doi.org/10.3390/agronomy13030792

AMA Style

Tsai M-F, Lan C-Y, Wang N-C, Chen L-W. Time Series Feature Extraction Using Transfer Learning Technology for Crop Pest Prediction. Agronomy. 2023; 13(3):792. https://doi.org/10.3390/agronomy13030792

Chicago/Turabian Style

Tsai, Ming-Fong, Chun-Ying Lan, Neng-Chung Wang, and Lien-Wu Chen. 2023. "Time Series Feature Extraction Using Transfer Learning Technology for Crop Pest Prediction" Agronomy 13, no. 3: 792. https://doi.org/10.3390/agronomy13030792

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Time Series Feature Extraction Using Transfer Learning Technology for Crop Pest Prediction

Abstract

1. Introduction

2. Related Work

3. Time Series Feature Extraction Using a Transfer Learning Technology Method

3.1. Overview of the Proposed System

3.2. Design of the Recognition Model

3.3. Examples of Recognition Models

3.4. Performance Assessment

4. Experiments and Results

4.1. Prediction of Bactrocera Dorsalis Crop Pests

4.2. Prediction of Whitefly Crop Pests

4.3. Prediction of Aphid Crop Pests

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI