Crop Identification by Machine Learning Algorithm and Sentinel-2 Data

Stournaras, Serafeim; Loukatos, Dimitrios; Arvanitis, Konstantinos G.; Kalatzis, Nikolaos

doi:10.3390/IOCAG2022-12261

Open AccessProceeding Paper

Crop Identification by Machine Learning Algorithm and Sentinel-2 Data^†

by

Serafeim Stournaras

^1,*

,

Dimitrios Loukatos

¹

,

Konstantinos G. Arvanitis

¹

and

Nikolaos Kalatzis

²

¹

Department of Natural Resources Management and Agricultural Engineering, Agricultural University of Athens, 75 Iera Odos Str., 118 55 Athens, Greece

²

Computer Software Engineering, Technical Project Manager in Neuropublic S.A., Methonis 6 Str., 185 45 Piraeus, Greece

^*

Author to whom correspondence should be addressed.

^†

Presented at the 1st International Online Conference on Agriculture—Advances in Agricultural Science and Technology, 10–25 February 2022; Available online: https://iocag2022.sciforum.net/.

Chem. Proc. 2022, 10(1), 20; https://doi.org/10.3390/IOCAG2022-12261

Published: 14 February 2022

(This article belongs to the Proceedings of The 1st International Online Conference on Agriculture—Advances in Agricultural Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

:

There is a growing need for remote identification of crop types, which is a serious issue for policymakers and statistical accountants (i.e., agricultural inspectors and government agencies), for verifying the degree of validity of the information concerning the area and the type of each crop being cultivated. In this work, remote Sentinel-2 imaging data was utilized for calculating average NDVI values, twice a month, for the period 2017–2020, for cotton, rice and olive trees. In addition, a machine learning algorithm was developed and the corresponding model was trained using the average NDVI values. Python programming and KNN machine learning on the PyCharm environment were used.

Keywords:

crop identification; NDVI; Sentinel-2; machine learning

1. Introduction

The degradation of the arable land and the water resources and the rapid growth of the world population intensifies the need for enhancing agricultural productivity [1]. The latter goal is facilitated by the accurate mapping and crop-type identification for supporting crop growth monitor, yield prediction, and global food-security decisions [2]. Spatial information on the distribution of arable land, in conjunction with crop type identification, can help in valuable and accurate statistical estimations, such as forecasting of agricultural production and crop area estimation, thus improving the efficiency of agricultural policy mechanisms. Accurate crop maps, generated by sensors, remote earth observation, or a combination of them, can form the basis for agricultural monitoring and decision-making in remote areas, to support sustainable agricultural land management [3]. Due to the fact that each crop has its unique phenology and that the phenological differences among the crops can be described by time series of remotely sensed images [4], the satellite time series images were widely used for annual crop classification [3,5]. Most existing annual crop classifications use the image time series of the entire growing season to identify crops [5,6].

Indeed, remote sensing data has found its way into precision agriculture with the aim of increasing agricultural efficiency. Second, remote sensing is a valuable tool for monitoring agricultural expansion. Finally, it provides timely, comprehensive, objective, transparent, accurate and non-discriminatory data, where the resulting remote information can be used without hesitation. Most existing annual crop classifications use the image time series of the entire growing season to identify crops [3].

Furthermore, as NDVI time series can be used to describe the phenological differences among different crops, Hao et al. [7] propose an NDVI time series-based method (RBM). This method generates reference NDVI time series that can be used to identify crop types and produce training samples [8]. It is generally accepted that phenological and growing patterns of a crop are actually similar or the same in the different regions of the world [9]. Based on this phenomenon, it is reasonable to hypothesize that a supervised classification model trained in one region can be applied to other regions to identify the crops common in both the training and applied regions. Since major types of crops, such as olive trees, wheat and rice are distributed globally, it will make the global in-season identification of major crops possible. The key element is to train a classification model efficient enough to compensate for the differences between the actual crop growth environment and the training region where plentiful ground samples are openly available, without sacrificing prediction accuracy [10,11].

Benefited from the rapid growth of the ICT (Information and Communication Technologies) technologies that provide well-documented and easy-to-use programming environments [12], this work presents a simple crop identification method, for three different crop types, using machine learning, which exploits the growing season’s NDVI time series extracted by Sentinel-2 (S2) data. These data were acquired over the area of interest, from the beginning of the growing season in 2017 until the end of the year 2020, were further processed, compared against ground-truth data of cover crops, and fed machine learning techniques in a python environment.

2. Methods and Materials

Ground-based data were collected by Sentinel-2 (S2) fortnightly during the year. The three crop types selected were cotton, rice and olive trees. Data availability and validity were assured by Neuropublic SA (http://www.neuropublic.gr, (accessed on 30 November 2021)) company. Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission, supporting Copernicus Land Monitoring studies, including the monitoring of vegetation, soil and water cover, as well as observation of inland waterways and coastal areas. The Sentinel-2 Multispectral Instrument (MSI) collects data using 13 spectral bands, with four bands at 10 m, six bands at 20 m and three bands at 60 m of spatial resolution [13].

In 2021, S2 data covering the spatial distribution of the training samples selected in this study were utilized for each crop type. The digital data utilized to calculate the NDVI values were based on Band 4 (0.665 μm) and Band 8 (0.842 μm) for the period 2017–2020. We collected S2 data between days 01 and 365 of the year, then calculated NDVI for each image with Red and NIR bands [14], and then generated NDVI time series by selecting the maximum NDVI value within each 15-day window. The 15-day time granularity was necessary because cloud-free images cannot be acquired daily, while an every 15-day image time series could best describe the crop phenological difference and reduce the number of missing values, which is the best choice for in-season crop classification [15].

In case there was no value to be assigned as a 15-day sample, a gap was marked and the 15-day composited NDVI time series were initially filled via a moving window method, by calculating the average of the two neighboring high-quality values in the time series, and finally smoothed using Savitzky–Golay (S–G) filters, to further ameliorate irregular variations in the NDVI time series [3,16]. The equation for calculating NDVI is given by Equation (1).

NDVI = \frac{ρ (N I R) - ρ (R e d)}{ρ (N I R) + ρ (R e d)}

(1)

Equation (1): for calculating NDVI.

Where ρ(NIR) and ρ(Red) donate the SR reflectance of NIR and Red band respectively, which are Band 4 and Band 8 of Sentinel-2 data. Afterward, the aggregate NDVI time series of all these training samples were composed of the separate 15-day NDVI image time series between 2017 and 2020. All values were collected in one MS Excel datasheet file, for further processing.

Both the trained algorithm (Figure 1a) and identification algorithm (Figure 1b, top) were implemented in python language. Figure 1a depicts the algorithm that was used for training and testing the model. The preprocessed data were imported into algorithm by MS excel datasheets that were generated from the averaged values of NDVI time series. Some of the selected columns and series create a learning dataset for the training of itself and the others are used for checking the training. We choose the columns that presented raises of the NDVI values because they correspond to increase in plant’s biomass and help future predictions. Trained algorithm uses ‘pandas’ module for the data entry and scikit-learn module with KNN classifier about machine learning. At the end of Figure 1a, the command ‘saving of the model’, in particular, generates a .joblib file that is going to be used by the second algorithm (Figure 1b, top), thus providing crop type identification based on user-provided data [17].

The top part of Figure 1b explains the identification algorithm that makes the prediction of the crop type. Identification algorithm asks as input the NDVI values referring to a specific period and uses scikit-learn [17,18] module for rerunning the saved model. These input NDVI values depend on cultivation season and are connected with the plant’s biomass. After that, the algorithm fits the values being inserted in the already trained algorithm (i.e., the model) and thus makes the identification of the crop type.

For the training of the algorithm is used the K-Nearest Neighbors (KNN) module (Figure 1b, bottom). K-Nearest Neighbors (KNN) is a type of supervised learning algorithm used for both regression and classification. KNN tries to predict the correct class for the test data by calculating the distance between the test data and all the training points. Then closest to the test data points is selected as a K number. The KNN algorithm calculates the probability of the test data belonging to the classes of ‘K’ training data and class holds the highest probability will be selected. In the case of regression, the value is the mean of the ‘K’ selected training points [18,19].

KNN Algorithm Description:

Select the K number of the neighbors. K value indicates the count of the nearest neighbors;
Calculate the Euclidean distance of K number of neighbors;
Take the K nearest neighbors as per the calculated Euclidean distance;
Among these k neighbors, count the number of the data points in each category;
Assign the new data points to that category for which the number of the neighbor is maximum;
The KNN model is ready.

QGIS is free and open-source platform desktop geographic information system (GIS) application that supports graphic viewing and editing of geospatial data. In this paper, data were utilized with QGIS Desktop 3.16.2 with GRASS 7.8.4 that allows user to analyze and edit geospatial data, in order to export a graphical map. Graphical maps were exported and showed the heterogynous of the crop types due to different growing and climate conditions, but also shows the heterogynous between different and same crop’s phenological stages by collected NDVI.

3. Experimentation, Results and Discussion

3.1. Training Process

Three different crops were analyzed: cotton, rice and olive tree. The data used for the experiment were preprocessed and a monthly average of NDVI values were calculated for each month. Preprocess included the typical method of average. Next, a machine learning algorithm was developed and training was accomplished utilizing monthly average NDVI values. Python programming language and KNN machine learning module (Figure 2a) on a PyCharm shell were used for the development of the machine learning algorithm. NDVI data that concentrated in the period 2017 until 2020 (Figure 2c) were utilized by the algorithm on eighty (80) percent for training and twenty (20) percent for testing itself. Figure 2b,d provide the graphical presentation of cotton and olive tree, respectively, NDVI datasets, for the month of July, using color indicators. For verification purposes, we used the QGIS program.

The trained model was saved on a .joblib file. As was anticipated, after two or three runs the algorithm’s accuracy has become ‘1.00′, because of the low number of the samples that are on the training NDVI dataset, Figure 3a. The fast convergence of the training model process is attributed to the limited amount of the original data being used.

3.2. Identification Process

The algorithm that was developed for identifying the crops was tested by inserting used defined NDVI series values and comparing them against the trained algorithm (.joblib). Through this process, the entry-level identification of crop types was accomplished based on a few NDVI values. The trained model algorithm implementation (Figure 3b) asks the user for specific NDVI values, per month, for a specific period and cultivation. Finally, the result, i.e., the matching crop type being identified, is provided by a textual output as depicted in the bottom part of Figure 3b.

In this case, user-given values are colored in green (Figure 3b). The execution of the testing algorithm using slightly different input values returned the same (crop type) result. These results were fast-generated and satisfactory, for this simple machine learning implementation.

3.3. Discussion

In this paper, a method was presented for introducing the potential of machine learning techniques using python for crop identification purposes. This approach is beneficial for either students of agriculture or professionals wanting to become familiar with the techniques of the digital era. The dataset being used was quite limited, but further ongoing research, combining this method with richer NDVI time series is delivering satisfactory results that will be included in a more mature version of this preliminary work. Apart from the core python-based machine learning engine, the role of assistive open-source tools for elaboration and visualization of geospatial agricultural-specific data, such as the QGIS, is also highlighted.

4. Conclusions

This paper highlighted the feasibility of implementing the presented simple K-nearest neighbor (KNN) model, in which classification models were trained with a type of supervised learning algorithm used for both classification and regression. KNN tries to predict the correct crop type from the test data by calculating the distance between the test data and all the training points corresponding to composited NDVI time series. Training NDVI data were collected across Greece to contain the NDVI time series of each crop under different climate and irrigation conditions. The trained classification model was then tested for crop identification. The performance of this KNN model was tested using real data. The learning method achieved proper identification results when using the NDVI time series referring to the entire growing season. The identification of the crop type by slightly different NDVI values was satisfactory. Training, refinements and tests using more data, as well as better visualization of the results, will be significant future objectives.

Author Contributions

Conceptualization, S.S., K.G.A., D.L. and N.K.; Methodology, S.S. and K.G.A.; Data curation, S.S. and N.K.; Validation, S.S., K.G.A., D.L. and N.K.; Formal analysis, S.S., K.G.A., D.L. and N.K.; Writing—original draft preparation, S.S., K.G.A., D.L. and N.K.; Writing—review and editing, S.S., K.G.A., D.L. and N.K.; Supervision, K.G.A., D.L. and N.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

The authors are grateful to the company Neuropublic S.A. for providing access to the original crop data that were used in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bajželj, B.; Richards, K.S.; Allwood, J.M.; Smith, P.; Dennis, J.S.; Curmi, E.; Gilligan, C.A. Importance of food-demand management for climate mitigation. Nat. Clim. Chang. 2014, 4, 924–929. [Google Scholar] [CrossRef] [Green Version]
Lobell, D.B. The use of satellite data for crop yield gap analysis. Field Crops Res. 2013, 143, 56–64. [Google Scholar] [CrossRef] [Green Version]
Hao, P.; Di, L.; Zhang, C.; Guo, L. Transfer Learning for Crop classification with Cropland Data Layer data (CDL) as training samples. Sci. Total Environ. 2020, 733, 138869. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Feng, L.; Yao, F. Improved maize cultivated area estimation over a large scale combining MODIS-EVI time series data and crop phenological information. ISPRS J. Photogramm. Remote Sens. 2014, 94, 102–113. [Google Scholar] [CrossRef]
Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
Löw, F.; Michel, U.; Dech, S.; Conrad, C. Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using support vector machines. ISPRS J. Photogramm. Remote Sens. 2013, 85, 102–119. [Google Scholar] [CrossRef]
Hao, P.; Wang, L.; Zhan, Y.; Niu, Z. Using moderate-resolution temporal NDVI profiles for high-resolution crop mapping in years of absent ground reference data: A case study of bole and Manas Counties in Xinjiang, China. ISPRS Int. J. Geo-Inf. 2016, 5, 67. [Google Scholar] [CrossRef] [Green Version]
Hao, P.; Wang, L.; Zhan, Y.; Wang, C.; Niu, Z.; Wu, M. Crop classification using crop knowledge of the previous year: Case study in Southwest Kansas, USA. Eur. J. Remote Sens. 2016, 49, 1061–1077. [Google Scholar] [CrossRef] [Green Version]
Zhong, L.; Gong, P.; Biging, G.S. Efficient corn and soybean mapping with temporal extendability: A multi-year experiment using Landsat imagery. Remote Sens. Environ. 2014, 140, 1–13. [Google Scholar] [CrossRef]
Boryan, C.; Yang, Z.W.; Mueller, R.; Craig, M. Monitoring US agriculture: The US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program. Geocarto Int. 2011, 26, 341–358. [Google Scholar] [CrossRef]
Han, W.; Yang, Z.; Di, L.; Zhang, B.; Peng, C. Enhancing agricultural geospatial data dissemination and applications using geospatial web services. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4539–4547. [Google Scholar] [CrossRef]
Arvanitis, K.G.; Symeonaki, E.G. Agriculture 4.0: The Role of Innovative Smart Technologies Towards Sustainable Farm Management. Open Agric. J. 2020, 14, 130–136. [Google Scholar] [CrossRef]
Sentinel Online. Available online: https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/overview (accessed on 20 December 2021).
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Hao, P.; Wu, M.; Niu, Z.; Wang, L.; Zhan, Y. Estimation of different data compositions for early-season crop type classification. PeerJ 2018, 6, e4834. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Liu, L.; Liu, Y.; Jayavelu, S.; Wang, J.; Moon, M.; Henebry, G.M.; Friedl, M.A.; Schaaf, C.B. Generation and evaluation of the VIIRS land surface phenology product. Remote Sens. Environ. 2018, 216, 212–229. [Google Scholar] [CrossRef]
Model Persistence. Available online: https://scikit-learn.org/stable/modules/model_persistence.html (accessed on 20 December 2021).
Nearest Neighbors. Available online: https://scikit-learn.org/stable/modules/neighbors.html (accessed on 20 December 2021).
Patwardhan Sai. Simple Understanding and Implementation of KNN Algorithm. Available online: https://www.analyticsvidhya.com/blog/2021/04/simple-understanding-and-implementation-of-knn-algorithm/ (accessed on 20 December 2021).

Figure 1. (a) The analysis of the training algorithm; (b) top: Identification of crop type by machine learning algorithm, bottom: KNN module on figure (source: https://medium.com/, (accessed on 30 November 2021)).

Figure 2. (a) Run of the training algorithm; (b) Graphical presentation of cotton crops’ NDVI dataset in month of July using color indicator; (c) Figure of training NDVI dataset; (d) Graphical presentation olive tree crop’s NDVI dataset on month July using color indicator.

Figure 3. (a) Testing results by trained machine learning algorithm; (b) Input for the unknown crop’s NDVI values and identification prediction.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stournaras, S.; Loukatos, D.; Arvanitis, K.G.; Kalatzis, N. Crop Identification by Machine Learning Algorithm and Sentinel-2 Data. Chem. Proc. 2022, 10, 20. https://doi.org/10.3390/IOCAG2022-12261

AMA Style

Stournaras S, Loukatos D, Arvanitis KG, Kalatzis N. Crop Identification by Machine Learning Algorithm and Sentinel-2 Data. Chemistry Proceedings. 2022; 10(1):20. https://doi.org/10.3390/IOCAG2022-12261

Chicago/Turabian Style

Stournaras, Serafeim, Dimitrios Loukatos, Konstantinos G. Arvanitis, and Nikolaos Kalatzis. 2022. "Crop Identification by Machine Learning Algorithm and Sentinel-2 Data" Chemistry Proceedings 10, no. 1: 20. https://doi.org/10.3390/IOCAG2022-12261

Article Menu

Crop Identification by Machine Learning Algorithm and Sentinel-2 Data^†

Abstract

1. Introduction

2. Methods and Materials