Next Article in Journal
A Discount Technique-Based Inventory Management on Electronics Products Supply Chain
Next Article in Special Issue
Technology Adoption and Learning-by-Doing: The Case of Bt Cotton Adoption in China
Previous Article in Journal
Government Subsidisation and Shareholder Wealth Impact: Evidence from Malaysia
Previous Article in Special Issue
Bitcoin Return Volatility Forecasting: A Comparative Study between GARCH and RNN
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Stochastic Analysis and Neural Network-Based Yield Prediction with Precision Agriculture

1
Department of Mathematics, North Dakota State University, Fargo, ND 58108-6050, USA
2
Department of Agribusiness and Applied Economics, North Dakota State University, Fargo, ND 58108-6050, USA
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2021, 14(9), 397; https://doi.org/10.3390/jrfm14090397
Submission received: 4 August 2021 / Revised: 18 August 2021 / Accepted: 20 August 2021 / Published: 25 August 2021
(This article belongs to the Special Issue Agribusiness Financial Risk Management)

Abstract

:
In this paper, we propose a general mathematical model for analyzing yield data. The data analyzed in this paper come from a characteristic corn field in the upper midwestern United States. We derive expressions for statistical moments from the underlying stochastic model. Consequently, we illustrate how a particular feature variable contributes to the statistical moments (and in effect, the characteristic function) of the target variable (i.e., yield). We also analyze the data with neural network techniques and provide two methods of data analysis. This mathematical model and neural network-based data analysis allow for better understanding of the variability within the data set, which is useful to farm managers attempting to make current and future decisions using the yield data. Lenders and risk management consultants may benefit from the insights of this mathematical model and neural network-based data analysis regarding yield expectations.

1. Introduction

The International Society of Precision Agriculture adopted the following definition of precision agriculture in 2019 (see The International Society of Precision Agriculture (n.d.)): “Precision Agriculture is a management strategy that gathers, processes, and analyzes temporal, spatial, and individual data and combines it with other information to support management decisions according to estimated variability for improved resource using efficiency, productivity, quality, profitability, and sustainability of agricultural production”. Precision agriculture has emerged as a central tool to address current challenges in agricultural sustainability and profitability. Various methodologies of data-science, such as machine learning, have been implemented with this cutting edge technology. With the use of artificial intelligence and data-science, farmer managers can get the precise data that convey all the information related to the optimum health and productivity of the crops, thereby enabling informed decision-making.
There is an emerging literature on applications of data-science to agriculture. In Lemley et al. (2017), machine learning and deep learning techniques are implemented to solve both precision agriculture related problems and build better, smarter consumer devices and services. In Sharma et al. (2021), a systematic review of various machine learning applications in the field of agriculture is provided. The prediction of soil parameters such as organic carbon and moisture content, crop yield prediction, disease and weed detection in crops and species detection is presented. In addition, this paper demonstrates how knowledge-based agriculture can improve the sustainable productivity and quality of the product. In Bauer et al. (2019), the authors present an automated and open-source analytic platform that combines computer vision, and modular software engineering in order to measure yield-related phenotypes from ultra-large aerial imagery. An analysis is developed to map lettuce size distribution across the field, based on which associated global positioning system (GPS) tagged harvest regions have been identified to enable growers and farmers to conduct precision agricultural practices. In Chlingaryan et al. (2018), the authors analyze research developments performed within the last 15 years on machine learning based techniques for accurate crop yield prediction and nitrogen status estimation. In Treboux and Genoud (2020) the authors present the performances of machine learning algorithms on aerial images object detection for high-precision agriculture. The proposed approach in this paper improves the object detection and obtain an accuracy of 94.27%. In Horng et al. (2020) the authors propose a study of harvesting system based on the Internet of Things technology and smart image recognition. The proposed model is implemented for crop detection by collecting and tagging images. It is shown that the proposed model training has a mean average precision of 84%, which is better than the other existing models. In Bauer et al. (2019) a common problem for apple orchards, namely, the attack of the codling moth, is studied. It is shown that a data-science based near sensor neural network algorithms can be implemented to automatically detect the codling moth. The performance of this system is evaluated, and power consumption ideas are discussed for achieving the zero energy balance of the system. Finally, in Addey et al. (2021) the authors examine the implications of risks, uncertainties and random events on the prediction of crop yields.
Motivated by all these studies, we propose a general mathematical model for analyzing yield data. It is shown that a special case of such a model can be a generalized version of the well-known Barndorff–Nielsen and Shephard model. If the statistical moments of the yield data can be obtained from the stochastic model, then it can play a crucial role in understanding the empirical data set. In fact, this will illustrate how a particular feature variable contributes to the statistical moments (and in effect, the characteristic function) of the target variable (i.e., yield). Consequently, we derive expressions for statistical moments from the underlying stochastic model. This model allows for better understanding of the variability within the data set, which is useful to farm managers attempting to make current and future decisions using the yield data. In addition, this model may help to identify errors in yield data collection and recording. Finally, lenders and risk management consultants may benefit from this model’s insights regarding yield expectations.
The data analyzed in this paper come from a characteristic corn field in the upper midwestern United States. The data were collected in 2010. Each observation in the data set represents two seconds of harvesting as a combine harvester traveling through the field. Each observation is associated with a precise latitude and longitude where it was recorded in the field. Yld Vol(Dry)(bu/ac) is the quantity of corn harvested during the 2-seconds interval. It is the variable predicted by the model. The mean Yld Vol(Dry)(bu/ac) for the field is 161.06 and the median Yld Vol(Dry)(bu/ac) is 172.07. Yld Mass(Wet)(lb/ac), Yld Mass(Dry)(lb/ac), and Crop Flw(M)(lb/s) are also measures of corn yield during each 2-seconds interval, but these measures have not been converted to bushels. Furthermore, Yld Mass(Wet)(lb/ac) has not been corrected for moisture, which is represented by the crop’s relative moisture, Moisture(%), and Crop Flw(M)(lb/s) measures corn harvested per second rather than per acre. During each 2-seconds interval, the combine harvested travels a variable distance, Distance(ft), which is determined by the Speed(mph) that the combine is traveling during the interval. The combine’s speed and the width of the combine’s header during the 2-seconds interval, Swth Wdth(ft), determine how many acres could be harvested in one hour at those rates (Prod(ac/h). We implement neural network algorithms to analyze the dry yield volume based on the other observed and significant feature variables.
The organization of the paper is as follows. In Section 2, we propose a general statistical model, and analyze a special case of it—a generalized Barndorff–Nielsen and Shephard model. Theoretical results related to the statistical moments of the target variable are discussed in detail. In Section 3, we provide the description of the data. We analyze the data with neural network techniques and provide two methods of data analysis. Finally, a brief conclusion is provided in Section 4.

2. Mathematical Model and Analysis

2.1. General Framework

For the empirical data, we assume that there is one target variable and n feature variables. We model the target variable S t by
S t = S 0 e X t , where d X t = b t d t + i = 1 n θ t ( i ) σ t d W t ( i ) + d J t ( i ) ,
where b t is a deterministic function of t, W t ( i ) , i = 1 , , n , are independent Brownian motions, and J t ( i ) is the jump process with intensity λ i , i = 1 , , n . We assume that W t ( i ) and J t ( i ) , for i = 1 , , n , are independent. The coefficients θ t ( i ) , at every t satisfy i = 1 n ( θ t ( i ) ) 2 = 1 . In addition to that, σ t is assumed to be stochastic, and its dynamics are governed by
d σ t 2 = F ( σ t 2 , β t ( 1 ) H t ( 1 ) , β t ( 2 ) H t ( 2 ) , , β t ( n ) H t ( n ) ) ,
for an appropriate function F, where H t ( j ) , for j = 1 , , n , are jump processes with intensities μ j , j = 1 , , n . The coefficients β t ( j ) , at every t satisfy j = 1 n ( β t ( j ) ) 2 = 1 . For simplicity, for the rest of the proposal, we assume θ ( i ) = β ( i ) , for i = 1 , , n .
For a special case of (1), we assume that the individual dynamics of a feature variable is given by e Y t ( i ) , where d Y t ( i ) = σ t d W t ( i ) + d J t ( i ) , i = 1 , , n . From (1), we obtain d X t = b t d t + i = 0 n θ t ( i ) d Y t ( i ) . Hence, θ t ( i ) represents the “importance factor” for the i-th feature variables, for i = 1 , , n . We observe, that if i = 1 n ( θ t ( i ) ) 2 = 1 , then i = 1 n θ t ( i ) d W t ( i ) can be represented by d B t , where B t is a Brownian motion. Consequently, (1) can be written as
S t = S 0 e X t , where d X t = b t d t + σ t d B t + i = 1 n θ t ( i ) d J t ( i ) .
The expression (3) provides an alternative explanation for the coefficients θ t ( i ) , i = 1 , , n , and those will be computed in the numerical section. Those represent the significance in terms of big fluctuations (or “jumps”) of the i-th ingredient feature process Y t ( i ) .
We write J t ( i ) in terms of integrals with respect to Poisson random measures N ( i ) ( d t , d x ) , for i = 1 , , n . Consequently,
J t ( i ) = 0 t R x N ( i ) ( d t , d x ) .
Hence, (3) can be written as
S t = S 0 e X t , where d X t = b t d t + σ t d B t + i = 1 n θ t ( i ) R x N ( i ) ( d t , d x ) .

2.2. Special Case: A Generalized Barndorff–Nielsen & Shephard Model

There are some special cases of the proposed model that are studied in the literature in connection to the financial market; for example, the Barndorff–Nielsen and Shephard model (BN-S model). For such a model, the target variable is the stock, see, Barndorff-Nielsen and Shephard (2001a, 2001b); Habtemicael and SenGupta (2016); Issaka and SenGupta (2017) or the commodity price (see, Roberts and SenGupta (2020); SenGupta et al. (2019); Shoshi and SenGupta (2021); Wilson et al. (2019), S = ( S t ) t 0 . On some filtered probability space ( Ω , G , ( G t ) 0 t T , P ) it is modeled by
S t = S 0 exp ( X t ) ,
d X t = ( μ + β σ t 2 ) d t + σ t d W t + ρ d Z λ t ,
d σ t 2 = λ σ t 2 d t + d Z λ t , σ 0 2 > 0 ,
where the parameters μ , β , ρ , λ R with λ > 0 and ρ 0 and r is the risk-free interest rate where a stock or commodity (the target variable) is traded up to a fixed horizon date T. In the above model, W t is a Brownian motion, and the process Z λ t is a subordinator. Additionally, W and Z are assumed to be independent, and ( G t ) is assumed to be the usual augmentation of the filtration generated by the pair ( W , Z ) . We consider a special case of (4), where d Z s ( i ) = 1 ρ 0 x N ( i ) ( d s , d x ) , i = 1 , , n are subordinators. Making a scaling in the time variable, we define s = λ t , for λ > 0 . Then, we obtain d Z λ t ( i ) = 1 ρ 0 x N ( i ) ( λ d t , d x ) , i = 1 , , n , as subordinators. Consequently, we consider S = ( S t ) t 0 on some risk-neutral probability space ( Ω , F , ( F t ) 0 t T , Q ) , given by (4). We consider the case that is more aligned with Nicolato and Venardos (2003). In this case, we assume that the generalized version of (6) takes the form
d X t = ( B 1 2 σ t 2 ) d t + σ t d W t + ρ i = 1 n θ t ( i ) d Z λ t ( i ) ,
where Z ( i ) , i = 1 , , n are independent subordinators. For the drift term, comparing with (6), we thus have μ = B and β = 1 2 . Additionally, we assume that ( F t ) is assumed to be the usual augmentation of the filtration generated by ( W , Z ( i ) ) , i = 1 , , n . In this case, (7) will be given by
d σ t 2 = λ σ t 2 d t + i = 1 n θ t ( i ) d Z λ t ( i ) , σ 0 2 > 0 .
The solution of (9) can be explicitly written as
σ t 2 = e λ t σ 0 2 + 0 t e λ ( t s ) i = 1 n θ s ( i ) d Z λ s ( i ) .
The integrated variance over the time period [ t , T ] is given by σ I 2 = t T σ s 2 d s , and a straight-forward calculation shows
σ I 2 = ϵ ( t , T ) σ t 2 + t T ϵ ( s , T ) i = 1 n θ s ( i ) d Z λ s ( i ) ,
where
ϵ ( s , T ) = 1 exp ( λ ( T s ) ) / λ , t s T .
We derive a general expression for the characteristic function of the conditional distribution of the log-asset price process appearing in the stochastic model given by Equations (5), (8) and (9).
We quote the following result from Nicolato and Venardos (2003), which is known as the “key formula”. This result will play a significant role in the next section.
Lemma 1.
Let Z be a subordinator with cumulant transform κ, and let f : R + C be a complex-valued, left-continuous function such that Re ( f ) 0 . Then
E exp 0 t f ( s ) d Z λ s = exp λ 0 t κ ( f ( s ) ) d s .
We also note that
0 t σ s 2 d s = σ 0 2 λ ( 1 e λ t ) + 1 λ i = 1 n 0 t 1 e λ ( t s ) θ s ( i ) d Z λ s ( i ) = d σ 0 2 λ ( 1 e λ t ) + 1 λ i = 1 n 0 λ t θ s ( i ) θ t s λ ( i ) e s d Z s ( i ) ,
where “ = d ” represents equality in distribution.
Statistical moments are useful quantities, which when computable, can give significant insight into the underlying empirical data. Consequently, it is useful to obtain analytical expressions for statistical moments for the underlying model. In connection to the BN-S model, statistical moments for S t are derived in Ihsan and SenGupta (2018). With this motivation, we derive a couple of results related to certain moments of the target variable, S t . The first result is simple but restrictive. This result is subsequently generalized. For the rest of the paper, we take N N , where N is the set of natural numbers.
Theorem 1.
For a given λ > 0 and ρ < 0 , if N satisfies
N 1 2 λ ρ ,
then the N-th moment of S t with respect to the measure Q is given by
E Q [ e N X t ] = a ( t , N ) i = 1 n exp λ 0 t κ ( i ) θ s ( i ) N ρ + N ( N 1 ) 2 λ ( 1 e λ ( t s ) ) d s ,
where
a ( t , N ) = e N B t + N ( N 1 ) 2 λ ( 1 e λ t ) σ 0 2 ,
and κ ( i ) ( · ) is the cumulant transform of Z 1 ( i ) , i = 1 , , n , with respect to Q .
Proof. 
Let G denote the σ -algebra generated by the Background Driving Lévy Process, BDLP, Z up to time t. We observe
E Q [ e N X t ] = E Q [ e N ( B t 1 2 0 t σ u 2 d u + 0 t σ u d W u + ρ i = 1 n 0 t θ u ( i ) d Z λ u ( i ) ) ] = e N B t E Q e N 2 N 2 0 t σ u 2 d u + N ρ i = 1 n 0 t θ u ( i ) d Z λ u ( i ) .
Using the first equality in (13) for the integrated variance process, we obtain
E Q [ e N X t ] = e N B t + N ( N 1 ) 2 λ ( 1 e λ t ) σ 0 2 E Q e i = 1 n 0 t θ s ( i ) N ρ + N ( N 1 ) 2 λ ( 1 e λ ( t s ) d Z λ s ( i ) = e N B t + N ( N 1 ) 2 λ ( 1 e λ t ) σ 0 2 i = 1 n E Q e 0 t θ s ( i ) N ρ + N ( N 1 ) 2 λ ( 1 e λ ( t s ) d Z λ s ( i ) ,
where in the last step, we use the independence of Z ( i ) , i = 1 , , n . If N satisfies N ρ + N ( N 1 ) 2 λ 0 , that is, N satisfies (14), then for 0 s t ,
N ρ + N ( N 1 ) 2 λ ( 1 e λ ( t s ) ) 0 .
Consequently, by using Lemma 1, we obtain (15). □
Next, we derive a result without any restriction on N. For this paper, we denote the random measure associated with the jumps of a process A t , and the Lévy density of A t , by J A ( · , · ) and ν A ( · ) , respectively. The compensator for J A ( d t , d x ) is ν A ( d x ) d t , and we define J ˜ A ( d t , d x ) = J A ( d t , d x ) ν A ( d x ) d t . For the proof of Theorem 3, we use the following version of the Girsanov theorem. The proof may be found in Øksendal and Sulem (2007).
Theorem 2.
Let u ( t ) and θ ( t , z ) 1 be predictable processes such that the process
Z ( t ) : = exp ( 0 t u ( s ) d W s 1 2 0 t u 2 ( s ) d s + 0 t R ln ( 1 θ ( s , z ) ) J ˜ Z ( d s , d z ) + 0 t R [ ln ( 1 θ ( s , z ) ) + θ ( s , z ) ] ν Z ( d z ) d s ) ,
exists for 0 t T and satisfies E P 1 [ Z ( T ) ] = 1 . Define the probability measure P 2 by d P 2 = Z ( T ) d P 1 . Then u ( t ) d t + d W t is a Brownian motion, and θ ( t , z ) ν ( d z ) d t + N ˜ ( d t , d z ) is a compensated Poisson random process with respect to P 2 .
We now proceed to prove a more general result related to the moments of S t , with respect to the BN-S model. This result does not assume any restriction on N N .
For the short-term, we assume θ s ( i ) to be a positive constant and θ s ( i ) = θ ( i ) > 0 , i = 1 , , n .
Theorem 3.
E Q ( S t N ) = τ 2 ( t , N ) i = 1 n exp 0 λ t κ ( i ) θ ( i ) N ( N 1 ) 2 λ e s d s ,
where τ 2 ( t , N ) is a deterministic function of t, and κ ( i ) ( · ) is the cumulant generating function for Z 1 ( i ) , with respect to Q , where i = 1 , , n .
We remark that an explicit form of a particular case of the function τ 2 ( t , N ) in (18) will be found in the Corollary 4.
Proof. 
Let U be a subordinator with Poisson measure J U ( · , · ) . We will characterize the process later in this proof. Consider the stochastic differential equation
d M t M t = α d t + β d W t + R + γ ( y ) J ˜ U ( λ d t , d y ) , M 0 = 1 ,
where α , β are constants and γ ( y ) > 1 . Consequently, (see Øksendal and Sulem (2007)) we obtain
M t = exp [ α 1 2 β 2 t + β W t + 0 λ t 0 < y < 1 ( ln ( 1 + γ ( y ) ) γ ( y ) ) ν U ( d y ) d s + 0 λ t R + ln ( 1 + γ ( y ) ) J ˜ U ( d s , d y ) ] .
Set β = 0 in (19) to obtain
M t = exp α t + 0 λ t R + ln ( 1 + γ ( y ) ) γ ( y ) ν U ( d y ) d s + 0 λ t R + ln ( 1 + γ ( y ) ) J ˜ U ( d s , d y ) ,
where α = α λ y 1 ( ln ( 1 + γ ( y ) ) γ ( y ) ) ν U ( d y ) . We choose α in such a way that
α = 0 .
Then by Cont and Tankov (2004) (Proposition 8.23), we obtain M t as a martingale. Consider a new measure d T ( t ) = M t d Q ( t ) . Note that with respect to T , the Brownian motion W t still remains the same.
Then
E Q ( S t N ) = E T 1 M t S t N = E T [ exp ( 0 λ t R + ( ln ( 1 + γ ( y ) ) γ ( y ) ) ν U ( d y ) d s 0 λ t R + ln ( 1 + γ ( y ) ) J ˜ U ( d s , d y ) + N B t N 2 0 t σ s 2 d s + N 0 t σ s d W s + ρ N i = 1 n θ ( i ) 0 λ t R + y J Z ( i ) ( d s , d y ) ) ] = τ ˜ ( t , N ) E T [ exp ( 0 λ t R + ln ( 1 + γ ( y ) ) J ˜ U ( d s , d y ) N 2 0 t σ s 2 d s + N 0 t σ s d W s + ρ N i = 1 n θ ( i ) 0 λ t R + y J Z ( i ) ( d s , d y ) ) ] ,
where τ ˜ ( t , N ) = exp N B t λ t R + ( ln ( 1 + γ ( y ) ) γ ( y ) ) ν U ( d y ) , is a deterministic function of t. Using J ˜ U ( d s , d y ) = J U ( d s , d y ) ν U ( d y ) d s , we obtain
E Q ( S t N ) = E T 1 M t S t N = τ ( t , N ) E T [ exp ( 0 λ t R + ln ( 1 + γ ( y ) ) J U ( d s , d y ) N 2 0 t σ s 2 d s + N 0 t σ s d W s ρ N i = 1 n θ ( i ) 0 λ t R + y J Z ( i ) ( d s , d y ) ) ] ,
where
τ ( t , N ) = exp N B t λ t R + ( ln ( 1 + γ ( y ) ) γ ( y ) ) ν U ( d y ) + λ t R + ln ( 1 + γ ( y ) ) ν U ( d y ) = exp N B t + λ t R + γ ( y ) ν U ( d y ) ,
is a deterministic function of t. We choose 1 < γ ( y ) , such that the following holds:
R + ln ( 1 + γ ( y ) ) J U ( d s , d y ) + ρ N + N ( N 1 ) 2 λ i = 1 n θ ( i ) 0 λ t R + y J Z ( i ) ( d s , d y ) = 0 .
Consequently,
E Q ( S t N ) = τ ( t , N ) E T exp N 2 0 t σ s 2 d s + N 0 t σ s d W s N ( N 1 ) 2 λ i = 1 n θ ( i ) 0 λ t R + y J Z ( i ) ( d s , d y ) .
As before, let G denote the σ -algebra generated by the BDLP Z ( i ) , i = 1 , , n , up to time t. Then we obtain
E Q ( S t N ) = τ ( t , N ) E T e N 2 0 t σ s 2 d s N ( N 1 ) 2 λ i = 1 n θ ( i ) 0 λ t R + y J Z ( i ) ( d s , d y ) E T e N 0 t σ s d W s | G = τ ( t , N ) E T e N 2 0 t σ s 2 d s N ( N 1 ) 2 λ i = 1 n θ ( i ) 0 λ t R + y J Z ( i ) ( d s , d y ) e N 2 2 0 t σ s 2 d s = τ ( t , N ) E T exp N ( N 1 ) 2 0 t σ s 2 d s N ( N 1 ) 2 λ i = 1 n θ ( i ) 0 λ t R + y J Z ( i ) ( d s , d y ) = τ ( t , N ) E T exp N ( N 1 ) 2 λ ( 1 e λ t ) σ 0 2 + N ( N 1 ) 2 λ i = 1 n 0 t ( θ ( i ) e λ ( t s ) ) d Z λ s ( i ) = τ 1 ( t , N ) E T exp N ( N 1 ) 2 λ i = 1 n 0 t R + ( θ ( i ) e λ ( t s ) ) y J ˜ Z ( i ) ( λ d s , d y ) ,
where
τ 1 ( t , N ) = τ ( t , N ) exp N ( N 1 ) 2 λ ( 1 e λ t ) σ 0 2 + N ( N 1 ) 2 i = 1 n θ ( i ) 0 t R + ( e λ ( t s ) ) y ν Z ( i ) ( d y ) d s .
With respect to T , using Girsanov’s theorem, we find that the compensated subordinator is (see Øksendal and Sulem (2007) (Theorem 1.35)) given by
J ˜ T ( i ) ( λ d s , d y ) = λ γ ( y ) ν Z ( i ) ( d y ) d s + J ˜ Z ( i ) ( λ d s , d y ) , i = 1 , , n .
Consequently,
E Q ( S t N ) = τ 2 ( t , N ) E T exp N ( N 1 ) 2 λ i = 1 n 0 t R + θ ( i ) e λ ( t s ) y J T ( i ) ( λ d s , d y ) = τ 2 ( t , N ) E T exp N ( N 1 ) 2 λ i = 1 n 0 t ( θ ( i ) e λ ( t s ) ) d Z λ s ( i ) = τ 2 ( t , N ) E T exp N ( N 1 ) 2 λ i = 1 n θ ( i ) 0 λ t ( e s ) d Z s ( i ) ,
where Z t , in the final steps, is a subordinator with respect to T (consequently, (13) is used), and
τ 2 ( t , N ) = τ 1 ( t , N ) exp N ( N 1 ) 2 λ i = 1 n θ ( i ) 0 t R + ( e λ ( t s ) ) λ y ( γ ( y ) ν Z ( i ) ( d y ) ν T ( i ) ( d y ) ) d s .
Thus, with the application of Lemma 1, we obtain
E Q ( S t N ) = τ 2 ( t , N ) i = 1 n exp 0 λ t κ ( i ) θ ( i ) N ( N 1 ) 2 λ e s d s .
Next, we present an immediate corollary that follows from Theorems 1 and 3, by setting N = 1 . Note that when N = 1 , since ρ < 0 , the condition (14) is satisfied.
Corollary 4.
τ 2 ( t , 1 ) = exp B t + λ i = 1 n κ ( i ) θ ( i ) ρ t ,
where κ ( i ) ( · ) is the cumulant generating function for Z 1 ( i ) , with respect to Q , where i = 1 , , n .

3. Data Description and Analysis

The data analyzed in this paper come from a North Dakota corn field in 2010. For the cleaned-up data set, we have six significant feature variables and one target variable. The feature variables are:
(1)
Swth Wdth(ft): The width of the header in ft;
(2)
Distance(ft): The distance (in ft) travelled between two data points;
(3)
Crop Flw(M)(lb/s): The crop flow harvested per second between data points;
(4)
Moisture(%): Crop moisture in %;
(5)
Speed(mph): The speed of the combine;
(6)
Prod(ac/h): Combine productivity per hour.
The target variable is:
  • Yld Vol(Dry)(bu/ac): Yield volume dry.
In other words, for the above feature variables, in (4), for simplicity we assign θ t ( i ) as independent of time, and θ t ( i ) = 1 , for i = 1 , , 6 , and S t represents the “Yld Vol(Dry)(bu/ac)”. Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5 provide various exploratory data analyses.
For Figure 1, from the histogram and the distribution plot for Yield Vol (Dry), respectively, we observe that the distribution of the sample data points of feature Yld Vol (Dry) is left-skewed. It can also be noted that the maximum count of observation falls within the range of 150–200 of Yld Vol (Dry) (bu/ac) for both these plots. In Figure 2 and Figure 3, for our exploratory data analysis, we implement box plots to visualize the measures of dispersion of our data set for the variable Yld Vol (Dry) with respect to two variables—Moisture (%) and Distance(ft), respectively. We observe from both plots that for a certain fixed interval, there is a cluster of outliers underneath the lower whisker of the plots. The inter-quartile range tends to show an increasing pattern either right before or right after the big cluster of outliers. To help us visualize the distribution of a single variable and relationships between two variables, in Figure 4 we use the pairs plot. For our data set, the pairs plot in the diagonal gives us the univariate distribution via a histogram, and the scatter plots shows the bivariate relationship among variables. For Figure 5, the correlation matrix heatmap for our data set provides an overview on the relation among different variables (features). From the matrix heatmap color codes or annotated values, we can observe that the correlation coefficient between most features is low.
The goal of the analysis is to predict the target variable (Yld Vol(Dry)(bu/ac)) based on the feature variables. We conduct the analysis in two different approaches.
Method 1: For this case, at first, we observe that for the “Yld Vol(Dry)(bu/ac)” variable, the maximum and minimum are 399.55 and 10.28 , respectively. Consequently, we divide the target into eight equally spaced intervals, as [ 10 , 60 , 110 , 160 , 210 , 260 , 310 , 360 , 410 ] .
Corresponding to each target variable, we create a list of three categorical data [ a , b , c ] , where a , b , c { 0 , 1 } by the following rule: if the target variable x is such that 10 x < 60 , then [ a , b , c ] = [ 0 , 0 , 0 ] , if 60 x < 110 , then [ a , b , c ] = [ 0 , 0 , 1 ] , if 110 x < 160 , then [ a , b , c ] = [ 0 , 1 , 0 ] , if 160 x < 210 , then [ a , b , c ] = [ 0 , 1 , 1 ] , if 210 x < 260 , then [ a , b , c ] = [ 1 , 0 , 0 ] , if 260 x < 310 , then [ a , b , c ] = [ 1 , 0 , 1 ] , if 310 x < 360 , then [ a , b , c ] = [ 1 , 1 , 0 ] , and if 360 x < 410 , then [ a , b , c ] = [ 1 , 1 , 1 ] .
After this, we train the neural network model with 80 % data and test it on the remaining 20 % data. We create a four-layered deep model with 42 nodes in the first layer, 30 nodes in the second layer, and 20 and 10 nodes in the last two layers, respectively. We use the activation function Rectified Linear Unit (ReLU) for the first input layer followed by the tanh activation function for the three hidden layers. For the output, we use the softmax activation function. To determine which hyper-parameter combination is most efficient for our model, we first conduct exploratory data analysis (EDA) to visualize the validation loss and accuracy of our model and then choose to train the model on 120 epochs, with a batch size of 32.
For the testing data, true positive, true negative, false positive, and false negative are denoted as TP, TN, FP, and FN, respectively. The following measurements are standard:
precision = TP TP + FP ,
recall = TP TP + FN .
The f1-score gives the harmonic mean of precision and recall. The scores corresponding to every class gives the accuracy of the classifier in classifying the data points in that particular class compared to all other classes. The support is the number of samples of the true response that lie in that class. Table 1 provides the classification report for Method 1.
Here we used the keras.callback “EarlyStopping” and monitored the validation loss. We use “EarlyStopping” as it terminates training when the chosen performance measure stops improving. As a result, if we look into the model accuracy plot (Figure 6) we observe that learning stops once the epoch number is 2. We also see that our model performs well enough with such a smaller epoch number and does not crudely over-fit or under-fit the training and testing data. Accuracy at that time is above 97%.
The line plot for the “model loss” (Figure 7) shows that the model is good at minimizing the loss function with fewer epochs.
Method 2: The first part of this method is the same as in Method 1. However, instead of considering a, b, and c separately (as done in Method 1), for this method, we consider those simultaneously. In this case, with the same neural network as described in Method 1, we find that for the test data, the model correctly predicts 97.06% of the time. We create learning curves for different learning rates ( l r ) in Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13. When l r = 1 , 0.01 , we can observe from the loss curve and accuracy curve that the model performs poorly when fitting the training and testing dataset. The test data are over-fitted, and rather than showing any decay, the loss curve shows an increase as the epoch number increases. When 10 5 l r < 10 6 , the model performance is much more improved. This can be justified, as in this case, both model loss and model accuracy have a nicely fitted training and testing data set. When l r > 10 6 , the model performs much poorly, suggesting that there is probably no learning happening at all. The classification report for Method 1 and the model’s ability to correctly predict the test data set in Method 2 tells us that the model is not perfect and there is room for improvement. In our future work, we plan to improve our procedure and further polish this model to enhance the predictive ability for both these methods.
From the analysis, it is clear that (4) can be an appropriate model for the target variable. It is also clear that for both the methods, the neural network model provides a very appropriate prediction of the target variable, with the help of feature variables.

4. Discussion

This paper analyzed yield data from a representative corn field in the upper midwestern United States. Yield data and data on other variables were collected every two seconds as the corn was harvested. These data are modeled to better understand corn yield in this field. In our data analysis, we wanted to answer the question: “Given intervals of fixed length, can a machine-learning-driven neural network model help us predict the interval on which sample points of feature Yld Vol (Dry)(bu/ac) belong to? And if so, how accurately does the model predict that correct interval?” In order to answer these questions, our first task was to create intervals (of fixed size) using the feature Yld Vol (Dry) (bu/ac)) where the left end-points were kept as part of the interval. Once these intervals were constructed, we assigned the sample points of this feature Yld Vol (Dry) (bu/ac) into the correct interval. We labeled them with either 000 or 001 or 010 and so on (eight combination/labels) and created separate columns for these labels in our data set.
After this preparation stage was complete, we trained the data set with six features (Swth Wdth, Distance, Crop Flw, Moisture, Speed and Prod) where the label columns were treated as our test data. To avoid a highly complex deep neural network, we kept the number of hidden layers of our neural network to three. We compiled the model using the Adam optimizer and monitored loss using ‘categorical cross entropy’. When fitting our model, we used callback ‘EarlyStopping’ that monitored the validation loss. ‘EarlyStopping’ was implemented to speed up model training and to stop model fitting when the training data points is no longer learning efficiently; this also helps avoid overfitting. To observe that the learning does not become inefficient and to avoid overfitting, we computed validation loss and accuracy with different learning rates and analyzed them visually. In the last phase of our analysis, we obtained a classification report for our prediction. The final step of our analysis involved investigating how well our model correctly predicted the intervals. When compared to the test data points, the model correctly predicted almost 97% of the sample test points.
In Stelzer and Barndorff-Nielsen (2013), the authors introduce and analyze a multivariate supOU stochastic volatility (SV) model where they present an example of long memory in log returns in the SV supOU model, and in Willinger et al. (1999), the authors investigate whether stock price returns exhibit long-range dependence. Their work motivates us to investigate further with our pricing model to observe if our model exhibits long-range dependence as a stylized factor in our future project. At the same time, an in-depth analysis to determine whether our time series data exhibits any long-range dependence is something we wish to incorporate in a future sequel of our current work. Even though we have a model that could efficiently predict the interval on which sample data points of Yld Vol (Dry) (bu/ac) belong to, there is still room for improvements and adjustments. Before training our data set, we can apply feature selection along with a random forest classifier to help us decide which features are more important than others. We can later replace them with the six features that we manually selected for our training purpose. Once we have allowed the machine to select the important features for us, in our modeling phase we can focus more on hyperparameter tuning by investigating with different epochs, batch sizes, and increasing the weights in the hidden layers.

Author Contributions

All the authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by USDA ARS grant number 58-6064-8-023.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from a cooperating farm manager and are available from the authors pending the permission of that individual.

Acknowledgments

The authors would like to thank the anonymous reviewers for their careful reading of the manuscript and for suggesting points to improve the quality of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Addey, Kwame Asiam, Saleem Shaik, William Nganje, and Indranil SenGupta. 2021. Implications of the Dirichlet processes mixture model on U.S. crop yield predictions in the presence of random shocks. submitted. [Google Scholar]
  2. Barndorff-Nielsen, Ole Eiler, and Niel Shephard. 2001a. Modelling by Lévy processes for financial econometrics. In Lévy Processes: Theory and Applications. Edited by Ole Eiler Barndorff-Nielsen, Thomas Mikosch and Sidney Resnick. Basel: Birkhäuser, pp. 283–318. [Google Scholar]
  3. Barndorff-Nielsen, Ole Eiler, and Neil Shephard. 2001b. Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63: 167–241. [Google Scholar] [CrossRef]
  4. Bauer, Alan, Aaron George Bostrom, Joshua Ball, Christoper Applegate, Tao Cheng, Stephen Laycock, Sergio Moreno Rojas, Jacob Kirwan, and Ji Zhou. 2019. Combining computer vision and deep learning to enable ultra-scale aerial phenotyping and precision agriculture: A case study of lettuce production. Horticulture Research 6: 70. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Brunelli, Davide, Andrea Albanese, Donato d’Acunto, and Matteo Nardello. 2019. Energy Neutral Machine Learning Based IoT Device for Pest Detection in Precision Agriculture. IEEE Internet of Things Magazine 2: 10–13. [Google Scholar] [CrossRef]
  6. Chlingaryan, Anna, Salah Sukkarieh, and Brett Whelan. 2018. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and Electronics in Agriculture 151: 61–69. [Google Scholar] [CrossRef]
  7. Cont, Rama, and Peter Tankov. 2004. Financial Modelling with Jump Processes. CRC Financial Mathematics Series; Boca Raton: Chapman and Hall. [Google Scholar]
  8. Habtemicael, Semere, and Indranil SenGupta. 2016. Pricing variance and volatility swaps for Barndorff–Nielsen and Shephard process driven financial markets. International Journal of Financial Engineering 3: 1650027. [Google Scholar] [CrossRef]
  9. Horng, Gwo-Jiun, Min-Xiang Liu, and Chao-Chun Chen. 2020. The Smart Image Recognition Mechanism for Crop Harvesting System in Intelligent Agriculture. IEEE Sensors Journal 20: 2766–81. [Google Scholar] [CrossRef]
  10. Ihsan, Atif, and Indranil SenGupta. 2018. Moments of the asset price for the Barndorff–Nielsen and Shephard model. Lithuanian Mathematical Journal 58: 408–20. [Google Scholar] [CrossRef]
  11. Issaka, Aziz, and Indranil SenGupta. 2017. Analysis of variance based instruments for Ornstein-Uhlenbeck type models: Swap and price index. Annals of Finance 13: 401–34. [Google Scholar] [CrossRef]
  12. Lemley, Joe, Shabab Bazrafkan, and Peter Corcoran. 2017. Deep Learning for Consumer Devices and Services: Pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consumer Electronics Magazine 6: 48–56. [Google Scholar] [CrossRef] [Green Version]
  13. Nicolato, Elisa, and Emmanouil Venardos. 2003. Option Pricing in Stochastic Volatility Models of the Ornstein-Uhlenbeck type. Mathematical Finance 13: 445–66. [Google Scholar] [CrossRef]
  14. Øksendal, Bernt, and Agnès Sulem-Bialobroda. 2007. Applied Stochastic Control of Jump Diffusions. Berlin/Heidelberg: Springer. [Google Scholar]
  15. Roberts, Michael, and Indranil SenGupta. 2020. Sequential hypothesis testing in machine learning, and crude oil price jump size detection. Applied Mathematical Finance 27: 374–95. [Google Scholar] [CrossRef]
  16. SenGupta, Indranil, William Wilson, and William Nganje. 2019. Barndorff–Nielsen and Shephard model: Oil hedging with variance swap and option. Mathematics and Financial Economics 13: 209–26. [Google Scholar] [CrossRef]
  17. Sharma, Abhinav, Arpit Jain, Prateek Gupta, and Vinay Chowdary. 2021. Machine Learning Applications for Precision Agriculture: A Comprehensive Review. IEEE Access 9: 4843–73. [Google Scholar] [CrossRef]
  18. Shoshi, Humayra, and Indranil SenGupta. 2021. Hedging and Machine Learning Driven Crude Oil Data Analysis Using a Refined Barndorff–Nielsen and Shephard Model. International Journal of Financial Engineering 2021: 2150015. [Google Scholar] [CrossRef]
  19. Stelzer, Robert, and Ole Eiler Barndorff-Nielsen. 2013. The multivariate supOU stochastic volatility model. Mathematical Finance 23: 296. [Google Scholar]
  20. The International Society of Precision Agriculture. n.d. Available online: https://www.ispag.org/ (accessed on 20 August 2021).
  21. Treboux, Jérôme, and Dominique Genoud. 2019. High Precision Agriculture: An Application Of Improved Machine-Learning Algorithms. Paper presented at 2019 6th Swiss Conference on Data Science (SDS), Bern, Switzerland, June 14; pp. 103–8. [Google Scholar] [CrossRef]
  22. Willinger, Walter, Murad S. Taqqu, and Vadim Teverovsky. 1999. Stock market prices and long range dependence. Finance and Stochastics 3: 1–13. [Google Scholar] [CrossRef]
  23. Wilson, William, William Nganje, Semere Gebresilasie, and Indranil SenGupta. 2019. Barndorff–Nielsen and Shephard model for hedging energy with quantity risk. High Frequency 2: 202–14. [Google Scholar] [CrossRef]
Figure 1. (a) Histogram for Yield Vol (Dry). (b) Distribution plot for Yield Vol (Dry).
Figure 1. (a) Histogram for Yield Vol (Dry). (b) Distribution plot for Yield Vol (Dry).
Jrfm 14 00397 g001
Figure 2. Box plot for Moisture (%) vs. Yld Vol (Dry).
Figure 2. Box plot for Moisture (%) vs. Yld Vol (Dry).
Jrfm 14 00397 g002
Figure 3. Box plot for Distance (ft) vs. Yld Vol (Dry).
Figure 3. Box plot for Distance (ft) vs. Yld Vol (Dry).
Jrfm 14 00397 g003
Figure 4. Pair plot with respect to all features.
Figure 4. Pair plot with respect to all features.
Jrfm 14 00397 g004
Figure 5. Correlation matrix heatmap for the data set.
Figure 5. Correlation matrix heatmap for the data set.
Jrfm 14 00397 g005
Figure 6. Model accuracy.
Figure 6. Model accuracy.
Jrfm 14 00397 g006
Figure 7. Model loss.
Figure 7. Model loss.
Jrfm 14 00397 g007
Figure 8. l r = 1 . (a) Train accuracy: 0.911. (b) Test accuracy: 0.908.
Figure 8. l r = 1 . (a) Train accuracy: 0.911. (b) Test accuracy: 0.908.
Jrfm 14 00397 g008
Figure 9. l r = 10 2 . (a) Train accuracy: 0.958. (b) Test accuracy: 0.956.
Figure 9. l r = 10 2 . (a) Train accuracy: 0.958. (b) Test accuracy: 0.956.
Jrfm 14 00397 g009
Figure 10. l r = 10 4 . (a) Train accuracy: 0.310. (b) Test accuracy: 0.305.
Figure 10. l r = 10 4 . (a) Train accuracy: 0.310. (b) Test accuracy: 0.305.
Jrfm 14 00397 g010
Figure 11. l r = 1 × 10 5 . (a) Train accuracy: 0.961. (b) Test accuracy: 0.962.
Figure 11. l r = 1 × 10 5 . (a) Train accuracy: 0.961. (b) Test accuracy: 0.962.
Jrfm 14 00397 g011
Figure 12. l r = 1 × 10 6 . (a) Train accuracy: 0.907. (b) Test accuracy: 0.899.
Figure 12. l r = 1 × 10 6 . (a) Train accuracy: 0.907. (b) Test accuracy: 0.899.
Jrfm 14 00397 g012
Figure 13. l r = 1 × 10 8 . (a) Train accuracy: 0.410. (b) Test accuracy: 0.403.
Figure 13. l r = 1 × 10 8 . (a) Train accuracy: 0.410. (b) Test accuracy: 0.403.
Jrfm 14 00397 g013
Table 1. Classification report.
Table 1. Classification report.
LabelPrecisionRecallf1-ScoreSupport
a0.530.990.6999
b0.991.000.995115
c0.980.750.85438
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shoshi, H.; Hanson, E.; Nganje, W.; SenGupta, I. Stochastic Analysis and Neural Network-Based Yield Prediction with Precision Agriculture. J. Risk Financial Manag. 2021, 14, 397. https://doi.org/10.3390/jrfm14090397

AMA Style

Shoshi H, Hanson E, Nganje W, SenGupta I. Stochastic Analysis and Neural Network-Based Yield Prediction with Precision Agriculture. Journal of Risk and Financial Management. 2021; 14(9):397. https://doi.org/10.3390/jrfm14090397

Chicago/Turabian Style

Shoshi, Humayra, Erik Hanson, William Nganje, and Indranil SenGupta. 2021. "Stochastic Analysis and Neural Network-Based Yield Prediction with Precision Agriculture" Journal of Risk and Financial Management 14, no. 9: 397. https://doi.org/10.3390/jrfm14090397

Article Metrics

Back to TopTop