Next Article in Journal
An Improved UWB/IMU Tightly Coupled Positioning Algorithm Study
Next Article in Special Issue
A Lightweight Fault-Detection Scheme for Resource-Constrained Solar Insecticidal Lamp IoTs
Previous Article in Journal
Research on High Precision Stiffness Modeling Method of Redundant Over-Constrained Parallel Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Characterization of Rice Yield Based on Biomass and SPAD-Based Leaf Nitrogen for Large Genotype Plots

1
School of Engineering, Pontificia Universidad Javeriana Bogota, Cra. 7 No. 40-62, Bogota 110231, Colombia
2
The OMICAS Alliance, Pontificia Universidad Javeriana, Cali 760031, Colombia
3
The International Center for Tropical Agriculture CIAT, Km 17 Recta Cali–Palmira, Palmira 763537, Colombia
4
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), AGAP-Pam, Avenue Agropolis, 34398 Montpellier, France
5
Fedearroz, Centro Experimental Las Lagunas, Km 4 Los Cairos, Tolima 730568, Colombia
6
Chemistry and Chemical Engineering Division, California Institute of Technology, Pasadena, CA 91125, USA
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(13), 5917; https://doi.org/10.3390/s23135917
Submission received: 31 May 2023 / Revised: 23 June 2023 / Accepted: 23 June 2023 / Published: 26 June 2023
(This article belongs to the Special Issue Sensors and Artificial Intelligence in Smart Agriculture)

Abstract

:
The use of Unmanned Aerial Vehicle (UAV) images for biomass and nitrogen estimation offers multiple opportunities for improving rice yields. UAV images provide detailed, high-resolution visual information about vegetation properties, enabling the identification of phenotypic characteristics for selecting the best varieties, improving yield predictions, and supporting ecosystem monitoring and conservation efforts. In this study, an analysis of biomass and nitrogen is conducted on 59 rice plots selected at random from a more extensive trial comprising 400 rice genotypes. A UAV acquires multispectral reflectance channels across a rice field of subplots containing different genotypes. Based on the ground-truth data, yields are characterized for the 59 plots and correlated with the Vegetation Indices (VIs) calculated from the photogrammetric mapping. The VIs are weighted by the segmentation of the plants from the soil and used as a feature matrix to estimate, via machine learning models, the biomass and nitrogen of the selected rice genotypes. The genotype IR 93346 presented the highest yield with a biomass gain of 10,252.78 kg/ha and an average daily biomass gain above 49.92 g/day. The VIs with the highest correlations with the ground-truth variables were NDVI and SAVI for wet biomass, GNDVI and NDVI for dry biomass, GNDVI and SAVI for height, and NDVI and ARVI for nitrogen. The machine learning model that performed best in estimating the variables of the 59 plots was the Gaussian Process Regression (GPR) model with a correlation factor of 0.98 for wet biomass, 0.99 for dry biomass, and 1 for nitrogen. The results presented demonstrate that it is possible to characterize the yields of rice plots containing different genotypes through ground-truth data and VIs.

1. Introduction

Rice is a staple crop in Colombia and plays a significant role in the country’s economy and food security. The average consumption of rice per capita in Colombia was 43.16 kg in 2021. According to the National Rice Growers Federation (FEDEARROZ), Colombia produced approximately 3.326 million metric tons of milled rice in the same year. The country’s rice production is concentrated in Tolima, Huila, Meta, Cauca, and Valle del Cauca. In Colombia, rice is primarily grown in two cropping seasons: a primary season from March to July and a second season from August to December [1]. The phenotypic expression of the rice varies depending on the interaction of the rice genotype with the environmental and growing conditions. FEDEARROZ defines five rice regions in the country with distinct environmental dynamics and high genetic variability in the weedy rice grown, resulting in highly diverse morphological characteristics [2].
Rice yields in Colombia vary depending on the location and production system, with higher yields typically achieved in irrigated systems. However, yields in rain-fed systems can also be high under favorable environmental and management conditions [3]. Various initiatives are in place to improve rice productivity and sustainability, including promoting high-yielding varieties and providing technical assistance and training to farmers [4,5,6]. One topic of interest is plant breeding, where, through the selection of favorable genes and higher-level omics characterization [7], optimal rice varieties are obtained for cultivation. To identify the important genetic characteristics expressed in the environment, phenotyping is performed in two directions: biomass accumulation and nitrogen content [8,9,10].
Biomass is a critical variable in rice crops, and estimating its accumulation in growing cycles enables crop-yield performance to be gauged, determining high-producing varieties [11]. The biomass estimation is usually made by cutting the plant, obtaining its fresh weight, and then dehydrating it to obtain its dry weight. An estimation of this variable using UAV imaging is a common approach that employs multispectral or hyperspectral sensors to capture information about the reflectance properties of vegetation in different parts of the spectrum [12]. This information can be used to calculate vegetation indices (VIs) from which biomass estimates can be derived. In [13,14,15,16], the effectiveness of this approach has been demonstrated, with biomass estimates obtained from UAV images showing strong correlations with on-the-ground measurements in different varieties of rice crops. In addition to providing accurate biomass estimates, UAV images can also be used to create detailed maps of biomass distribution, which can help identify spatial patterns and variability within crop fields [17].
Leaf-blade nitrogen concentration is correlated with the chlorophyll range in plants, and the variability of the pigments is another phenotypic characteristic. Nitrogen is an essential nutrient for plant growth and productivity, and accurate estimations of the nitrogen content in plants can lead to greater precision in fertilizer application, subsequently improving crop yields [18]. A correlation has been found between a plant’s leaf-blade nitrogen concentration and chlorophyll range. Given that the latter is a measurement of leaf-pigment variability, it can be estimated using UAV images. The detailed high-resolution visual data obtained from UAV images can be used to estimate the chlorophyll range and, therefore, indicate likely nitrogen distribution within rice crops. A common approach for estimating nitrogen using UAV images is to use spectral information to derive VIs related to nitrogen content [19]. These indices are based on the fact that radiation absorption in the visible and near-infrared spectra is related to nitrogen content in the vegetation. Several studies have demonstrated the potential of this approach, with UAV-based nitrogen content estimates showing strong correlations with on-the-ground measurements across various crops and ecosystems [9,20,21]. In addition to providing accurate estimates of plant nitrogen content, UAV images can also be used to create detailed maps of the nitrogen distribution within rice fields, which can help identify spatial patterns, variability, and guide precision fertilization [22,23].
Using UAV images for biomass and nitrogen estimation in rice crops is a promising approach for precision agriculture research [24]. By leveraging the high spatial and spectral resolution of UAV imagery, researchers can derive accurate and detailed estimates of these important variables, providing valuable information for understanding rice-crop dynamics [25]. Developing a model for biomass and nitrogen in rice is an important research topic within the area of food security, as rice is a staple crop that feeds more than half of the world’s population [26]. Accurately estimating biomass and nitrogen content in rice can help optimize crop management practices, increase yields, and reduce the environmental impact [27,28]. Developing non-invasive estimation methods, such as parameter estimation through multispectral imaging, allows for more frequent, automated estimates to be made, resulting in closer crop monitoring throughout the growing cycle and the possibility of more timely responses. A number of machine learning (ML) techniques have been employed to correlate VIs with biomass and nitrogen [29,30,31], including linear and nonlinear multivariate regression, support vector machine (SVM), and neural network (NN) models. However, there are other techniques that remain relatively unexplored such as decision trees, regression ensembles, and Gaussian regression processes.
This paper explores biomass and nitrogen dynamics in 59 rice plots and correlates these ground-truth measurements with information drawn from multispectral images captured at a height of 20 m. This study aims to (1) observe the dynamics of biomass and nitrogen behavior in the different genotypes sampled, establishing the ones with the highest yields; (2) analyze the behavior of the VIs in each plot in relation to its genotype; and (3) test various ML techniques to correlate the samples taken using traditional methods with the data drawn from the multispectral sensor images.

2. Materials and Methods

The experiment was conducted during the dry season in 2021 at Saldaña, which is a municipality located in the department of Tolima, Colombia. Situated in the Magdalena River Valley, Saldaña has fertile soils and a suitable climate for rice cultivation. Rice is one of the main agricultural crops in the region, and Saldaña is one of the top rice-producing municipalities in Tolima. Low-lying Tolima has a typically tropical climate, with average temperatures between 18 °C and 28 °C. It has two rainy seasons per year, one from March to May and the other from September to November, and two dry seasons, one from December to February and the other from June to August.
Samples were obtained from the FEDEARROZ (National Rice Growers Federation) experimental station Las Lagunas, in the plot located at latitude 3°54 55.29 north and longitude 74°59 02.75 west at 304 masl. The plot has 410 crop subplots, with an average crop area of 1.72 m 2 each, and contains 330 different genotypes. At flowering, 56 genotypes were randomly selected based on their distribution in the experimental field. Of these genotypes, 53 had 1 repetition, whereas the other 3 had 2 repetitions due to their agronomic relevance. The sampling unit for the determination of biomass had an area of 0.2 m 2 (5 plants per linear meter). Figure 1 shows the GPS points and the distributions of the genotypes in the experiment. The images taken by the UAV are aligned and orthorectified in two orthomosaic maps, the first in the visible light spectrum with red, blue, and green RGB channels, and the second with the red, red-edge, and near-infrared channels. The genotype of interest is labeled with its GPS points and the image extracted to relate it to the biomass and nitrogen measurements in the crop. The genotype image dataset consists of 2475 images in each channel. This set of images was used to evaluate five models for estimating biomass and nitrogen parameters by calculating VIs.

2.1. Experiment Sampling

Sampling was conducted using two methods: destructive or invasive methods and non-destructive or non-invasive methods. The first method consisted of harvesting the entire plant above ground level. The plants were weighed to determine their fresh weight and later the organs were separated into their different parts (such as stems, leaves, and panicles. The plants were dried in an oven at 65 °C for 72 h and weighed to determine the total dry weight of the plant. The values of both the fresh and dry weights were used to determine the percentage of water content ( W C ) of the plant for each genotype according to:
% W C = W e i g h t f r e s h W e i g h t d r y W e i g h t f r e s h × 100
This method is considered the most accurate but time-consuming. In addition, 5 plants per plot were selected due to the amount of plant material to be processed, and 59 samples were obtained to determine the biomass using this method. Plant height was also measured as an indicator of biomass [32,33].
Nitrogen levels were determined using a chlorophyll meter to measure the relative chlorophyll content in the leaves. These meters emit light of a specific wavelength onto the leaf, and the amount of light absorbed by the chlorophyll is measured. The meter reading can then estimate a plant’s nitrogen content [34]. The rice plots were sampled using a SPAD 502 Plus meter (Konica-Minolta), and the ground truth was established using this measure to correlate with the VI estimates.
The values obtained in the field through the first sampling method are displayed in Table 1 and Table 2. The variables include fresh weight, dry weight, water percentage, SPAD measurement, and height of the selected rice crops.
The second method involved instruments and techniques that did not destroy the plants, allowing for repeated measurements and, therefore, monitoring over time. The sampling was carried out by a UAV, which captured multispectral images to estimate the biomass and nitrogen content in the rice crops. The channels of each camera can detect differences in the reflectance of light at different wavelengths, which can be correlated with the biomass and nitrogen content in the plants [35].
Figure 2 illustrates the tools employed for acquiring the multispectral images and the outcome of each channel. The UAV followed the yellow trajectory across each row of rice plots. The rice plots were georeferenced according to the flight altitude and geo-tagged ground-level markers. With each separate spectral sensor, the camera offered a resolution of 1600 × 1300 pixels, translating to a crop-to-image resolution of 2.5 cm/pixel at a flying height of 20 m.
The selected UAV was the Phantom 4 Multispectral, which had six cameras, each with 1/2.9-inch CMOS sensors, including an RGB camera and a multispectral camera array with five cameras covering the blue (B), green (G), red (R), red-edge (RE), and near-infrared (NIR) bands, with wavelengths of 450 nm, 560 nm, 650 nm, 730 nm, and 840 nm respectively. The uncertainty in each channel was ±16 nm. The precision of the multispectral data obtained was maximized by the spectral sunlight sensor on top of the aircraft, which could detect solar irradiance in real time for image compensation. To prevent aberrations that could occur when employing a rolling shutter, the P4 Multispectral used a global shutter.
The acquisition of the multispectral images was carried out on 14 September 2021, from 9:45 a.m. to 12:10 p.m., and the traditional sampling of the other parameters was conducted in parallel. The sampling was carried out on these dates because it was 90 days after sowing, which is the average period during which all genotypes are in the reproductive stage, allowing for the collection of samples at an intermediate state in plant growth. The drone sampling was conducted in a single flight, capturing 588 images with the 6 cameras (RGB, R, G, B, IR, NIR) at 98 points within the crop.
The images were orthorectified with the camera parameters derived from the camera and its position to correct any geometric distortions in the image. This process was aimed at removing any potential distortions and perspective effects from the images, as distortions can significantly impact the precision of the orthomosaic process.
The interior orientation parameters (IOPs) describe the technical specifications of the camera, such as the focal length, location of the principal point, and lens distortion coefficients. These parameters were used to correct distortions caused by the camera’s lens and sensor. Equations (2) and (3) illustrate the orthocorrected pixel coordinates [36].
X = ( x c x ) · H h ( y c y ) · t a n ( ϕ ) · H h + X 0
Y = ( y c y ) · H h · c o s ( ϕ ) + Y 0
where X and Y are the orthocorrected pixel coordinates in the image, x and y are the original pixel coordinates in the image, c x and c y are the coordinates of the image center, H is the height of the camera above the ground, h is the image height in pixels, ϕ is the pitch angle of the camera (in radians), and X 0 and Y 0 are the coordinates of the image center in the ground coordinate system.

2.2. Feature Extraction

In the feature extraction process, the characteristics of the selected rice crop were extracted from the images captured across all channels. An orthomosaic was then created from the 98 images captured from the five channels, as shown in Figure 3a. The orthomosaic was created using the geo-tagged ground-level markers depicted in Figure 2 to ensure that all channels were aligned when subsampling was performed in the plot. The channel data were combined and registered into two images for the purpose of visualizing the orthomosaic. The first orthomosaic, shown in Figure 3b, contains the red, green, and blue channels. The second orthomosaic, displayed in Figure 3c, includes the red, red-edge, and near-infrared channels. In the figures, each rice plot is subdivided into 40 subplots, each of which contains an average of one plant.
In order to ensure precision in the extraction of features, a segmentation method called GFkuts was applied, which has been used to accurately estimate biomass in other crops [15]. For the feature extraction from the UAV images, two methods were selected: the calculation of VIs and the use of the pixel averages of the multispectral images. A VI is a numerical value used to describe the health, density, or growth of vegetation in a particular area. Eight VIs were selected based on their relationship with biomass, nitrogen, and the spectral bands captured by the sensors used. The calculation of the VIs is presented in Table 3 and is explained in [35,37].
The GFKuts segmentation algorithm comprises multiple parts, including K-means clustering, GrabCut, and guided filtering. K-means is an unsupervised clustering algorithm that partitions an image into K clusters. It minimizes the within-cluster variance, which can be defined as follows:
i = 1 k x S i | | x μ i | | 2
where x is a point in the image, S i is the set of points in cluster i, and μ i is the mean of points in cluster i.
The GrabCut algorithm can be modeled as an energy-minimization problem. The energy of a labeling f can be defined as the sum of a region term R ( f ) and a boundary term B ( f ) :
E ( f ) = R ( f ) + λ B ( f )
where f is the label field and λ is a parameter that balances the two terms. The region term R ( f ) is the sum of the negative log-likelihoods of the color model for each pixel. The boundary term B ( f ) is defined in terms of the edges in the graph.
The output of the guided filter q i for an input image I and a guidance image p is defined as:
q i = a k I i + b k
where for every pixel i in a box window k, a k and b k are linear coefficients that are the solution of the following minimization problem:
min a k , b k i ω k ( ( a k I i + b k p i ) 2 + ϵ a k 2 )
where ϵ is a regularization parameter and ω k is the window centered at pixel k.
In the GFKuts algorithm, K-means clustering is first applied to the image to generate an initial segmentation. This segmentation is then refined using GrabCut. Finally, guided filtering is used to smooth the segmentation result. The final output of the GFKuts algorithm is a binary mask that separates the rice canopy from the background.
Four feature matrices were proposed. The first two involved estimations using the genotype ID: the first feature set (FS1) used the aforementioned VIs, and the second feature set (FS2) used the mean pixel value of each multispectral channel and the segmentation. The last two involved estimations using the height: the third feature set (FS3) used the VIs and multispectral channels, and the fourth feature set (FS4) used the mean segmentation value. These feature matrices were designed to evaluate the usefulness of calculating the vegetative indices and the mean of the channels for estimating the variables. Once the four sets of features were obtained, they were labeled with the wet biomass, dry biomass, and SPAD estimation variables.

2.3. Estimation Models

The ground-truth data for training these ML algorithms are specified in Table 1 and Table 2. The data include the fresh weight, dry weight, percentage of water content, and measured SPAD values, which, through their adherence to a linear correlation, were directly correlated with the leaf-blade N concentrations. A number of plants that matched the previously mentioned plots were manually collected for destructive biomass testing. The samples taken from each subplot were weighed at the time of cutting for a fresh weight estimate, and again after drying to produce a dry-weight estimate in order to define the corresponding ground truth.
The collected multispectral image database consisted of 11,800 images, resulting from the 59 rice plots and the 40 subplots in each rice plot, totaling 2360 images in each channel. For the experiments, a cross-validation approach was proposed, using 80 % of the data for cross-validation, amounting to a total of 1888 images. The remaining 20 % of the data were used for testing and were not involved in the training process at all. From the 1888 images, five k-folds were randomly assembled, with each fold comprising 70 % of the data for training and 30 % for validation in order to enhance the robustness of the model. Training was performed for five models, including Gaussian process (GP) regression, tree regression (TR), ensemble regression (ER), support vector machine (SVM) regression, and neural network regression (NNR).
Tree regression is an ML technique in which a decision tree model is created to predict a continuous target variable. In this approach, the tree is constructed by categorizing the data into subsets based on the value of one of the input features. The goal is to recursively create binary splits that maximize the reduction in the variance of the target variable. The final model is a tree structure, where each leaf node contains a predicted value for the target variable [46]. Tree regression can be prone to overfitting so techniques such as pruning or ensembling multiple trees can be used to improve its performance.
MSE ( t ) = 1 N t i D t ( y i y ¯ t ) 2
where M S E ( t ) is the mean square error at node t, N t is the total number of samples at node t, D t is the dataset at node t, y i is the target value of the i-th instance at node t, and y ¯ t is the average value of the responses at node t.
Ensemble regression involves combining multiple models to make a more accurate prediction. One popular ensemble technique is random forest, which is an extension of the decision tree model. It creates multiple trees on randomly sampled subsets of the data and combines their predictions by averaging. Another popular ensemble technique is gradient boosting, which builds a sequence of decision trees in which each new tree attempts to correct the errors made by the previous trees [47]. Ensemble regression techniques can be more accurate than a single decision tree or linear model.
y ^ ( x ) = 1 M m = 1 M f m ( x )
where y ^ (x) represents the ensemble prediction for an input x , M is the number of base regression models in the ensemble, and f m ( x ) is the prediction of the m-th base regression model for input x .
Gaussian process (GP) regression is a probabilistic machine learning technique used for regression problems. It is based on the assumption that any finite set of points in the input space has a joint Gaussian distribution over the corresponding target values. In GP regression, a prior distribution over the space of functions is defined using a covariance function, which determines how correlated the output values are for any two inputs. Given a set of input–output pairs, the posterior distribution over functions can be computed, which can be used to make predictions or calculate uncertainty estimates [48]. A Gaussian process is defined by a mean function m( x ) and a covariance function (kernel) k( x , x ′):
f ( x ) GP ( m ( x ) , k ( x , x ) )
where f(x) represents the function value at input x, GP denotes a Gaussian process, m( x ) is the mean function, (which is often assumed to be zero for simplicity), and k( x , x ′) is the covariance function (kernel) that defines the relationship between different input points.
Support vector machines can also be used for regression problems. In SVM regression, the goal is to find a hyperplane that maximizes the margin between the predicted values and the actual values. The model tries to fit a linear function to the data while minimizing the errors or deviations from the target variable [49]. In cases where a linear function is inadequate for accurately fitting the data, a nonlinear kernel can be used to transform the data into a higher-dimensional space, where a linear function can better separate the data.
minimize 1 2 | | w | | 2 + C i = 1 N ( ξ i + ξ i * )
subject to y i w T ϕ ( x i ) b ϵ + ξ i
w T ϕ ( x i ) + b y i ϵ + ξ i *
ξ i , ξ i * 0 , i = 1 , , N
where w is the weight vector, ϕ ( x i ) is a feature mapping function that maps the input vector x i to a higher-dimensional space, b is the bias term, C is a regularization parameter that controls the trade-off between maximizing the margin and minimizing the training error, ξ i and ξ i * are slack variables that account for prediction errors outside the ε -tube, y i is the target value for the i-th instance, and N is the number of instances in the dataset.
Neural network regression models are a type of machine learning model that are well-suited for prediction tasks involving nonlinear data. They consist of an input layer, one or more hidden layers, and an output layer. Each layer consists of a number of nodes or “neurons”. Each neuron in a layer is connected to every neuron in the previous and subsequent layers [50]. The neurons transform the inputs using a weighted sum and a nonlinear activation function. The weights are learned during training by minimizing a loss function, such as the mean squared error for regression tasks, using an optimization algorithm such as stochastic gradient descent.
y ( x , w ) = f j = 1 M w j ( 2 ) σ i = 1 D w j i ( 1 ) x i + w j 0 ( 1 ) + w 0 ( 2 )
where y ( x , w ) is the output of the neural network for the input vector x and weights w ; f is the activation function of the output layer; σ is the activation function of the hidden layer; w j i ( 1 ) and w j ( 2 ) are the weights of the first and second layers, respectively; D is the number of input features; and M is the number of neurons in the hidden layer.
The performance of these machine learning models can be significantly influenced by the choice of hyperparameters. Hyperparameters are parameters whose values are set prior to the commencement of the learning process. To find the optimal hyperparameters, training is performed by varying these initial values to obtain the model with the minimum mean squared error (MSE). The hyperparameter optimization method selected for all the models was Bayesian optimization. This method can improve the search speed using past performances and achieve high accuracy with fewer samples [51]. The number of iterations for each model was defined based on the training time of each model. The models with the longest training times were Gaussian process regression (GPR) and neural network regression (NNR), with 5 and 300 iterations, respectively. Support vector machine regression (SVMR) followed, with 600 iterations. The models with the shortest training times were ensemble regression (ER) and tree regression (TR), with 800 and 900 iterations, respectively.
The evaluation of the estimations was conducted using regression metrics such as the coefficient of determination R², root mean square error (RMSE), and mean absolute error (MAE). The R 2 value was used to observe the fit between the estimated curves and the ground truth. The MAE provides a straightforward measure of the average magnitude of the error. One benefit of the MAE is that it is not overly sensitive to large errors, unlike the RMSE, which assigns a higher penalty to extreme values.

3. Results

The results were obtained from the selected and sampled genotypes, including the dry biomass, wet biomass, water percentage, height plant, and SPAD measurements. A total of 59 plots and 53 genotypes were sampled. Among these genotypes, the three genotypes IRBB, BR28, and Fedearroz had two samples each in different subplots, with varying sampling values. The remaining 50 genotypes had only one sample per plot. Sampling was carried out during two stages of the crop’s growth cycle: vegetative and harvest.
Figure 4 presents the dry-weight values for the 59 rice plots. The blue bars represent measurements taken during the vegetative stage, and the red bars represent measurements taken at the time of harvest. The dry weight is given in grams per plant, with an average of 296.27 g and a standard deviation of 45.70 g during the vegetative stage, and a mean of 994.20 g and a standard deviation of 181.03 g at the time of harvest. The plots with the highest dry weights during the vegetative stage were 18, 52, 33, and 6, with weights above 360 g. The plots with the highest dry weights at the time of harvest were 50, 6, 36, and 54, with weights above 1170 g.
The fresh-weight values shown in Table 1 and Table 2 are given in grams per plant, with a mean of 982.71 g and a standard deviation of 199.91 g during the vegetative stage, and a mean of 1244.44 g and a standard deviation of 2439.60 g at the time of harvest. The plots with the highest fresh weights during the vegetative stage were 34, 6, 5, and 54, all with weights for these genotypes of over 1300 g. The plots with the highest fresh weights at the time of harvest were 50, 6, 36, and 56, with weights of 1500 g. Both stages had similar total water weights but, as a proportion of the plant, the percentage of water was much higher during the vegetative stage. At the time of harvest when the crop was fully grown, the percentage of water was much lower with a higher proportion of biomass.
Figure 5 presents the SPAD values for the 59 rice plots. The blue bars represent measurements taken during the vegetative stage, and the red bars represent measurements taken at the time of harvest. The SPAD values had a mean of 37.97 and a standard deviation of 2.94 during the vegetative stage, and a mean of 37.36 and a standard deviation of 2.93 at the time of harvest. The plots with the highest SPAD values during the vegetative stage were 18, 20, 33, and 52, whereas the plots with the highest SPAD values at the time of harvest were 20, 18, 25, and 11. The SPAD value remained stable between the vegetative and harvest stages, indicating that the crop was adequately nourished throughout its growth period.
In order to determine the genotypes with the highest yield, both the yield per area and the biomass gain were calculated. The plots with the highest yields were 50, 36, 6, and 54 with 10 , 252.75 kg/ha, 8702.85 kg/ha, 8600.76 kg/ha, and 8330.06 kg/ha, respectively. For the biomass gain, the baseline was taken as the date of sampling during the vegetative stage and compared to the final biomass at the time of harvest. The greatest biomass gain was observed in plots 50, 6, 56, and 28, with averages of 49.91 g/day, 47.27 g/day, 46.78 g/day, and 40.44 g/day, respectively.
The orthomosaics were composed of 400 plots of different genotypes, and the first step in their processing was to locate them using the same spatial reference, allowing for alignment with the same reference point. Although not all 400 genotypes were sampled, an analysis of the VIs was performed on the entire orthomosaic. Figure 6 shows the VIs with the most significant differences in their values. The alignment and orthorectification allowed for the VI statuses to be mapped for the different genotypes. Although the entire crop was rice, significant variations in the VIs were observed. These variations were not necessarily due to different biomass and nitrogen values but rather to the phenotypic expressions of the cultivated genotypes. For rice crops of the same genotype, exploring the VI maps can be useful since they can reveal anomalies in the crops. The phenotypic expression should be similar and vary only due to changes in biomass or nitrogen.
VI calculations were performed on the segmented image dataset of the 59 sampled plots. Figure 7 shows a box plot for the simple ratio (SR) index, which ranged from 0.5 and 2.5 . This allowed us to observe the behavior of the simple ratio indices of the 59 selected genotypes and the uncertainty associated with each index. It was observed that genotype 1 had high variations, which further complicated its estimation. The plots with the highest values of this index were 3, 33, and 16. The plots with the most significant variations were 4, 24, and 26.
Figure 8 presents a box plot of the SR, DVI, GNDVI, and CTVI values for the three genotypes that had two plots each. These indices were selected for visualization because their range of variation was similar. The other indices exhibited smaller variations and similar behavior. These genotypes had a range of variation since the VIs were not uniform across the entire plot. However, sampling was only performed on one plant, assuming that the rest of the plants would have similar values, given that they belonged to the same genotype under the same growing conditions.
For the genotype IRBB 66, the SR average was 1.23 for plot 4 and 1.2 for plot 24, with a standard deviation of 0.24 . The range was from 1.1 to 1.3 , with outliers between 0.8 and 0.9 for plot 4. This plot had a fresh-weight value of 700 g, a dry-weight value of 240 g, and a SPAD value of 41.68 . The range for plot 24 was from 1 to 1.2 , with a fresh-weight value of 780 g, a dry-weight value of 280 g, and a SPAD value of 41.2 . Although there was a directly proportional relationship with the SPAD values, the relationship with the biomass values was inversely proportional. It is essential to note that the variation between the VIs was low, as was the variation between the ground-truth values.
For the genotype Fedearroz 67, the SR average was 1.65 for plot 36 and 1.4 for plot 44, with a standard deviation of 1. A range of 1.2 to 2.1 was recorded, with outliers between 1 and 1.2 for plot 36. This plot had a fresh-weight value of 1000 g, a dry-weight value of 280 g, and a SPAD value of 35.66 . The range for plot 44 was from 0.9 to 1.6 , with a fresh-weight value of 660 g, a dry-weight value of 220 g, and a SPAD value of 32.64 . This genotype had a directly proportional relationship between the SPAD values and the biomass values. In addition, the variation between the VIs was significant, as was the variation between the ground-truth values.
Finally, for the genotype IR 64-21, the SR average was 1.62 for plot 13 and 1.4 for plot 31, with a standard deviation of 0.6 . The range was from 1.4 to 1.8 , with the outliers at 1.1 for plot 13. This plot had a fresh-weight value of 940 g, a dry-weight value of 300 g, and a SPAD value of 41. The range for plot 31 was from 0.9 to 1.6 , with a fresh-weight value of 900 g, a dry-weight value of 280 g, and a SPAD value of 38.02 . This genotype had an inversely proportional relationship with the SPAD values and a directly proportional relationship with the biomass values. The variation between the VIs was significant, as was the variation between the ground-truth values.
Figure 9 shows a correlation matrix for the dataset used, where W-BM is the wet biomass in grams, D-BM is the dry biomass, N is the SPAD value, and He is the average height of the plants. The VIs referenced in Table 3, the multispectral channels, and the GF segmentation values are also included. The correlation matrix revealed strong correlations among the data obtained using traditional methods. The VIs, which are mathematical operations between different reflectance values, also showed high correlations with each other. The VIs with the highest positive correlations with the ground-truth variables were the NDVI and SAVI for wet biomass, GNDVI and NDVI for dry biomass, GNDVI and SAVI for height, and NDVI and ARVI for nitrogen. The multispectral channels that correlated the most with wet biomass were REG, NIR, and GFkuts segmentation, whereas for dry biomass, the same channels were present but they also correlated with the RGB channels. Nitrogen showed fewer correlations with these channels and correlated more closely with the VIs. The ID genotype was highly correlated with the RGB channels, as well as the height. It can be observed that the height was highly correlated with the biomass variables, a valuable observation as it is an estimation parameter that does not require destructive methods.
As mentioned in Section 2, tests were conducted for five models. Each of the methods was optimized using Bayesian optimization, which can minimize a model’s confidence interval by adjusting the hyperparameters. The five models selected for training were evaluated using the four feature matrices. This process was executed using a common machine learning technique known as k-fold cross-validation, specifically with five randomly formed folds. In k-fold cross-validation, the original sample is randomly partitioned into k equal-sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k-1 subsamples are used as training data.
In this case, the folds were created from 80 % of the total 2360 rice subplots, resulting in 1888 samples. These samples were used to train the models, taking into account the variability of the different datasets. These datasets were distributed in a 70 / 30 split, where 70 % was used for training and 30 % for validation. This split helps prevent overfitting, which is a modeling error that occurs when a function is too closely aligned to a limited set of data points.
In addition, 472 samples composed of 8 rice subplots of each genotype, were randomly set aside. These were not included in the model training and were reserved for the independent evaluation of the trained models. This is a necessary step to test how well the models generalize to unseen data.
In Table 4, Table 5 and Table 6, the values of the selected metrics, R 2 , MAE, and RMSE, are presented. In general, it was found that decision tree regression, regression ensemble, and Gaussian process regression were the machine learning models that yielded the best correlations when trained on the data obtained during the experiment. Decision tree regression and regression ensemble are known for their efficiency and speed in training, whereas the Gaussian process regression, although it produces good results, is considerably slower.
The use of VIs improved the correlations for the estimations of the wet biomass, dry biomass, and nitrogen variables. The highest correlation among this set of training features was found when using the genotype ID, whereas the lowest correlation was associated with plant height. These correlations suggest that the genotype ID is a more significant predictor of crop yield.
The following figures show graphs of the estimations for the wet biomass, dry biomass, and nitrogen variables from the learning models with the highest correlation coefficients, with the estimated values in orange and the ground-truth values in blue. In Figure 10, the estimation for the first variable, wet biomass, is displayed. The results are shown for the training set of 472 images for the ensemble regression, tree regression, and Gaussian process regression estimation models. The graphs present the wet biomass of the samples (Y-axis) versus the samples obtained (X-axis) from each of the characterized images. In general, Gaussian process regression performed the best, obtaining a correlation coefficient of 0.987 . This indicates that this model handles uncertainty much better for smaller datasets.
In Figure 11, the estimations for the second variable, dry biomass, are displayed. In general, better performance was observed for Gaussian process regression, which obtained a correlation coefficient of 0.998 . This indicates that this model handles uncertainty much better for smaller datasets.
In Figure 12, the estimations for the third variable, SPAD, are displayed. Better performance was observed for Gaussian process regression, which obtained a correlation coefficient of 0.999 . This indicates that this model handles uncertainty much better for smaller datasets.

4. Discussion

The comparison of the GPR, SVMR, TR, ER, and NNR techniques provides valuable insights into the potential benefits and drawbacks of each method in the context of estimating the wet biomass, dry biomass, and nitrogen content in rice plots.
One notable aspect of this comparison is the emphasis on the probabilistic nature of GPR. In many practical applications, having a measure of uncertainty about the predictions can be just as important as the predictions themselves. Especially in agricultural applications, this measure of uncertainty can support decision-making tasks in the context of risk. For instance, knowing the uncertainty around the estimated nitrogen content in a rice plot can inform decisions about fertilization that balance potential yield improvements against the risk of over-fertilization and its environmental repercussions.
However, the computational cost of GPR, particularly for large datasets, is a significant drawback. As technological advances facilitate the collection of more and more data through increasingly sophisticated remote sensing technologies, the scalability of models becomes a critical concern. In this respect, the TR and ER methods may have an advantage.
The ability of ER and TR to handle complex, nonlinear relationships using decision rules is a significant advantage, considering the complex interactions among environmental factors, plant physiology, and the resulting variables to be estimated (biomass, nitrogen content). However, the careful selection of the kernel function and fine-tuning of hyperparameters can be potential drawbacks, as these require additional computational resources and domain expertise.
Ensemble techniques, which combine multiple base regression models, can potentially offer robustness and improved predictive performance, which is a significant advantage. Additionally, in this study, the estimation models were calibrated and tested by including both time-independent imagery samples and time-dependent vegetation index dynamics throughout each phenological cycle, enabling the characterization of spatio-temporal variations in above-ground biomass and leaf nitrogen.
However, these methods can also pose computational challenges, as multiple base models need to be trained and combined. Additionally, the interpretability of these models can be lost or reduced, which could be a disadvantage in scenarios where understanding the model’s decision process is crucial.
Selecting the most suitable method depends on factors such as the size of the dataset, the complexity of the relationship between the VIs and target variables, and the importance of uncertainty quantification in the predictions. It is also important to consider the computational resources available, the level of domain expertise for model tuning and interpretation, and the specific decision-making context in which the model’s predictions will be used.
It is worth noting that these methods are not mutually exclusive and could potentially be combined in a hybrid approach. For example, one could use a TR or an ER model to handle the main predictive task and a GPR model to provide uncertainty estimates.
Finally, the choice of the model should not only be guided by theoretical considerations but also validated through rigorous empirical testing. Cross-validation and performance metrics are indeed crucial tools for assessing and comparing the predictive performance of the models. Furthermore, whenever possible, models should be evaluated not only on their predictive performance but also on their practical impact when employed for decision making in the field.

5. Conclusions

The orthomosaic application is a useful tool for working with UAV images, as it allows for the integration of images captured by the UAVs in a flight plan, merging them into a single image. For this application to work, the images must have an area of coincidence greater than 40 % . This tool enables a general mapping of the crop to be obtained in the form of VIs, classifying the vegetation on the ground by reflectance levels in order to estimate plant health and nutrient distribution patterns. In this study, the variation observed within each genotype reflects the heterogeneous nature of agricultural fields. This emphasizes the importance of using robust statistical methods in analyzing remote sensing data for biomass and nitrogen content estimation.
Multispectral images are useful for estimating wet biomass, dry biomass, and SPAD. Through the use of VIs, characteristics can be identified that closely correlate with the results from in-the-field sampling, offering researchers a non-invasive alternative for measuring this parameter and potentially eliminating the use of destructive sampling methods. The correlation between the VIs and the parameters measured in the field will vary according to the genotype and its phenological expression. In this study, the correlation matrix for the dataset revealed strong relationships between the data gathered conventionally in the field and several VIs. In addition, the VIs that correlated the most with the ground-truth variables were NDVI and SAVI for wet biomass, GNDVI and NDVI for dry biomass, GNDVI and SAVI for height, and NDVI and ARVI for nitrogen.
Moreover, the multispectral channels, specifically REG, NIR, and GFkuts segmentation, showed significant correlations with both wet and dry biomass, with an additional correlation observed with the RGB channels for dry biomass. Nitrogen content exhibited a weaker correlation with these channels, instead presenting a stronger correlation within the VIs. The ID genotype was highly correlated with the RGB channels and height, indicating a relationship between genetic identity, coloration, and growth. These correlations highlight the potential of remote sensing data in estimating key parameters such as biomass and nitrogen content in rice plots and emphasize the value of non-destructive parameters such as height for these estimation tasks.

Author Contributions

Conceptualization, J.D.C., D.P. and A.F.D.; methodology, A.F.D., O.D.P., E.P., N.A. and N.E.; software, A.F.D.; validation, A.F.D., J.D.C., I.F.M. and O.D.P.; formal analysis, A.F.D., D.P., J.D.C., M.C.R. and E.P.; investigation, J.D.C., D.P., A.F.D., A.J.-B., M.C.R. and I.F.M.; resources, A.J.-B., J.D.C., N.A., N.E. and O.D.P.; data curation, A.F.D., D.P., D.M. and J.D.C.; writing—original draft preparation, A.F.D.; writing—review and editing, D.P., J.D.C., D.M. and A.J.-B.; supervision, J.D.C. and D.P.; project administration, J.D.C.; funding acquisition, A.J.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the OMICAS program: “Optimización Multiescala In-silico de Cultivos Agrícolas Sostenibles (Infraestructura y validación en Arroz y Caña de Azúcar)”, anchored at the Pontificia Universidad Javeriana in Cali and funded within the Colombian Scientific Ecosystem by The World Bank; the Colombian Ministry of Science; Technology, and Innovation; the Colombian Ministry of Education; the Colombian Ministry of Industry and Tourism; and ICETEX under grant ID FP44842-217-2018 and OMICAS Award ID 792-61187.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to patent in progress.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Castro-Llanos, F.; Hyman, G.; Rubiano, J.; Ramirez-Villegas, J.; Achicanoy, H. Climate change favors rice production at higher elevations in Colombia. Mitig. Adapt. Strateg. Glob. Chang. 2019, 24, 1401–1430. [Google Scholar] [CrossRef]
  2. Hoyos, V.; Plaza, G.; Caicedo, A.L. Characterization of the phenotypic variability in Colombian weedy rice (Oryza spp.). Weed Sci. 2019, 67, 441–452. [Google Scholar] [CrossRef]
  3. Arango-Londoño, D.; Ramírez-Villegas, J.; Barrios-Pérez, C.; Bonilla-Findji, O.; Jarvis, A.; Uribe, J.M. Closing yield gaps in colombian direct seeding rice systems: A stochastic frontier analysis. Agron. Colomb. 2020, 38, 101–110. [Google Scholar] [CrossRef]
  4. Yagioka, A.; Hayashi, S.; Kimiwada, K.; Kondo, M. Kitagenki, a high-yielding rice variety, exhibits a high yield potential under optimum crop management practices. Eur. J. Agron. 2022, 140, 126606. [Google Scholar] [CrossRef]
  5. Nguyen, L.T.; Nanseki, T.; Ogawa, S.; Chomei, Y. Determination of Paddy Rice Yield in the Context of Farmers’ Adoption of Multiple Technologies in Colombia. Int. J. Plant Prod. 2022, 16, 93–104. [Google Scholar] [CrossRef]
  6. Orjuela-Garzon, W.; Quintero, S.; Giraldo, D.P.; Lotero, L.; Nieto-Londoño, C. A theoretical framework for analysing technology transfer processes using agent-based modelling: A case study on massive technology adoption (AMTEC) program on rice production. Sustainability 2021, 13, 1143. [Google Scholar] [CrossRef]
  7. Jaramillo-Botero, A.; Colorado, J.; Quimbaya, M.; Rebolledo, M.C.; Lorieux, M.; Ghneim-Herrera, T.; Arango, C.A.; Tobón, L.E.; Finke, J.; Rocha, C.; et al. The ÓMICAS alliance, an international research program on multi-omics for crop breeding optimization. Front. Plant Sci. 2022, 13, 992663. [Google Scholar] [CrossRef]
  8. Wan, L.; Zhang, J.; Dong, X.; Du, X.; Zhu, J.; Sun, D.; Liu, Y.; He, Y.; Cen, H. Unmanned aerial vehicle-based field phenotyping of crop biomass using growth traits retrieved from PROSAIL model. Comput. Electron. Agric. 2021, 187, 106304. [Google Scholar] [CrossRef]
  9. Wang, J.J.; Li, Z.; Jin, X.; Liang, G.; Struik, P.C.; Gu, J.; Zhou, Y. Phenotyping flag leaf nitrogen content in rice using a three-band spectral index. Comput. Electron. Agric. 2019, 162, 475–481. [Google Scholar] [CrossRef]
  10. Takai, T.; Lumanglas, P.; Simon, E.V.; Arai-Sanoh, Y.; Asai, H.; Kobayashi, N. Identifying key traits in high-yielding rice cultivars for adaptability to both temperate and tropical environments. Crop J. 2019, 7, 685–693. [Google Scholar] [CrossRef]
  11. Yamashita, M.; Ootsuka, C.; Kubota, H.; Adachi, S.; Yamaguchi, T.; Murata, K.; Yamamoto, T.; Ueda, T.; Ookawa, T.; Hirasawa, T. Alleles of high-yielding indica rice that improve root hydraulic conductance also increase flag leaf photosynthesis, biomass, and grain production of japonica rice in the paddy field. Field Crop. Res. 2022, 289, 108725. [Google Scholar] [CrossRef]
  12. Barrero, O.; Ouazaa, S.; Jaramillo-Barrios, C.I.; Quevedo, M.; Chaali, N.; Jaramillo, S.; Beltran, I.; Montenegro, O. Rice Yield Prediction Using On-Farm Data Sets and Machine Learning. In Advances in Smart Technologies Applications and Case Studies; El Moussati, A., Kpalma, K., Ghaouth Belkasmi, M., Saber, M., Guégan, S., Eds.; Springer International Publishing: Cham, Swizerland, 2020; pp. 422–430. [Google Scholar]
  13. Xu, T.; Wang, F.; Xie, L.; Yao, X.; Zheng, J.; Li, J.; Chen, S. Integrating the Textural and Spectral Information of UAV Hyperspectral Images for the Improved Estimation of Rice Aboveground Biomass. Remote Sens. 2022, 14, 2534. [Google Scholar] [CrossRef]
  14. Ge, H.; Ma, F.; Li, Z.; Du, C. Grain Yield Estimation in Rice Breeding Using Phenological Data and Vegetation Indices Derived from UAV Images. Agronomy 2021, 11, 2439. [Google Scholar] [CrossRef]
  15. Colorado, J.D.; Calderon, F.; Mendez, D.; Petro, E.; Rojas, J.P.; Correa, E.S.; Mondragon, I.F.; Rebolledo, M.C.; Jaramillo-Botero, A. A novel NIR-image segmentation method for the precise estimation of above-ground biomass in rice crops. PLoS ONE 2020, 15, e0239591. [Google Scholar] [CrossRef]
  16. Wang, Y.; Zhang, K.; Tang, C.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; Liu, X. Estimation of rice growth parameters based on linear mixed-effect model using multispectral images from fixed-wing unmanned aerial vehicles. Remote Sens. 2019, 11, 1371. [Google Scholar] [CrossRef] [Green Version]
  17. Mia, S.; Tanabe, R.; Habibi, L.N.; Hashimoto, N.; Homma, K.; Maki, M.; Matsui, T.; Tanaka, T.S.T. Multimodal Deep Learning for Rice Yield Prediction Using UAV-Based Multispectral Imagery and Weather Data. Remote Sens. 2023, 15, 2511. [Google Scholar] [CrossRef]
  18. Yu, F.; Bai, J.; Jin, Z.; Zhang, H.; Guo, Z.; Chen, C. Research on Precise Fertilization Method of Rice Tillering Stage Based on UAV Hyperspectral Remote Sensing Prescription Map. Agronomy 2022, 12, 2893. [Google Scholar] [CrossRef]
  19. Xu, S.; Xu, X.; Blacker, C.; Gaulton, R.; Zhu, Q.; Yang, M.; Yang, G.; Zhang, J.; Yang, Y.; Yang, M.; et al. Estimation of Leaf Nitrogen Content in Rice Using Vegetation Indices and Feature Variable Optimization with Information Fusion of Multiple-Sensor Images from UAV. Remote Sens. 2023, 15, 854. [Google Scholar] [CrossRef]
  20. Wang, L.; Wei, Y. Revised normalized difference nitrogen index (NDNI) for estimating canopy nitrogen concentration in wetlands. Optik 2016, 127, 7676–7688. [Google Scholar] [CrossRef]
  21. Colorado, J.D.; Cera-Bornacelli, N.; Caldas, J.S.; Petro, E.; Rebolledo, M.C.; Cuellar, D.; Calderon, F.; Mondragon, I.F.; Jaramillo-Botero, A. Estimation of nitrogen in rice crops from UAV-captured images. Remote Sens. 2020, 12, 3396. [Google Scholar] [CrossRef]
  22. Fabianto, L.; Hardhienata, M.K.D.; Priandana, K. Multi-UAV coordination for crop field surveillance and fertilization. In Proceedings of the 2020 International Conference on Computer Science and Its Application in Agriculture, ICOSICA 2020, Bogor, Indonesia, 16–17 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
  23. Su, D.; Yao, W.; Yu, F.; Liu, Y.; Zheng, Z.; Wang, Y.; Xu, T.; Chen, C. Single-Neuron PID UAV Variable Fertilizer Application Control System Based on a Weighted Coefficient Learning Correction. Agriculture 2022, 12, 1019. [Google Scholar] [CrossRef]
  24. Panday, U.S.; Pratihast, A.K.; Aryal, J. A Review on Drone-Based Data Solutions for Cereal Crops. Drones 2020, 4, 41. [Google Scholar] [CrossRef]
  25. Qiu, Z.; Ma, F.; Li, Z.; Xu, X.; Ge, H.; Du, C. Estimation of nitrogen nutrition index in rice from UAV RGB images coupled with machine learning algorithms. Comput. Electron. Agric. 2021, 189, 106421. [Google Scholar] [CrossRef]
  26. United States Department of Agriculture, Foreign Agricultural Services. World Rice Production, Consumption and Stocks. RCS-20I. 15 September 2020; USDA, Economic Research Service. Available online: https://www.ers.usda.gov/ (accessed on 30 May 2023).
  27. Rossi, M.; Candiani, G.; Nutini, F.; Gianinetto, M.; Rossi, M.; Candiani, G.; Nutini, F.; Gianinetto, M. Sentinel-2 estimation of CNC and LAI in rice cropping system through hybrid approach modelling approach modelling. Eur. J. Remote Sens. 2022, 1–20. [Google Scholar] [CrossRef]
  28. Longfei, Z.; Ran, M.; Xing, Y.; Yigui, L.; Zehua, H.; Zhengang, L.; Binyuan, X.; Guodong, Y.; Shaobing, P.; Le, X. ScienceDirect Improved Yield Prediction of Ratoon Rice Using Unmanned Aerial Vehicle-Based Multi-Temporal Feature Method. Rice Sci. 2023, 30, 247–256. [Google Scholar] [CrossRef]
  29. Derraz, R.; Melissa Muharam, F.; Nurulhuda, K.; Ahmad Jaafar, N.; Keng Yap, N. Ensemble and single algorithm models to handle multicollinearity of UAV vegetation indices for predicting rice biomass. Comput. Electron. Agric. 2023, 205, 107621. [Google Scholar] [CrossRef]
  30. Shahi, T.B.; Xu, C.Y.; Neupane, A.; Guo, W. Machine learning methods for precision agriculture with UAV imagery: A review. Electron. Res. Arch. 2022, 30, 4277–4317. [Google Scholar] [CrossRef]
  31. Ma, F. Mapping Nitrogen Status in Rice Crops Using Unmanned Aerial Vehicle (uav) Data, Multivariate Methods and Machine Learning Algorithms. Ph.D. Thesis, University of Twente, Enschede, The Netherlands, 2020. [Google Scholar]
  32. Li, H.; Dong, W.; Li, Z.; Cao, X.; Tan, S.; Qi, L.; Chen, X.; Xiao, R.; Gong, H.; Wang, X.; et al. Smartphone application-based measurements of stem-base width and plant height in rice seedling. Comput. Electron. Agric. 2022, 198, 107022. [Google Scholar] [CrossRef]
  33. Kawamura, K.; Asai, H.; Yasuda, T.; Khanthavong, P.; Soisouvanh, P.; Phongchanmixay, S. Field phenotyping of plant height in an upland rice field in Laos using low-cost small unmanned aerial vehicles (UAVs). Plant Prod. Sci. 2020, 23, 452–465. [Google Scholar] [CrossRef]
  34. Shibaeva, T.G.; Mamaev, A.V.; Sherudilo, E.G. Evaluation of a SPAD-502 Plus Chlorophyll Meter to Estimate Chlorophyll Content in Leaves with Interveinal Chlorosis. Russ. J. Plant Physiol. 2020, 67, 690–696. [Google Scholar] [CrossRef]
  35. Devia, C.A.; Rojas, J.P.; Petro, E.; Martinez, C.; Mondragon, I.F.; Patino, D.; Rebolledo, M.C.; Colorado, J. High-Throughput Biomass Estimation in Rice Crops Using UAV Multispectral Imagery. J. Intell. Robot. Syst. Theory Appl. 2019, 96, 573–589. [Google Scholar] [CrossRef]
  36. Jensen, J. Introductory Digital Image Processing: A Remote Sensing Perspective; Prentice Hall series in geographic information science; Prentice Hall: Hoboken, NJ, USA, 2005. [Google Scholar]
  37. Hatfield, J.L.; Prueger, J.H.; Sauer, T.J.; Dold, C.; Brien, P.O.; Wacha, K. Applications of Vegetative Indices from Remote Sensing to Agriculture: Past and Future. Inventions 2019, 4, 71. [Google Scholar] [CrossRef] [Green Version]
  38. Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef] [Green Version]
  39. Jiang, R.; Sanchez-azofeifa, A.; Laakso, K.; Wang, P.; Xu, Y. UAV-based partially sampling system for rapid NDVI mapping in the evaluation of rice nitrogen use ef fi ciency. J. Clean. Prod. 2021, 289, 125705. [Google Scholar] [CrossRef]
  40. Zheng, H.; Cheng, T. Evaluation of RGB, Color-Infrared and Multispectral Images Acquired from Unmanned Aerial Systems for the Estimation of Nitrogen Accumulation in Rice. Remote Sens. 2018, 10, 824. [Google Scholar] [CrossRef] [Green Version]
  41. Huete, A. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
  42. Naito, H.; Ogawa, S.; Orlando, M.; Mohri, H.; Urano, Y.; Hosoi, F.; Shimizu, Y.; Lucia, A.; Ishitani, M.; Gomez, M.; et al. ISPRS Journal of Photogrammetry and Remote Sensing Estimating rice yield related traits and quantitative trait loci analysis under different nitrogen treatments using a simple tower-based field phenotyping system with modified single-lens reflex cameras. ISPRS J. Photogramm. Remote. Sens. 2017, 125, 50–62. [Google Scholar] [CrossRef]
  43. Cen, H.; Wan, L.; Zhu, J.; Li, Y.; Li, X.; Zhu, Y.; Weng, H.; Wu, W.; Yin, W.; Xu, C.; et al. Dynamic monitoring of biomass of rice under different nitrogen treatments using a lightweight UAV with dual image-frame snapshot cameras. Plant Methods 2019, 15, 1–16. [Google Scholar] [CrossRef] [PubMed]
  44. Patel, M.K.; Ryu, D.; Western, A.W.; Suter, H.; Young, I.M. Which multispectral indices robustly measure canopy nitrogen across seasons: Lessons from an irrigated pasture crop. Comput. Electron. Agric. 2021, 182, 106000. [Google Scholar] [CrossRef]
  45. Ahmad, N.; Ullah, S.; Zhao, N.; Mumtaz, F.; Ali, A.; Ali, A.; Tariq, A.; Kareem, M.; Imran, A.B.; Khan, I.A.; et al. Comparative Analysis of Remote Sensing and Geo-Statistical Techniques to Quantify Forest Biomass. Forests 2023, 14, 379. [Google Scholar] [CrossRef]
  46. Razi, M.A.; Athappilly, K. A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree ( CART ) models. Expert Syst. Appl. 2005, 29, 65–74. [Google Scholar] [CrossRef]
  47. Breiman, L.E.O. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  48. Rasmussen, C.E.; Williams, C.K.I.; Processes, G.; Press, M.I.T.; Jordan, M.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
  49. Smola, A.J.; Olkopf, B.S.C.H. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
  50. Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
  51. Wu, J.; Hao, X.-Y.; Zhang, H.; Xiong, L.-D.; Lei, H.; Deng, S.-H. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar] [CrossRef]
Figure 1. Methodology for canopy nitrogen and biomass estimation in large genotype rice plots using UAV multispectral images.
Figure 1. Methodology for canopy nitrogen and biomass estimation in large genotype rice plots using UAV multispectral images.
Sensors 23 05917 g001
Figure 2. Non-invasive sampling method using multispectral images.
Figure 2. Non-invasive sampling method using multispectral images.
Sensors 23 05917 g002
Figure 3. Preprocessing of the 98 images captured across the red, green, blue, red-edge, and near-infrared channels. (a) Orthomosaic of the different channels (red, green, blue, red-edge, and near-infrared, (b) Orthomosaic red-green-blue. (c) Orthomosaic red, red-edge, near-infrared.
Figure 3. Preprocessing of the 98 images captured across the red, green, blue, red-edge, and near-infrared channels. (a) Orthomosaic of the different channels (red, green, blue, red-edge, and near-infrared, (b) Orthomosaic red-green-blue. (c) Orthomosaic red, red-edge, near-infrared.
Sensors 23 05917 g003
Figure 4. Total dry weight of each plot during the vegetative stage and at harvest time.
Figure 4. Total dry weight of each plot during the vegetative stage and at harvest time.
Sensors 23 05917 g004
Figure 5. SPAD values of each plot at different times.
Figure 5. SPAD values of each plot at different times.
Sensors 23 05917 g005
Figure 6. Distribution of the vegetative indices in the total rice crop. (a) Heatmap of the SR index. (b) Heatmap of the NDVI index. (c) Heatmap of the TVI index.
Figure 6. Distribution of the vegetative indices in the total rice crop. (a) Heatmap of the SR index. (b) Heatmap of the NDVI index. (c) Heatmap of the TVI index.
Sensors 23 05917 g006
Figure 7. Box plot distribution for the SR vegetative index. Outliers are presented in the red values.
Figure 7. Box plot distribution for the SR vegetative index. Outliers are presented in the red values.
Sensors 23 05917 g007
Figure 8. Box plots for the plot samples by genotype. (a) Box plot for genotype IRBB 66. (b) Box plot for genotype Fedearroz 67. (c) Box plot for genotype IR 64-21.
Figure 8. Box plots for the plot samples by genotype. (a) Box plot for genotype IRBB 66. (b) Box plot for genotype Fedearroz 67. (c) Box plot for genotype IR 64-21.
Sensors 23 05917 g008
Figure 9. Correlation matrix of the selected features.
Figure 9. Correlation matrix of the selected features.
Sensors 23 05917 g009
Figure 10. Models’ fresh-weight estimations. (a) Ensemble regression. (b) Tree regression. (c) Gaussian process regression.
Figure 10. Models’ fresh-weight estimations. (a) Ensemble regression. (b) Tree regression. (c) Gaussian process regression.
Sensors 23 05917 g010
Figure 11. Models’ dry-weight estimations. (a) Ensemble regression. (b) Tree regression. (c) Gaussian process regression.
Figure 11. Models’ dry-weight estimations. (a) Ensemble regression. (b) Tree regression. (c) Gaussian process regression.
Sensors 23 05917 g011
Figure 12. Models’ SPAD estimations. (a) Ensemble regression. (b) Tree regression. (c) Gaussian process regression.
Figure 12. Models’ SPAD estimations. (a) Ensemble regression. (b) Tree regression. (c) Gaussian process regression.
Sensors 23 05917 g012
Table 1. Average values of measured parameters in the field for rice plots 1–30.
Table 1. Average values of measured parameters in the field for rice plots 1–30.
Parameter Values
Plot ID123456789101112131415161718192021222324252627282930
Fresh weight (g)9401000102070013801380114086088084011808409401060700820980126011201260120011608407801060740780600860840
Dry Weight (g)260280280240360380320240260280320260300280220260300400320360340320260280340240280200280280
Water percentage72.372.072.565.773.972.571.972.170.566.772.969.068.173.668.668.369.468.371.471.471.772.469.064.167.967.664.166.767.466.7
SPAD40.4837.7236.4241.6840.7441.2439.5838.6640.240.0242.2635.064138.6434.8636.436.0443.4838.6244.7640.9438.1433.541.242.3641.1435.1439.2838.0237
Height84105.296.287117.2145.811196.8106109.4104.412299.490.8104.695.898.2113.698.8116117.297.295.890115.296.8129.6104.4100.8122.6
Table 2. Average values of measured parameters in the field for rice plots 31–59.
Table 2. Average values of measured parameters in the field for rice plots 31–59.
Parameter Value
Plot ID3132333435363738394041424344454647484950515253545556575859
Fresh weight (g)9001180980144011801000760980104070076096076066094010009801240122076010601200920132090011809407801080
Dry Weight (g)280340380300320280260300340240240280260220320340280360360260300380300360280340300240280
Water percentage68.971.261.279.272.972.065.869.467.365.768.470.865.866.766.066.071.471.070.565.871.768.367.472.768.971.268.169.274.1
SPAD38.0238.4441.937.323735.6634.6432.735.1637.3641.3237.9439.2832.6433.334.4237.8437.5234.935.6436.138.1437.4237.5840.0838.7234.0430.5839.88
Height106.6116.2113.8118.8131.4111.894.4109.2131.410099.293.610697.6106.8152.4108109.2121.4108118131119.6133.6112129.2103.48891
Table 3. Vegetation indices used in the experiment.
Table 3. Vegetation indices used in the experiment.
Vegetation IndexFormulaApplication
Difference Vegetation
Index (DVI)
N I R R E D This index distinguishes between soil and vegetation but does
not take into account the difference between the reflectance and
radiance caused by atmospheric effects or shadows [38].
Normalized Difference
Vegetation Index (NDVI)
N I R R E D N I R + R E D NDVI is a widely used vegetation index that measures
the difference between the near-infrared (NIR) and red light reflected
by vegetation. Healthy vegetation typically reflects more NIR light and
less red light so a high NDVI value indicates a high level of vegetation
density and productivity [39].
Green Normalized Difference
Vegetation Index (GNDVI)
N I R G R E E N N I R + G R E E N GNDVI is primarily used to estimate vegetation biomass and monitor
vegetation health [40].
Soil-Adjusted Vegetation
Index (SAVI)
N I R R E D N I R + R E D + L · ( 1 + L ) SAVI is widely used to estimate vegetation biomass and monitor
vegetation health, especially in areas with high soil background noise [41].
Modified Soil-Adjusted
Vegetation Index (MSAVI)
2 N I R 2 + 0.5 ( 2 N I R + 1 ) 2 8 ( N I R R E D ) 2 MSAVI is widely used to estimate vegetation biomass and monitor
vegetation health in a variety of environmental conditions [35].
Corrected Transformed
Vegetation Index (CTVI)
N D V I + 0.5 | N D V I + 0.5 | · | N D V I + 0.5 | CTVI is primarily used to monitor vegetation health and stress [42].
Simple Ratio SR N I R R E D SR is widely used to estimate vegetation biomass and monitor
vegetation health [43].
Transformed
Vegetation Index (TVI)
| N D V I + 0.5 | TVI is used to monitor vegetation health and stress, and is also
sensitive to changes in vegetation structure and composition [35].
Enhanced
Vegetation Index (EVI)
G ( N I R R E D ) ( N I R + C 1 R E D C 2 B L U E + L ) EVI is used to monitor rice growth and canopy biomass [44].
Atmospherically Resistant
Vegetation Index (ARVI)
A R V I = N I R ( B L U E γ ( R E D B L U E ) ) N I R + ( B L U E γ ( R E D B L U E ) ) ARVI is widely used to estimate vegetation biomass and monitor
sensitive changes in vegetation with atmospheric correction [45].
Table 4. Evaluation metrics for the implemented models for dry biomass estimation.
Table 4. Evaluation metrics for the implemented models for dry biomass estimation.
ModelFS1FS2FS3FS4
R2MAERMSER2MAERMSER2MAERMSER2MAERMSE
GPR0.9980.03050.1190.970.05620.1890.950.1150.3450.930.2140.759
ER0.9621.50941.6680.9621.60641.3910.8329.76981.0120.9339.30354.028
TR0.9414.29349.8290.915.31961.4890.8329.76981.0120.7744.0294.11
NNR0.48114.4140.60.6295.141250.54102.51340.6590.54110
SVMR0.42125.2158.60.58102.65118.60.53109.41400.697.86122
Table 5. Evaluation metrics for the implemented models for wet biomass estimation.
Table 5. Evaluation metrics for the implemented models for wet biomass estimation.
ModelFS1FS2FS3FS4
R2MAERMSER2MAERMSER2MAERMSER2MAERMSE
GPR0.9870.591122.41420.960.6532.6810.953.46247.4560.923.5899.1563
ER0.8811.29515.4810.8513.14618.1580.8519.6423.2310.720.11425.099
TR0.950.44325.72460.8310.14620.45620.99.546213.5620.7519.45621.025
NNR0.7524.05628.2450.6824.25629.2560.5923.25131.2560.4327.15633.254
SVMR0.6421.05530.0250.4228.74535.2540.4326.15234.5210.3130.37639.272
Table 6. Evaluation metrics for the implemented models for nitrogen estimation.
Table 6. Evaluation metrics for the implemented models for nitrogen estimation.
ModelFS1FS2FS3FS4
R2MAERMSER2MAERMSER2MAERMSER2MAERMSE
GPR0.9990.03050.1190.9980.05620.1890.950.1150.3450.930.2140.759
ER0.970.1940.4820.980.1970.4540.910.4520.9560.920.2410.901
TR0.970.2820.5310.970.1980.5230.90.4580.8980.90.5260.918
NNR0.770.7931.0820.790.6421.3250.610.8891.1850.621.1011.004
SVMR0.650.8611.1540.680.7851.5520.541.0011.2010.581.1651.198
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Duque, A.F.; Patino, D.; Colorado, J.D.; Petro, E.; Rebolledo, M.C.; Mondragon, I.F.; Espinosa, N.; Amezquita, N.; Puentes, O.D.; Mendez, D.; et al. Characterization of Rice Yield Based on Biomass and SPAD-Based Leaf Nitrogen for Large Genotype Plots. Sensors 2023, 23, 5917. https://doi.org/10.3390/s23135917

AMA Style

Duque AF, Patino D, Colorado JD, Petro E, Rebolledo MC, Mondragon IF, Espinosa N, Amezquita N, Puentes OD, Mendez D, et al. Characterization of Rice Yield Based on Biomass and SPAD-Based Leaf Nitrogen for Large Genotype Plots. Sensors. 2023; 23(13):5917. https://doi.org/10.3390/s23135917

Chicago/Turabian Style

Duque, Andres F., Diego Patino, Julian D. Colorado, Eliel Petro, Maria C. Rebolledo, Ivan F. Mondragon, Natalia Espinosa, Nelson Amezquita, Oscar D. Puentes, Diego Mendez, and et al. 2023. "Characterization of Rice Yield Based on Biomass and SPAD-Based Leaf Nitrogen for Large Genotype Plots" Sensors 23, no. 13: 5917. https://doi.org/10.3390/s23135917

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop