Headland Identification and Ranging Method for Autonomous Agricultural Machines

Liu, Hui; Li, Kun; Ma, Luyao; Meng, Zhijun

doi:10.3390/agriculture14020243

Open AccessArticle

Headland Identification and Ranging Method for Autonomous Agricultural Machines

¹

Information Engineering College, Capital Normal University, Beijing 100048, China

²

National Research Center of Intelligent Equipment for Agriculture, Beijing 100097, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(2), 243; https://doi.org/10.3390/agriculture14020243

Submission received: 22 December 2023 / Revised: 23 January 2024 / Accepted: 29 January 2024 / Published: 1 February 2024

(This article belongs to the Special Issue Agricultural Collaborative Robots for Smart Farming)

Download

Browse Figures

Versions Notes

Abstract

:

Headland boundary identification and ranging are the key supporting technologies for the automatic driving of intelligent agricultural machinery, and they are also the basis for controlling operational behaviors such as autonomous turning and machine lifting. The complex, unstructured environments of farmland headlands render traditional image feature extraction methods less accurate and adaptable. This study utilizes deep learning and binocular vision technologies to develop a headland boundary identification and ranging system built upon the existing automatic guided tractor test platform. A headland image annotation dataset was constructed, and the MobileNetV3 network, notable for its compact model structure, was employed to achieve binary classification recognition of farmland and headland images. An improved MV3-DeeplabV3+ image segmentation network model, leveraging an attention mechanism, was constructed, achieving a high mean intersection over union (MIoU) value of 92.08% and enabling fast and accurate detection of headland boundaries. Following the detection of headland boundaries, binocular stereo vision technology was employed to measure the boundary distances. Field experiment results indicate that the system’s average relative errors of distance in ranging at distances of 25 m, 20 m, and 15 m are 6.72%, 4.80%, and 4.35%, respectively. This system is capable of meeting the real-time detection requirements for headland boundaries.

Keywords:

autonomous agricultural machinery; headland; image recognition; deep learning; binocular vision

1. Introduction

As the application of artificial intelligence technology becomes more prevalent, autonomous agricultural machinery has emerged as a significant trend in the advancement of smart agriculture [1]. One of the key technologies for autonomous agricultural machinery systems is sensor-based environmental perception and recognition [2], such as detecting obstacles in farmland. In fact, compared to obstacle avoidance scenarios, the farmland headland environment in normal field operations of agricultural machinery is more common [3]. The headland serves as the edge area of farmland, offering space for agricultural machinery turning during field operations [4]. The headland boundary between the farmland and the adjacent headland is approximately linear in a small range [5]. The currently popularized automatic navigation technology for agricultural machinery necessitates manual intervention by the driver to perform complex tasks such as turning at the headland and lifting implements upon approaching the headland area [6]. For intelligent agricultural equipment, achieving full autonomous driving requires environmental perception of the farmland headland [7]. Achieving this objective requires first automatically identifying the farmland headland area and detecting the boundary, then measuring the distance to the boundary in real time, and ultimately guiding the agricultural machinery to execute steering operations at a reasonable distance from the headland boundary.

Image recognition is a significant field in artificial intelligence, with its key step being the extraction of features from images [8]. Some research employs manual feature extraction methods, focusing on local features such as color, texture, and shape to recognize headland images. Qiao Yujie et al. have employed a robust regression method based on machine vision technology to detect and fit the headland boundary for a typical grayscale mutation type headland image during the sowing period [9]. Hong Zijia et al. have extracted point clouds using binocular stereo vision technology, leveraging the height discrepancy between the headland ridge and the plot for detecting a headland boundary. This method has been validated to recognize and measure distances to ridges 5–10 m ahead with a significant height difference [10]. Guomin Gao et al. have extracted the headland boundary line by employing a combination of hue, saturation, and value-fixed threshold segmentation method and random sample consensus algorithm [11]. These studies have yielded positive recognition results in particular headland scenarios. In actual production, the farmland headland is an unstructured environment characterized by complexity, diverse types, and indistinct boundary, including vegetated, bare soil, and cement facility headlands. Nevertheless, traditional machine vision methods exhibit poor robustness and real-time performance in natural scenes, rendering them inadequate for the complex and diverse scenarios encountered at the headland.

In recent years, the application of deep learning technology within agriculture has advanced swiftly, yielding novel research progress in areas such as field road detection [12], obstacle detection [13], and agricultural machinery path recognition [14]. Several studies employ deep learning models for the automatic extraction of features from headland images. He Yong et al. proposed a MobileV2-UNet model for segmenting farmland and nonfarmland areas in paddy fields and utilized the Random Sample Consensus (RANSAC) algorithm to detect boundary lines [15]. Li Dongfang et al. developed a deep learning-based network that combines convolutional neural network (CNN) and recurrent neural network (RNN) architectures for segmenting paddy field headlands [16]. Compared to paddy fields, the headland environment in dry fields exhibits greater complexity and diversity, thereby increasing the challenge of headland image recognition. Qiao Yujie et al. constructed a dataset containing six categories of annotated cropland headland images and employed the compact network MobileNetV2 for headland image recognition [17].

The decision-making process in intelligent agricultural machinery for determining proximity to the headland may appear straightforward, yet it encompasses three interconnected steps: recognition of the headland environment, detection of the headland boundary, and measurement of the distance to the headland boundary. Each step must be executed in real time, with the accuracy and speed of processing having a direct impact on the system’s overall performance. Many prior studies have provided only partial solutions to the challenge of headland detection.

This study proposes a comprehensive solution for automatic recognition of headland images and boundary distance measurement in autonomous agricultural machinery. It presents a unique integration and significant enhancement of deep learning and binocular vision technologies for the specific application in autonomous agricultural machinery. A specialized image dataset, encompassing a wide range of farmland and headland scenes, was constructed to enhance the robustness and scalability of headland recognition in unstructured natural environments. In addressing the processing limitations of onboard terminals in autonomous agricultural machinery, our study has implemented two key optimizations to improve system performance without sacrificing accuracy. (1) Our study forgoes intricate image recognition of complex and diverse headlands. Instead, it employs image binary classification and binary region segmentation, which not only fulfills application requirements but also enhances the real-time performance of the system. (2) For segmenting headland images, a cropped version of MobileNetV3-Large is used as the Deeplab V3+ model backbone network for feature extraction, and modifications are made to the atrous spatial pyramid pooling (ASPP) module in the encoder, further reducing the parameters of the semantic segmentation model and increasing the speed of model image segmentation. Finally, we have developed a hardware and software system for headland recognition and boundary distance measurement and evaluated the feasibility and overall performance of integrating multiple models.

2. Technical Route and Data Preparation

2.1. Technical Route

In obstacle-free dry field operation scenarios, the autonomous driving agricultural machinery captures front-view images in real time while in motion. Figure 1 shows the technical route which consists of 3 steps. Step I performs image classification. The acquired images can be classified into two categories: farmland images and headland environment images. Given the standard operation mode of agricultural machinery, which involves turning upon approaching the headland, our system requires only the processing of headland images. It utilizes a deep learning image classification network to automatically recognize both farmland and headland images. Step II performs image segmentation. For the identified headland images, a deep learning image segmentation network is employed to distinguish the headland region from the farmland region within these images. Step III performs binocular vision ranging. Following the segmentation results, the head boundaries are extracted from the image. Stereo matching of the left and right images from a binocular camera yields the corresponding depth information, which is then utilized to calculate the distance from the agricultural machinery to the headland boundary.

2.2. Dataset of Headland Image Construction

At present, no publicly available dataset exists for headland images. Consequently, this paper presents the construction of two datasets: a binary classification annotated dataset for headland images and a region segmentation annotated dataset for headland images. As depicted in Figure 2, the collection of farmland and headland environment images predominantly occurs in the summer and autumn seasons, spanning dry farmlands nationwide to encompass a wide range of farmland and headland environments and thereby enhance generalizability. At the operation site, videos of typical headlands and farmlands are recorded in accordance with a unified operating standard. Subsequently, images are systematically organized and extracted from these videos to construct the headland image dataset.

(1): Headland Binary Classification Dataset

The binary classification dataset consists of 9600 images, which have been partitioned into training, validation, and test sets using the hold-out method [18], with a distribution ratio of 4:1:1 for the images. The training set has been augmented fourfold utilizing four methods: vertical flipping, scaling, blurring, and color transformation, as detailed in Table 1. Each of the training, validation, and test sets contains the two types of images outlined in Table 1. The original image data for each category are stored in corresponding folders. For data conversion, the TensorFlow framework version 1.14.0 is utilized to transform the original image data into TFRecord format files, with filenames beginning with the dataset category.

(2): Headland Image Segmentation Annotation Dataset

Despite the presence of diverse headland scenes, including green vegetation, ditch/ridge, and weed headlands, the primary objective remains the detection of the headland boundary. Consequently, in the semantic segmentation of headland images, annotation is required for only two categories: the farmland area and the headland area. The Labelme tool version 4.5.13 is utilized for semantic annotation in constructing the image segmentation dataset, comprising a total of 1616 images. This dataset includes 1396 images featuring headland boundary and 120 images exclusively of farmland, aiming to enhance the model’s generalizability and resistance to interference. Datasets containing various types of headlands are divided into training, validation, and test sets, maintaining a distribution ratio of 4:1:1.

3. System Design and Methodology

3.1. Hardware Composition

As depicted in Figure 3, a John Deere 1204 tractor manufactured by John Deere, located in Moline, IL, USA, equipped with an automatic navigation system is used as the experimental platform. A MYNT EYE D1000-50 binocular depth camera, manufactured by MYNTAI, located in Wuxi, China, mounted at the center of the front counterweight of the John Deere 1204 tractor, positioned at a height of 1.13 m, is connected to the onboard terminal through USB for power supply and real-time data transmission. The binocular camera possesses a horizontal field of view of 90° and a vertical field of view of 60° and is capable of synchronously collecting left and right images at a resolution of 1280 × 720 pixels with a frame rate of 30 frames per second. Furthermore, it offers hardware-level synchronization for both the left and right cameras under identical parameters, facilitating simultaneous exposure and mitigating errors due to unsynchronized data collection times. This feature is advantageous for precise distance measurement of the headland boundary. The Stereo Camera Calibrator toolbox in Matlab R2019b which is developed by MathWorks located in Natick, MA, USA, is used for calibrating the binocular camera, calculating distinct parameters such as the focal lengths of the left and right cameras, intrinsic and extrinsic parameters, rotation matrices, and translation matrices [19].

The AVT-608M onboard terminal of the automatic navigation system is equipped with an Intel Quad-Core Celeron J1900 CPU, which is manufactured by Intel Corporation, headquartered in Santa Clara, CA, USA, clocking at 2.0 GHz, and runs a 64-bit Windows 10 Enterprise edition. This terminal features a disk capacity of 32 GB, an LED backlit LCD panel, and a memory capacity of 4 GB.

3.2. Headland Image Classification Based on MobileNetV3

3.2.1. MobileNetV3 Network Model

Considering the limited performance capabilities of the onboard terminal for autonomous agricultural machinery, this paper opts for MobileNetV3, a lightweight convolutional neural network developed by Google, which is advantageous due to its small memory footprint and fast computational speed.

MobileNetV3 [20], building upon its predecessors MobileNetV1 and MobileNetV2, integrates a lightweight attention mechanism (SE Attention) into its principal bneck structure block. Employing both the h-sigmoid and swish activation functions, it introduces h-swish, a hard version of the swish activation function. The structure of MobileNetV3-Large is detailed in Table 2, where ‘Input’ means the feature map size of each feature layer; ‘Operator’ means the layer structure which each feature map will cross; ‘Exp Size’ means the number of channels after the inverse residual structure in the bneck rises; ‘Out’ means the number of channels in the feature map after passing the bneck; ‘SE’ means whether there is a squeeze-and-excite in that block; ‘NL’ means the type of nonlinearity used, with RE standing for ReLU activation function and ‘HS’ for h-swish activation function; ‘Stride’ means the stride length during convolution.

The MobileNetV3-Large network employs 5 × 5 depthwise separable convolutions, substantially reducing the network’s parameter count and enhancing model efficiency. Furthermore, MobileNetV3-Large utilizes attention mechanisms and the h-swish activation function in specific layers, thereby improving model accuracy. In classification tasks, the feature map size is reduced to 1/32 of the original image size, which may result in significant loss of spatial and semantic information during computation. Consequently, this network is primarily utilized for classification applications. The MobileNetV3-Large network, by utilizing limited information, is able to reduce the consumption of computing resources, thereby rendering it suitable for the onboard application scenario of binary classification of headland images.

3.2.2. Classification Model Performance Evaluation Method

To ascertain the accuracy of the network model’s classification capabilities, a confusion matrix is employed for evaluation. The performance metrics encompass Precision, Accuracy, Recall, and F1-score, as delineated in Equations (1)–(4).

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{R e c a l l + P r e c i s i o n}

(4)

where:

True positive (TP) denotes the instances that are correctly identified as positive.
False positive (FP) denotes the instances that are incorrectly identified as positive.
True negative (TN) denotes the instances that are correctly identified as negative.
False negative (FN) denotes the instances that are incorrectly identified as negative.

3.2.3. Classification Model Performance Testing and Analysis

The classification model was trained and tested on a high-performance server running Ubuntu 16.04.5. This server is equipped with an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20 GHz, which is manufactured by Intel Corporation, headquartered in Santa Clara, CA, USA, a GTX1080Ti graphics card, 500 GB of memory, and a 4 TB hard drive. The software environment includes Python version 3.6.4, CUDA version 10.0, and TensorFlow version 1.14 as the framework.

Utilizing a pre-trained model, the MobileNetV3-Large network was trained with specific hyperparameters: a batch size of 16, an iteration number of 200, use of the Adam optimizer, a learning rate of 0.0005, a dropout rate of 0.5, a temperature coefficient (T) of 20 during knowledge distillation, use of the cross-entropy loss function, and a weighted coefficient (γ) of 0.95 for the loss function. To more effectively evaluate the classification performance of the MobileNetV3-Large network, a comparative experiment was carried out with the MobileNetV2 network and the InceptionV3 network [21]. Pre-trained models for each network were sourced via transfer learning, and an identical headland image enhancement dataset and hyperparameters were employed to train all three networks. Post-training, the networks underwent validation on a platform equipped with an i5-8250U CPU. According to the confusion matrix, MobileNetV3-Large attained an average Precision and Recall of 0.99 on the validation set, demonstrating its ability to accurately identify both sample types. Furthermore, the average F1-score of 0.99 indicates that the classification network proficiently identifies farmland headland images in natural environments, showcasing robust performance.

Table 3 presents the recognition results of the three classification networks, with computation time for recognition speed encompassing the total duration for model loading, image reading, pre-processing, and recognition. As evidenced by the table, all three models demonstrate high recognition accuracy. In comparison to the others, MobileNetV2 and InceptionV3 require more computing resources and exhibit slower recognition speeds, while MobileNetV3-Large not only achieves high classification accuracy but also benefits from lower memory occupancy. This combination of attributes renders it more suitable for deployment on onboard terminals.

3.3. Headland Image Segmentation Based on DeeplabV3+

3.3.1. DeeplabV3+ Network

DeeplabV3+ is an advanced architecture that evolved from DeeplabV3 [22], featuring a novel encoding and decoding structure that optimizes the detailed information of object edges. This model applies depthwise separable convolutions to both the ASPP [23] and the decoder module. However, the segmentation speed of DeeplabV3+ still falls short of the real-time recognition requirements for the onboard system of agricultural machinery. To enhance the model’s operational speed, the pruned MobileNetV3-Large lightweight network is selected as the feature extraction network. The feature extraction component of the DeeplabV3+ network model is optimized, and modifications are made to the ASPP module in the encoder. These adjustments aim to further reduce the parameters of the semantic segmentation model, thereby achieving a more rapid image segmentation rate.

Figure 4 illustrates the MV3-DeeplabV3+ network model structure, where the model input is a scaled image from the headland image segmentation dataset, with a resolution of 513 × 513 pixels. In this structure, the MobileNetV3 and the ASPP module serve as components of the encoder. MobileNetV3 primarily extracts semantic information from two distinct layers: shallow and deep. The shallow semantic information is characterized by the feature map output (128 × 128) after the first downsampling, using a 3 × 3 convolution kernel size, which reduces the parameter count and backpropagation delay while preserving high-dimensional features. The deep semantic information is captured in the feature map (32 × 32) after the fourth downsampling, employing the h-swish activation function to significantly lower the computational load while retaining effective nonlinearity.

3.3.2. Segmentation Model Performance Evaluation Method

For evaluating the accuracy of the segmentation network model, metrics like class pixel accuracy (CPA) and mean intersection over union (MIoU) are commonly employed as assessment criteria.

(1): CPA: This metric represents the ratio of the number of correctly predicted pixels for a single class to the total number of pixels in that class, as delineated in Equation (5).

C P A = \frac{\sum_{i = 0}^{k} P_{i i}}{\sum_{j = 0}^{k} P_{j i}} \times 100 %

(5)

(2): MIoU: This metric calculates the weighted average of the ratio of the intersection to the union of the correctly predicted pixel set and the correctly classified pixel set for each class, as delineated in Equation (6).

M I o U = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{p_{i i}}{\sum_{j = 0}^{k} p_{i j} + \sum_{j = 0}^{k} p_{j i} - p_{i i}} \times 100 %

(6)

In Equations (5) and (6):

k represents the number of classes.

P_ii represents the number of pixels correctly predicted as belonging to class i.

P_ij represents the number of pixels belonging to class i but incorrectly predicted as class j.

P_ji represents the number of pixels belonging to class j but incorrectly predicted as class i.

3.3.3. Segmentation Model Performance Testing and Analysis

The segmentation network was subjected to model training and testing within a high-performance server environment, configured as detailed in Section 3.2.3 for the classification network model.

During the training of the DeeplabV3+ model, the resolution of images from the headland segmentation dataset was scaled down to 513 × 513 pixels to expedite the computation speed of the deep learning algorithm. Furthermore, data augmentation techniques including random flipping, cropping, and padding were employed, while transfer learning was utilized to boost the overall efficiency of the network model’s training. Considering the dataset’s labels comprised only two classes, the training period was established at a relatively brief 100 iterations, with an initial learning rate of 0.0005 and a batch size of 16. To further reduce the training duration and facilitate faster model convergence, the Adam optimization algorithm [24] was employed.

For the performance evaluation of the MV3-DeeplabV3+ network, both the SegNet and U-Net network models were trained using the headland segmentation training set, facilitating a comparative analysis with the proposed MV3-DeeplabV3+. The performance of the three segmentation network models was assessed on a platform equipped with an Intel i5-8250U CPU, and the comparative results are presented in Table 4. Among the models, the SegNet network exhibited lower CPA and MIoU values. The recognition speeds of the three network models varied by less than 0.1 s, with MV3-DeeplabV3+ demonstrating the fastest performance. While the U-Net network model required less memory, its CPA and MIoU values were inferior compared to those of MV3-DeeplabV3+, and it displayed more pronounced fluctuations in loss values during training. Figure 5 demonstrates that, after 100 iterations, the loss values of both U-Net and MV3-DeeplabV3+ networks decreased to a certain level and subsequently stabilized. The loss curve reveals that MV3-DeeplabV3+ experienced the smallest loss, and its decline in loss values began to plateau from the 20th epoch onwards, indicating a minimal prediction error and effective training.

The segmentation training outcomes indicate that the MV3-DeeplabV3+ network model, utilizing MobileNetV3 as its backbone network, exhibits rapid recognition speed and a smaller memory footprint, while concurrently maintaining superior performance. These attributes fulfill the requirements for model deployment on embedded devices. Through transfer learning, data augmentation, and hyperparameter tuning, the network was optimized, yielding high CPA and MIoU values for the trained model. This outcome indicates the model’s capability to segment different areas of typical headland with considerable accuracy.

3.4. Ranging the Headland Boundary Based on Binocular Vision

Binocular stereo vision technology [25] finds application in various sectors, including mining production [26] and medical machinery [27]. Compared to indoor environments, the bumpy movement of agricultural machinery poses increased challenges for distance measurement using binocular vision technology. Nonetheless, significant advancements have been achieved in detecting obstacles in farmland [28,29] and extracting crop harvesting boundaries [30]. This paper utilizes binocular vision technology for the extraction of headland boundaries, based on segmented model image detection.

The segmentation model classifies each pixel in the input headland image, assigning a label value to each, which results in the creation of a pixel label matrix for the entire image. By traversing the matrix columnwise, the coordinates at which the label value first changes are identified. These coordinates correspond to the boundary pixels delineating the farmland from the headland in the image.

Figure 6 illustrates the use of parallel binocular cameras, which capture the left and right camera images of the headland simultaneously from two distinct viewpoints. The disparity value between the imaging planes of the left and right cameras is calculated, followed by the computation of the distance between the headland boundary and the camera using triangulation. This process yields the three-dimensional coordinate information of the boundary. To measure the distance of the boundary pixels in the image captured by the left camera of the binocular system, it is essential to locate the corresponding pixel in the image taken by the right camera. Following this, the principle of camera imaging is employed to calculate the distance between the pixel and the camera. Pixel matching is a crucial step in the process of binocular ranging. In this study, the measurement of the headland boundary is carried out using the block-matching (BM) binocular stereo vision matching algorithm provided by the OpenCV library. The BM algorithm [31], a local stereo matching algorithm, is renowned for its high-speed matching capabilities. It operates on the concept of support windows, which assumes uniform disparity for pixels within each window. The process begins with setting a small window, followed by searching the target image through traversing feature vectors within this window. Subsequently, the similarity between various windows encountered during traversal and the initial window is calculated. The window with the highest similarity is then selected as the final matching result. Figure 7 presents the actual headland test results.

3.5. Development of Application Software Based on the Onboard Terminal

In this study, the C++ programming language was utilized, and the application program was developed using QT version 5.9.9. The capability to invoke model weight files trained using TensorFlow was facilitated through the deep neural network (DNN) module in the OpenCV version 4.5.4 dynamic link library. The software’s executable (exe) installation package was generated using the windeployqt packaging tool provided by QT and subsequently deployed on the onboard terminal.

The software designed for farmland headland environment perception and headland boundary ranging encompasses various functional modules, including video image acquisition, video frame capture, image pre-processing, binocular camera calibration, headland image determination, headland boundary extraction, depth map generation, and ranging. The software’s workflow is depicted in Figure 8.

4. Field Tests and Result Analysis

4.1. Test Environment and Conditions

To confirm the real-time performance and accuracy of headland boundary ranging in a farmland application environment, field tests were carried out at the Xiaotangshan National Precision Agriculture Demonstration Base.

During field operations, the speed of agricultural machinery is generally slow, ranging typically between 1.3 m/s and 2.2 m/s. Upon detecting the headland boundary, the machinery decelerates, followed by the lifting of implements and executing turns within the designated headland area. In our field tests, the machinery maintained a uniform speed of 1.3 m/s.

Under the outdoor test conditions where the binocular ranging sensor system of this study ensures an error not exceeding 10%, the maximum measurement distance is 40 m. Notably, an increase in measurement distance results in a corresponding escalation in measurement error. Taking these factors into account, to fulfill the ranging needs of agricultural machinery and enhance measurement accuracy, accuracy tests were conducted at reference distances of 25 m, 20 m, and 15 m. For each specified distance, 10 depth measurements were executed and then averaged to ascertain the headland distance.

4.2. Evaluation Metrics

In the field tests, four key metrics were employed for evaluation: classification time, average confidence, ranging time, and relative error of distance.

(1): Classification time (T_c).

T_c = T_c1 + T_c2 + T_c3 + T_c4

(7)

where:

T_c₁: Time to load category names and extract required labels.

T_c₂: Time to load the pre-trained headland binary classification network model.

T_c₃: Time to load the image, including the time to convert the image into the input format for the deep learning model.

T_c₄: Time to pass the image through the model for forward propagation and to obtain output data.

(2): Average reliability ( $r e l i a b i l i t y$ ). Calculated as the mean of the Softmax probabilities for each test, which represent the likelihood of the model predicting the label with the highest score.

S o f t m a x (Z_{i}) = \frac{e^{z_{i}}}{\sum_{c = 1}^{C} e^{Z_{c}}}

(8)

r e l i a b i l i t y = \frac{\sum_{j = 1}^{10} S o f t m a x (j)}{10}

(9)

where:

i: The output value for the i-th node.

C: The number of output nodes, which is the number of categories for classification. The Softmax function can convert the output values of binary classification into a probability distribution within the range of [0, 1].

j: The order of the test.

(3): Ranging time (T_d).

T_d = T_s + T_x + T_a

(10)

where:

T_s: Time to load the label file, segmentation network model, and image and to obtain output results.

T_x: Time for stereo image correction and pixel matching.

T_a: Time to extract boundary pixel depth information and calculate the average distance.

(4): Relative error of distance ( $P_{r e}$ ).

P_{r e} = \frac{|D_{m e a} - D_{c a l c}|}{D_{m e a}} \times 100 %

(11)

where D_mea is the actual reference distance from the machinery to the headland, and D_calc is the distance detected by the algorithm.

4.3. Result Analysis

Ranging tests were carried out in three different farmland headland environments: weedy, green vegetation, and ditch/ridge. The results including classification time, average reliability, ranging time, and relative error of distance at distances of 25 m, 20 m, and 15 m are presented in Table 5, Table 6 and Table 7.

As indicated in Table 5, Table 6 and Table 7, at distances of 25 m, 20 m, and 15 m, the average classification time approximates to 0.6 s, and the average reliability is approximately 94%, displaying minimal variation across different distances and scenarios. This suggests that the classification network model is stable and effectively perceives the farmland headland environment in a timely manner. The relative errors of distance at the three reference distances are 6.72%, 4.80%, and 4.35%, respectively. This suggests that a greater distance to the headland, or a larger actual reference distance, results in more similarity in the disparity of certain image points, thereby causing an increased measurement error. However, analyzing the overall trend of measurement error reveals that the closer the proximity to the headland, or the smaller the actual reference distance, the smaller the relative error becomes. Observations from the results of relative error of distance across three distinct scenarios indicate that for ditches/ridges, characterized by a clear and easily determinable boundary, the measurement distance is relatively more accurate. In the case of weedy headland images, the measurement error tends to be relatively higher due to the environment’s complexity, the presence of numerous disturbances, and a blurred boundary. The green vegetation headland environment, being moderately complex, results in a measurement error that falls between the other two image types.

In actual farmland environments, agricultural machinery typically operates at speeds ranging from 1.3 to 2.2 m/s and system response times of 1 to 2 s. The average ranging time of approximately 2.0 s enables the completion of ranging within 15 m from the headland, resulting in effective auxiliary turn control, and thereby verifies the effectiveness of the employed ranging method.

In conclusion, the relative error and ranging time associated with the farmland headland environment perception and headland-boundary-ranging method utilized in this study meet the requirements for autonomous headland-ranging of agricultural machinery in actual farmland environments.

5. Conclusions

A system dedicated to the identification and ranging of a headland boundary for autonomous agricultural machinery has been designed and developed. A method for detecting a dry farmland headland boundary utilizing deep learning was proposed, encompassing three main processes: farmland and headland image classification, headland image segmentation and headland boundary extraction, and distance measurement. This study constructed an annotated dataset for farmland headland images and employed the compact MobileNetV3 network architecture for the binary classification of farmland and headland images. An enhanced MV3-DeeplabV3+ image segmentation network was proposed, which reduces computation by incorporating an attention mechanism, demonstrating high efficiency in segmenting headland images. It achieved an MIoU of 92.08%, an average recognition speed of 1.26 s, and a memory usage of 38.45 MB. Headland boundaries were extracted from segmented images, and depth information was acquired by stereo matching the left and right images from a binocular camera, culminating in the accurate measurement of the distance to the headland.

Field tests revealed that the system’s error reduces as the detection distance decreases, exhibiting average relative errors of distance of 6.72%, 4.80%, and 4.35% at headland distances of 25 m, 20 m, and 15 m, respectively. When deployed on an embedded onboard terminal for headland boundary ranging, the classification time is a mere 0.6 s, satisfying the need for real-time inference. The ranging time approximates 2.0 s, meeting the requirements for real-time detection of a dry farmland headland boundary. Additionally, this method is suitable for various farmland headland environments, demonstrating strong generalization capability.

In conclusion, the headland-boundary-ranging method introduced in this paper adequately addresses the environmental perception requirements of autonomous agricultural machinery headland. It is aptly suited for integration with agricultural machinery onboard terminals, offering a novel research approach to addressing complex scene recognition in agriculture. While the headland boundary identification and ranging system effectively balances real-time performance and accuracy, its efficacy is somewhat dependent on the hardware system. The recognition speed has not attained the level of traditional image recognition methods cited in the literature [5], indicating the need for further optimization and design in future studies.

Author Contributions

Conceptualization, H.L. and Z.M.; methodology, H.L. and L.M.; validation, L.M. and K.L.; formal analysis, L.M. and H.L.; writing—original draft preparation, H.L., L.M. and K.L.; writing—review and editing, H.L. and K.L.; supervision, Z.M. and H.L.; project administration, Z.M. and H.L.; funding acquisition, Z.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Nature Science Foundation of China, Grant No. 31971800.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Han, S.; He, Y.; Fang, H. Recent development in automatic guidance and autonomous vehicle for agriculture: A Review. J. Zhejiang Univ. (Agric. Life Sci.) 2018, 44, 381–391+515. [Google Scholar]
Bai, Y.; Zhang, B.; Xu, N.; Zhou, J.; Shi, J.; Diao, Z. Vision-based navigation and guidance for agricultural autonomous vehicles and robots: A review. Comput. Electron. Agric. 2023, 205, 107584. [Google Scholar] [CrossRef]
Xue, J.L.; Grift, T.E. Agricultural robot turning in the headland of corn fields. Appl. Mech. Mater. 2011, 63, 780–784. [Google Scholar] [CrossRef]
Tu, X.; Tang, L. Headland turning optimisation for agricultural vehicles and those with towed implements. J. Agric. Food Res. 2019, 1, 100009. [Google Scholar] [CrossRef]
Qiao, W.; Hui, L.; Pengshu, Y.; Zhijun, M. Detection method of headland boundary line based on machine vision. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2020, 51, 18–27. [Google Scholar]
Xie, B.; Wu, Z.; Mao, E. Development and prospect of key technologies on agricultural tractor. Trans. Chin. Soc. Agric. Mach. 2018, 49, 1–17. [Google Scholar]
Olcay, E.; Rui, X.; Wang, R. Headland Turn Automation Concept for Tractor-Trailer System with Deep Reinforcement Learning. In Proceedings of the 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), Auckland, New Zealand, 26 August–30 August 2023; pp. 1–7. [Google Scholar]
Li, Y. Research and application of deep learning in image recognition. In Proceedings of the 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA), Shenyang, China, 21–23 January 2022; pp. 994–999. [Google Scholar]
Qiao, Y.; Yang, P.; Meng, Z.; Liu, H. Detection System of Headland Boundary Line Based on Machine Vision. J. Agric. Mech. Res. 2022, 44, 24–30. [Google Scholar]
Hong, Z.; Li, Y.; Lin, H.; Liu, C. Field Boundary Distance Detection Method in Early Stage of Planting Based on Binocular Vision. Trans. Chin. Soc. Agric. Mach. 2022, 53, 27–33+56. [Google Scholar]
Gao, G.; Guo, H.; Zhang, J.; Zhang, Z.; Wu, T.; Lu, H.; Qiu, Z.; Chen, H.; Lingxuan, Z. An efficient headland-turning navigation system for a safflower picking robot. J. Agric. Eng. 2023, 54, 1539. [Google Scholar] [CrossRef]
Yang, L.; Li, Y.; Chang, M.; Xu, Y.; Hu, B.; Wang, X.; Wu, C. Recognition of field roads based on improved U-Net++ Network. Int. J. Agric. Biol. Eng. 2023, 16, 171–178. [Google Scholar] [CrossRef]
Li, Y.; Iida, M.; Suyama, T.; Suguri, M.; Masuda, R. Implementation of deep-learning algorithm for obstacle detection and collision avoidance for robotic harvester. Comput. Electron. Agric. 2020, 174, 105499. [Google Scholar] [CrossRef]
Kim, W.-S.; Lee, D.-H.; Kim, T.; Kim, G.; Kim, H.; Sim, T.; Kim, Y.-J. One-shot classification-based tilled soil region segmentation for boundary guidance in autonomous tillage. Comput. Electron. Agric. 2021, 189, 106371. [Google Scholar] [CrossRef]
He, Y.; Zhang, X.; Zhang, Z.; Fang, H. Automated detection of boundary line in paddy field using MobileV2-UNet and RANSAC. Comput. Electron. Agric. 2022, 194, 106697. [Google Scholar] [CrossRef]
Li, D.; Li, B.; Long, S.; Feng, H.; Wang, Y.; Wang, J. Robust detection of headland boundary in paddy fields from continuous RGB-D images using hybrid deep neural networks. Comput. Electron. Agric. 2023, 207, 107713. [Google Scholar] [CrossRef]
Qiao, Y.; Liu, H.; Meng, Z.; Chen, J.; Ma, L. Method for the automatic recognition of cropland headland images based on deep learning. Int. J. Agric. Biol. Eng. 2023, 16, 216–224. [Google Scholar] [CrossRef]
Dwork, C.; Feldman, V.; Hardt, M.; Pitassi, T.; Reingold, O.; Roth, A. The reusable holdout: Preserving validity in adaptive data analysis. Science 2015, 349, 636–638. [Google Scholar] [CrossRef]
Wang, X.; Zhao, L. Binocular vision system research based on MATLAB and Opencv. Digit. Commun. World 2019, 2, 46–47. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Adam, H.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2020. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Poggi, M.; Tosi, F.; Batsos, K.; Mordohai, P.; Mattoccia, S. On the synergies between machine learning and binocular stereo for depth estimation from images: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5314–5334. [Google Scholar] [CrossRef]
Zhang, L.; Hao, S.; Wang, H.; Wang, B.; Lin, J.; Sui, Y.; Gu, C. Safety Warning of Mine Conveyor Belt Based on Binocular Vision. Sustainability 2022, 14, 13276. [Google Scholar] [CrossRef]
Hu, J.; Sun, Y.; Li, G.; Jiang, G.; Tao, B. Probability analysis for grasp planning facing the field of medical robotics. Measurement 2019, 141, 227–234. [Google Scholar] [CrossRef]
Meng, Z.; Yan, B.; Yin, Y.; Wang, Q.; Liu, H.; Ling, L. Machine vision obstacle detection method in unstructured farmland environment. J. Intell. Agric. Mech. 2021, 2, 1–6. [Google Scholar]
Yang, P.; Liu, H.; Wang, X.; Wang, Q.; Meng, Z. Three-dimensional Information Detection Method for Crop Seedling Obstacles Based on Binocular Vision. J. Agric. Mech. Res. 2021, 43, 11–16. [Google Scholar]
Wei, X.; Zhang, M.; Liu, Q.; Li, L. Extraction of Crop Height and Cut-edge Information Based on Binocular Vision. Trans. Chin. Soc. Agric. Mach. 2022, 53, 225–233. [Google Scholar]
Li, X.; Chen, S.; Xiao, H.; Huang, D. 3D Reconstruction Analysis Based on SGBM Algorithm and BM. Algorithm Autom. Inf. Eng. 2019, 40, 6–12. [Google Scholar]

Figure 1. Technology roadmap.

Figure 2. Examples of farm field images.

Figure 3. Schematic of the hardware system.

Figure 4. Network model structure of MV3-DeeplabV3+.

Figure 5. Comparison of network model training loss changes.

Figure 6. Left and right camera imaging schematic. Note: ‘O_l’ and ‘O_r’ are the center points of the left and right cameras, respectively. ‘P_l’ and ‘P_r’ are the imaging points of the left and right cameras, respectively. ‘z’ represents the depth distance.

Figure 7. Actual headland test results.

Figure 8. Software flowchart.

Table 1. Distribution of categorized datasets for farmland and headland images.

Number	Image Type	Training Set	Augmented Training Set	Validation Set	Test Set
0	Farmland	2200	11,000	550	550
1	Headland	4200	21,000	1050	1050
Total		6400	32,000	1600	1600

Table 2. MobileNetV3-Large network architecture.

Input	Operator	Exp Size	Out	SE	NL	Stride
512 × 512 × 3	conv2d	-	16	-	HS	2
256 × 256 × 16	bneck, 3 × 3	-	16	-	RE	1
256 × 256 × 16	bneck, 3 × 3	64	24	-	RE	2
128 × 128× 24	bneck, 3 × 3	72	24	-	RE	1
128 × 128 × 40	bneck, 5 × 5	72	40	√	RE	2
64 × 64 × 40	bneck, 5 × 5	120	40	√	RE	1
64 × 64 × 40	bneck, 5 × 5	120	40	√	RE	1
64 × 64 × 40	bneck, 3 × 3	240	80	-	HS	2
32 × 32 × 80	bneck, 3 × 3	200	80	-	HS	1
32 × 32 × 80	bneck, 3 × 3	184	80	-	HS	1
32 × 32 × 80	bneck, 3 × 3	184	80	-	HS	1
32 × 32 × 80	bneck, 3 × 3	480	l 12	√	HS	1
32 × 32 × 112	bneck, 3 × 3	672	l 12	√	HS	1
32 × 32 × 112	bneck, 5 × 5	672	160	√	HS	1
32 × 32 × 112	bneck, 5 × 5	672	160	√	HS	2
16 × 16 × 160	bneck, 5 × 5	960	160	√	HS	1
16 × 16 × 160	conv2d, l × l	-	960	-	HS	1
16 × 16 × 960	Pool, 7 × 7	-	-	-	HS	-
1 × 1 × 960	conv2d, l × l, NBN	-	1280	-	HS	1
1 × 1 × 1280	conv2d, l × l, NBN	-	k	-	-	-

Table 3. Performance comparison of three classification networks.

Network	Accuracy (%)	Recognition Speed (s/Image)	Memory Usage (MB)	Training Time (h)
MobileNetV3-Large	99.2	1.45	26.30	7
InceptionV3	98.3	4.49	45.00	11
MobileNetV2	98.9	1.90	28.95	7

Table 4. DeeplabV3+ network performance testing.

Network	Memory Usage (MB)	CPA (%)	MIoU (%)	Training Time (h)	Recognition Speed (s)
MV3-DeeplabV3+	38.45	97.03	92.08	79	1.26
U-Net	25.36	96.19	90.20	75	1.35
SegNet	30.88	92.20	85.95	83	1.33

Table 5. The measurement result at a distance of 25 m.

Headland Category	Classification Time (s)	Average Reliability (%)	Ranging Time (s)	Relative Error of Distance (%)
Weedy	0.627	93.20	1.945	8.12
Green vegetation	0.620	96.26	2.076	6.60
Ditch/Ridge	0.623	90.91	1.988	5.44
Average	0.623	93.46	2.003	6.72

Table 6. The measurement result at a distance of 20 m.

Headland Category	Classification Time (s)	Average Reliability (%)	Ranging Time (s)	Relative Error of Distance (%)
Weedy	0.626	94.02	1.985	5.10
Green vegetation	0.624	96.55	1.954	4.90
Ditch/Ridge	0.623	92.02	2.089	4.40
Average	0.624	94.20	2.009	4.80

Table 7. The measurement result at a distance of 15 m.

Headland Category	Classification Time (s)	Average Reliability (%)	Ranging Time (s)	Relative Error of Distance (%)
Weedy	0.626	93.75	1.918	5.00
Green vegetation	0.620	96.78	1.976	4.33
Ditch/Ridge	0.622	91.98	2.088	3.73
Average	0.623	94.17	1.994	4.35

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, H.; Li, K.; Ma, L.; Meng, Z. Headland Identification and Ranging Method for Autonomous Agricultural Machines. Agriculture 2024, 14, 243. https://doi.org/10.3390/agriculture14020243

AMA Style

Liu H, Li K, Ma L, Meng Z. Headland Identification and Ranging Method for Autonomous Agricultural Machines. Agriculture. 2024; 14(2):243. https://doi.org/10.3390/agriculture14020243

Chicago/Turabian Style

Liu, Hui, Kun Li, Luyao Ma, and Zhijun Meng. 2024. "Headland Identification and Ranging Method for Autonomous Agricultural Machines" Agriculture 14, no. 2: 243. https://doi.org/10.3390/agriculture14020243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Headland Identification and Ranging Method for Autonomous Agricultural Machines

Abstract

1. Introduction

2. Technical Route and Data Preparation

2.1. Technical Route

2.2. Dataset of Headland Image Construction

3. System Design and Methodology

3.1. Hardware Composition

3.2. Headland Image Classification Based on MobileNetV3

3.2.1. MobileNetV3 Network Model

3.2.2. Classification Model Performance Evaluation Method

3.2.3. Classification Model Performance Testing and Analysis

3.3. Headland Image Segmentation Based on DeeplabV3+

3.3.1. DeeplabV3+ Network

3.3.2. Segmentation Model Performance Evaluation Method

3.3.3. Segmentation Model Performance Testing and Analysis

3.4. Ranging the Headland Boundary Based on Binocular Vision

3.5. Development of Application Software Based on the Onboard Terminal

4. Field Tests and Result Analysis

4.1. Test Environment and Conditions

4.2. Evaluation Metrics

4.3. Result Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI