Next Article in Journal
A Process-Based Model for Arctic Coastal Erosion Driven by Thermodenudation and Thermoabrasion Combined and including Nearshore Morphodynamics
Next Article in Special Issue
Small-Sample Sonar Image Classification Based on Deep Learning
Previous Article in Journal
Numerical Simulation Investigation of Vortex Finder Depth Effects on Flow Field and Performance of Desanding Mini-Hydrocyclones
Previous Article in Special Issue
Combined LOFAR and DEMON Spectrums for Simultaneous Underwater Acoustic Object Counting and F0 Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Performance Comparison of Feature Detectors on Various Layers of Underwater Acoustic Imagery

1
School of Ocean Engineering, Harbin Institute of Technology, Weihai 264200, China
2
School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264200, China
3
College of Engineering, Peking University, Beijing 100871, China
*
Authors to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2022, 10(11), 1601; https://doi.org/10.3390/jmse10111601
Submission received: 6 September 2022 / Revised: 14 October 2022 / Accepted: 18 October 2022 / Published: 31 October 2022
(This article belongs to the Special Issue Application of Sensing and Machine Learning to Underwater Acoustic)

Abstract

:
Image feature matching is essential in many computer vision applications, and the foundation of matching is feature detection, which is a crucial feature quantification process. This manuscript focused on detecting more features from underwater acoustic imageries for further ocean engineering applications of autonomous underwater vehicles (AUVs). Currently, the mainstream feature detection operators are developed for optical images, and there is not yet a feature detector oriented to underwater acoustic imagery. To better analyze the suitability of existing feature detectors for acoustic imagery and develop an operator that can robustly detect feature points in underwater imageries in the future, this manuscript compared the performance of well-established handcrafted feature detectors and that of the increasingly popular deep-learning-based detectors to fill the gap in the literature. The datasets tested are from the most commonly used side-scan sonars (SSSs) and forward-looking sonars (FLSs). Additionally, the detection idea of these detectors on the acoustic imagery phase congruency (PC) layer was innovatively proposed with the aim of finding a solution that balances detection accuracy and speed. The experimental results show that the ORB (Oriented FAST and Rotated BRIEF) and BRISK (Binary Robust Invariant Scalable Keypoints) detectors achieve the best overall performance, the FAST detector is the fastest, and the PC and Sobel layers are the most favorable for implementing feature detection.

1. Introduction

In recent years, human and industrial development has become more closely related to the ocean, with more infrastructure built off the coast and more renewable energy obtained from the deep sea. To more safely and efficiently explore the ocean, AUVs are becoming the predominant underwater exploration equipment, which can effectively replace divers for port and offshore infrastructure maintenance and can also perform exploration tasks in uncharted waters. When the AUVs perform exploration tasks in deep water, sonar sensors play an important role, with the forward-looking sonar (FLS) assisting AUVs to avoid obstacles for their safety and the side-scan sonar (SSS) providing high-resolution views of the seafloor to help researchers understand the deep-sea operating area [1]. With the continuous development of the electronics industry and remote-sensing technology, the detection performance of sonar equipment has been dramatically improved. The two main sonar types, FLS and SSS, can provide rich underwater acoustic imageries. The number and resolution of these images are constantly increasing. Therefore, it is crucial and far-reaching to design an efficient vision-processing algorithm to extract the information from underwater acoustic images, which, in turn, can serve deep-sea exploration activities. Image matching, as a principal technology in computer vision, applies the knowledge of graphics and corresponding mathematical theory to analyze the correspondence of grayscale, content, structure, texture, and semantic information in one or more groups of images, finally, to find similar image targets. Typical matching methods can be divided into region-matching and feature-point-matching approaches [2]. Image matching is the essential work of image analysis and processing and also a fundamental prerequisite for tasks such as target tracking, visual navigation, map stitching, and 3D reconstruction [3].
The above tasks are bound to be faced in practical ocean engineering tasks, so there is an opportunity and far-reaching significance to developing an effective acoustic image-matching algorithm. Underwater acoustic imageries are different from natural light images in that they have low contrast, a narrow grayscale range, high spatial correlation of adjacent pixels, and poor quality. Traditional matching algorithms designed for optical images, such as SIFT (scale-invariant feature transform) [4] and SURF (speeded-up robust features) [5], cannot effectively perform the matching task of acoustic images. The contrast between the large and small size of underwater acoustic images and the lack of the number of targets of interest leads to the inappropriateness of area-based matching methods for underwater task scenarios and considering the real-time requirements of underwater detection tasks, and the current mainstream research is based on feature-point-matching methods [6,7], which usually include feature detection, feature description, and outlier rejection. Among them, feature detection is the basis of the whole process, representing the salient structural information in the image. It also determines the number of correct matching pairs and the accuracy of the point position in the two images to be matched. In recent years, more research results have been achieved in the feature description stage, but many bottlenecks have not been solved in the feature detection stage. Meanwhile, compared with the development of optical image matching technology, the technique in the field of underwater acoustic imagery matching is lagging, which is mainly limited by the imaging shortcomings of underwater acoustic imageries and external factors such as the lack of samples available for research [8].
In recent years, with the development of the electronics industry and the update of sensor technology, the quantity and quality of underwater acoustic imageries available to researchers have been guaranteed. Meanwhile, the research on sonar image feature detection has gradually started to be emphasized. When a search is performed on the Web of Science using the subject term “sonar image feature detection” and setting the search date from 2000 to the present, the following report could be derived, as shown in Figure 1. There has been a significant increase in the number of publications and citation frequency of related research since 2013, which may be related to the rise of deep-learning techniques within the field of computer vision, where new techniques have brought opportunities for underwater acoustic imagery feature detection tasks.
In this paper, a framework of comparative feature detection algorithms was proposed to find a high-quality feature detection scheme so as to fundamentally advance the underwater acoustic image matching technology to meet the needs of the downstream tasks such as the underwater visual navigation, seabed mapping, and environment reconstruction. Moreover, the relevant experimental reports are also summarized to help newcomers in this field as well as researchers who are proficient in image processing but do not understand sonar images to participate in acoustic perception tasks. Finally, some potential applications of deep-learning methods in this research field are summarized.
The rest of the manuscript is organized as follows: Section 2 briefly describes the research related to sonar image matching and the recent developments; Section 3 presents the feature detection approach taken in this paper; Section 4 describes the experimental setup, including dataset introduction, evaluation criteria, and parameter settings; Section 5 describes and compares the experimental results; Section 6 discusses the experimental shortcomings and outlook; finally, conclusions are drawn in Section 7.

2. Related Work

In natural light images, feature points are generally selected from pixel points that are easily distinguishable and visually salient in the image, such as straight-line segment intersections, corner points, or local center-of-gravity points [2]. Compared with natural light images, underwater acoustic images have poor imaging quality, severe noise interference, grayscale distortion, and weak edge features, all of which lead to the inefficiency of feature detectors initially designed for optical images. In addition, due to the significant differences between acoustic and optical imaging mechanisms, sonar images are usually accompanied by a narrow grayscale range, high spatial correlation of adjacent pixels, and insignificant grayscale variation, and there is also no publicly available and mature scheme for preprocessing sonar images. This situation also leads to almost failure of feature matching in the final underwater acoustic imageries, as evidenced by the low number of feature points detected and the absolute low matching accuracy.
In the study of underwater acoustic image matching, Daniel et al. [8] proposed to use acoustic shadow features and a segmentation algorithm to achieve SSS image matching, which is based on the geometric distribution characteristics of shadows in SSS images, echoes, and their corresponding critical position information. However, the authors used a manual combination of features rather than individual feature detectors, so the overall pipeline design lacked flexibility. Khater [9] proposed an SSS image feature detection method based on SUSAN (smallest univalue segment assimilating nucleus) and Harris corner point information, which can achieve better detection when the sonar image features are stable and uniform. However, in general, there is a large amount of noise and distortion on sonar images, and features are scarce and difficult to extract, so the generalization performance of this method is insufficient, and the application space in underwater scenes is limited. P. Vanish [6] investigated the effect of the classical SIFT algorithm on SSS images. The results showed that when the image features are rich and stable, SIFT could accomplish the SSS image-matching tasks. However, when there are apparent noise interference and image distortion in the sonar image, the matching effects of SIFT will also be significantly reduced. Zhang [10] proposed a SIFT-like feature detector and descriptor MBS-SIFT that could robustly cope with noise interference in multibeam imaging sonar images, which could more accurately capture the features of underwater objects compared with the classical SIFT method. Tao [11] proposed a combined SSS image-matching method in a priori position information, and SURF detectors are used to detect feature points from the SSS image that are stable to the affine transform for matching, and finally, the results are corrected with the help of RANSAC (random sample consensus). The results show that the matching algorithm is time-consuming and has high accuracy. However, this method depends on a priori position data, which has limited application in practical underwater scenarios. Shang [12] proposed a way to automatically match and mosaic SSS images. First, the image overlap region between adjacent strips is automatically determined based on the sonar detection track lines and the width of the scanned strips. Then in the overlap region, the SURF detector is used to detect the feature points. Finally, the geographic coordinates of the feature points are used to constrain the method, which uses multiple constraint information to complete the matching between the SSS image pair, and the matching results are better but very dependent on external information. Peter King [7] compared the performance of several classical handcrafted feature-matching algorithms in matching sonar images using a pre-collected dataset. The results show that the SIFT and SURF methods match better when the sonar images are feature-rich and stable. In the complex landscape background, only the SURF method could barely complete the matching. However, the authors did not deeply dissect the feature detection link and only evaluated each algorithm regarding the overall matching effect. Tueller [13] compared the performance of seven commonly used feature detection algorithms on synthetic sonar image datasets with different background attributes by training an SVM (support vector machine) classifier to evaluate the performance of the detectors. However, the study only covered several traditional feature detectors and tested a single type of image. Ansari [14] investigated the detection effectiveness of two algorithms, SIFT and SURF, in underwater image matching, and the results showed that SIFT and SURF in feature extraction and matching of SSS images offer good performance; but this is limited to scenes where stable textures and independent targets exist, and their performance sharply degrades when hard sediments and rocks are encountered.
Although the above studies have achieved feature detection of sonar images in their respective assumed scenarios, it is evident that the detection is less than ideal, and there are strict requirements for the quality of the acoustic imageries themselves and the feature richness of the detected scenes, which reduce the generalization performances of these methods. When AUVs carry sonar equipment for underwater detection tasks, the idealized characteristics of acoustic images cannot be guaranteed in most cases due to the complexity and unknown nature of the underwater environment, so it is necessary to develop a robust acoustic imagery feature detection method, which is also the cornerstone for designing an overall algorithm for underwater acoustic imagery matching.
With the rapid development of deep-learning techniques within the field of image processing, CNN (convolutional neural network) based feature detectors have been generated, which can be used to extract shallow, in-depth, and combined multidimensional features in images to achieve feature point detection. Zhou [15] proposed to match underwater acoustic images using deep-learning-based detectors, feature descriptors, and style transfer algorithms, and the matching effect was quantified and visualized. The experimental results showed that even though the learned detectors and descriptors were obtained based on the optical image dataset training, they surpassed the traditional handcrafted operators, such as the SIFT, in terms of the overall effect. When the style transfer method was introduced, the matching results of acoustic images were further improved, and this study can provide a reference for the design of deep-learning-based matching algorithms for underwater acoustic images in the future.
The above review of related studies shows that the feature point-based matching method is more robust and flexible and is the leading way of feature matching for underwater acoustic images at present. This kind of method can also work well when there are complex geometric transformations and external interference between two images; therefore, it has been more widely used in underwater acoustic imagery matching. However, most of the literature studies used the existing optical matching methods due to the fact that it is very challenging to develop a high-precision acoustic matching algorithm with particular generalization performance, so it is necessary to individually design two links of feature detection and feature description, each of which has a significant impact on matching results. Obviously, the first step of these processes is to find out a high-performance feature detection scheme according to the characteristics of underwater acoustic imagery.
According to the literature survey, existing literature does not independently develop an analysis of feature detection on underwater acoustic imagery, which is the motivation of this manuscript. We noticed that Oliveira [16] conducted research on the feature extraction of acoustic imagery involved in underwater navigation tasks, and the tested algorithms involved SURF, ORB, and BRISK. Their work achieved preliminary results, but the conclusion is limited to underwater navigation tasks, and the evaluated methods are not comprehensive enough to generalize to other underwater tasks, such as 3D reconstruction or underwater mapping.
In contrast, the research in this paper has expanded with multiple applications, devices, detection methods, and evaluation methods, aiming to obtain a more comprehensive solution. Meanwhile, the work in this manuscript is more focused, as reflected by the fact that the research content only explores the feature detection aspect in depth, as it is the cornerstone of feature description and matching. In addition to filling gaps in the existing literature, the study of this paper has made a more valuable contribution: on the one hand, due to the scarcity of underwater acoustic data, the research on deep-learning-based matching methods is still not mature enough, and many algorithms need further experimentation and validation—this manuscript carried out valuable work in this area; on the other hand, the research pipeline designed is flexible and can be updated according to the actual mission requirements to cope with the complex and changing underwater environment information.
The overall research pipeline proposed in this manuscript is shown in Figure 2. First, input underwater acoustic data, such as from FLS and SSS, and visualize them as a set of digital images. Meanwhile, the images can be pre-processed to generate regions of interest (ROIs) Second, extract layers ready for feature detection, such as grayscale, gradient, and PC layers. Third, set up detectors for feature extraction, such as handcrafted and deep-learning-based. Then, the best feature detection scheme tailored to the current scene is used for downstream tasks such as matching, mapping, and tracking.

3. Methodology

Current feature point-based matching methods usually use grayscale or gradient information to detect keypoints, which are a set of potential feature points. However, this kind of method is easily affected by contrast, grayscale distortion, and noise, which are negative factors commonly found in underwater detection scenarios, and they will make the detection of keypoints unstable and inaccurate. On the one hand, due to the wide noise distribution, some points will be incorrectly detected as keypoints, whereas some correct keypoints may not be detected; on the other hand, low contrast and grayscale distortion phenomena on acoustic images will directly affect the number of detected feature points. Inaccurate feature point detection results will directly affect the design of subsequent descriptors, affecting the final matching results.
Therefore, to address these problems, this manuscript proposed to apply the PC (phase congruency) principle to underwater acoustic imagery feature detection. The PC layer is not affected by image illumination and contrast, and is robust to noise interference; it also provides rich texture, edge, and structure information consistent with human vision, so using the PC layer in underwater acoustic imagery feature detection tasks has the suitable space and apparent advantages.
To more clearly and intuitively understand the effect of PC processing of underwater acoustic imagery, Figure 3 shows the PC layer and original grayscale layer of the raw SSS image while comparing them with the classical gradient layers, which involve the Sobel, Scharr, and Laplacian layers.

3.1. Phase Congruency Layer

Phase Congruency Solving

The principle of PC started in 1981 when Oppenheim and Lim [17] first revealed the effect of phase information on image features, and subsequent researchers have extended PC in various directions [18,19,20]. One of the more classical PC-solving algorithms in image processing is the PC based on the multiscale multidirectional log-Gabor filter proposed by Kovesi in 2003 [21]. The log-Gabor filter is expressed as Equation (1):
LG w ^ = exp ln w ^ / w ^ 0 2 2 ln σ w ^ / w ^ 0 2
assuming that I x , y denotes the underwater acoustic imagery, and M n o ¯ e and M n o ¯ o represent the even- and odd-symmetric log-Gabor wavelets at the n t h scale and o ¯ t h direction, respectively. The two wavelet functions are convolved with the image signal to obtain the corresponding components e n o ¯ x , y and o n o ¯ x , y , respectively, and the overall computational process is described as Equation (2):
e n o ¯ x , y , o n o ¯ x , y = I x , y × M n o ¯ e , I x , y × M n o ¯ o
The amplitude and phase of the underwater acoustic imagery after wavelet transform with scale n and direction o ¯ are as follows:
A n o ¯ x , y = e n o ¯ x , y 2 + o n o ¯ x , y 2
ϕ n o ¯ x , y = arctan o n o ¯ x , y / e n o ¯ x , y
Combining the analysis results in each direction, the PC solving of the underwater acoustic imagery can be obtained as Equation (5):
P C 2 x , y = n o ¯ W o ¯ x , y A n o ¯ x , y Δ Φ n o ¯ x , y T ^ n o ¯ A n o ¯ x , y + ξ
where W o ¯ x , y indicates the two-dimensional (2D) frequency expansion weight factor, and Δ Φ n o ¯ x , y is the 2D phase deviation function.

3.2. Classic Gradient Layers

3.2.1. Sobel Layer Solving

The Sobel operator is a discrete differentiation operator used to compute the approximate gradient of the image grayscale function. Furthermore, it is also a joint Gaussian smoothing plus differentiation operator with solid noise resistance.
Assume an underwater acoustic image I , derive it in both horizontal and vertical directions using the Sobel operator, and then perform a convolution operation on I applying a kernel of size K . To ensure the accuracy and operation speed, K is usually taken as 3, where G x and G y denote the convolution results in the horizontal and vertical directions, respectively, and the calculation formula is
G x = 1 0 + 1 2 0 + 2 1 0 + 1 × I
G y = 1 2 1 0 0 0 + 1 + 2 + 1 × I
Then, for any pixel on the underwater acoustic imagery, the gradient of its location can be calculated by using Equation (8).
G = G x 2 + G y 2

3.2.2. Scharr Layer Solving

The Scharr operator has enhanced its difference extraction capability based on the Sobel operator, and the solution principle is similar to that of Sobel. In general, the size of the edge detection filter of the Scharr operator is set to 3 × 3 , and the detection is also divided into two directions, which are calculated as Equations (9) and (10):
G x = 3 0 3 10 0 10 3 0 3 × I
G y = 3 10 3 0 0 0 3 10 3 × I
where I represents the input underwater acoustic imagery, and G x and G y indicate the image gradient values obtained by solving.

3.2.3. Laplacian Layer Solving

The Laplacian operator is a kind of a spatial sharpening filter, which is a second-order differential operator in n-dimensional Euclidean space with rotational invariance. This operator usually considers the point whose second-order differential point of the pixel point value is 0 as an edge point. For the underwater acoustic imagery I x , y , the Laplace solving process of the image is
2 I x , y = 2 I x 2 + 2 I y 2
To find the second-order derivatives of the image I x , y in the x and y directions, the second-order Laplacian operator expressing can be obtained as follows:
2 I x , y = I x , y + 1 + I x , y 1 + I x + 1 , y + I x 1 , y 4 I x , y
Finally, the filter mask of the Laplacian operator can be solved using Equations (11) and (12).
G 1 = 0 1 0 1 4 1 0 1 0
It can be seen from Figure 3 that the PC is well-adapted to the nonlinear features of underwater acoustic imagery. Specifically, it can well contain the structural feature information in the raw acoustic imagery, and the preservation of edges is more desirable, while it can also effectively resist noise interference. On the contrary, the operators’ performances based on gradient information are poor, and it is not easy to adapt to the nonlinear gradient features on acoustic imagery; it is also more sensitive to the interference of noise. In the next section, this research will quantitatively and qualitatively evaluate the performances of the feature detectors on various layers from experiments.

4. Experiment

4.1. Experimental Method Setting

In terms of experimental method settings, several mainstream feature detection operators, including single-scale detection operators, multiscale detection operators, and deep-learning-based detectors, are selected in this manuscript, as depicted in Table 1. The purpose of the experiments is to fully explore the best performance of the detection operators of various modes on the underwater acoustic imagery layers and to verify whether the deep-learning-based feature detector will bring some opportunities for underwater acoustic imagery feature detection task.
The above-mentioned feature detectors have their own advantages; however, underwater scenes are often complex and variable, resulting in detected sonar images that may have noise interference, grayscale distortion, translation transformations, rotation transformations, scale transformations, mixed rotation and scale transformations, viewpoint transformations, and slight distortions, which make it tough for a single detector to cope with. What is worse, these transformations are often nonlinear, irregular, and difficult to describe in mathematical language. Therefore, it is necessary to further explore a scheme for underwater acoustic detection scenarios that should concentrate on the advantages of existing detection operators and be able to effectively deal with the limitations of images themselves while meeting the real-time needs of underwater applications.
The detectors and layers set up for the experiments in this research are separately shown in Table 1 and Table 2. The detectors for sequences 1 to 9 are implemented using the benchmark OpenCV library [22], and a KeyNet [23] detector for sequence 10 is implemented using the benchmark Kornia library [24], both with default parameters to facilitate comparison and update for future researchers. In terms of the detection scene setting, three types of feature detection layers are selected, which are the grayscale, gradient, and PC layers, where the gradient layers are computed by the Sobel, Scharr, and Laplacian operators.

4.2. Dataset

To evaluate the detection accuracy and processing time of different detectors on various layers, a random dataset of underwater acoustic imageries was selected in this paper. These images were obtained from various types of SSS and FLS sensors, and they are depicted in Figure 4. For the internal target richness, the dataset can be classified into simple and complex scenes; in terms of external factors, it can be classified into general, scale-transformed, noise-interfered, grayscale-transformed, and other phenomena. The division of this dataset is specified in Table 3, and the test samples are described as follows:
  • This image was provided by the T-SEA Marine Technology company [32] and originated from an SSS device with an image size of 112 × 186. The content of this image is the wreck site without any pre-processing of the image.
  • Image 2 is a scale transformation of image 1 with a scale-transformation factor of 2. The transformed image size is 224 × 372, and the purpose of the transformation is to check the scale-invariant performances of the detectors.
  • Image 3 was provided by Peng Cheng Laboratory (PC LAB) [33] from an SSS device. The image size is 429 × 424, and the image content is a “ridge-like zone” on the seafloor. This image is used to test the detection ability of the detector for protruding targets in a monotonous background, without any pre-processing of the image.
  • Image 4 was provided by PC LAB from an SSS device. The image size is 337 × 425, and the image content is a seafloor protrusion with severe acoustic shadowing. This image is used to check the detection capability of the detector for targets with severe acoustic shadowing in a sandy background, without any processing of the image.
  • Image 5 was provided by PC LAB from an SSS device. The image size is 210 × 305, and the image content is a depressed area on the seafloor. This image is used to check the detection capability of the detector for depressed targets in a monotonous background, without any pre-processing of the image.
  • Image 6 was provided by PC LAB from an SSS device. The image size is 245 × 264, and the image content is a tubular object on the seafloor. This image is used to check the detection capability of the detector for tubular targets in a monotonous background, without any pre-processing of the image.
  • Image 7 was provided by PC LAB from an SSS device. The image size is 286 × 258, and the image content is a submarine tubular target with significant grayscale distortion. This image is used to check the detector’s detection capability for tubular targets under the influence of grayscale distortion, without any pre-processing of the image.
  • Image 8 was provided by PC LAB from an SSS device. The image size is 495 × 374, and the image content is a rocky area of the seafloor with strong noise interference. The image was denoised using a median filter in order to check the detector’s feature extraction capability under strong noise interference.
  • Image 9 is the original image of image 8 without any pre-processing of the image, the purpose of which is to compare with the test results of image 8.
  • Image 10 was provided by PC LAB from an SSS device. The image size is 446 × 371, and the image content is a seabed area with a lot of noise interference. The image is intended to complement the test scenes of test images 8 and 9. The targets in image 10 are mostly raised, and no pre-processing of the image has been performed.
  • Image 11, from [34], is an FLS image with an image size of 116 × 127. The image contains a tire laid underwater, and the image has significant noise interference. This image is used to check the detection ability of the detector under substantial noise interference, and the image is denoised using a median filter.
  • Image 12 is the original image of image 11, without any pre-processing of the image, which is intended to be compared with the test results of image 11.
  • Image 13, from [34], is an FLS image with an image size of 44 × 50. The image content is a steel frame laid underwater, without any pre-processing of the image.
  • Image 14 is obtained based on image 13 by scale transformation with a scale factor of 2. The transformed image size is 88 × 100. The purpose of the transformation is to check the scale-invariant performances of the detectors.
  • Image 15 is from the Sound Metrics database [35] and is a DIDSON (dual-frequency identification sonar) image with an image size of 351 × 359. The image content is a floor tile underwater, and the image is used to check the feature extraction capability of the detectors for DIDSON images without any pre-processing of the image.
  • Image 16 was provided by DeepVision AB company [36] from an SSS image device with an image size of 800 × 400. The image content is a low-texture, low-contrast depressed region of the seafloor. The image was designed to check the detectors’ feature extraction capability for low contrast and texture areas without any pre-processing of the image.
Next, some representative underwater acoustic imageries are taken to demonstrate the display of each layer to help the reader visualize the information represented by various layers. The demonstration results are described in Figure 5 and Figure 6.
Finally, the feature detectors in Table 1 were used for three gradient layers, one PC layer, and one grayscale layer. Quantitative and qualitative evaluation criteria were used to find the feature detector with the best detection performance, the layer best suited for feature detection, and the best combination of the two; these criteria will be given in the next section.

4.3. Evaluation Criterion

In terms of the evaluation criterion, this paper introduced the Number of keypoints (N), Precision (P), Distribution (D), and Time consumption (T). It is worth mentioning that these evaluation criteria are developed for the region of interest (ROI), not the whole image, which is mapped by experts in the field of underwater acoustics. The detection performance of the detector is reflected by comparing the number of features correctly identified within the ROI, the precision, and the distribution of feature points; the real-time performance of the detector is reflected by comparing the time consumed for feature point detection. In traditional optical image-matching studies, the metric used to evaluate the performance of detector is the feature repeatability, which requires prior knowledge of the image pair corresponding to the transformation relationship. However, due to the scarcity of underwater acoustic datasets, the corresponding image pairs cannot be effectively obtained, so the test dataset used in this manuscript is single, random, and irregular, including both SSS and FLS image data. Moreover, the image size is not adjusted and is kept in its raw state to maximize the fairness of the test.
Firstly, experts in the field of sonar image interpretation are requested to manually map the ROIs, as shown in Figure 7. These ROIs all have a clear detection value, and the closer the feature detection effect is to these regions, the better the detector’s performance. The keypoints that fall into these regions are considered valid feature points, and the detector’s performance is accordingly evaluated. In addition, this manuscript fine-tuned the thresholds of some detectors during the experiment to ensure that the number of their detection points will not explode, making the final evaluation results in a valid range.

4.3.1. Number of keypoints (N)

N is computed to evaluate the richness of the detection effect. It directly affects the starting quantity of subsequent matches and the accuracy of the solution of the transformation matrix.
N = i = 1 [ x i , y i R O I s ]
where x i , y i represents the coordinate point that falls within the ROIs.

4.3.2. Precision (P)

P is used as the criterion to evaluate the accuracy of detection effect, which is calculated as Equation (15):
P = N N a l l
where N a l l indicates the number of detected keypoints on the whole image.

4.3.3. Distribution (D)

The distribution criterion is used to measure the uniformity of the detection results, since an ideal feature detector should detect feature points that are uniformly distributed. For the distribution measure, the traditional histogram idea is used to divide the whole image into a set of uniform regions. In the experimental part, the width and height of each region are 0.1 of the entire imagery, and there are 100 regions in total. Since this work only considered the homogeneity of feature points inside the ROI, where the area is covered in red in the figure, only the feature points inside the ROI are counted, and only the regions ( b i n s ) overlapping with the ROI with an area larger than 0 are considered, while the rest of the regions are ignored, as shown in Figure 8. After counting the number of keypoints n i in each small area, the chi-square test is used to check the distribution of keypoints. Specifically, this part of the research proposed the hypothesis that the distribution of feature points in the H 0 : ROI follows a two-dimensional uniform distribution. Therefore, the probability p i that a feature point falls in each cell region is equal to the ratio of the area S i of the region to the total area S R O I of the ROI. Then the expectation of the number of feature points in each small area i is calculated as Equation (16):
E i = p i × N = S i S R O I × N
where N is the total number of keypoints in the ROI. From the above, the chi-square statistic can be calculated based on the actual distribution of keypoints and the expected distribution, which is computed as follows:
Χ 2 = i n i E i 2 E i
The probability is then calculated based on the value of the chi-square distribution function c d f at Χ 2 for a sample with degrees of freedom N b i n 1 . Then the final calculation formula is derived, where the significance level parameter was set to 0.05.
D = 1 c d f Χ 2 , N b i n 1

4.3.4. Time consumption (T)

T denotes the time consumed to detect per keypoint and is computed as
T = T a l l N a l l
where T a l l denotes the total time taken to detect all keypoints ( N a l l ) on the whole image.

4.4. Testing Environment

All tests were conducted under a 64-bit Windows 10 operating system with an Intel Core i7-10750 2.60 GHz processor, 16 GB of physical memory, and an NVIDIA GeForce RTX2060 graphics card. All algorithms were compiled using Python 3.7. To ensure the reliability of the test results and avoid the influence of coarse differences, the evaluation results were averaged over ten test results.

5. Experiment Results

The evaluation criterion can evaluate the detectors in four aspects: richness, accuracy, distribution uniformity, and timeliness, and the evaluation scope is more comprehensive. In order to more intuitively find the most suitable feature detector for underwater acoustic images, this paper proposed a normalized evaluation strategy that normalizes the first three evaluation criteria, the number of keypoints (N), Precision (P), and Distribution (D), into each index of the evaluation algorithm, and then multiplies them by their respective weights to calculate the final evaluation score. Each index is assigned a score (0–10) according to ranking order (0–10), and then the final score S is calculated as follows:
S = w i s i = w N s N + w P s P + w D s D
where s i is the score of each individual metric of the feature detector, and w i is the weight value corresponding to the individual metric. s N is the metric score obtained from the evaluation of the number of keypoints, w N is the weight value corresponding to s N , s P is the metric score obtained from the evaluation of precision, w P is the weight value corresponding to s P , s D is the metric score obtained from the evaluation of the distribution, and w D is the weight value corresponding to s D . S is the final score of the performance evaluation of each detector. Considering the practical applications in ocean engineering, the experts suggest that the w N , w P , and w D distributions take the values 0.35, 0.60, and 0.05, respectively. The results on each test image are presented in the form of a table, where, for the evaluation of gradient layers, it selected the one with the greatest Precision (P) among the three layers for presentation. The specific experimental results are shown in Table 4, Table 5, Table 6, Table 7, Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10, Table A11, Table A12 and Table A13. In order to accommodate the layout of the manuscript, three typical table results were selected for analysis in this section; please refer to Appendix A for the completed test results. The highest scoring items are marked with *.
Table 4, Table 5 and Table 6 are utilized as typical schematic tables to illustrate the statistical contents. The horizontal axis of each table is used to illustrate the performance comparison of detection layers, and the vertical axis of each table is used to illustrate the performance comparison of detectors. The performance evaluation metrics include the Number of keypoints (N), Precision (P), and Distribution (D). Finally, all the metrics are weighted to determine the optimal detector and detection layer.
From the statistical results in Table 7, it can be seen that the most suitable detectors for the underwater acoustic imagery are ORB and BRISK, which appear seven and six times, respectively. It is worth mentioning that both detectors use the detection algorithm of the FAST series.
In terms of detection layers, the PC layer has the best performance, appearing seven times, followed by the classic Sobel layer appearing five times. This result shows the PC layer’s great potential in underwater acoustic imagery feature detection.
Next, the paper evaluated the detection speed of various detectors and detection layers, and the results show that the FAST detector and PC layer have an apparent advantage. In comparison, grayscale layers are the slowest to detect. The specific detection time difference is demonstrated in Figure 9.
As shown in Figure 10, images with good visual perception of the detection results are also labeled as optimal items in quantitative statistical Table 4, Table 5 and Table 6, which are the quantitative evaluation results for images 6, 8, and 10, respectively, fully validating the validity and rationality of our proposed evaluation framework.
In the following, this manuscript will use the detection scheme that is the ORB detector with the PC layer, which achieves the best performance for the feature detection of the acoustic imagery dataset. The detailed detection effects are shown in Figure 10. In order to observe the details of feature detection, the feature points detected by the PC layer are mapped to the original image.
As shown in Figure 10, it can be seen that the ORB detector has an overall good detection effect on the PC layer, which can resist the noise interference on the underwater acoustic imageries, and the detected feature points can be accurately around the target area and more uniformly distributed. This detection method allows for effective feature detection on underwater acoustic imageries while meeting the need for real time.
In addition, the detection effect of images 1 and 2 shows that the scheme can resist scale transformation; the detection effect of image pairs 8 and 9, and image pairs 11 and 12 indicates that the scheme based on the PC layer does not need to denoise the acoustic images in advance and can achieve good results by directly detecting the raw images. It is worth mentioning that this almost noise-independent feature detection scheme can bring significant help to the underwater sonar target tracking tasks [37,38,39], which, in turn, can effectively enhance the autonomous capability of AUVs.
However, this detection scheme is currently unable to overcome the challenges posed by acoustic shadowing, as shown in test images 4, 7, and 16. Moreover, the detection results on small-scale forward-looking sonar images are also unsatisfactory, such as image 13. Both of these are the directions for continued research in the future.
In order to compare the detection effects of the detector on various layers, Figure 11 displays the detection effects of the ORB detector and grayscale layer, Sobel layer, Scharr layer, Laplacian layer, and PC layer combinations. In order to visually compare the detailed effects of feature detection, the detection performances on each layer are mapped to the original image. As can be seen in Figure 11, the PC layer has stronger noise resistance than the gradient layer, whereas richer feature points can be detected in the PC layer than in the grayscale layer. This detection scheme can achieve a good balance between the precision and velocity, making it more suitable for underwater sensing.

6. Discussion

In this manuscript, we designed feature detection experiments from several scientific perspectives, including the sonar-type differences, scale transformation, noise interference, detector design methods, and the properties of detection layers. In addition to the intuitive optimal results above, there are many valuable research insights that can be found from the experimental results: (1) KeyNet, a deep-learning-based feature detector, is not the optimal choice, which may be related to its lack of pretraining on acoustic datasets. In the future, if more underwater acoustic datasets can be collected and fine-tuned on the original model, the final detection results may be better. (2) Both the ORB and BRISK detectors perform well, and the underlying detection operators they used are based on the FAST derivation, which is very clever. In addition, the FAST detector achieved the best results in velocity evaluation, so developing better FAST-like detectors may present opportunities for underwater acoustic imagery feature detection. (3) Layers are the basis of feature detection. In the experiments of this paper, the best results were achieved by PC-layer-based detection, which achieves a balance between accuracy and speed. The PC layer has excellent resistance to noise interference and brightness variations, which is well-suited for applications in underwater acoustic matching tasks. In the future, if a feature detector based on PC layers can be developed, it will become more convenient to engage in underwater perception tasks. (4) The test results show that the combination of the ORB detector and PC layer is not perfect. They are not sensitive to the line features in the image, and how to enhance their line feature extraction capability will be essential work in the future because it will affect underwater applications such as underwater pipeline maintenance and underwater cable laying. (5) From the comparison of all the experimental results, the detection results of various detectors on side-scan sonar images are better than those on forward-looking sonar, which may be due to the higher resolution and better quality of side-scan sonar images, which are closer to the optical images in terms of image display effects. In the future, it is worth considering how to use deep-learning methods to achieve domain transfer between acoustic and optical images and reduce the imaging differences between the two, which will also provide an idea for underwater acoustic imagery detection and matching research.

7. Conclusions

This manuscript systematically investigated how to more effectively detect feature points on underwater acoustic imagery by enumerating the most commonly used detectors for experiments and comparing their effects on the grayscale, gradient, and PC layers. Tests were conducted on a dataset containing side-scan sonar and forward-looking sonar images, and a new evaluation framework was proposed. The experimental results show that the feature detection using the ORB detector and PC layer is the best, and this combined detection mode can ensure the detection accuracy while also reducing the interference of image noise, and has obvious advantages in terms of speed. It is worth mentioning that this manuscript is the first comprehensive proposal to use the PC layer for underwater acoustic imagery feature detection. Throughout the testing process, no assumptions were made on the sonar type and imagery attributes, and the test results proved that the evaluation framework proposed in this manuscript has good generalization performance.
The research work in this manuscript selected typical acoustic images in practical ocean applications, analyzed the characteristics of each image and classified them, and, finally, displayed the experimental report of each of them in detail to provide reference for practitioners. The conclusions can also provide new ideas for novices in the field of acoustic image processing and researchers who are proficient in image processing but do not understand acoustic images. However, there can be more detector attempts beyond the test methods set in this research, as well as the use of methods such as the hyperparameters to explore the optimal parameter settings for the detector instead of the default parameters. For these cases, the evaluation framework in this manuscript preserves the pipeline and deep-learning interfaces to facilitate testing.
In the future, the research will introduce deep-learning-based assessment metrics while expanding the underwater acoustic dataset to form a more comprehensive and objective assessment framework and open-source it in the community for researchers to use; moreover, it will consider the performance of feature detectors and feature descriptors when combined to conclude a complete solution for underwater acoustic imagery matching.

Author Contributions

X.Z. provided the idea. X.Z. and S.Y. designed the experiments. H.L. and C.Y. analyzed the experiments. X.Z. wrote the paper. X.Y. edited and proofread the article. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Chinese Shandong Provincial Key Research and Development Plan (Grant No. 2019GHZ011; 2021CXGC010702) and Wenhai Program of the S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology (Qingdao) (NO. 2021WHZZB2000).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors wish to thank the editors and anonymous reviewers for their valuable suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AUVAutonomous Underwater Vehicle
BRISKBinary Robust Invariant Scalable Keypoints
DIDSONDual-Frequency Identification Sonar
FLSForward-Looking Sonar
FASTFeatures from Accelerated Segment Test
ORBOriented FAST and Rotated BRIEF
PCPhase Congruency
ROIRegion of Interest
SIFTScale-Invariant Feature Transform
SURFSpeeded-Up Robust Features
SSSSide-Scan Sonar

Appendix A

The appendix is a supplement to the experimental results in Section 5 of this manuscript. The table format and test content are consistent with the schematic table in Section 5, and the test images are derived from the dataset presented in Section 4 of this paper.
Table A1. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 1.
Table A1. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 1.
LayersGrayscale LayerGradient Layer (Scharr) *PC Layer
Detectors NPDNPDNPD
AKAZE151.0000 0.3298 740.7708 0.1713 160.8889 0.2303
BRISK1800.5556 0.0005 6740.5570 0.0000 960.6713 0.0000
FAST1880.4574 0.0130 6500.4159 0.9488 1160.5604 0.0000
Harris300.3093 0.4894 11.0000 0.9846 200.3704 0.8499
ORB *1670.7076 0.0000 2320.8315 0.0000 1100.7746 0.0000
Shi-Tomasi360.3600 0.9899 360.3600 0.9994 360.3956 0.9919
SIFT900.4615 0.0953 1110.5139 0.2532 630.7326 0.0000
SURF1080.5482 0.9383 1660.5061 0.0176 620.6667 0.4383
SAR-SIFT510.9107 0.0000 20.2857 0.9624 2010.4752 0.0000
KeyNet720.4865 0.3438 910.4866 0.9121 760.5135 0.2934
Table A2. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 2.
Table A2. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 2.
LayersGrayscale LayerGradient Layer (Scharr)PC Layer *
Detectors NPDNPDNPD
AKAZE1830.6489 0.0000 4470.5992 0.0863 1070.5912 0.0000
BRISK *10960.4740 0.0000 31500.4827 0.0000 6450.5051 0.0000
FAST6250.3714 0.0000 23290.3971 0.3134 4030.4022 0.0000
Harris1300.2882 0.7285 11.0000 0.9153 1110.3801 0.1492
ORB3640.8125 0.0000 3570.7661 0.0000 3090.7087 0.0000
Shi-Tomasi430.4300 0.3483 390.3900 0.9779 420.4200 0.8083
SIFT5450.3929 0.2014 4220.4302 0.0000 3320.4094 0.0000
SURF4290.4400 0.5324 8100.4574 0.0870 3900.4407 0.3716
SAR-SIFT1200.8163 0.0000 300.8333 0.0000 8020.4284 0.0000
KeyNet3150.4280 0.6049 3800.4279 0.9222 3210.4315 0.2497
Table A3. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 3.
Table A3. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 3.
LayersGrayscale LayerGradient Layer (Sobel) *PC Layer
Detectors NPDNPDNPD
AKAZE240.9231 0.0007 480.4364 0.0000 300.3226 0.0000
BRISK *680.2500 0.0002 5460.2423 0.0000 3370.2334 0.0000
FAST250.1773 0.3382 3340.2001 0.1089 2350.2083 0.0001
Harris1030.1791 0.0276 1090.1542 0.9619 430.1937 0.0046
ORB1640.4505 0.0001 1880.3917 0.0000 1500.3119 0.0000
Shi-Tomasi190.1900 0.9040 180.1800 0.8117 160.1600 0.8694
SIFT660.4853 0.0001 2140.1994 0.0384 650.2471 0.0000
SURF2430.4812 0.0485 4180.2401 0.0918 2330.3324 0.0000
SAR-SIFT130.7222 0.0077 2730.1716 0.0000 5970.1743 0.0000
KeyNet2460.2064 0.9848 3000.1795 0.6008 2550.19410.6637
Table A4. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 4.
Table A4. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 4.
LayersGrayscale LayerGradient Layer (Scharr)PC Layer *
Detectors NPDNPDNPD
AKAZE60.0274 0.9868 4730.3546 0.0000 360.0859 0.0000
BRISK1720.0899 0.0000 39290.3714 0.0000 2950.1427 0.0000
FAST1230.0838 0.0000 30340.3308 0.0000 2250.1314 0.0000
Harris520.2158 0.0000 1110.5722 0.0000 520.1297 0.0002
ORB100.0207 0.0000 1410.2820 0.0000 230.0472 0.0000
Shi-Tomasi40.0400 0.9423 260.2600 0.8645 110.1100 0.7849
SIFT670.0847 0.0000 6720.3630 0.0024 1190.0869 0.0000
SURF1830.1801 0.0000 11130.3862 0.3628 1510.1465 0.0000
SAR-SIFT *1320.6839 0.0000 3270.8015 0.0000 7200.3124 0.0000
KeyNet3030.2803 0.0003 5720.3801 0.6949 2560.2681 0.0000
Table A5. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 5.
Table A5. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 5.
LayersGrayscale LayerGradient layer (Sobel) *PC Layer
Detectors NPDNPDNPD
AKAZE81.0000 0.6830 141.0000 0.5247 141.0000 0.0001
BRISK210.9545 0.0007 740.4744 0.0000 540.4186 0.0000
FAST51.0000 0.0811 310.4247 0.0193 310.3974 0.0331
Harris110.1170 0.6387 60.2727 0.7380 70.8750 0.1628
ORB *740.8409 0.0000 1650.5935 0.0000 1400.5072 0.0000
Shi-Tomasi110.1100 0.9012 120.1200 0.9505 110.1122 0.9780
SIFT70.8750 0.3071 260.4906 0.5564 210.7241 0.0204
SURF190.7600 0.9658 350.2258 0.6841 290.8286 0.3505
SAR-SIFT21.0000 0.0857 680.7556 0.0000 600.0338 0.1002
KeyNet240.1043 0.0080 280.0532 0.0130 270.0961 0.0294
Table A6. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 7.
Table A6. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 7.
LayersGrayscale LayerGradient Layer (Sobel) *PC Layer
Detectors NPDNPDNPD
AKAZE130.7222 0.0884 340.3148 0.0002 220.2095 0.0107
BRISK1620.1826 0.0000 3130.1389 0.0000 1450.1284 0.0000
FAST1060.1715 0.0000 2600.1284 0.0000 1150.1192 0.0000
Harris30.1034 0.3431 30.0667 0.9466 140.0680 0.1860
ORB *1050.3079 0.0000 1530.3533 0.0000 800.1882 0.0000
Shi-Tomasi160.1600 0.5171 190.1900 0.4193 170.1700 0.6678
SIFT430.1946 0.0000 710.1659 0.0000 730.0874 0.0000
SURF690.1364 0.0000 1040.1453 0.0000 650.1275 0.0000
SAR-SIFT121.0000 0.0004 260.3333 0.0000 1020.1067 0.0000
KeyNet450.1184 0.0292 750.1276 0.8836 500.1160 0.0029
Table A7. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 9.
Table A7. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 9.
LayersGrayscale LayerGradient Layer (Sobel)PC Layer *
Detectors NPDNPDNPD
AKAZE5290.6861 0.0000 12050.6646 0.0000 440.6875 0.0000
BRISK77470.6357 0.0000 10,4990.6281 0.0000 6760.6877 0.0000
FAST59640.6395 0.4485 86400.6219 0.9960 5290.7178 0.0000
Harris6960.6005 0.1376 240.3810 0.0005 670.6700 0.0000
ORB3600.7200 0.0000 3830.7660 0.0000 2860.6560 0.0000
Shi-Tomasi710.7100 0.2824 670.6700 0.0769 750.7500 0.0808
SIFT10600.6397 0.0000 13080.6384 0.0002 520.6500 0.0000
SURF *22110.6159 0.0000 25930.6149 0.0000 1410.7790 0.0000
SAR-SIFT1570.8870 0.0000 1570.8351 0.0000 27470.5889 0.0000
KeyNet12260.6284 0.9184 12850.6403 0.9594 7280.6547 0.0036
Table A8. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 11.
Table A8. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 11.
LayersGrayscale LayerGradient Layer (Sobel)PC Layer *
Detectors NPDNPDNPD
AKAZE21.0000 0.9330 51.0000 0.9475 211.0000 0.6758
BRISK *71.0000 0.0012 1251.0000 0.0000 1950.9949 0.0014
FAST61.0000 0.0001 1290.8543 0.0000 2000.7692 0.4320
Harris460.7667 0.3578 310.6327 0.8912 520.5417 0.9936
ORB141.0000 0.0660 1301.0000 0.0000 1771.0000 0.0000
Shi-Tomasi560.6154 1.0000 640.6882 1.0000 580.6824 1.0000
SIFT680.8831 0.2643 1870.7924 0.0001 2180.7676 0.0011
SURF880.9362 0.6021 1250.9615 0.4207 1050.9211 0.8922
SAR-SIFT180.8571 0.0000 1770.8009 0.0000 1790.7366 0.0000
KeyNet760.9383 0.0249 950.9048 0.0258 1150.9274 0.7302
Table A9. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 12.
Table A9. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 12.
LayersGrayscale Layer *Gradient Layer (Scharr)PC Layer
Detectors NPDNPDNPD
AKAZE61.0000 0.4100 601.0000 0.0573 161.0000 0.9797
BRISK *1231.0000 0.0000 7830.9962 0.0000 1961.0000 0.0040
FAST470.7966 0.0001 8110.7502 0.9989 1980.8082 0.6602
Harris590.6705 0.9429 11.0000 0.9112 430.6615 0.9870
ORB1181.0000 0.0000 2471.0000 0.0000 1391.0000 0.0000
Shi-Tomasi570.6404 1.0000 660.6804 1.0000 560.7000 1.0000
SIFT2200.7885 0.3889 980.8448 0.0001 1040.8125 0.0051
SURF1050.9906 0.1416 2710.9679 0.0134 1550.9810 0.1540
SAR-SIFT240.8276 0.0000 41.0000 0.0000 2070.7782 0.0000
KeyNet970.9604 0.6042 1260.9618 0.9298 1120.9492 0.4372
Table A10. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 13.
Table A10. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 13.
LayersGrayscale Layer *Gradient Layer (Laplacian)PC Layer
Detectors NPDNPDNPD
AKAZE01.00001.000001.00001.000001.00001.0000
BRISK61.0000 0.1759 151.0000 0.0385 41.0000 0.9495
FAST *311.0000 0.8846 640.7805 0.8736 390.9512 0.3238
Harris10.2500 0.9724 11.0000 0.9724 81.0000 0.3102
ORB01.00001.000001.00001.000001.00001.0000
Shi-Tomasi110.8462 0.8202 80.5714 0.9901 90.9000 0.2691
SIFT260.9630 0.2636 200.8333 0.0044 221.0000 0.3143
SURF31.0000 0.9135 71.0000 0.9678 51.0000 0.9197
SAR-SIFT161.0000 0.0000 171.0000 0.0000 320.6275 0.0000
KeyNet71.0000 0.1447 71.0000 0.0313 71.0000 0.0243
Table A11. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 14.
Table A11. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 14.
LayersGrayscale Layer *Gradient Layer (Scharr)PC Layer
Detectors NPDNPDNPD
AKAZE121.0000 0.8286 181.0000 0.7376 101.0000 0.9262
BRISK *821.0000 0.0019 3190.9785 0.0003 831.0000 0.0007
FAST580.8056 0.0523 3870.6355 0.9953 1040.7820 0.4632
Harris170.3864 0.7894 11.0000 0.9713 140.7368 0.9557
ORB551.0000 0.0000 911.0000 0.0000 491.0000 0.0000
Shi-Tomasi310.5536 0.9953 300.5357 0.1682 230.5897 0.9782
SIFT800.7692 0.8076 720.6486 0.0002 400.9756 0.0253
SURF520.9811 0.8918 1080.9818 0.9223 740.9610 0.2070
SAR-SIFT410.9318 0.0000 10.5000 0.9713 690.4157 0.0000
KeyNet510.9273 0.0497 580.8788 0.9141 440.9362 0.0040
Table A12. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 15.
Table A12. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 15.
LayersGrayscale LayerGradient layer (Sobel) *PC Layer
Detectors NPDNPDNPD
AKAZE201.0000 0.0000 560.9180 0.0000 3260.7342 0.2802
BRISK390.9070 0.0001 3440.8190 0.0000 11770.6386 0.3195
FAST60.8571 0.0862 1520.7415 0.0000 6240.5714 0.8864
Harris240.7500 0.1796 620.7126 0.0663 2030.5867 0.9995
ORB *2490.8384 0.0000 3980.8964 0.0000 3730.7460 0.0000
Shi-Tomasi730.7300 0.8915 710.7100 0.6492 460.4600 0.9765
SIFT1710.8953 0.0000 2800.7609 0.0000 10210.6713 0.4981
SURF2900.7160 0.2040 6390.6968 0.0041 5470.5295 0.4551
SAR-SIFT570.1926 0.0000 3860.2338 0.0000 5440.2981 0.0000
KeyNet3170.6967 0.0000 3590.5583 0.0079 4510.4713 0.2504
Table A13. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 16.
Table A13. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 16.
LayersGrayscale LayerGradient layer (Scharr) *PC Layer
Detectors NPDNPDNPD
AKAZE70.8750 0.4935 3730.2087 0.0000 260.2185 0.0244
BRISK *640.2560 0.0000 28820.2230 0.0000 2650.2221 0.0000
FAST330.2025 0.0000 22160.2014 0.0000 1890.2043 0.0000
Harris650.2038 0.0000 460.2738 0.0000 520.2023 0.0123
ORB910.2907 0.0000 680.1360 0.0000 1130.2369 0.0000
Shi-Tomasi210.2100 0.0045 170.1700 0.8357 170.1700 0.0764
SIFT330.6735 0.0002 4460.1739 0.0256 650.2273 0.0000
SURF1350.3890 0.0000 6730.2165 0.1812 1410.2117 0.0000
SAR-SIFT291.0000 0.0026 1230.2808 0.0000 5830.1462 0.0000
KeyNet2100.1888 0.0308 4090.2205 0.9114 2150.1955 0.0000

References

  1. Neira, J.; Sequeiros, C.; Huamani, R.; Machaca, E.; Fonseca, P. Review on unmanned underwater robotics, structure designs, materials, sensors, actuators, and navigation control. J. Robot. 2021, 1, 22–30. [Google Scholar] [CrossRef]
  2. Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis. 2021, 129, 23–79. [Google Scholar] [CrossRef]
  3. Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A review of multimodal image matching: Methods and applications. Inf. Fus. 2021, 73, 22–71. [Google Scholar] [CrossRef]
  4. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  5. Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the 2006 European Conference on Computer Vision, Berlin, Germany, 18–22 September 2006. [Google Scholar]
  6. Vandrish, P.; Vardy, A.; Walker, D.; Dobre, O. Side-scan sonar image registration for AUV navigation. In Proceedings of the 2011 IEEE Symposium on Underwater Technology and Workshop on Scientific Use of Submarine Cables and Related Technologies, Tokyo, Japan, 5–8 April 2011. [Google Scholar]
  7. King, P.; Anstey, B.; Vardy, A. Comparison of feature detection techniques for AUV navigation along a trained route. In Proceedings of the 2013 OCEANS-San Diego, San Diego, CA, USA, 23–27 September 2013. [Google Scholar]
  8. Daniel, S.; Le Léannec, F.; Roux, C.; Soliman, B.; Maillard, E. Side-scan sonar image matching. IEEE J. Ocean. Eng. 1998, 23, 245–259. [Google Scholar] [CrossRef]
  9. Khater, H.A.; Gad, A.S.; Omran, E.A.; Abdel-Fattah, A. Enhancement matching algorithms using fusion of multiple similarity metrics for sonar images. World Appl. Sci. J. 2009, 6, 759–763. [Google Scholar]
  10. Zhang, W.; Zhou, T.; Xu, C.; Liu, M. A SIFT-Like Feature Detector and Descriptor for Multibeam Sonar Imaging. J. Sens. 2021, 2021, 8845814. [Google Scholar] [CrossRef]
  11. Tao, W.; Liu, Y. Combined imaging matching method of side scan sonar images with prior position knowledge. IET Image Process. 2018, 12, 194–199. [Google Scholar] [CrossRef]
  12. Shang, X.; Zhao, J.; Zhang, H. Automatic overlapping area determination and segmentation for multiple side scan sonar images mosaic. IEEE J.-Stars 2021, 14, 2886–2900. [Google Scholar] [CrossRef]
  13. Tueller, P.; Kastner, R.; Diamant, R. A comparison of feature detectors for underwater sonar imagery. In Proceedings of the OCEANS 2018 MTS/IEEE Charleston, Charleston, SC, USA, 22–25 October 2018. [Google Scholar]
  14. Ansari, S. A review on SIFT and SURF for underwater image feature detection and matching. In Proceedings of the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 20–22 February 2019. [Google Scholar]
  15. Zhou, X.; Yu, C.; Yuan, X.; Luo, C. Matching Underwater Sonar Images by the Learned Descriptor Based on Style Transfer Method. In Proceedings of the 2021 2nd International Conference on Mechatronics Technology and Intelligent Manufacturing (ICMTIM 2021), Hangzhou, China, 13–15 August 2021. [Google Scholar]
  16. Oliveira, A.J.; Ferreira, B.M.; Cruz, N. A Performance Analysis of Feature Extraction Algorithms for Acoustic Image-Based Underwater Navigation. J. Mar. Sci. Eng. 2021, 9, 361. [Google Scholar] [CrossRef]
  17. Oppenheim, A.V.; Lim, J.S. The importance of phase in signals. Proc. IEEE 1981, 69, 529–541. [Google Scholar] [CrossRef]
  18. Ma, W.; Wu, Y.; Liu, S.; Su, Q.; Zhong, Y. Remote sensing image registration based on phase congruency feature detection and spatial constraint matching. IEEE Access 2018, 6, 77554–77567. [Google Scholar] [CrossRef]
  19. Li, J.; Hu, Q.; Ai, M. RIFT: Multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Trans. Image Process 2019, 29, 3296–3310. [Google Scholar] [CrossRef]
  20. Ye, Y.; Shan, J.; Hao, S.; Bruzzone, L.; Qin, Y. A local phase based invariant feature for remote sensing image matching. Remote Sens. 2018, 142, 205–221. [Google Scholar] [CrossRef]
  21. Kovesi, P. Phase congruency detects corners and edges. In Proceedings of the Australian Pattern Recognition Society Conference: DICTA, Sydney, Australia, 10–12 December 2003. [Google Scholar]
  22. Bradski, G. The OpenCV library. Dr Dobb’s J. Softw. Tools 2020, 120, 122–125. [Google Scholar]
  23. Barroso-Laguna, A.; Riba, E.; Ponsa, D.; Mikolajczyk, K. Key. Net: Keypoint detection by handcrafted and learned CNN filters. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
  24. Riba, E.; Mishkin, D.; Ponsa, D.; Rublee, E.; Bradski, G. Kornia: An Open Source Differentiable Computer Vision Library for PyTorch. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 14 May 2020. [Google Scholar]
  25. Alcantarilla, P.F.; Solutions, T. Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1281–1298. [Google Scholar]
  26. Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary robust invariant scalable keypoints. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 26 January 2011. [Google Scholar]
  27. Rosten, E.; Porter, R.; Drummond, T. Faster and better: A machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 105–119. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Harris, C.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988. [Google Scholar]
  29. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
  30. Shi, J. Good features to track. In Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 6 August 1994. [Google Scholar]
  31. Dellinger, F.; Delon, J.; Gousseau, Y.; Michel, J.; Tupin, F. SAR-SIFT: A SIFT-like algorithm for SAR images. IEEE Trans. Geosci. Remote 2014, 53, 453–466. [Google Scholar] [CrossRef] [Green Version]
  32. T-SEA Marine Technology Company. Available online: http://www.t-sea.com/ (accessed on 6 September 2022).
  33. Pengcheng Laboratory. Available online: https://www.pcl.ac.cn/ (accessed on 6 September 2022).
  34. Valdenegro-Toro, M. Improving sonar image patch matching via deep learning. In Proceedings of the 2017 European Conference on Mobile Robots (ECMR), Paris, France, 6–8 September 2017. [Google Scholar]
  35. Sound Metrics. Available online: http://www.soundmetrics.com/ (accessed on 6 September 2022).
  36. DeepVision AB Company. Available online: http://deepvision.se/ (accessed on 6 September 2022).
  37. Zhang, T.; Liu, S.; He, X.; Huang, H.; Hao, K. Underwater Target Tracking Using Forward-Looking Sonar for Autonomous Underwater Vehicles. Sensors 2020, 20, 102. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Kazimierski, W.; Zaniewicz, G. Determination of Process Noise for Underwater Target Tracking with Forward Looking Sonar. Remote Sens. 2021, 13, 1014. [Google Scholar] [CrossRef]
  39. Wawrzyniak, N.; Stateczny, A. MSIS Image Positioning in Port Areas with the Aid of Comparative Navigation Methods. Pol. Marit. Res. 2017, 24, 32–41. [Google Scholar] [CrossRef]
Figure 1. Status of publications related to sonar image feature detection since 2000.
Figure 1. Status of publications related to sonar image feature detection since 2000.
Jmse 10 01601 g001
Figure 2. Research pipeline proposed in this manuscript.
Figure 2. Research pipeline proposed in this manuscript.
Jmse 10 01601 g002
Figure 3. Schematic diagram of various layers of underwater acoustic imagery.
Figure 3. Schematic diagram of various layers of underwater acoustic imagery.
Jmse 10 01601 g003
Figure 4. Schematic diagram of experimental dataset, the serial numbers 1–16 in the figure caption correspond to the 16 samples above. For the analysis of their characteristics, please refer to the above.
Figure 4. Schematic diagram of experimental dataset, the serial numbers 1–16 in the figure caption correspond to the 16 samples above. For the analysis of their characteristics, please refer to the above.
Jmse 10 01601 g004
Figure 5. Display of various layers of some side-scan sonar images.
Figure 5. Display of various layers of some side-scan sonar images.
Jmse 10 01601 g005
Figure 6. Display of various layers of some forward-looking sonar images.
Figure 6. Display of various layers of some forward-looking sonar images.
Jmse 10 01601 g006
Figure 7. Schematic diagram of ROIs (regions delineated by red line) on tested images drawn by experts. The serial numbers 1–16 in the figure correspond to the 16 samples in Section 4.2.
Figure 7. Schematic diagram of ROIs (regions delineated by red line) on tested images drawn by experts. The serial numbers 1–16 in the figure correspond to the 16 samples in Section 4.2.
Jmse 10 01601 g007
Figure 8. Schematic diagram of Distribution (D) calculation. The area enclosed by the red line is the ROI to be calculated, and the detected feature points are represented by green circles. Image is offered by DeepVision AB company [36].
Figure 8. Schematic diagram of Distribution (D) calculation. The area enclosed by the red line is the ROI to be calculated, and the detected feature points are represented by green circles. Image is offered by DeepVision AB company [36].
Jmse 10 01601 g008
Figure 9. Comparison of average time consumed to detect single feature point.
Figure 9. Comparison of average time consumed to detect single feature point.
Jmse 10 01601 g009
Figure 10. Effects of feature detection on PC layer using ORB detector; feature points are highlighted with green. The serial numbers 1–16 in the figure correspond to the 16 samples in Section 4.2.
Figure 10. Effects of feature detection on PC layer using ORB detector; feature points are highlighted with green. The serial numbers 1–16 in the figure correspond to the 16 samples in Section 4.2.
Jmse 10 01601 g010
Figure 11. Comparison of detection effects of ORB detector on PC layer, grayscale layer, Sobel layer, Scharr layer, and Laplacian layer.
Figure 11. Comparison of detection effects of ORB detector on PC layer, grayscale layer, Sobel layer, Scharr layer, and Laplacian layer.
Jmse 10 01601 g011
Table 1. Introduction to detectors used in the experiment.
Table 1. Introduction to detectors used in the experiment.
SequencesNameTimeCreatorCharacteristic
1AKAZE [25]2011Pablo F AlcantarillaMultiscale detector
2BRISK 1 [26]2011S LeuteneggerMultiscale detector
3FAST [27]2010Edward RostenSingle-scale detector
4Harris [28]1988CG HarrisSingle-scale detector
5 ORB 2 [29]2011GARY R.BradskiMultiscale detector
6Shi-Tomasi [30]1994J.SHI & C. TomasiSingle-scale detector
7SIFT [4]2004LoweMultiscale detector
8SURF [5]2008BayMultiscale detector
9SAR-SIFT [31]2014Flora DellingerMultiscale detector
10KeyNet [23]2019Axel BarrosoLearned detector
1 Detector used by BRISK is FAST-9-16. 2 Detector used by ORB is Oriented FAST.
Table 2. Detection layers used in the experiment.
Table 2. Detection layers used in the experiment.
Layer NameGradient LayersPC LayerGrayscale Layer
OperatorSobelPCPixel
Scharr
Laplacian
Table 3. Characteristics division of dataset scenarios.
Table 3. Characteristics division of dataset scenarios.
CharacteristicsSimple ScenesComplex Scenes
General2, 119, 14
Scale-transformed113
Noise-interfered5, 128, 10
Grayscale-transformed6, 73, 4
Others1516
Table 4. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 6.
Table 4. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 6.
LayersGrayscale LayerGradient Layer (Sobel) *PC Layer *
Detectors NPDNPDNPD
AKAZE410.8913 0.0378 1000.5882 0.0217 460.5349 0.2422
BRISK5590.4815 0.0000 10880.4131 0.0000 4410.4608 0.0000
FAST3320.4618 0.0000 8360.3696 0.0038 3310.4206 0.0009
Harris90.4286 0.3444 110.2973 0.0539 660.3158 0.5465
ORB *3320.8448 0.0000 3220.7368 0.0000 3150.7360 0.0000
Shi-Tomasi430.4300 0.9462 430.4300 0.9879 500.5000 0.8790
SIFT1510.4535 0.3539 1840.4044 0.3060 1850.4780 0.0000
SURF2550.4374 0.0135 2990.3705 0.0390 2360.4574 0.0140
SAR-SIFT120.7059 0.0000 300.3409 0.0000 2950.3203 0.0000
KeyNet1590.3850 0.7262 1700.3020 0.5512 1600.3922 0.3252
Table 5. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 8.
Table 5. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 8.
LayersGrayscale LayerGradient Layer (Laplacian)PC Layer *
Detectors NPDNPDNPD
AKAZE2910.6063 0.0000 160.8000 0.0241 3340.6095 0.0000
BRISK6140.6221 0.0000 12540.6006 0.0000 22340.5935 0.0000
FAST1940.6978 0.0000 9240.6067 0.0000 14100.5975 0.0001
Harris1230.6089 0.0000 1050.6069 0.0000 3810.5443 0.0240
ORB *3440.6880 0.0000 2880.6973 0.0000 3240.6480 0.0000
Shi-Tomasi700.7000 0.0091 690.6900 0.0008 670.6700 0.6352
SIFT9500.5675 0.0000 770.5461 0.0000 13630.5623 0.0000
SURF11250.6134 0.0109 5220.5781 0.0000 14870.6165 0.0000
SAR-SIFT *1460.8795 0.0000 22780.6142 0.0000 23220.6180 0.0000
KeyNet10590.5970 0.9472 10210.6241 0.9671 11190.6159 0.0957
Table 6. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 10.
Table 6. Comparison of Number of keypoints (N), Precision (P), and Distribution (D) of image 10.
LayersGrayscale LayerGradient Layer (Scharr)PC Layer *
Detectors NPDNPDNPD
AKAZE4360.9316 0.0000 7070.8559 0.0000 450.9783 0.0059
BRISK78040.7963 0.0000 99920.7930 0.0000 6140.8899 0.0000
FAST55410.7354 0.0000 78100.7223 0.9939 3900.8387 0.0000
Harris7190.6548 0.0423 40.6667 0.7437 560.8000 0.0405
ORB *4910.9820 0.0000 4700.9514 0.0000 4320.9841 0.0000
Shi-Tomasi840.8400 0.0927 730.7300 0.2023 840.8400 0.0234
SIFT9940.7821 0.0000 12310.7245 0.2000 460.8519 0.0000
SURF24000.7846 0.0000 32920.7735 0.0000 1230.9111 0.0000
SAR-SIFT2030.9144 0.0000 31.0000 0.0000 30150.7320 0.0000
KeyNet11970.7389 0.9982 13650.7708 0.9445 6680.7687 0.0024
Table 7. Statistical table of results of experimental tests. (n) indicates that the number of times with optimal performance is n. Total number of tests is 16.
Table 7. Statistical table of results of experimental tests. (n) indicates that the number of times with optimal performance is n. Total number of tests is 16.
Test itemsDetectorLayer
BestORB 1 (7)PC (7)
Next bestBRISK2 (6)Sobel (5)
1 Specific detection operator is Oriented FAST. 2 Specific detection operator is FAST-9-16.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhou, X.; Yuan, S.; Yu, C.; Li, H.; Yuan, X. Performance Comparison of Feature Detectors on Various Layers of Underwater Acoustic Imagery. J. Mar. Sci. Eng. 2022, 10, 1601. https://doi.org/10.3390/jmse10111601

AMA Style

Zhou X, Yuan S, Yu C, Li H, Yuan X. Performance Comparison of Feature Detectors on Various Layers of Underwater Acoustic Imagery. Journal of Marine Science and Engineering. 2022; 10(11):1601. https://doi.org/10.3390/jmse10111601

Chicago/Turabian Style

Zhou, Xiaoteng, Shihao Yuan, Changli Yu, Hongyuan Li, and Xin Yuan. 2022. "Performance Comparison of Feature Detectors on Various Layers of Underwater Acoustic Imagery" Journal of Marine Science and Engineering 10, no. 11: 1601. https://doi.org/10.3390/jmse10111601

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop