In low-resolution remote sensing data, the target mainly presents unclear texture and blurred edges, and the gray distribution tends to be consistent. In addition, there are problems in sequence image frames, such as significant frame shaking and complex registration of sea surface scenes. At the same time, there is a considerable amount of data in the wide-field view, which is a great challenge to the real-time performance of data processing [
21]. Therefore, the detection of low-resolution wide-area remote sensing targets mainly relies on feature recognition and motion correlation. This section gives a multilevel discriminant detection algorithm to improve accuracy and reduce the false alarm rate. First, the basic algorithm framework is introduced in
Section 2.1. Then,
Section 2.2 discusses adaptive segmentation preprocessing and SVM false alarm elimination with simplified HOG features. Finally,
Section 2.3 discusses the target association of sequence images and the false alarm identification of sequence data based on S-YOLO.
2.1. Basic Framework
The basic framework of the sequence target detection algorithm is shown in
Figure 1, including four steps: preprocessing, feature discrimination, association discrimination, and S-YOLO discrimination.
The preprocessing steps include morphology processing, adaptive threshold segmentation, and connected domain area elimination of false alarms. Morphology processing is often used to enhance the edges of breakable objects. This framework adopts multiple iterative morphology processing to improve the adaptability of the enhancement processing to different complex scenes. The adaptive threshold segmentation uses the ratio of the target region to the mean of the noise region to judge the front and rear scenes, which has good adaptability to low resolution and large fields of view. Finally, the connected domain was labeled for the binary results, the area of each region was calculated, and the larger or smaller parts were eliminated.
For the remaining suspected targets, the features were identified. Considering that the parts are relatively simple and to ensure the real-time performance of the calculation, S-HOG (simplified HOG algorithm [
13]) was used for analysis. After removing the gamma correction step, the obtained first-level suspected target image was divided into 16 blocks, and eight angular direction histograms were calculated, respectively. The 9X32 S-HOG feature result was obtained by merging 2 × 2 adjacent blocks. Input S-HOG to binary nonlinear SVM [
14] to generate feature discrimination results and eliminate false alarm targets.
Target sequence association, through the position prediction and search of multi-frame sequence images, the suspected target, which cannot form trajectory information in each frame, was eliminated, and the trajectory information of the third-level suspected target was obtained.
In the S-YOLO identification process, the secondary suspected target is further refined through feature labeling and CNN network identification. Considering the small scale of the third-level suspected target slice, the simplified four-layer YOLO network [
15] was adopted to identify the associated sequence slice. The series was judged invalid when the number of false alarms in the sequence exceeded the limited threshold.
2.2. Preprocessing
In ship target detection, especially in low resolution and low signal-to-noise ratio image data, it is often difficult to distinguish the gray information of the target from the surrounding background, and it is difficult to eliminate the complex sea state interference information. At the same time, for wide-area and large-width images, the amount of data in a single frame is large, so it is challenging to ensure the real-time performance of the calculation by directly using high-performance and large-parameter CNN networks.
To solve the above problems, morphological enhancement was first carried out, and iterative TopHat transform was used to improve the contrast of the target edge relative to the background and reconstruct the target contour. Secondly, contrast-based binary segmentation was used to remove background noise information. Finally, the connected domain was labeled, and the regions with larger or smaller related domain areas were removed to obtain the first-level suspected targets.
I. Morphological Reconstruction
Define I as an image matrix of size
,
, then the iterative TopHat transformation process is as follows:
where
,
,
stands for square full 1 structure elements,
,
,
,
stands for the number of iterations,
and
, respectively, stand for forward and inverse top-hat transformation,
stands for expansion operation,
stands for corrosion operation,
and
, respectively, stand for weight parameters (the default is 1).
The result of morphological reconstruction is
.
Figure 2 shows the gray distribution of the ship target before and after reconstruction and the gray distribution of the cloud fragmentation target before and after reconstruction. It can be seen that after reconstruction, the ship target energy is more concentrated, the contour edge is easier to distinguish compared with the background noise, and the background interference noise is effectively suppressed. After reconstruction, the contour of the broken cloud target is also more apparent, and most false alarm targets can be eliminated by contour. In the reconstruction process, the number of reconstruction iterations needs to be set according to the target size. If the number of iterations is too small, the effect cannot be enhanced, and the computation will be increased if the number of iterations is too large.
II. Adaptive Threshold Segmentation
The enhanced image data
is segmented with an adaptive threshold. The detection process includes the information statistics of three-square Windows: target window
, protection window
, and background window
. The length parameters of three types of windows are
,
and
, respectively. The target window mainly includes the gray information of the target to be detected. The protection window mainly consists of the gray information between the target and the background, which is used to protect the dispersion part of the target from being counted into the background window. The background window mainly covers the sea surface noise information. The judgment basis for the target detected in the target window is:
where
is the mean value of the target window,
is the mean value of the background window,
is the comparison threshold, and
is the mean signal-to-noise ratio of the reconstruction result of the target region.
The result of the binary image obtained by threshold segmentation is:
After adaptive threshold segmentation, the weak target is extracted according to the contract, which reduces the amount of image data to be processed later and eliminates background noise interference on the suspected weak target. It can be seen in
Figure 3 that after adaptive segmentation, the dim target was significantly extracted, and the background noise was isolated.
III. Connected Domain Labeling
The connected domain set
is obtained by marking the connected domain of binary image
,
represents the number of regions,
represents the set of
coordinates of the
i-th region, and
represents the set of
coordinates of the
i-th region. In the annotation result, the small area is removed as a false alarm target, the large area is removed as cloud, land, and other targets, and the binary image
is obtained.
where
represents null,
and
represent the lower and upper limits of the area, respectively, and
represent the area of the
region.
After the connected domain labeling and connected domain area elimination, the first-level suspected target set ( regions) is obtained.
In the preprocessing process, considering that the target width is about three pixels, the size of the square all ‘1′ structure element
used in morphological reconstruction is
. The number of reconstruction iterations
is related to the mean signal-to-noise ratio (SNR)
of the reconstruction results of the target area. In addition, it is related to illumination conditions, imaging side-swing angle, target sea state, and other conditions. In practical applications, considering the working mode, the same area is often observed in the same period. Therefore,
is observed after processing a large number of targets in the region; as shown in
Figure 4, 80 groups of data were counted. When the number of iterations is 3,
reaches the optimum, and the optimal number of reconstruction iterations
is obtained. The threshold
in adaptive threshold segmentation is mainly related to the mean/variance of the sea state in the imaging area and the gray level of the sea surface background. Considering that the gray level distribution of the sea surface tends to be unified after reconstruction at low resolution, 80 groups of data were statistically analyzed and
was obtained. The connected domain was marked, and the suspected target area was calculated. The area eliminated the larger and smaller targets, and the upper and lower limits of the area were
and
, respectively.
2.3. Feature Identification
The feature identification process calculates the gray features of all the targets in the first-level suspected target set and classifies whether they are ship targets according to the elements. Among them, the simplified HOG feature (S-HOG) is used for gray feature calculation, and nonlinear binary classification SVM is used for binary classification.
I.S-Hog Feature Calculation
After preprocessing, simplified HOG enhanced the edge and gray distribution of the target, while gamma enhancement will increase the noise interference on the target. Therefore, the calculation process of S-HOG is mainly divided into the following steps (as shown in
Figure 5):
Step 1: For any region , take its centroid as the center, extract slice with a length of 52 pixels from the original image , and perform S-HOG feature calculation;
Step 2: Slice was segmented. The size of each cell was 13 × 13 pixels. The gradient features calculated for each cell were divided into 8 directions, and each cell had 8 feature values;
Step 3: The features of every four adjacent cells form a block, then there are 9 blocks in total, each block has 32 eigenvalues , and the eigenvector has 288 eigenvalues in total.
II. Nonlinear Binary SVM Calculation
The eigenvalue of single slice image is calculated by SVM binary classification. Considering the high feature dimension, nonlinear SVM is adopted. The training and decision calculation are divided into the following steps:
Step 1: First, input the training feature value and category set A, B is the category to be distinguished, ‘0′ represents the interference target, ‘1′ represents the ship target;
Step 2: Construct and solve the optimization formula
The constraint condition is
where
is the loss parameter,
is the Lagrange multiplier, and
is the radial basis kernel function:
According to Equations (5)–(7), the optimal Lagrange multiplier solution
is obtained, and the threshold
is calculated as follows:
Step 3: The final decision function formula is
Therefore, after feature identification of the first-level suspected target set
, the result of the second-level suspected target set
(the number of effective targets is
) is as follows:
2.4. Association Identification
The association identification mainly eliminates the interference false alarm target through the correlation of the target in the front and back frames. The assumption is that the target is considered to move at a constant speed in a short observation time, and the target position difference between adjacent frames is equal to the sum of the moving distance and the pose and orbit error between frames. Considering the randomness of pixel-level error between adjacent frames in orbit, a conservative circular search method is adopted in the inter-frame search process. The search radius is proportional to the sum of the moving distance, inter-frame pose, and orbit error. In the process of association discrimination, the more frames selected, the higher the detection accuracy, but it is also necessary to consider that the real-time performance of the calculation results will be affected in the case of multi-frame association.
The image in frame
is defined as
, and the target in frame
is marked as
, where
, the image in frame
is
. The coordinate
Tq(
p,
xq,
yq) of the target
in frame
is taken as the center of the circle and
as the radius to search for the target in the
frame. That is, the three-level suspected target set
in frame
(the number of effective targets is
) can be expressed as follows.
where
and
represent the Mahalanobis distance of two coordinate points. Therefore, if the target within the radius appears in three consecutive frames, the target is considered to exist; otherwise, the target is considered not to exist. In practical applications, the number of multi-frame association sequences of Formula (11) can be increased according to the real-time requirements to improve the confidence of target discrimination.
After association identification, the false alarm caused by the interference target, especially the broken cloud, can be eliminated well. However, it is also necessary to consider the missing detection of a frame caused by the cloud cover and sea state interference in the sequence association process and increase the tolerance appropriately to improve the adaptability of association identification.
2.5. S-Yolo Identification
After the three-level identification of false alarm elimination, cloud interference with movement association features similar to ship targets will still be broken. In this case, the detailed features of the image should be considered for further false alarm elimination. At the same time, at this time, the data is retained in the form of slices, and the amount of target and false alarm data decreases sharply. Therefore, the CNN method can be considered for target refined feature recognition.
In preprocessing, the segmentation of front and rear scenes results in insufficient target centroid positioning accuracy. The HOG feature cannot effectively describe the refined features such as low-resolution cloud ship edges and trailing traces. Therefore, a simplified version of the Yolo algorithm (S-YOLO) was designed in this section. The lightweight network was designed with Yolo V2 Tiny [
10] as the baseline for further feature recognition of the tertiary suspected target set
.
Table 1 shows the architecture of the S-YOLO network. The input was 52 × 52 image data, which mainly included three convolutional layers, a Maxpool layer, and one Yolo prediction layer, totaling 23,296 parameters. S-yolo outputs the target’s position, category, length, and width on the 52 × 52 slice image.
For
sequence set
of target
i in the sequence image, after S-YOLO identification, if the judgment result of the effective target in the sequence exceeds the threshold D, the sequence is considered as effective sequence and target, that is, the fourth level suspected target set
is calculated as follows: