3.2.2. Specific Algorithm Flow
The codebook algorithm proposed by Kim et al. firstly inputs the video frame sequence; then, it uses the idea of clustering to classify each pixel, adds codewords one by one, and updates the codebook regularly to obtain the final background model. It occupies small memory, and the intuitive principle of the method is simple, including background modeling and foreground detection. The following will briefly introduce them, respectively:
The codebook algorithm firstly uses the input
N video frames for training.
is used to represent that each pixel has an
N-dimensional RGB vector, and
is used to represent that each pixel has a codebook with
L codewords. The codebook model of each pixel contains different numbers of codewords according to the change of the sample. Each codeword
consists of an RGB vector
and a cell
with six elements, where:
where
and
, respectively, represent the average value of the color components of the pixels belonging to the codeword
,
and
represent the minimum and maximum light intensity values of the pixels belonging to the code
, respectively,
represents the frequency of occurrence of the codeword
,
represents the maximum interval in which the code word
is not reproduced during the training phase, and
and
represent the first and last time of the codeword
, respectively.
In the training phase, each pixel is compared with all the codewords in the current codebook to determine whether they can be matched to the codewords. The specific content of the algorithm is shown in Algorithm 1:
Algorithm 1 Codebook background modeling algorithm |
- Input:
N three-channel video frames, , , , - Output:
a background model - 1:
Initialize , set to empty set - 2:
for to N do - 3:
- 4:
for each do - 5:
if colordist(,)≦ brightness(I,<, >)=true then - 6:
- 7:
- 8:
else - 9:
- 10:
create a new by setting - 11:
- 12:
- 13:
end if - 14:
end for - 15:
end for - 16:
for each to L do - 17:
- 18:
end for
|
in the algorithm is a manually given threshold, and the two judgment conditions represent the color distortion and brightness limits, respectively. For the input pixel
and codeword
, there are:
The color distortion can be calculated as:
In order to allow the change of brightness during detection, the upper and lower limits of brightness have been given:
where
is usually between 0.4 and 0.7, and
is usually between 1.1 and 1.5. Then:
In the background update stage, according to the maximum interval time to remove the code words that have not appeared for a long time, the improved background model
M is obtained:
where
is the threshold, set to half the number of training frames, that is
.
When the background model in the video scene is obtained, the codebook foreground detection algorithm can perform background subtraction
operation on the new pixel input in the test set, which is fast and intuitive, as shown in Algorithm 2.
Algorithm 2 Codebook foreground detection algorithm |
- Input:
pixel containing (R, G, B) components, , M - Output:
type of pixel - 1:
- 2:
for each do - 3:
if then - 4:
- 5:
- 6:
return (background) - 7:
else - 8:
return (foreground) - 9:
end if - 10:
end for
|
is the threshold for detection; if the pixel is not matched to the code word in the background model, the pixel is defined as foreground; otherwise, it is background.
Although the detection speed of the original codebook algorithm is fast, color information is taken into account, and the upper and lower limits of light intensity are set to reduce the interference caused by light changes, the experimental results show that the effect is average. As shown in
Figure 4, the highway video frame sequence in the baseline category in the CDNet2014 dataset [
36] is used to evaluate the target detection effect of the original codebook algorithm.
It can be seen from
Figure 4 that the original codebook algorithm can obtain the detection results quickly and can basically detect the approximate contour of the moving object. However, due to factors such as the leaves on the roadside swaying with the wind and changes in light intensity in the video, the original algorithm shows a large number of false detections at the trees beside the road on the upper left, and there are many missed detections inside the car. In addition, the shadows of the cars have the same trajectory as the cars and lead to a certain degree of adhesion between the shadows and the cars in the detection results, and more noise points appear elsewhere.
Considering that the shooting of the depth image is independent of brightness and color, this paper adds depth information to the codeword to expand the dimension of the codeword, and it optimizes the output result graph to reduce noise. Therefore, this paper proposes a codebook algorithm based on RGB-D to improve the performance and robustness of the algorithm. The algorithm includes background modeling and foreground detection.
The algorithm proposed in this paper adds one-dimensional depth information into the RGB component
of the original codeword to convert into an RGBD vector, namely
. Then, we add the minimum and maximum depth values
and
of all the pixels belonging to codeword
to the original six-dimensional tuple to make
an eight-dimensional tuple, namely
. In addition, based on the two judgment conditions of background modeling and foreground detection algorithm, the depth deviation limit is added, and its definition refers to the light intensity limit, that is:
where
where
is usually between 0.4 and 0.7,
is usually between 1.1 and 1.5. If the boundary range is too large, some foreground points will not be detected because the depth value changes little; if the range is too small, due to the quality problem of the depth map of the data set, the noise points whose depth values change are mistakenly detected as moving objects. Adding
to the formula takes into account that the depth of the moving object may be empty. In the judgment, the algorithm proposed in this paper not only takes the intersection after comparing color and depth changes separately but also fuses the two in order to reduce the interference of shadow and light intensity changes with color changes as the dominant one. The judgment conditions are set as
and
as follows:
where
is another threshold value greater than
,
is the value when
takes another set of
and
, making the upper and lower limits larger. The movement of shadow and the change of illumination will inevitably change the color and light intensity of the corresponding pixel, but the depth value of the pixel is not affected. Therefore, these two equations show whether the change of depth value exceeds the threshold as an auxiliary. Although the color and brightness of the pixel change, the depth change is still within the set range, which largely excludes the possibility that the pixel is a moving object.
,
,
is updated in the same way as
,
,
, where three conditions are required to determine whether a pixel
has a matching code
, namely:
It can be seen from the detection results of
Figure 4 that there are some noise points in the original codebook algorithm in addition to the detected foreground objects. Therefore, this paper uses morphological methods to remove the redundant noise points. Firstly, the erosion operation in the basic morphological operation of the binary image is used to remove the generated noise points. The erosion operation ablates the boundaries of objects according to the size of the structural elements, so that objects smaller than the structural elements disappear, and therefore, all noise points that do not fully contain structural elements can be removed. However, the disadvantage of erosion operation is that it will also change the shape of the foreground object to a certain extent. As shown in
Figure 5b after erosion, although the sporadic noise spots are completely eliminated, the detected hand part is also eroded and thinned. Therefore, the morphological reconstruction is used to restore the image. The reconstruction involves two images, one of which is used as a marker and the other as a mask. The reconstruction method can accurately recover the image before the target area is eroded and achieve the purpose of removing the noise without changing the detection result of the foreground object. The results are shown in
Figure 5c. In this paper, the morphological erosion and morphological reconstruction functions of MATLAB are used.