Next Article in Journal
A Symmetry Concept for the Self-Assembly Synthesis of Mn-MIL-100 Using a Capping Agent and Its Adsorption Performance with Methylene Blue
Previous Article in Journal
A Study on the Various Aspects of Bounce Realisation for Some Choices of Scale Factors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast Image Block Matching for Image Detection and MPEG Encoding Based on Underlying Symmetries

Department of Computer Engineering, Hanyang Cyber University, 220, Wangsimni-ro, Seongdong-gu, Seoul 04763, Republic of Korea
Symmetry 2023, 15(7), 1333; https://doi.org/10.3390/sym15071333
Submission received: 28 May 2023 / Revised: 25 June 2023 / Accepted: 28 June 2023 / Published: 29 June 2023
(This article belongs to the Section Engineering and Materials)

Abstract

:
Image block matching is one of the representative methods for image detection and motion compensation in MPEG. Block matching between two images is a problem of finding symmetry between two images by matching macro blocks that are symmetrical to each other in two given images. The greater the PSNR value, the greater the symmetry of the two images. In the given two images, the two macro blocks with the minimum matching error values are regarded as symmetrical to each other. The classical method of calculating the matching error function for every pixel in the entire search area and choosing the smallest of them guarantees global convergence but requires a lot of computation, especially for large intensities. For this reason, many sparse search methods have been developed to reduce the amount of computation. In this paper, we introduce a gradient descent vector optimization algorithm with guaranteed global convergence to the image block matching problem by utilizing the conceptual symmetry of the vector optimization problem, which is a continuous variable, and the image block matching problem, which is a discrete variable. By blurring the image, we transform the matching cost function closer to being unimodal so that the descent-type algorithm works well. As a result, although the proposed method is simple, it can reduce the amount of computation remarkably and has more robustness for the large displacement of image blocks compared to existing sparse search methods.

1. Introduction

In geometry, left–right similarity is called symmetry, and in physics, invariance (lack of change) is called symmetry. By generalizing this concept a little more, in this paper, the meaning of symmetry refers to something that has similarities no matter what point of view/space. From this point of view, a form in which similar modules are repeated, such as block matching, is a representative example of symmetry. Moreover, the mutual similarity between the vector space optimization problem and the block matching in the image space, which is the basis of the proposed algorithm, is a representative example of symmetry between the two spaces. From this point of view, a module that is less similar to its surroundings is an example of breaking symmetry.
In view of this symmetry, we first analyze the existing block matching methods, and based on this, we propose a speed-up method that makes use of the symmetry of block matching. Block matching has highly symmetric properties in the respect of the repetition of similar modules. This block-based approach divides the image into macro blocks and finds each macro block within a specified search window. This approach uses the pixel values of the image to obtain the block’s position difference. Block-based motion estimation has been adopted in all the existing video coding standards [1,2,3]. This is achieved by allowing blocks from currently coded frames to be matched with ones from reference frames. A metric for matching a macro block with another block is based on a cost function. The most popular of the cost functions for block matching are MAD (Mean Absolute Difference) and MSE (Mean Squared Error). Block matching algorithms for motion estimation have been broadly adopted by current video coding standards such as H.264/AVC MPEG-4/AVC [4] and HE C [5] due to their high compression efficiency and implementation simplicity. Block matching algorithms are also employed for hardware realization due to their regularity and simplicity [6,7,8].
Conventional ES (Exhaustive Search) algorithms adopt FS (Full Search) that divides the current frame into blocks, and each block is searched for the best match within a search window in the reference image. For each block, a motion vector corresponding to that displacement is associated [9].
Because ES calculates the cost function for block matching at each possible location in the search window, this leads to the best possible match of the macro block in the reference frame with a block in another frame. The resulting motion-compensated image has the highest peak signal-to-noise ratio as compared to any other block matching algorithm. However, this is the most computationally extensive block matching algorithm among all of them. A larger search window requires a greater number of computations. Thus, ES is not ready for real-time applications [10].
There are several sparse search approaches to reducing the computational complexity of ES. In these approaches, the motion estimation process is performed using SS (Sparse Search) algorithms instead of FS (Full Search). They search for the best motion vectors in a sparse (coarse-to-fine) search pattern. Based on analysis of the frequency of motion vector in real-world image sequences, the evolution of the matching error shows a decrease as the search moves toward a global error minimum position. This method of gradually finding the global minimum is based on the assumption that the error surface due to motion in every macro block is unimodal. A unimodal surface is a bowl-shaped surface such that the weights generated by the cost function increase monotonically from the global minimum.
Starting this SS (Sparse Search) is TSS/3SS (Three Step Search) [11]. TSS adopts a coarse (large step size)-to-fine (small step size) search method, expressed as: ‘The search starts from the center with step size S = 4 and search parameter value of 7 where the search parameter is the number of pixels on all four sides of the corresponding macro-block in the previous frame. We have to search for these eight locations +/− S pixels around location (0, 0). From these nine locations it picks the one giving least cost and makes it the new search origin. In the next step, let the new step size S = S/2 and repeat the search procedure until S = 1. The resulting location for S = 1 is the one with minimum cost function and the macro block at this location is the best match.’
After TSS, algorithms such as NTSS (New Three Step Search) [12], 4SS (Four Step Search) [13], SES (Simple and Efficient Search) [14], DS (Diamond Search) [15] and SD (Star-Diamond Search) [16], which improved TSS, were proposed sequentially. Although they have improved some of the features of TSS, they commonly adopt the coarse-to-fine search method, which is the basic strategy of a sparse search such as TSS. Therefore, they have the disadvantage of a coarse-to-fine search in common. The goal of this paper is to supplement the common disadvantages of the above-listed sparse search methods; that is, the disadvantages of the coarse-to-fine search. Therefore, we would like to use the algorithm with the best performance among them as a representative and benchmark the proposed method. Studies [17,18] provide a detailed performance comparison of Sparse Search methods.
From the literature, it was confirmed that the ES method was the most accurate but the most computationally intensive, and the SD method was the most efficient. In other words, SD is known to produce results with similar accuracy to ES with the smallest amount of computation. Based on the analysis of the literature, ES and SD were selected as benchmarking approaches for the proposed method in the Results section.
The goal of this paper is to develop a block matching algorithm that achieves an accurate result closer to ES with less computational effort than SD without compromising the symmetry characteristics of block matching (without adding an asymmetric process or module). Existing sparse search algorithms, including SD, are a type of descent algorithm that use a variable step size (coarse-to-fine search) method to find a direction in which the matching cost decreases. For this reason, existing sparse search algorithms implicitly assume the unimodal matching cost, which is the basic condition of gradient descent.
However, it should be noted that real images generally do not satisfy the unimodal matching cost condition. Moreover, there is a risk of deviating from the optimal descent direction from the beginning due to the coarse-to-fine search strategy of the existing sparse search algorithm. Due to these problems, existing sparse search algorithms have eventually become less accurate than ES, and unnecessary trial and error may be repeated even in the process of reaching a convergence point.
In order to compensate for these disadvantages, in this paper, we intend to develop a descent algorithm in a more systematic way. To this end, by using symmetry (similarity) between the optimization problem of vector space and the block matching problem of an image, we introduce the gradient descent algorithm of vector space, which is a continuous variable that guarantees global convergence, to the block matching problem of an image, which is a discrete variable. The proposed algorithm pays attention to the following points.
First, we design the proposed algorithm to be applicable regardless of the shape of the matching cost function and even if the gradient of the image is not given. To do this, we reinterpret the process of the gradient descent algorithm, which calculates the gradient and finds the direction of descent, as follows: Compare the matching cost values in eight neighboring pixels, and move one pixel to the pixel with the minimum matching cost.
As such, the proposed algorithm compensates for the disadvantage that the initial coarse search of the sparse search, including SD, may miss the optimal descent direction by consistently using only a dense search with a step size of 1 from the beginning.
Second, in order for the proposed descent algorithm to work well, the image is blurred and transformed into a unimodal form. By satisfying the unimodal assumption in this artificial way, errors in sparse search caused by the difference between the unimodal assumption and the actual image can be prevented.
Through this improvement, the proposed descent algorithm was able to obtain a smaller computational amount than SD and a more accurate accuracy approaching ES. In addition, the proposed algorithm is more accurate than SD and comparable to ES in not only the frames adjacent to the reference frame with a small change in block position, but also in frames distant from the reference frame with a large change in block position. This result is useful when there is a large difference in the position of a block between two adjacent image frames due to the high speed of the object. Furthermore, it is shown that there is no need to perform motion estimation for every two adjacent video frames, and motion estimation can be performed by skipping several frames. In addition, it shows that it can be used for fast image detection within a limited range.

2. Proposed Descent Line Search Algorithm for Image Block Matching—Utilization of Symmetry between Vector Space Optimization Image Block Matching

In this section, we develop a method for effectively improving the matching speed while maintaining the symmetry of block matching. In order not to break the symmetry of block matching, we adopt a sparse search method, which is a method that increases computational efficiency without adding an asymmetric preprocessing process or module. Therefore, the final goal of this section is to develop a method for improving search speed that is more effective than SD, which is known to have the highest performance among sparse search methods. We try to find a clue to developing such an algorithm by using the hidden symmetry between the optimization problem in vector space with continuous variables and the block matching problem in images with discrete variables.
We try to find a clue to developing such an algorithm by using the hidden symmetry between the optimization problem in vector space with continuous variables and the block matching problem in images with discrete variables. Utilizing the symmetry of these two different spaces, we try to apply the line search optimization algorithm in vector space to the block matching problem in the image.
Figure 1 shows concept of block matching and motion (displacement) vector. Block matching is to locate matching macro blocks in image frames for the purpose of motion estimation. Motion vectors (the vector of displacement) between two frames are estimated by subdividing a frame into blocks of identical size and supposing that all pixels of the same block have same displacement, as shown in Figure 1. The criterion for matching a macro block with another block is based on a cost-matching function.
The matching cost function is defined as a metric that represents the difference between the image intensities of the reference block and the current block. Therefore, the matching cost function takes the form of a function having the motion (displacement) vector d R 2 as an argument. Denoting the cost function as J ( d ) , the block matching problem is reduced to an optimization problem:
Problem 1. Image Block Matching Problem
Find the motion vector d R 2 that minimizes the matching cost function
J ( d ) R
Here, the reason why the cost function J ( d ) R is not specifically described is to emphasize that the proposed algorithm is applicable regardless of the type or shape of the cost function).
A conventional direct way to find d R 2 that minimizes the matching cost function J ( d ) R is to use the full search algorithm described in Algorithm 1.
Algorithm 1. Full search algorithm (often called ES: Exhaustive Search)
  • Step 1:
  For each row on the search image
   For each column on the search image
   For d = (row, column), compute the matching cost function J(d) ∈ R
  END FOR
 END FOR
  • Step 2: IF the matching cost function is minimum for (row*, column*) THEN
      d* = (row*, column*) is the matching position.
     END IF
Because full search for every column and row requires too much computational load, better efficient search algorithm is preferable. For this purpose, we draw attention to the following fact: ‘block matching approach can be considered as an optimization problem where J ( d ) R is minimized with respect to d ’.
Using this similarity between the image block matching problem and the vector space optimization problem, we try to apply the vector space optimization algorithm, e.g., line search algorithm to the image block matching. Among such line search algorithms, the gradient descent algorithm is often used because it assures global convergence, i.e., convergence is assured for any choice of starting position. To this end, let us first look at the gradient descent algorithm (Algorithm 2).
Algorithm 2. Contemporary gradient descent algorithm for a smooth function J(d) of continuous variable d
  • Step 1: Choose any value of d as starting position
  • Step 2: Compute
d n e x t = d η ( J / d )   where   J / d   is   the   gradient   of   J ( d )   and   0 < η < 1
  • Step 3: If ‖∂J/∂d‖ is less than the convergence condition, then iteration stops.
    Otherwise, let d = dnext and go to Step 2.
The gradient descent algorithm (Algorithm 2) is only applicable to smooth functions J ( d ) of continuous variables d , but image block matching does not satisfy this necessary condition. To make the image block matching problem satisfy the smooth function requirement, we blur the intensity maps of reference and matching images. Equation (2) contains a continuous variable and a gradient with non-integer position values. However, the position of the image is not continuous but discrete at 1-pixel intervals. To avoid this problem, we reinterpret the meaning of Equation (2) from a different perspective. Equation (2) can be understood as each iterative step proceeding in a direction in which the matching cost is minimized. Considering that the position variable in the image is discrete by one pixel, Equation (2) can be modified as described below.
Each iteration step in Algorithm 2 can be interpreted as a search process that proceeds to one of the 8 connected neighboring pixels for which Equation (1) is minimized. Algorithm 2 guarantees convergence for all starting positions, but convergence can be faster if the starting point of the iterative step is well chosen near the convergence point. Applying the same principle to images, block matching for consecutive images can predict the position in the next frame, and in this case, the predicted position is selected as the starting position of the search. If the expected position cannot be estimated, the position of the given reference block in the reference image is selected as the starting position of the search. Based on this reinterpretation of Algorithm 2, the proposed algorithm can be summarized as Algorithm 3.
Algorithm 3. Proposed descent algorithm for image block matching
  • Step 1: Blur image intensity maps of reference and matching images.
  • Step 2: Choose initial d where to start iteration. For continuous images, select the block matched position of the previous target image as the starting position of block matching of the current target image. If the starting position cannot be estimated, select the position of the reference block in the reference image as the starting position for the search.
  • Step 3: For the 8-neighboring pixels (described in Equation (3)) connected to d, compute the value the matching cost function J(d)
     
d 1 = d + ( 1 , 1 ) ,   d 2 = d + ( 1 , 0 ) ,   d 3 = d + ( 1 , 1 ) ,   d 4 = d + ( 0 , 1 ) , d 5 = d + ( 0 , 1 ) ,   d 6 = d + ( 1 , 1 ) ,   d 7 = d + ( 1 , 0 ) ,   d 8 = d + ( 1 , 1 )
  • Step 4: Choose d n e x t = arg ( min ( J ( d 1 ) , J ( d 2 ) , , J ( d 8 ) )
  • Step 5: If J(dnext) = J(d) then stops. Otherwise, let d = dnext and go to Step 3.
All iterative steps in Algorithm 3 proceed in the same gradient descent direction as in Algorithm 2. Therefore, Algorithm 3 guarantees global convergence as with Algorithm 2.
Another thing to note is that Algorithm 2 uses a gradient to find the direction of descent. Therefore, it is applicable only when the gradient of matching cost function J ( d ) can be calculated or the value of the gradient field of the image is given. On the other hand, Algorithm 3 calculates and compares matching cost values instead of using gradients to find the downward direction. Therefore, the matching cost function can be applied regardless of what form it is given, and a gradient of the image is not required.
Algorithm 3 can be applied regardless of the form of the matching cost function J ( d ) , but in the Results section, the most used SSD was selected for quantitative performance analysis. The SSD (sum of squared differences) used in the Results section is described in Equation (4).
J ( d ) = q i ( I ( q i + d ) T ( q i ) ) 2
where q i , i = 1 , , M are the image positions (in pixel coordinates) of the reference image and M is the number of image positions. T and I are image intensity maps of the reference and current (matching) images, respectively. In the literature, the SSD is commonly known as block matching approach [19].
In block matching, the reference frame and the matching frame are adjacent frames of the video, so the difference in motion vectors is usually considered insignificant. However, in the case of a fast object, the motion vector may be large even in adjacent frames. In order to design a block matching algorithm that works well even when the motion vector is large, the Results section of this paper evaluates the performance from matching frames adjacent to the reference frame to matching frames far from the reference frame. As can be seen in the Results section, since Algorithm 3 uses a blurred image, it has been confirmed that it works relatively well even with a larger motion vector compared to other contemporary sparse search algorithms.

3. Conceptual Depiction of Proposed Algorithm (Algorithm 3)

In this section, in order to understand the idea of the proposed gradient descent line search algorithm (Algorithm 3) more easily, we compare it to a one-dimensional problem.
Figure 2 shows a conceptual depiction of a one-dimensional metaphor of Algorithm 3. Figure 2a,b shows image intensities for each pixel coordinates of the original search image and original reference block. Figure 2c shows the matching cost (SSD) values for each displacement (motion vector) of the original reference block in the original search image. As seen in Figure 2c, SSD values are not smooth, nor unimodal. Therefore, the descent algorithm cannot be applied. To solve this problem, Figure 2a,b is blurred. Figure 2d,e shows image intensities for each pixel coordinate of the blurred search image and the blurred reference block. Figure 2f shows SSD values for each displacement (motion vector) of the blurred reference block in the blurred search image. As seen in Figure 2f, SSD values of the blurred image are smooth and close to unimodal. Therefore, the proposed descent algorithm can be applied.
Figure 2g shows how the descent algorithm approaches a match position, i.e., minimum of SSD. Compute the SSD values for the pixels that are neighboring (i.e., one pixel away) to position P2, i.e., position P1 and position P3. Because J3 (SSD value at position P3) is smaller than J1 (SSD value at position P1), the current search position P2 moves to position P3. At position P3, compute SSD values for the neighboring pixels, i.e., position P2 and position P4. Because J4 (SSD value at position P4) is smaller than J2 (SSD value at position P2), current search position P3 moves to position P4. Repeating this procedure, the current search position moves to a match position (minimum of SSD values).
As seen in this example, Algorithm 3 moves with step size 1 (one by one pixel) in the descent direction of SSD values. To find the descent direction, Algorithm 3 does not require a gradient. Instead, Algorithm 3 computes the matching cost values for neighboring pixels and then chooses minimum values. The direction vector from the current pixel position to the neighboring pixel position with the minimum SSD value is the gradient descent direction.
The reason we can find the descent direction without using a gradient is due to the fact that the image position is a discrete variable with a distance of one pixel, rather than a continuous variable. Due to the discrete coordinate characteristics of the image, eight pixels neighboring the current pixel can be specified, and the minimum value can be found by directly calculating the matching cost for the specified eight neighboring pixels. In addition, the direction of moving to the neighboring pixel corresponding to the minimum matching cost is the descent direction in which the matching cost decreases. Since Algorithm 3 uses this method to find the direction of descent, it does not require a gradient. Therefore, Algorithm 3 can be applied even when the gradient of the matching cost cannot be calculated or the gradient field of the image is not given.

4. Conceptual Differences between Proposed Descent Line Search Algorithm and Contemporary Algorithms

Although many improved versions of TSS/3SS (Three Step Search) have been proposed, they commonly adopt the coarse-to-fine search method, which is the basic strategy of a sparse search such as TSS. Therefore, they have the disadvantages of a coarse-to-fine search in common. The goal of this paper is to supplement the common disadvantages of the above-listed sparse search methods; that is, the disadvantages of a coarse-to-fine search. Therefore, when comparing how well the proposed method compensates for the weakness of a coarse-to-fine search, among the sparse search algorithms using a coarse-to-fine search, the algorithm with the best performance is used as a representative benchmark. Studies [17,18] provide a detailed performance comparison of such sparse search methods.
From the literature, it was confirmed that the ES method was the most accurate but the most computationally intensive, and the SD (Star-Diamond Search) method was the most efficient. In other words, SD is known to produce results with similar accuracy to ES (Exhaustive Search) with the smallest amount of computation. Based on the analysis of these studies, ES and SD were selected as benchmarking approaches for the proposed method in the Results section.

4.1. Benchmarking Method 1: ES (Exhaustive Search)

ES is a global search method, and a conceptual description of ES can be found in reference [9], which is as follows.
‘Block-based matching algorithms find the optimal motion vectors which minimize the difference between reference block and candidate blocks. Exhaustive search (ES) or the Full search algorithm (FSA) is the simple method for motion estimation. In searching for the total (2p + 1) × (2p + 1) positions that need to be examined, where p is the search range for the block. The full search is brute force in nature and it delivers good accuracy in searching for the best match. But because of a large amount of computation is involved, ES is not ready for real-time applications [10]’.
Moreover, if ES is expressed in the form of an algorithm, it can be described in the form of Full Search see Algorithm 4.
Algorithm 4. Full search
  • Step 1: FOR each row on the search image
       FOR each column on the search image
         For d = (row, column), compute the matching cost function
       END FOR
     END FOR
  • Step 2: IF the matching cost function is minimum for (row*, column*)
     THEN d* = (row*, column*) is the matching position.
     END IF

4.2. Benchmarking Method 2: SD (Star-Diamond Search)

Because SD is based on FSS/4SS (Four Step Search), which is an improved version of TSS/3SS (Three Step Search), we first check FSS. The easy-to-understand description of FSS can be found in [14], which is Algorithm 5:
Algorithm 5. Four Step Search (FSS)
  • Step 1: Start with search location at center
  • Step 2: Set step size S = 2, (irrespective of search parameter p)
  • Step 3: Search 8 locations +/− S pixels around location (0, 0)
  • Step 4: Pick among the 9 locations searched, the one with minimum matching cost function
  • Step 5: If the minimum weight is found at center for search window:
    (1) Set the new step size as S = S/2 (that is S = 1)
    (2) Repeat the search procedure from steps 3 to 4
    (3) Select location with the least weight as motion vector
  • Step 6: If the minimum weight is found at one of the 8 locations other than the center:
    (1) Set the new origin to this location
    (2) Fix the step size as S = 2
    (3) Repeat the search procedure from steps 3 to 4.
       Depending on location of new origin,
       search through 5 locations or 3 locations
    (4) Select the location with the least weight
    (5) If the least weight location is at the center of new window,
       go to step 5.
       Else go to step 6
SD uses a Star-Diamond search point pattern and the algorithm runs exactly the same as 4SS/FSS (Four Step Search). However, there is no limit on the number of steps that the algorithm can take. A recent version of SD is SDAT (Star-Diamond search with Adaptive Threshold) given in [17]. The SDAT algorithm is based on determining an adaptive threshold form for matching error to eliminate stationary motion earlier in the search procedure [17]. Two search patterns are employed to respectively track and refine different motion types; the Star pattern followed by a Small Diamond pattern.

4.3. Conceptual Differences between Proposed Method and Benchmarking Methods

After the ES independently checks all candidate locations in the search window, it selects the location with the lowest cost value. Therefore, ES is not a line search algorithm that sequentially approaches the minimum of the matching cost function. Since it is a result obtained by considering all candidate positions within the search window, the accuracy is the highest, but the amount of calculation is the most.
On the other hand, SD and the proposed method do not independently check all candidate positions in the search window. At each step, by selecting the position with the smallest value of the matching cost function among the eight adjacent positions, the next position is selected, gradually approaching the minimum value of the matching cost while repeating the steps. In each step, the next location is chosen as the adjacent pixel with the minimum matching cost value. Repeating this step, the algorithm sequentially approaches the minimum of the matching cost function. Therefore, SD and the proposed method are line search algorithms.
Since the line search algorithm does not check all locations within the search window as with ES, the amount of calculation is greatly reduced, but the accuracy is lower than the result considering all positions within the window.
The first difference between SD and the proposed method is that ES starts with a sparse search with a step size of 2 and changes to a dense search with a step size of 1. A sparse search with a step size of 2 may miss the optimal descent direction, but the uniformly dense search with a step size 1 of the proposed algorithm can prevent this problem.
The second difference between SD and the proposed method is that the eight neighboring search points used by SD are arranged in a sparse Star-Diamond shape with a step size of 2, whereas the proposed method uses a dense simple square with a step size of 1. Since the arrangement of the eight candidate search points in the proposed method is more compact, the search accuracy can be improved. In addition, since the proposed method has a simpler layout, it also has an advantage in terms of computational complexity.
The third difference between SD and the proposed method lies in the complexity of the algorithm for finding the next position with the lowest cost. The algorithm of SD is complex and asymmetric, consisting of several steps and dividing the algorithm into two cases. On the other hand, the proposed algorithm is extremely simple because it is actually composed of one step, and it is a symmetrical form that is equally applied in all cases. Therefore, the proposed method has many advantages over SD in terms of computational complexity and ease of implementation.
The fourth difference of the proposed method is that most of the existing block matching algorithms, including ES and SD, use clear images, whereas the proposed method uses blurred images. In addition, SD and the proposed method sequentially find the adjacent location with the smallest matching cost value, which means that it moves in the direction of decreasing matching cost value; that is, in the direction of gradient descent. Therefore, SD and the proposed method correspond to gradient descent.
Conversely, as explained in Figure 2, the matching cost value of the blurred image is close to unimodal, so the gradient descent method can work optimally. For this reason, the accuracy of the proposed gradient descent algorithm using blurred images can be improved. Of course, since the matching cost of the blurred images is not a perfect unimodal form, it may be slightly less accurate than ES. However, more accurate results can be obtained than contemporary Sparse Search algorithms such as SD using clear images. This characteristic means that the proposed method is more robust than SD even when the motion vector of the video is relatively large.
As such, the robustness to a relatively large motion vector can be very useful. First, even when the motion vector is large due to the high speed of the object, the proposed algorithm can produce more accurate results.
Second, in applications where the motion vector is not too large, the proposed block matching method can be used for template image detection.
Third, it means that there is no need to perform block matching between two adjacent frames having a small motion vector in a video, and it is okay to perform block matching between frames far apart having a large motion vector. In this case, the amount of calculation is reduced by the amount of block matching between the missing intermediate frames.
In order to confirm the robustness against such a large motion vector, in the Results section, block matching is performed between the first reference frame and the remaining frames, not between adjacent image frames having a small motion vector. Through this experiment, it was confirmed that the proposed method showed better block matching performance (in accuracy and speed) than SD, from the case of a small motion vector to the case of a large motion vector.

5. Results

In this section, we numerically verify how much the proposed algorithm (Algorithm 3) can reduce the amount of block matching computation compared to Full Search (Algorithm 1).

5.1. Qualitative Analysis

5.1.1. Full Search (Algorithm 1)

We consider the Full Search case given in Algorithm 1. It can be assumed that the matching image block lies within a certain area adjacent to the reference image block. Then, the ‘square area of size M centered on the reference image block’ can be selected as the ‘search area’. When the search area is determined in this way, the given matching error function is calculated for all pixels in the search area (square area of size M), and the pixel corresponding to the smallest matching error value among them is determined as the matching position. In such block matching, the matching error function is computed for each of the M 2 pixels. As a result, M 2 computations of the matching error function are required.

5.1.2. Proposed Line Search (Algorithm 3)

We consider the proposed line search case given in Algorithm 3. As in the case of the Full Search just discussed, we can assume that matching image blocks are adjacent to reference image blocks. In this case, the search can be started from the location of the reference image block. Among the eight adjacent pixels, in order to determine which pixel the search position will advance to, the matching error function value is calculated for the eight adjacent pixels. The search position advances to the pixel position corresponding to the smallest value among the eight matching error values. Then, at the current search position, the matching error function values are calculated for the eight adjacent pixel values. Among them, the search position advances to the pixel position corresponding to the smallest matching error value. By repeating this process, the search position advances by one pixel in the direction where the value of the matching error function is minimized. In addition, if the minimum value of the matching error for the eight adjacent pixels is not different from the minimum value of the matching error at the previous search position, the search is terminated and the current search position is determined as the matching position. If K searches are performed before the end of the search, the calculation of the matching error function value is performed a total of 8K times.

5.1.3. Comparison

The 8K computation of the proposed Line Search (Algorithm 3) is much smaller than the M 2 computation of the Full Search (Algorithm 1). Moreover, if M is large, the amount of operation of Full Search increases in square ratio, but the proposed Line Search is not affected by M. Therefore, the efficiency of the proposed Line Search is maximized for larger blocks or images.

5.2. Quantitative Analysis

For quantitative analysis, we consider the sample image given in Figure 3. Figure 3 shows some images from public data sets provided in reference [20]. An open data set was chosen as the sample image for this experiment as it was more objective because the experimental conditions could not be controlled. To emphasize that the proposed method can be applied to various subjects, many types of sample images were selected. In each image sequence in Figure 3, the first image is the reference image frame, and the area highlighted by the white box is the reference image block, and we look at the block matching problem of locating the reference block in the rest of the images.
The reason image sequences are used as sample images for this experiment is not for image tracking (matching between adjacent image frames), but because image frames in an image sequence contain various image blocks and poses. In other words, the purpose of this experiment is to check the block matching performance for these various configurations and poses. Therefore, it should be noted that in this experiment, block matching is not performed between adjacent image frames but between the reference frame and the remaining image frames.
The results of the proposed block matching search method are compared with the results of ES (Exhaustive Search) described in [9], a recent full search block matching method and SDAT (Star-Diamond search with Adaptive Threshold) [17], a recent sparse search block matching method.
Figure 4 is a plot of the average number of calculations when the proposed method, ES and SDAT are applied to each sample image in Figure 3 for easy comparison. The results of applying the proposed method, ES and SDAT are indicated by solid (‘-’), dotted (‘--’) and asterisk (‘*’) lines, respectively. In this figure, the number of computations refers to how many matching measures were computed during the matching process. The y-axis is shown in logarithmic scale.
Figure 5 is a plot of the average matching errors when the proposed method, ES and SDAT are applied to each sample image in Figure 3 for easy comparison. The results of applying the proposed method, ES and SDAT are indicated by solid (‘-’), dotted (‘--’), and asterisk (‘*’) lines, respectively. In this figure, the matching error means a peak signal-to-noise ratio (PSNR) between the reference image block B r and the matched image block B m r . In this regard, a given block B r in the reference frame is considered a noise-free black-and-white image, and a matched block B m in the current frame is considered a noise approximation of B r . The PSNR (in dB) is then expressed as Equation (5). The larger the PSNR value, the smaller the matching error.
PSNR = 10   ×   log 10 ( MAX r 2 MSE )
Here, MAX r is the maximum possible pixel value of the image block B r , and MSE (Mean Squared Error) is given in Equation (6)
MSE = 1 mn i = 0 m 1 j = 0 n 1 [ B r ( i , j )     B m ( i , j ) ] 2
Looking at the results in Figure 4 and Figure 5, it shows that the proposed method can perform block matching with almost the same accuracy (based on PSNR) at a computational cost of about 0.08% to 0.62% compared to ES.
Compared to SDAT, it can be seen that the proposed method can perform block matching with generally better accuracy with a computational cost of about 1.6% to 7.6%.
In Section 4, we examined whether the accuracy of the proposed gradient descent algorithm using a blurred image can be improved because the matching cost function for a blurred image is converted to a unimodal one. Of course, since the matching cost of the blurred images is not a perfect unimodal form, it may be slightly less accurate than ES. However, more accurate results can be obtained than contemporary Sparse Search algorithms such as SD using clear images. This characteristic means that the proposed method is more robust than SD even when the motion vector of the video is relatively large. As such, the robustness to a relatively large motion vector can be very useful. For example, it means that there is no need to perform block matching between two adjacent frames having a small motion vector in a video, and it is okay to perform block matching between frames far apart having a large motion vector. The experimental results confirming the above facts are given in Figure 6.
Figure 6 is a plot of the PSNR between the initial and final frames of each sample image in Figure 3, i.e., PSNR between frames with a large motion vector. The results of applying the proposed method, ES and SDAT are indicated by solid (‘-’), dotted (‘--’) and asterisk (‘*’) lines, respectively. The point of this experiment is that the motion vector between both ends of the video sample (initial and final frames) will be much larger than the motion vector between adjacent frames. Looking at the results in Figure 6, it can be seen that the accuracy of the existing Sparse Search (SDAT) is greatly reduced. Therefore, SDAT is less robust to large motion vectors. On the other hand, the proposed method consistently shows almost the same accuracy as Full Search (ES) for all image samples. Therefore, it can be confirmed that the proposed method has almost the same robustness as Full Search (ES) for a relatively large motion vector.

6. Conclusions and Future Work

In this paper, we studied an image block matching method. Image block matching involves intensive computation for a sequential pixel-by-pixel full search. To reduce the computation, many other methods propose a sequential pixel-by-pixel sparse search. However, these sparse search methods have limitations in terms of reducing computation, and they often reduce matching accuracy or add complexity for usage (they often need additional complex processes). Different from these sparse search algorithms, the proposed method adopts a line search algorithm for image block matching. We found a clue to developing such an algorithm by using the hidden symmetry between the optimization problem in vector space with continuous variables and the block matching problem in images with discrete variables. Utilizing the symmetry of these two different spaces, we could apply the line search optimization algorithm in vector space to the block matching problem in the image. The proposed method gives acceptable results in matching accuracy and the dramatic reduction in the amount of computations. In addition, the proposed method assures global convergence and convenient usage (it does not need additional complex processes). Efficiency of the proposed algorithm can best be realized when the size of a block and image are bigger. Therefore, the proposed algorithm is the best fit for such a big block and image. (It does not mean that the efficiency of the proposed algorithm only occurs when blocks and images have to be larger than a specific size range, but it means that the larger the blocks and images, the greater the relative gain in computation compared to Full Search. In other words, if the blocks and images are not large enough, the relative gain of the proposed algorithm compared to Full Search will not be dramatically large).

Funding

This research received no external funding.

Data Availability Statement

The data used in the experiment can be obtained from the website (https://vision.middlebury.edu/flow/, accessed on 1 May 2023). Other than this, no new data was created.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kibeya, H.; Belghith, F.; Loukil, H.; Ayed, M.A.B.; Masmoudi, N. TZSearch pattern search improvement for HEVC motion estimation modules. In Proceedings of the 2014 1st International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 17–19 March 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 95–99. [Google Scholar]
  2. Muzammil, M.; Khan, Z.A.; Ullah, M.O.; Ali, I. Performance analysis of block matching motion estimation algorithms for HD videos with different search parameters. In Proceedings of the 2016 International Conference on Intelligent Systems Engineering (ICISE), Islamabad, Pakistan, 15–17 January 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 306–311. [Google Scholar]
  3. Sahu, S.K.; Shukla, D. A Review Paper on Motion Estimation Techniques. Int. J. Recent Innov. Trends Comput. Commun. IJRITCC 2017, 5, 26–32. [Google Scholar]
  4. Yasakethu, S.L.P.; Hewage, C.T. Efficient decoding algorithm for 3D video over wireless channels. Multimed. Tools Appl. 2018, 77, 30683–30701. [Google Scholar] [CrossRef] [Green Version]
  5. Richardson, I.E.G.H. 264 and MPEG-4 Video Compression, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
  6. Barjatya, A. Block matching algorithms for motion estimation. IEEE Trans. Evol. Comput. 2004, 8, 225–239. [Google Scholar]
  7. Ankita, P.C.; Rohit, R.P.; Shahida, G.C. Comparative study on diamond search algorithm for motion estimation. Int. J. Eng. Res. Technol. IJERT 2012, 1, 1–6. [Google Scholar]
  8. Nakum, N.K.; Kothari, A.M. A Review paper on Implementation & Comparative Analysis of Motion Estimation Algorithm in Video Compression. Int. J. Recent Technol. Eng. IJRITE 2012, 1, 57–60. [Google Scholar]
  9. Manjunatha, D.V. Comparison and implementation of fast block matching motion estimation algorithms for video compression. Int. J. Eng. Sci. Technol. IJEST 2011, 3, 7608–7613. [Google Scholar]
  10. Shanableh, T.; Peixoto, E.; Izquierdo, E. MPEG-2 to HEVC video transcoding with content-based modeling. IEEE Trans. Circuits Syst. Video Technol. 2013, 23, 1191–1196. [Google Scholar] [CrossRef] [Green Version]
  11. Koga, T.; Iinuma, K.; Hirano, A.; Iijima, Y.; Ishiguro, T. Motion compensated interframe image coding for video conference. In Proceedings of the NTC81, New Orleans, LA, USA, 29 November–3 December 1981; p. G5. [Google Scholar]
  12. Li, R.; Zeng, B.; Liou, M.L. A new three-step search algorithm for block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 1994, 4, 438–442. [Google Scholar]
  13. Basher, H.A. Two minimum three step search algorithm for motion estimation of images from moving IR camera. In Proceedings of the 2011 IEEE Southeastcon, Nashville, TN, USA, 17–20 March 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 384–389. [Google Scholar]
  14. Po, L.M.; Ma, W.C. A novel four-step search algorithm for fast block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 1996, 6, 313–317. [Google Scholar]
  15. Zhu, S.; Ma, K.K. A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans. Image Process. 2000, 9, 287–290. [Google Scholar] [CrossRef] [PubMed]
  16. Djoudi, K.; Belbachir, M.F. Star diamond: An efficient algorithm for fast block matching motion estimation in H264/AVC video codec. Multimed. Tools Appl. 2016, 75, 3161–3175. [Google Scholar]
  17. Kerfa, D.; Saidane, A.K. An efficient algorithm for fast block matching motion estimation using an adaptive threshold scheme. Multimed. Tools Appl. 2020, 79, 24173–24184. [Google Scholar] [CrossRef]
  18. Satish, K.S.; Dolly, S. Star diamond-diamond search block matching motion estimation algorithm for H.264/AVC video codec. Int. J. Innov. Res. Comput. Commun. Eng. 2018, 6, 1–8. [Google Scholar]
  19. Swaroop, P.; Sharma, N. An overview of various template matching methodologies in image processing. Int. J. Comput. Appl. 2016, 153, 8–14. [Google Scholar] [CrossRef]
  20. Middlebury Dataset. Available online: https://vision.middlebury.edu/flow/ (accessed on 1 May 2023).
Figure 1. Concept of block matching and motion (displacement) vector.
Figure 1. Concept of block matching and motion (displacement) vector.
Symmetry 15 01333 g001
Figure 2. Conceptual depiction of a one-dimensional metaphor of Algorithm 3. (a) Image intensity map of original search image, (b) image intensity map of original reference block, (c) matching cost (SSD) values between original images (a,b) for each displacement (motion vector), (d) intensity map of blurred image of (a), (e) intensity map of the blurred image of (b), (f) matching cost (SSD) values between blurred images (d,e) for each displacement (motion vector), (g) application of the proposed descent algorithm to (f). Looking at (c), it can be seen that when the motion vector d is 12, the SSD (sum of squared distances) between (a,b) is minimum. The location of the reference block in (b) when the motion vector d is 12 coincides with the position of the corresponding block in (a). The same principle applies to blurred images (df). (g) shows the process of the proposed descent algorithm finding the minimum point in (f).
Figure 2. Conceptual depiction of a one-dimensional metaphor of Algorithm 3. (a) Image intensity map of original search image, (b) image intensity map of original reference block, (c) matching cost (SSD) values between original images (a,b) for each displacement (motion vector), (d) intensity map of blurred image of (a), (e) intensity map of the blurred image of (b), (f) matching cost (SSD) values between blurred images (d,e) for each displacement (motion vector), (g) application of the proposed descent algorithm to (f). Looking at (c), it can be seen that when the motion vector d is 12, the SSD (sum of squared distances) between (a,b) is minimum. The location of the reference block in (b) when the motion vector d is 12 coincides with the position of the corresponding block in (a). The same principle applies to blurred images (df). (g) shows the process of the proposed descent algorithm finding the minimum point in (f).
Symmetry 15 01333 g002aSymmetry 15 01333 g002b
Figure 3. A block matching experiment to locate the reference block (the white boxed block in the first image) in the remaining images.
Figure 3. A block matching experiment to locate the reference block (the white boxed block in the first image) in the remaining images.
Symmetry 15 01333 g003
Figure 4. Plot of the average number of calculations when the proposed method, full search method and sparse search method are applied to each sample image in Figure 3. The solid line (‘-’), dotted line (‘--’) and asterisk line (‘*’) show the results of applying the proposed method, ES and SDAT, respectively.
Figure 4. Plot of the average number of calculations when the proposed method, full search method and sparse search method are applied to each sample image in Figure 3. The solid line (‘-’), dotted line (‘--’) and asterisk line (‘*’) show the results of applying the proposed method, ES and SDAT, respectively.
Symmetry 15 01333 g004
Figure 5. Plot of the average PSNR when the proposed method, full search method and sparse search method are applied to each sample image in Figure 3. The solid line (‘-’), dotted line (‘--’) and asterisk line (‘*’) show the results of applying the proposed method, ES and SDAT, respectively.
Figure 5. Plot of the average PSNR when the proposed method, full search method and sparse search method are applied to each sample image in Figure 3. The solid line (‘-’), dotted line (‘--’) and asterisk line (‘*’) show the results of applying the proposed method, ES and SDAT, respectively.
Symmetry 15 01333 g005
Figure 6. Plot of the PSNR between initial and final frames of each sample image in Figure 3. The solid line (‘-’), dotted line (‘--’) and asterisk line (‘*’) show the results of applying the proposed method, ES and SDAT, respectively.
Figure 6. Plot of the PSNR between initial and final frames of each sample image in Figure 3. The solid line (‘-’), dotted line (‘--’) and asterisk line (‘*’) show the results of applying the proposed method, ES and SDAT, respectively.
Symmetry 15 01333 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, Y. Fast Image Block Matching for Image Detection and MPEG Encoding Based on Underlying Symmetries. Symmetry 2023, 15, 1333. https://doi.org/10.3390/sym15071333

AMA Style

Han Y. Fast Image Block Matching for Image Detection and MPEG Encoding Based on Underlying Symmetries. Symmetry. 2023; 15(7):1333. https://doi.org/10.3390/sym15071333

Chicago/Turabian Style

Han, Youngmo. 2023. "Fast Image Block Matching for Image Detection and MPEG Encoding Based on Underlying Symmetries" Symmetry 15, no. 7: 1333. https://doi.org/10.3390/sym15071333

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop