Next Article in Journal
Unveiling the Secrets of Escher’s Lithographs
Previous Article in Journal
Acknowledgement to Reviewers of Journal of Imaging in 2019
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Three-Dimensional Block Matching Using Orthonormal Tree-Structured Haar Transform for Multichannel Images

1
Information and Communications Engineering, Tokyo Institute of Technology, Tokyo 152-8552, Japan
2
Department Telecommunications and Information Processing, Ghent University, 9000 Gent, Belgium
*
Author to whom correspondence should be addressed.
J. Imaging 2020, 6(2), 4; https://doi.org/10.3390/jimaging6020004
Submission received: 11 November 2019 / Revised: 23 January 2020 / Accepted: 6 February 2020 / Published: 11 February 2020

Abstract

:
Multichannel images, i.e., images of the same object or scene taken in different spectral bands or with different imaging modalities/settings, are common in many applications. For example, multispectral images contain several wavelength bands and hence, have richer information than color images. Multichannel magnetic resonance imaging and multichannel computed tomography images are common in medical imaging diagnostics, and multimodal images are also routinely used in art investigation. All the methods for grayscale images can be applied to multichannel images by processing each channel/band separately. However, it requires vast computational time, especially for the task of searching for overlapping patches similar to a given query patch. To address this problem, we propose a three-dimensional orthonormal tree-structured Haar transform (3D-OTSHT) targeting fast full search equivalent for three-dimensional block matching in multichannel images. The use of a three-dimensional integral image significantly saves time to obtain the 3D-OTSHT coefficients. We demonstrate superior performance of the proposed block matching.

1. Introduction

Block matching is a fundamental tool used to search for blocks (patches) similar to a given query. It has been widely used in solving various image processing problems, like object recognition and tracking [1], image registration [2], image analysis [3], image restoration [4], to name a few. Block matching searches for patches of the same size as a query and is sensitive to deformation. In this sense, it is different from some common image descriptors such as scale-invariant feature transform (SIFT) [5] and speed-up robust features (SURF) [6], which extract features robust to deformation. In block matching, generally, a full search (FS) algorithm that exhaustively compares all the pixel intensities of all the candidates overlapping each other is the most accurate, but requires vast computation due to the huge number of candidates in a large search space. The larger the image size and number of image bands, the harder it is to use FS.
Fast block matching has been studied from algorithms and architectures. There are several works of architecture such as hardware acceleration and configuration by custom instructions specially for motion estimation in video coding [7,8,9,10,11]. There are also two categories in the algorithms, FS-equivalent algorithms and non-FS-equivalent algorithms. Some non-FS-equivalent algorithms such as three-step-search [12] and diamond search [13] reduce the amount of computation by limiting search areas, and others do so by approximating patterns [14,15], where there is a trade-off between accuracy and efficiency. The scope of this paper is limited to the FS-equivalent algorithm.
Fast FS-equivalent algorithms have been intensively developed in order to address the computational complexity with FS [16,17,18,19]. Although these methods can be applied only to the patches of size power-of-2, they prove that the search in a transformed domain is much more efficient than the search in a spatial domain. The additional calculation to obtain the transform coefficients is fully compensated by the reduction of candidates. The orthogonal Haar transform (OHT) reportedly performs fastest in this setting [20,21]. One of the reasons is that there is a unique way to calculate the transform coefficients using an integral image [22,23]. Once an integral image is built, the transform coefficients can be calculated with a few arithmetic operations. This is based on the fact that the Haar transform matrix is sparse and composed of rectangular functions, which is much different from other transforms. In order to overcome the limitation of patch sizes, two-dimensional orthonormal tree-structured Haar transform (2D-OTSHT), the generalization of OHT, was proposed [24]. Compared to OHT, the normalization factor of 2D-OTSHT is not an integer number, but this has little effect in the whole speedup.
In this paper, we propose three-dimensional orthonormal tree-structured Haar transform (3D-OTSHT) with three-dimensional integral image for multichannel images. In the proposed 3D-OTSHT, the transform coefficients can be obtained by a few arithmetic operations regardless of patch size. Focusing on the pruning performance in the transformed domain, we consider the combination with FS, where the pruning process is stopped at a certain level of reduction of candidates. We demonstrate superior results regarding the savings in computation time compared to the straightforward solution, where the fast FS-equivalent method for grayscale images is applied to each band separately. Experimental results are obtained using a standard dataset for color images and a five-band multispectral image dataset.
The paper is organized as follows: In Section 2, we provide the problem setting and required techniques as preliminaries. We present the design of 3D-OTSHT for three-dimensional (3D) integral image and the 3D block matching using 3D-OTSHT in Section 3. Our evaluations of the pruning performance and speedup are detailed in Section 4. Finally, in Section 5, we conclude our study.

2. Preliminaries

First, the problem targeted in this paper is stated. Next, a couple of techniques required for the proposed method are briefly described.

2.1. Problem Statement

Consider the problem of searching for patches similar to a given query in a multichannel image. In the full search (FS) algorithm, the matching patches are detected with a threshold in a sliding window manner by measuring the similarity of all the candidates in the whole search space. Generally, the sum of squared differences (SSD) of all the intensities in a candidate is used as the similarity.
Let q ( x , y , z ) be a query patch of size N × N having B bands, and I ( u , v , w ) measure an image of size M × M having B bands. The SSD of all the intensities of all the candidates is calculated, for 0 u < M N , 0 v < M N , and 0 w < B , as
S S D ( u , v , w ) = u = 0 M N v = 0 M N x = 0 N 1 y = 0 N 1 z = 0 B 1 I ( u + x , v + y , w + z ) q ( x , y , z ) 2 .
It turns out that ( 2 ( M N + 1 ) 2 N 2 B 1 ) additions and ( M N + 1 ) 2 N 2 B multiplications are required for the search. The aim of this paper is to reduce the computational complexity for search in multichannel images while keeping the same accuracy as FS using the same threshold as FS uses.
A part of I ( u , v , w ) , i.e., the i-th candidate, and a query are hereafter simply expressed as p i and q , respectively, in a vector form, e.g., the SSD of the i-th candidate is represented as
S S D i = | | p i q | | 2 2 .

2.2. Tree-Structured Haar Transform

Tree-structured Haar transform (TSHT) is a generalization of the Haar transform, which can be applied to signals with arbitrary length [25]. In the conventional Haar transform [26], the basis is built by dividing an interval equally into two intervals. In TSHT, on the other hand, the basis is built by dividing an interval unequally into two intervals putting weights on them. The complete division of the intervals can be expressed by a binary tree structure.
Figure 1a shows an example of a binary tree having 3 leaves. A circle represents a node. The topmost node is the root. The node having no connection below (no child node) is a leaf. The number inside a circle represents the number of leaves the node has, which determines the internal division ratio when dividing an interval and the weight of intervals in the basis function. Figure 1b shows the intervals corresponding to the nodes of the binary tree. The interval corresponding to a node is divided internally in two subintervals, in a ratio equal to the number of leaves of the left child node to that of the right child node.
Let α be a node of a binary tree having N leaves. Let α 0 and α 1 be the left child node and the right child node of α , respectively. We denote by ι α the interval corresponding to α . The basis function for interval ι r o o t is given as
h ( t ) = 1 N , t ι r o o t
and
h ( t ) = ν ( α 1 ) , t ι α 0 ν ( α 0 ) , t ι α 1 0 , otherwise ,
where ν ( α ) represents the number of leaves that α has. Thus, except for ι r o o t , the absolute value of the weight of two intervals is inversely proportional to the number of leaves. Figure 1c shows the basis.

2.3. Three-Dimensional Integral Image

The three-dimensional (3D) integral image J ( x , y , z ) is generated from an image, I ( u , v , w ) , of size M × M having B bands, for x = 0 , 1 , 2 , , M ; y = 0 , 1 , 2 , , M ; and z = 0 , 1 , 2 , , B , as
J ( x , y , z ) = u = 0 x 1 v = 0 y 1 w = 0 z 1 I ( u , v , w ) ,
where J ( 0 , y , z ) = 0 , J ( x , 0 , z ) = 0 , and J ( x , y , 0 ) = 0 . Observe that ( 3 B 1 ) M 2 additions are required to build a 3D integral image.
With the 3D integral image, the sum of all the intensities in region A (ABCD-EFGH) whose diagonal starts at the location ( s X , s Y , s Z ) and ends at ( e X , e Y , e Z ) , as shown in Figure 2, is calculated by seven additions regardless of region size as
r e g i o n S u m ( A ) = J ( e X + 1 , e Y + 1 , e Z + 1 ) J ( e X + 1 , e Y + 1 , s Z ) J ( e X + 1 , s Y , e Z + 1 )                       J ( s X , e Y + 1 , e Z + 1 ) + J ( s X , s Y , e Z + 1 ) + J ( s X , e Y + 1 , s Z ) + J ( e X + 1 , s Y , s Z )                       J ( s X , s Y , s Z ) .
This property is a key to significant speedup of the proposed method.

3. Three-Dimensional Block Matching for Multichannel Images

We propose here fast FS-equivalent three-dimensional (3D) block matching for multichannel image using three-dimensional orthonormal tree-structured Haar transform (3D-OTSHT). First, we construct 3D-OTSHT to simplify the computation to obtain its coefficients. Next, we describe the 3D block matching using 3D-OTSHT.

3.1. Three-Dimensional Orthonormal Tree-Structured Haar Transform

One of the vectors forming the basis of the vector space for a three-dimensional region is referred to in this paper as a basis block. The 3D-OTSHT consists of the basis blocks built by subdividing a three-dimensional region formed by intervals along X, Y, and Z axis. For rapid calculation of 3D-OTSHT coefficients via integral image, we design the basis block to have at most two regions in it, each of which is assigned to a constant.
Let us consider the basis blocks for a query of size N × N having B bands. We generate the set of basis blocks by basis block functions. Let T X , T Y , and T Z be the binary trees for X axis having N leaves, Y axis having N leaves, and Z axis having B leaves, respectively. The nodes of T X , T Y , and T Z are denoted by α , β , and γ , respectively.
We define the basis block function for region ( ι r o o t × ι r o o t × ι r o o t ) as
φ 0 ( x , y , z ) = 1 N B , ( x , y , z ) ι r o o t × ι r o o t × ι r o o t
and the basis block functions for the other regions as
φ 1 ( x , y , z ) = c φ 1 + , ( x , y , z ) ι α 0 × ι β × ι γ c φ 1 , ( x , y , z ) ι α 1 × ι β × ι γ 0 , otherwise
φ 2 ( x , y , z ) = c φ 2 + , ( x , y , z ) ι α 0 × ι β 0 × ι γ c φ 2 , ( x , y , z ) ι α 0 × ι β 1 × ι γ 0 , otherwise
φ 3 ( x , y , z ) = c φ 3 + , ( x , y , z ) ι α 1 × ι β 0 × ι γ c φ 3 , ( x , y , z ) ι α 1 × ι β 1 × ι γ 0 , otherwise
φ 4 ( x , y , z ) = c φ 4 + , ( x , y , z ) ι α 0 × ι β 0 × ι γ 0 c φ 4 , ( x , y , z ) ι α 0 × ι β 0 × ι γ 1 0 , otherwise
φ 5 ( x , y , z ) = c φ 5 + , ( x , y , z ) ι α 0 × ι β 1 × ι γ 0 c φ 5 , ( x , y , z ) ι α 0 × ι β 1 × ι γ 1 0 , otherwise
φ 6 ( x , y , z ) = c φ 6 + , ( x , y , z ) ι α 1 × ι β 0 × ι γ 0 c φ 6 , ( x , y , z ) ι α 1 × ι β 0 × ι γ 1 0 , otherwise
φ 7 ( x , y , z ) = c φ 7 + , ( x , y , z ) ι α 1 × ι β 1 × ι γ 0 c φ 7 , ( x , y , z ) ι α 1 × ι β 1 × ι γ 1 0 , otherwise ,
where c φ n + and c φ n , ( n = 1 , 2 , , 7 ) are a positive constant and a negative constant, respectively, which includes normalization factor and weight:
c φ 1 + = ν ( α 1 ) ν ( α ) ν ( β ) ν ( γ ) ν ( α 0 ) ν ( α 1 ) , c φ 1 = ν ( α 0 ) ν ( α ) ν ( β ) ν ( γ ) ν ( α 0 ) ν ( α 1 ) ,
c φ 2 + = ν ( β 1 ) ν ( α 0 ) ν ( β ) ν ( γ ) ν ( β 0 ) ν ( β 1 ) , c φ 2 = ν ( β 0 ) ν ( α 0 ) ν ( β ) ν ( γ ) ν ( β 0 ) ν ( β 1 ) ,
c φ 3 + = ν ( β 1 ) ν ( α 1 ) ν ( β ) ν ( γ ) ν ( β 0 ) ν ( β 1 ) , c φ 3 = ν ( β 0 ) ν ( α 1 ) ν ( β ) ν ( γ ) ν ( β 0 ) ν ( β 1 ) ,
c φ 4 + = ν ( γ 1 ) ν ( α 0 ) ν ( β 0 ) ν ( γ ) ν ( γ 0 ) ν ( γ 1 ) , c φ 4 = ν ( γ 0 ) ν ( α 0 ) ν ( β 0 ) ν ( γ ) ν ( γ 0 ) ν ( γ 1 ) ,
c φ 5 + = ν ( γ 1 ) ν ( α 0 ) ν ( β 1 ) ν ( γ ) ν ( γ 0 ) ν ( γ 1 ) , c φ 5 = ν ( γ 0 ) ν ( α 0 ) ν ( β 1 ) ν ( γ ) ν ( γ 0 ) ν ( γ 1 ) ,
c φ 6 + = ν ( γ 1 ) ν ( α 1 ) ν ( β 0 ) ν ( γ ) ν ( γ 0 ) ν ( γ 1 ) , c φ 6 = ν ( γ 0 ) ν ( α 1 ) ν ( β 0 ) ν ( γ ) ν ( γ 0 ) ν ( γ 1 ) ,
c φ 7 + = ν ( γ 1 ) ν ( α 1 ) ν ( β 1 ) ν ( γ ) ν ( γ 0 ) ν ( γ 1 ) , c φ 7 = ν ( γ 0 ) ν ( α 1 ) ν ( β 1 ) ν ( γ ) ν ( γ 0 ) ν ( γ 1 ) .
A series of functions (8) to (14) is repeated at all the intervals corresponding to the nodes of three binary trees. Eventually, N 2 B basis blocks are generated. Each basis block is indicated as G k , ( k = 1 , 2 , , N 2 B ) in a vector form. As such, c φ n + and c φ n are replaced by c k + and c k , respectively. Let G be the set of basis blocks, i.e., G = [ G 1 , G 2 , , G N 2 B ] T , where T denotes the transposition. G is orthonormal.
Figure 3 shows the appearance of 3D-OTSHT. Function φ 1 divides a region along X axis into two regions, c + and c , where c + and c are the region assigned to c φ n + and c φ n as in (15) through (21), respectively. Functions φ 2 and φ 3 divide c + and c , respectively, built by φ 1 along Y axis; φ 4 and φ 5 divide c + and c , respectively, built by φ 2 along Z axis; φ 6 and φ 7 divide c + and c , respectively, built by φ 3 along Z axis.

3.2. 3D Block Matching Using 3D-OTSHT with 3D Integral Image

In the proposed 3D block matching, SSD of 3D-OTSHT coefficients is calculated as similarity, but not all the transform coefficients of all the candidates are used. The candidate that does not match is rejected from the search in the middle of processing, which is called pruning.
The transform coefficients are obtained efficiently with a 3D integral image. From (6), the k-th 3D-OTSHT coefficient, P ( k ) , of a patch, p , in a vector form is obtained as
P ( k ) = G k p = c k + × r e g i o n S u m ( p + ) + c k × r e g i o n S u m ( p ) ,
where p + and p are the regions in the patch corresponding to the regions to which c k + and c k are assigned in the k-th basis block, respectively.
For pruning, the similarity of candidates is calculated using a subset of 3D-OTSHT. Let G k be the subset of G that contains the first k basis blocks, i.e., G k = [ G 1 , G 2 , , G k ] T . The similarity of the i-th candidate is calculated with G k as
S S D i k = | | P i k Q k | | 2 2 ,
where P i k = G k p i and Q k = G k q . At every k ( k = 1 , 2 , , N 2 B ), S S D i k is judged with a threshold. Once S S D i k is beyond the threshold, the i-th candidate is rejected from the search and neither OTSHT coefficient nor SSD is calculated afterward. As long as using the same threshold as FS uses, the unmatched candidates are securely rejected. The theory behind this is that if the transform is orthonormal, the energy of a signal is not changed in the transformed domain. Therefore, it holds that
| | P i Q | | 2 2 = | | p i q | | 2 2 ,
where P i = G p i and Q = G q . From | | P i k Q k | | 2 2 | | P i Q | | 2 2 for k = 1 , 2 , , N 2 B , secure rejection is made. For this reason, the transform should be orthonormal and SSD used as the similarity measure.

3.3. 3D Block Matching Using Limited 3D-OTSHT

All the basis blocks of 3D-OTSHT can detect patches with the same accuracy as FS. However, it is inefficient to use all the basis blocks because G k becomes sparser as k increases. To avoid this, the limited number of basis blocks is used for pruning, and after the number of candidates is reduced, the remaining candidates are scrutinized by FS. That is, G K with a certain K is used instead of G for S S D i k shown in (23), and at every k, ( k = 1 , 2 , , K ) , S S D i k is judged with a threshold for pruning.

4. Evaluation

We performed 3D block matching in order to evaluate the pruning performance and elapsed time of the proposed method using multichannel images.

4.1. Methods and Environments

We compared the performance of the following five methods for search: FS, two-dimensional OTSHT with a two-dimensional integral image by single judge (2D-OTSHT-2DI-S) [24], two-dimensional OTSHT with two-dimensional integral image by whole judge (2D-OTSHT-2DI-W), two-dimensional OTSHT with a 3D integral image (2D-OTSHT-3DI) [27], and the proposed 3D-OTSHT with 3D integral image (3D-OTSHT-3DI). 2D-OTSHT-2DI-S and -W use the OTSHT for grayscale images on each channel, and judge the candidates in different ways: The single judge (-S) decides whether to reject the candidate or not based on a single channel, while the whole judge (-W) evaluates the candidate based on all the channels. In 2D-OTSHT-3DI, a basis image forms a basis block for the 3D integral image so that the same basis image is piled up.
We use two image data sets, color image data set and five-band multispectral image data set. The color image data set, called SIDBA, contains 12 scenes of size 256 × 256 having 3 color channels [28]. The 5-band image data set contains 11 scenes of size 1824 × 1368 having 5 bands [29]. For each dataset, five patch sizes are used. We chose 10 queries randomly in an image per patch size. Then, we obtain the ground truth patches similar to the query by FS with a threshold. If the SSD of the i-th patch holds
| | p i q | | 2 2 < = t h r e s h o l d ,
we set the i-th patch a ground truth. In this experiment, the threshold is 10 N 2 B for the color images and 2 N 2 B for the 5-band multispectral images. The same threshold is used in all the methods. Table 1 summarizes the number of ground truth patches. From Table 1, it can be seen that the mean number of ground truth patches tends to decrease as the patch size increases and increase as the image size increases. We confirmed that the number of ground truth patches chosen by the queries with low standard deviations of pixel intensities is likely to be large, and that the number of ground truth patches chosen by the queries with high standard deviations is likely to be one. Therefore, the threshold should be set appropriately considering the size and characteristics of images in practical applications. Generally, the larger threshold yields more ground truth patches. However, a threshold that is too large will be meaningless.
All the algorithms are written in C as single thread tasks, compiled with Xcode 10.1, and run on a macOS system with 4 GHz Intel core i7 and 16 GB RAM, where eight active processor cores with hardware multithreading are used.

4.2. Pruning Performance

The pruning performance is evaluated by the ratio, R ( K ) , of the number of remaining extra patches detected by K basis images/blocks to the number of all the candidates, which is defined by
R ( K ) = N d ( K ) N g N c × 100 ,
where N d ( K ) refers to the number of patches detected by K basis images/blocks, N g is the number of ground truth patches, and N c represents the number of all the candidates. Lower R ( K ) means better performance.
Figure 4 and Figure 5 show the pruning performance, R ( K ) , ( K = 2 , 4 , 8 , , K m a x ), in the color images and the 5-band multispectral images, respectively, where K m a x refers to the maximum number of basis images/blocks. In the proposed 3D-OTSHT-3DI, K m a x = N 2 B , while in the other methods, K m a x = N 2 . We confirmed that all the methods detect every ground truth patch at any K, i.e., there are no false negatives. Therefore, the final accuracy (the accuracy after the remaining candidates are scrutinized by FS) is the same as the accuracy of FS. The proposed 3D-OTSHT-3DI method yields the best pruning performance both on 3-band and 5-band images for all the patch sizes. In both image data sets, we observe that when K is greater than or equal to 8, 3D-OTSHT-3DI is better than the other methods and that as K increases, the rate of change reduces. These facts suggest that not all the basis blocks are required for the whole speedup.
Figure 6 shows an example of patches detected at K basis images/blocks by different methods and for different values of K. Every one of the detected patches of size 13 × 13 is shown in an orange square, and the detected overlapping patches cover different areas depending on the employed method and the K-value. It can be seen that the amount of candidate patches reduces as K increases. The result of 3D-OTHST-3DI does not differ significantly between K = 8 and K = 16 and, clearly, these results agree best with the ground truth.

4.3. Computational Complexity

Here, we consider the number of arithmetic operations with respect to the number of basis images/blocks when we use only OTSHT not included in the arithmetic operations of FS. Table 2 shows the number of additions and multiplications per pixel for search for patches similar to a query of size N × N having B bands including building an integral image. It shows the worst case, where no candidates are rejected at any K basis images/blocks. In 2D-OTSHT-2DI-S and -W, two additions are needed per pixel for building a 2D-integral image in each band, and ( 8 K 1 ) additions and 3 K multiplications are required per pixel for SSD in each band. Thus, in total, ( 2 + 8 k 1 ) B additions and 3 K B multiplications per pixel. In 2D- and 3D-OTSHT-3DI, ( 3 B 1 ) additions are needed per pixel for a 3D-integral image, and ( 16 K 1 ) additions and 3 K multiplications are required per pixel for SSD. For the same K, the number of additions for 2D- and 3D-OTSHT-3DI is smaller than that for 2D-OTSHT-2DI-S and -W, except for the images having 2 bands; while the number of multiplications for 2D- and 3D-OTSHT-3DI is smaller than that for 2D-OTSHT-2DI-S and -W in any images with more than one band.

4.4. Speedup

In this section, we analyze when to stop the pruning process for the whole speedup in combination with FS, showing the mean elapsed time at K basis images/blocks in the methods, and compare the final performance of the five methods. Figure 7 and Figure 8 show the mean ratio (%) of the elapsed time of each method at K basis images/blocks to the elapsed time of FS. Table 3 and Table 4 detail the mean elapsed time and the ratio to the elapsed time of FS when we use G K , ( K = 1 , 2 , 4 , , 32 ) for the color images and the 5-band multispectral images, respectively. The fastest time at each patch size is expressed in bold. From Table 3 and Table 4, it can be seen that the proposed method outperforms the other methods except for the cases of patches of size 9 × 9 in the color images and patches of size 5 × 5 in the 5-band images. In the color images, for patch size 5 × 5 , the fastest mean elapsed time was 1.03 (ms) which is 23.17 % of the mean elapsed time of FS; while for 21 × 21 , the fastest mean elapsed time was 1.30 (ms), which is 1.78 % of the mean elapsed time of FS. In the 5-band images, for patch size 5 × 5 , the fastest mean elapsed time was 0.050 (s) which is 17.44 % of the mean elapsed time of FS; while for 45 × 45 , the fastest mean elapsed time was 0.164 (s), which is 0.78 % of the mean elapsed time of FS. The larger the patch size and the larger the number of bands, the higher the efficiency of the proposed method.
We confirmed that the mean ratio did not differ much between eight active processor cores with hardware multithreading and one core in the same system.
We also compared the five methods with the best performed K-value to A2DHT [20]. A2DHT is an FS-equivalent algorithm for grayscale images and reportedly performs fastest in FS-equivalent algorithms, whose source code is provided by the authors [30]. We modified the code so that the method was applied to each band separately. Since A2DHT has a limitation that the patch size is power-of-2, the queries of size 16 × 16 were used for the color image dataset and the queries of size 16 × 16 and 32 × 32 were used for the 5-band image dataset. The number of ground truth patches is summarized in Table 5. We confirmed that all the methods detected every ground truth patch. Table 6 shows the mean elapsed time and mean ratio to FS time, where the fastest time and its ratio at each patch size is expressed in bold. It can be seen that the proposed method outperforms state-of-the-art methods.

5. Conclusions

We proposed a fast FS-equivalent 3D block matching method in order to search for the patch(es) similar to a given query for multichannel images. The proposed method uses 3D-OTSHT and reduces the number of candidates with SSD in the transformed domain. The pruning process rejects unmatched candidates during the block matching processing. Moreover, in combination with FS, the pruning process is stopped during the processing for the whole speedup. We designed 3D-OTSHT making the most of 3D integral image rather than the extension of one-dimensional TSHT. Unmatched patches are securely rejected due to being orthonormal and using SSD as the similarity measure. We have analyzed the pruning performance and mean elapsed time using a color image dataset and a 5-band multispectral image dataset, and demonstrated that the proposed method outperforms state-of-the-art methods. In the color images, the mean elapsed time was shortened up to less than 2% of the FS time. In the 5-band multispectral images, the search time was shortened up to around 0.8% of the FS time, hence allowing more than 100-times faster processing without sacrificing the accuracy. We believe that these huge savings in computation time can enable new applications of patch matching in multichannel images, which were not feasible before due to the prohibitive computational complexity.

Author Contributions

Conceptualization, I.I., A.P.; methodology, I.I.; software, I.I.; validation, I.I.; investigation, I.I.; resources, I.I.; data curation, I.I.; writing—original draft preparation, I.I.; writing—review and editing, A.P.; visualization, I.I.; project administration, I.I. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

  1. Dufour, R.; Miller, E.; Galatsanos, N. Template matching based object recognition with unknown geometric parameters. IEEE Trans. Image Process. 2002, 11, 1385–1396. [Google Scholar] [CrossRef] [PubMed]
  2. Ding, L.; Goshtasby, A.; Satter, M. Volume image registration by template matching. Image Vis. Comput. 2001, 19, 821–832. [Google Scholar] [CrossRef]
  3. Sarraf, S.; Saverino, C.; Colestani, A.M. A robust and adaptive decision-making algorithm for detecting brain networks using functional MRI within the spatial and frequency domain. In Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics, Las Vegas, NV, USA, 24–27 February 2016; pp. 53–56. [Google Scholar]
  4. Papyan, V.; Elad, M. Multi-scale patch-based image restoration. IEEE Trans. Image Process. 2016, 25, 249–261. [Google Scholar] [CrossRef] [PubMed]
  5. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the International Conference of Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
  6. Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded Up Robust Features; Lecture Notes in Computer Science 2006; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3951, pp. 404–417. [Google Scholar]
  7. Pastuszak, G.; Trochimiuk, M. Architecture design of the high-throughput compensator and interpolator for the H.265/HEVC Encoder. J. Real-Time Image Process. 2016, 11, 663–673. [Google Scholar] [CrossRef] [Green Version]
  8. Gonźalez, D.; Botella, G.; Garcia, C.; Prieto, M.; Tirado, F. Acceleration of block-matching algorithms using a custom instruction-based paradigm on a Nios II microprocessor. EURASIP J. Adv. Signal Process. 2013, 118. [Google Scholar] [CrossRef] [Green Version]
  9. González, D.; Botella, G.; Meyer-Baese, U.; García, C.; Sanz, C.; Prieto-Matías, M.; Tirado, F. A low cost matching motion estimation sensor based on the NIOS II microprocessor. Sensors 2012, 12, 13126–13149. [Google Scholar] [CrossRef] [Green Version]
  10. Nguyen, A.H.; Pickering, M.R.; Lambert, A. The FPGA implementation of a one-bit-per-pixel image registration algorithm. J. Real-Time Image Process. 2016, 11, 799–815. [Google Scholar] [CrossRef]
  11. Li, D.X.; Zheng, W.; Zhang, M. Architecture design for H.264/AVC integer motion estimation with minimum memory bandwidth. IEEE Trans. Consum. Electoron. 2007, 53, 1053–1060. [Google Scholar] [CrossRef]
  12. Koga, T.; Iinuma, K.; Hirano, A.; Iijima, Y. Motion compensated interframe coding for video conferencing. In Proceedings of the National Telecommunications Conference, New Orleans, LA, USA, 29 November–3 December 1981; pp. G5.3.1–G5.3.5. [Google Scholar]
  13. Zhu, S.; Ma, K. A new diamond search algorithm for fast block motion estimation. IEEE Trans. Image Process. 2000, 9, 287–290. [Google Scholar] [CrossRef]
  14. Simard, P.; Bottou, L.; Haffner, P.; Cun, Y.L. Boxlets: A fast convolution algorithm for signal processing and neural networks. Adv. Neural Inf. Process. Syst. 1999, 11, 571–577. [Google Scholar]
  15. Tang, F.; Crabb, R.; Tao, H. Representing images using non-orthogonal Haal-like bases. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2120–2134. [Google Scholar] [CrossRef] [PubMed]
  16. Tombari, F.; Mattoccia, S.; Stefano, L.D. Full search-equivalent pattern matching with incremental dissimilarity approximations. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 129–141. [Google Scholar] [CrossRef] [Green Version]
  17. Ouyang, W.; Cham, W.K. Fast algorithm for Walsh Hadamard transform on sliding windows. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 165–171. [Google Scholar] [CrossRef] [PubMed]
  18. Ouyang, W.; Zhang, R.; Cham, W.-K. Segmented gray-code kernels for fast pattern matching. IEEE Trans. Image Process. 2013, 22, 1512–1525. [Google Scholar] [CrossRef] [PubMed]
  19. Ouyang, W.; Zhang, R.; Cham, W.-K. Fast pattern matching using orthogonal Haar transform. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3050–3057. [Google Scholar]
  20. Ouyang, W.; Zhao, T.; Cham, W.-K.; WeiFast, L. Full-search-equivalent pattern matching using asymmetric Haar wavelet packets. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 819–833. [Google Scholar] [CrossRef]
  21. Li, Y.; Li, H.; Cai, Z. Fast orthogonal Haar transform pattern matching via image square sum. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1748–1760. [Google Scholar] [CrossRef]
  22. Crow, F. Summed-area tables for texture mapping. SIGGRAPH 1984, 18, 207–212. [Google Scholar] [CrossRef]
  23. Viola, P.; Jones, M. Robust real-time object detection. Int. J. Comput. Vis. 2001, 57, 37–154. [Google Scholar]
  24. Ito, I.; Egiazarian, K. Two-dimensional orthonormal tree-structured Haar transform for fast block matching. J. Imaging 2018, 4, 131. [Google Scholar] [CrossRef] [Green Version]
  25. Egiazarian, K.; Astola, J. Tree-structured Haar transform. J. Math. Imaging Vis. 2002, 16, 269–279. [Google Scholar] [CrossRef]
  26. Haar, A. Zur theorie der orthogonalen functionsysteme. Math. Annal. 1910, 69, 331–371. [Google Scholar] [CrossRef]
  27. Ito, I.; Pižurica, A. Fast cube matching using orthogonal tree-structured Haar transform for multispectral images. In Proceedings of the 11th International Symposium on Image and Signal Processing and Analysis, Dubrovnik, Croatia, 23–25 September 2019; pp. 70–75. [Google Scholar]
  28. Standard Image Data BAse. Available online: http://www.ess.ic.kanagawa-it.ac.jp/app_images_j.html (accessed on 10 February 2020).
  29. Monno, Y.; Tanaka, M.; Okutomi, M. TokyoTech 5-Band Multisupectral Image Dataset and Demosaicking Codes. Available online: www.ok.sc.e.titech.ac.jp/res/MSI/MSIdata.html (accessed on 10 February 2020).
  30. Fast Full-Search Equivalent Pattern Matching Using Asymmetric Haar Wavelet Packets. Available online: https://wlouyang.github.io (accessed on 10 February 2020).
Figure 1. Binary tree and corresponding intervals for the basis. (a) Binary tree having N = 3 leaves, where each node is labeled with 0 and 1 outside the circle except for root; (b) intervals corresponding to the nodes; (c) the basis built from the intervals.
Figure 1. Binary tree and corresponding intervals for the basis. (a) Binary tree having N = 3 leaves, where each node is labeled with 0 and 1 outside the circle except for root; (b) intervals corresponding to the nodes; (c) the basis built from the intervals.
Jimaging 06 00004 g001
Figure 2. Specific property of 3D integral image. The sum of the intensities in region ABCD-EFGH whose diagonal starts at A: ( s X , s Y , s Z ) and ends at G: ( e X , e Y , e Z ) is obtained by seven additions.
Figure 2. Specific property of 3D integral image. The sum of the intensities in region ABCD-EFGH whose diagonal starts at A: ( s X , s Y , s Z ) and ends at G: ( e X , e Y , e Z ) is obtained by seven additions.
Jimaging 06 00004 g002
Figure 3. Three-dimensional orthonormal tree-structured Haar transform (3D-OTSHT) basis blocks built by subdivision.
Figure 3. Three-dimensional orthonormal tree-structured Haar transform (3D-OTSHT) basis blocks built by subdivision.
Jimaging 06 00004 g003
Figure 4. Percentage of extra patches remaining at K basis images/blocks in color images of size 256 × 256 .
Figure 4. Percentage of extra patches remaining at K basis images/blocks in color images of size 256 × 256 .
Jimaging 06 00004 g004
Figure 5. Percentage of extra patches remaining at K basis images/blocks in 5-band images of size 1824 × 1368 .
Figure 5. Percentage of extra patches remaining at K basis images/blocks in 5-band images of size 1824 × 1368 .
Jimaging 06 00004 g005
Figure 6. An example of the patches detected by a query patch in different methods at K basis images/blocks. The query patch of size 13 × 13 and the detected patches are shown in orange squares.
Figure 6. An example of the patches detected by a query patch in different methods at K basis images/blocks. The query patch of size 13 × 13 and the detected patches are shown in orange squares.
Jimaging 06 00004 g006
Figure 7. Mean ratio [%] of the elapsed time of each method at K basis images/blocks to the elapsed time of FS in color images of size 256 × 256.
Figure 7. Mean ratio [%] of the elapsed time of each method at K basis images/blocks to the elapsed time of FS in color images of size 256 × 256.
Jimaging 06 00004 g007
Figure 8. Mean ratio [%] of the elapsed time of each method at K basis images/blocks to the elapsed time of FS in 5-band images of size 1824 × 1368.
Figure 8. Mean ratio [%] of the elapsed time of each method at K basis images/blocks to the elapsed time of FS in 5-band images of size 1824 × 1368.
Jimaging 06 00004 g008
Table 1. The number of ground truth patches.
Table 1. The number of ground truth patches.
Number of Patches
Data SetImage SizeBandScenesPatch SizeSamplesMin.MeanMax.
5 × 5 12012657541
9 × 9 12013086097
1 256 × 256 312 13 × 13 12012344937
17 × 17 1201511564
21 × 21 12014127
5 × 5 11016158103,161
13 × 13 11015634113,573
2 1824 × 1368 511 21 × 21 1101250889,534
30 × 30 1101154385,131
45 × 45 1101188274,967
Table 2. The number of additions and multiplications per pixel for searching patches having B bands with K basis images/blocks.
Table 2. The number of additions and multiplications per pixel for searching patches having B bands with K basis images/blocks.
MethodAdditionsMultiplications
2D-OTSHT-2DI-S and -W ( 2 + 8 K 1 ) B 3 K B
2D- and 3D-OTSHT-3DI 3 B 1 + 16 K 1 3 K
Table 3. Mean elapsed time and ratio to full search (FS) time in color images of size 256 × 256.
Table 3. Mean elapsed time and ratio to full search (FS) time in color images of size 256 × 256.
2D-OTSHT-2DI2D-OTSHT-2DI2D-OTSHT-3DI3D-OTSHT-3DI
FS -S+FS [24]-W+FS+FS [27]+FS (Proposed)
Size[ms]KTime [ms]Ratio [%]Time [ms]Ratio [%]Time [ms]Ratio [%]Time [ms]Ratio [%]
11.1826.571.1726.261.0423.231.0723.98
5 21.3029.071.3329.781.0323.181.0323.17
×4.4641.5534.891.6637.211.0623.791.0724.00
5 82.1047.122.3853.301.2427.781.2327.53
163.1570.773.6080.711.5935.751.5534.71
322.2349.96
11.4610.071.5510.671.6811.601.7011.72
9 21.449.921.5310.541.298.911.298.87
×14.5141.6211.181.7612.121.188.141.198.18
9 82.1414.742.3816.381.349.241.268.70
163.1321.583.5524.501.6411.311.5810.88
325.1135.195.8440.222.3015.862.1915.12
11.826.392.117.432.779.762.789.80
13 21.645.771.776.241.716.031.705.97
×28.3941.726.061.836.451.294.541.264.44
13 82.147.532.358.291.384.851.344.70
163.0010.553.3311.731.625.711.585.58
324.8016.905.3818.962.227.812.197.70
12.194.392.825.664.358.744.358.72
17 21.713.422.014.032.044.092.014.04
×49.8441.663.341.913.821.342.681.342.69
17 82.004.022.264.531.332.671.272.55
162.785.583.096.191.513.041.462.93
324.368.744.879.772.014.041.973.96
12.944.024.105.606.178.446.198.46
21 22.042.792.603.552.633.592.573.51
×73.1641.812.482.152.941.441.971.431.95
21 82.082.842.443.331.381.881.301.78
162.713.713.134.281.492.031.472.01
324.245.804.876.661.952.661.902.60
All the algorithms are written in C as single thread tasks, compiled with Xcode 10.1, and run on a macOS system with 4 GHz Intel core i7 and 16 GB RAM, where eight active processor cores with hardware multithreading are used.
Table 4. Mean elapsed time and ratio to FS time in 5-band images of size 1824 × 1368.
Table 4. Mean elapsed time and ratio to FS time in 5-band images of size 1824 × 1368.
2D-OTSHT-2DI2D-OTSHT-2DI2D-OTSHT-3DI3D-OTSHT-3DI
FS -S+FS [24]-W+FS+FS [27]+FS (Proposed)
Size[s]KTime [s]Ratio [%]Time [s]Ratio [%]Time [s]Ratio [%]Time [s]Ratio [%]
10.09934.9130.10235.8450.05218.2390.05117.873
5 20.12544.0070.12644.3730.05017.4370.05017.604
×0.28540.17360.9340.17962.7930.05218.3890.05318.487
5 80.27195.2020.286100.5770.06221.8960.05920.621
160.460161.5970.496174.3680.07827.3160.07325.471
320.09734.203
10.1377.7920.1699.6460.1267.1920.1257.141
13 20.1538.7420.17610.0650.0925.2500.0925.231
×1.75340.19611.1600.21712.3610.0774.3920.0774.401
13 80.28816.4530.31417.9100.0784.4730.0724.134
160.47326.9970.51229.2240.0905.1120.0834.724
320.84348.0940.91452.1680.1156.5870.1076.101
10.2114.3600.3246.6960.3026.2500.3016.235
21 20.1954.0350.2755.6810.1773.6580.1763.637
×4.83340.2214.5770.2765.7190.1222.5160.1222.525
21 80.3066.3220.3517.2660.1082.2280.0941.939
160.4839.9830.52910.9360.1112.2990.0982.034
320.83417.2520.89918.6070.1332.7470.1172.428
10.3523.7420.5896.2520.6166.5470.6166.547
30 20.2873.0530.4594.8710.3323.5270.3303.503
×9.41540.2863.0340.4184.4370.2092.2220.2072.195
30 80.3573.7880.4704.9900.1691.7980.1421.512
160.5275.5950.6236.6200.1541.6310.1331.407
320.8749.2840.98510.4600.1631.7300.1471.559
10.6423.0571.2025.7241.2946.1631.2976.178
45 20.4442.1130.8323.9610.6062.8880.6052.883
×20.99340.3711.7680.6513.1020.3281.5650.3271.556
45 80.4091.9480.6593.1380.2571.2240.1930.920
160.5542.6380.7683.6590.2111.0040.1640.781
320.8764.1731.0765.1240.2000.9520.1680.802
All the algorithms are written in C as single thread tasks, compiled with Xcode 10.1 and run on a macOS system with 4 GHz intel core i7 and 16 GB RAM, where eight active processor cores with hardware multi-threading are used.
Table 5. The number of ground truth patches of size power-of-2.
Table 5. The number of ground truth patches of size power-of-2.
Number of Patches
DatasetImage SizeBandScenesPatch SizeSamplesMin.MeanMax.
1 256 × 256 312 16 × 16 120115732
2 1824 × 1368 511 16 × 16 11014880100,278
32 × 32 1101137091,325
Table 6. Mean elapsed time and ratio to FS time for patches of size power-of-2.
Table 6. Mean elapsed time and ratio to FS time for patches of size power-of-2.
Data 2D-OTSHT2D-OTSHT2D-OTSHT3D-OTSHT-3DI
SizeSetFSA2DHT [20]-2DI-S+FS [24]-2DI-W+FS-3DI+FS [27]+FS (Proposed)
[ms][%][ms][%][ms][%][ms][%][ms][%][ms][%]
16 × 16146.041002.655.761.693.681.944.221.332.891.232.67
( K = 4 )( K = 4 )( K = 4 )( K = 8 )
[s][%][s][%][s][%][s][%][s][%][s][%]
16 × 1622.5171000.0993.9330.1365.3980.1666.6140.0692.7280.0672.647
( K = 2 )( K = 2 )( K = 4 )( K = 8 )
32 × 32210.8541000.1241.1420.2462.2620.3423.1550.1201.1050.1080.995
( K = 4 )( K = 4 )( K = 16 )( K = 16 )
All the algorithms are written in C as single thread tasks, compiled with Xcode 10.1, and run on a macOS system with 4 GHz Intel core i7 and 16 GB RAM, where eight active processor cores with hardware multithreading are used.

Share and Cite

MDPI and ACS Style

Ito, I.; Pižurica, A. Three-Dimensional Block Matching Using Orthonormal Tree-Structured Haar Transform for Multichannel Images. J. Imaging 2020, 6, 4. https://doi.org/10.3390/jimaging6020004

AMA Style

Ito I, Pižurica A. Three-Dimensional Block Matching Using Orthonormal Tree-Structured Haar Transform for Multichannel Images. Journal of Imaging. 2020; 6(2):4. https://doi.org/10.3390/jimaging6020004

Chicago/Turabian Style

Ito, Izumi, and Aleksandra Pižurica. 2020. "Three-Dimensional Block Matching Using Orthonormal Tree-Structured Haar Transform for Multichannel Images" Journal of Imaging 6, no. 2: 4. https://doi.org/10.3390/jimaging6020004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop