An Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weights

Lai, Xiangjun; Yang, Bo; Ma, Botao; Liu, Mingzhe; Yin, Zhengtong; Yin, Lirong; Zheng, Wenfeng

doi:10.3390/app13010514

Open AccessArticle

An Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weights

by

Xiangjun Lai

¹,

Bo Yang

¹

,

Botao Ma

¹,

Mingzhe Liu

^1,2,*

,

Zhengtong Yin

^1,3

,

Lirong Yin

⁴

and

Wenfeng Zheng

^1,*

¹

School of Automation, University of Electronic Science and Technology of China, Chengdu 610054, China

²

School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325000, China

³

College of Resource and Environment Engineering, Guizhou University, Guiyang 550025, China

⁴

Department of Geography and Anthropology, Louisiana State University, Baton Rouge, LA 70803, USA

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(1), 514; https://doi.org/10.3390/app13010514

Submission received: 29 November 2022 / Revised: 24 December 2022 / Accepted: 27 December 2022 / Published: 30 December 2022

(This article belongs to the Special Issue Advances in Signal and Image Processing for Biomedical Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Stereo matching is the operation of obtaining the parallax value between two images by matching all the corresponding image points in the two images, thus obtaining the dense parallax image between the two images. How to obtain accurate disparity images has always been a key point in the field of stereo vision. Presently, in the research of 3D reconstruction technology based on binocular stereo vision, the main research direction of domestic and foreign scholars is to improve the efficiency and accuracy of stereo matching, and there is research literature on soft tissues. This paper proposes an improved stereo matching algorithm based on joint similarity measures and adaptive weights. The algorithm improves the matching cost calculation based on the joint similarity measure to fit the color image of the heart soft tissue. At the same time, the algorithm uses the idea of graph cutting to improve the adaptive weight. The experimental results show that both the improved joint similarity measure and the improved adaptive weight can effectively reduce the mismatch rate. In addition, the corresponding matching effect is better than using only one of the improved joint similarity measures.

Keywords:

1. Introduction

Binocular stereo vision uses two cameras to obtain two-dimensional images of the same scene at different positions simultaneously. The parallax image between the two images is obtained by matching the two images by computer. By relevant calculation combined with the parameters of the binocular camera, the 3D point cloud data corresponding to the object surface can be obtained. The 3D reconstruction of the corresponding scene is realized based on the obtained point cloud data [1,2]. Zhang [3] et al. have used the spatiotemporal correlation of 3D motion in 3D reconstruction technology. Currently, feature-based 3D reconstruction and tracking technology is widely applied in the medical field [4]. Binocular stereo vision is widely used in robot vision, industrial measurement, medical information processing, and other fields because of its simple equipment and easy operation.

The taxonomy of stereo matching can be divided into two types: local and global. Among them, the taxonomy of the local stereo algorithm can be divided into three aspects: region, function, and phase. Based on the global situation, it can be divided into dynamic programming stereo matching algorithm, graph cutting stereo matching algorithm, belief propagation stereo matching algorithm, and other global-based stereo matching algorithms. As a research difficulty and hotspot in stereo vision, many scholars have conducted the work of studying the three-dimensional sound matching algorithm and solved various problems in stereo matching. For example, in the research of stereo matching algorithms based on the local area and the problem of support window selection in regional stereo matching, Kanade T [5] has been widely used in the research of stereo matching. For the problems of high algorithm complexity and low matching efficiency in the adaptive window, Fusiello et al. [6] proposed a matching method based on a multi-window mechanism, which selects the optimal matching window from a certain number of pre-set windows of different shapes, thus reducing the complexity of the algorithm to a certain extent; later, to further improve the matching precision, Samadi et al. [7] proposed a matching method that introduces non -parameter transformations on the basis of adaptive windows to achieve the rapid and accurate three -dimensional matching of the image, and its effect is more satisfactory. In this work, a matching method that is based on the adaptive window and non-parametric transformation is proposed to realize fast and accurate stereo matching of images. The effect is satisfactory. Aiming at the problem that it is difficult to extract accurate and unique features in a feature-based matching algorithm, Lowe proposed in 1999 and further improved a new local feature extraction operator sift [8,9], the corresponding feature points have a suitable effect on scale, rotation, translation, and illumination invariance, and can avoid the influence of affine transformation and noise to a certain extent. Since then, research in this direction has been continuously promoted, and many improved versions of sifting have emerged [10,11].To solve the problem of the phase winding in phase acquisition in a phase-based stereo matching algorithm, Zadeh et al. [12] and Hawi et al. [13] respectively proposed a method that takes the correlation of band-pass signal obtained by wavelet transform as the matching cost and a stereo matching algorithm that uses phase correlation function to obtain the similarity between corresponding image points, which effectively improves this problem, but there are still some problems in the phase-based matching algorithm, that is, the matching accuracy will decrease with the increase in disparity search range, At the same time, when the output signal amplitude of the band-pass filter is too low, the phase singularity problem may occur, which leads to mismatching.

To overcome the problem that continuity constraints in horizontal and vertical directions cannot be effectively fused in dynamic programming, avoid the generation of fringe defects, and obtain higher matching accuracy, Boykov et al. [14] proposed an energy function optimization algorithm for graph cuts. Based on this algorithm, many improved algorithms should be put into operation [15,16]. This kind of algorithm is the best matching algorithm at present, but it has high complexity and poor real-time performance. In addition, many foreign scholars have studied stereo matching algorithms based on belief propagation [17,18], Markov random fields [19,20] and artificial intelligence [21,22].

At present, the research on the three-dimensional vision of the eyes is mainly concentrated on the three-dimensional sound matching algorithm, but its application background is mainly in industry, aerospace, and other aspects [22]. However, the research on the three-dimensional reconstruction of the corresponding soft tissue in the surgical robot, which is based on the stereo endoscope, is less [23]. In this research, much attention is mainly focused on the three-dimensional reconstruction of sparse points. This three-dimensional reconstruction involves two steps. Firstly, the sparse three-dimensional space points are reconstructed. Then, through the sparse points, we can fit the corresponding surface. This method has a great error.

In this study, we focus on a stereo matching algorithm based on the absolute value of gray difference and the Hamming distance corresponding to the census transform of the joint similarity and adaptive weight. Among them, the matching cost calculation method based on a joint similarity measure, the matching cost aggregation method based on adaptive weight, and the specific forms of disparity calculation and post-processing operations are studied. Then, some problems in it are studied, corresponding improvement schemes are proposed, and the matching cost calculation method based on joint similarity measure in the original algorithm is improved. The absolute value of each color channel’s grayscale difference of two pixels in the color image [24] and the Hamming distance corresponding to the improved census transform is used as their joint similarity measure, and some problems existing in the adaptive weight in the original algorithm are improved. Drawing on the idea of graph cut, certain segmentation rules are equations, and each pixel is segmented by the color and distance similarity between each pixel in the central pixel and the support window.

2. Dataset

In this study, some experiments on the improved stereo matching algorithm, which is based on joint similarity measure and adaptive weight, are carried out. The algorithm is programmed on MATLAB 2010b. The experimental materials used in the experiment are two sets of heart model stereo images and corresponding CT provided by the Imperial University of England. The scan data and a set of actual stereoscopic images of heart soft tissue are available on the open data website of the Imperial University of England (https://imperialcollegelondon.app.box.com/s/kits2r3uha3fn7zkoyuiikjm1gjnyle3 (accessed on 1 February 2022)). The standard parallax map is to take the left image as the reference image, according to the CT. After scanning the 3D point cloud data and the corresponding stereo endoscope vision system model, the disparity search range is [0, 40] and [0, 50], respectively. The actual scene image is the heart soft tissue stereo image corrected by the image correction algorithm, and the matching accuracy of the algorithm is evaluated by the evaluation standard of mismatched pixels. The evaluation standard of mismatched pixels is shown in Equation (1) [8,9].

\{\begin{matrix} B = \frac{1}{N} \sum_{(u, v)} η_{(u, v)} \\ η_{(u, v)} = \{\begin{matrix} 1 & , |d_{c} (u, v) - d_{t} (u, v)| > δ_{d} \\ 0 & , |d_{c} (u, v) - d_{t} (u, v)| \leq δ_{d} \end{matrix} \end{matrix}

(1)

where

(u, v)

refers to the pixel coordinates of the point,

d_{c} (u, v)

and

d_{t} (u, v)

refer to the calculated parallax value and standard parallax value of the corresponding point,

N

refers to the number of pixel points in the image,

δ_{d}

refers to the set parallax value deviation threshold, and

B

refers to the ratio of the number of pixel points and all pixel points whose deviation between the parallax map and the standard parallax map is greater than the set threshold value. The smaller it is, the higher the matching accuracy and the better the algorithm performance.

Since the standard parallax map corresponding to the heart model image adopted in the experiment was obtained through the three-dimensional point cloud data of CT scan, the visual system model of the corresponding stereo endoscope and related calculations, its parallax value was in the form of double precision, that is, it itself contained certain decimal numbers. In order to set a reasonable allowable error range,

δ_{d} = 1.5

parallax deviation threshold was set here. If the disparity between the obtained parallax value and the standard parallax value is greater than 1.5, it is identified as the wrong matching point; otherwise, it is identified as the correct matching point.

3. Methods

In binocular stereo vision, an image pair is obtained by the binocular vision system, and then the parallax image between the two images is obtained after the image pair is processed by a stereo matching algorithm. Among them, the feature-based stereo matching algorithm can obtain sparse disparity images quickly; the global stereo matching algorithm can obtain global optimal dense disparity image, but the complexity of the global stereo matching algorithm is too high, which is not suitable for real-time processing environment; the complexity of the region-based stereo matching algorithm is relatively low and can obtain dense disparity map. In view, the objective of this project is to reconstruct the surface of heart soft tissue, so we need a dense disparity map. At the same time, considering the real-time requirements in the process of endoscopic use, the idea of region-based stereo matching is selected as the main idea of the stereo matching algorithm in this project.

3.1. The Stereo Matching Algorithm in this Paper

The stereo matching algorithm in this project is a new stereo matching algorithm based on the region stereo matching algorithm. The matching cost calculation method combines SAD [25] and census non-parametric transformation. The aggregation of matching costs is based on the idea of adaptive weight, which changes the weight of each pixel according to the local distribution of pixels in the support window. Finally, the dense disparity image is obtained by left-right consistency detection and post-processing (interpolation, median filtering, etc.).

The matching cost aggregation based on adaptive weight first assigns weight to each pixel in the support window based on the gray scale and distance similarity between each pixel in the support window and the central pixel, that is, the closer the color is and the smaller the distance is, the greater the weight will be. Then, the aggregation matching cost of all pixels in the support window can be obtained according to the assigned weight and corresponding matching cost. This method can effectively avoid the mismatching phenomenon caused by depth discontinuity and color difference. The corresponding gray scale and distance similarity in the adaptive weight refers to the absolute gray scale difference between two pixels and the geometric distance between pixel coordinates, respectively; that is, the weight is assigned according to the absolute gray scale difference between each pixel in the support window and the central pixel and the geometric distance between pixel coordinates. The distribution rule is shown in Equation(2).

\{\begin{array}{l} w (p, q) = \exp (- \frac{D_{c} (q)}{λ_{c}} - D_{d} (q) / λ_{d}) \\ D_{c} (q) = | I (q) - I (p) | \\ D_{d} (q) = \sqrt{{(u_{q} - u_{p})}^{2} + {(v_{q} - v_{p})}^{2}} \end{array}, q \in N (p)

(2)

where p is the center pixel of the supporting window,

N (p)

is the set of all pixel points in the window,

w (p, q)

is the weight of pixels in the support window,

{(u}_{q}, v_{q})

,

{(u}_{p}, v_{p})

are pixel coordinates of point p and point q, respectively.

D_{c} (q)

,

D_{d} (q)

are the absolute value of gray difference and pixel coordinate geometric distance between point p and point q, respectively.

λ_{c}

,

λ_{d}

are gray and distance factors, respectively, which are used to adjust the influence of the absolute value of gray difference and the geometric distance of pixel coordinates on the weight.

3.2. Matching Cost Calculation Method

The calculation of matching cost is the first step of stereo matching operation, and its calculation method directly affects the accuracy of matching results and the complexity of the matching algorithm. The SAD algorithm is based on the difference of pixel gray values. It assumes that the gray values of the same feature point in the two images are the same, and the matching accuracy is directly related to the size of the support window. Its advantage is that the calculation is simple and fast, and it can obtain dense disparity images. However, it is difficult to complete the correct matching for large areas without texture or weak texture, and it is difficult to complete the correct matching for large areas without texture or weak texture. Shadows and noise are very sensitive. The structure of the census algorithm is relatively simple, the amount of calculation is small, and it can also obtain dense disparity images. The calculation result is only related to the comparison between the intensity values of gray pixel values. Therefore, it has suitable robustness and can effectively avoid the influence of radiation distortion and halo. However, it is easy to produce false matching in texture repetition or similar areas, and the matching performance will be reduced. To combine the advantages of the two algorithms and reduce the mismatching when the two algorithms are used alone, a weighted fusion of the matching cost calculation methods of the two matching algorithms is carried out in this paper to obtain a new matching cost calculation method.

The SAD algorithm uses the sum of absolute values of the difference of gray values of all pixels in the support window as its matching cost. Considering that the existing image is generally a color image, to make full use of the information contained in the color image, the sum of the absolute values of the difference between the gray values of the three-color channels of the color image is used as the matching cost of two pixels. Then, the two points’ corresponding matching cost in the left and right image support Windows is shown in Equation (3).

C_{A D} (p, d) = \sum_{c \in {r, g, b}} |{I_{l}}^{c} (p) - {I_{r}}^{c} (p - d)|

(3)

Among them, the gray similarity between the two pixels corresponding to the value

C_{A D} (p, d)

,

{I_{l}}^{c} (p)

and

{I_{r}}^{c} (p - d),

respectively, represents the gray value of the pixel in the left image and the corresponding pixel in the right image on the color channel

c (c \in {r, g, b}

), which refers to the visual difference.

Noise is an inevitable attribute when a camera acquires an image. It is a point with a jump in the intensity value compared with the surrounding area, so it will lead to a high matching cost. To reduce its adverse effects, truncated absolute differences (TAD) are introduced here, as shown in Equation (4).

C_{A D}^{'} (p, d) = \min \{T_{c}, C_{A D} (p, d)\}

(4)

where

T_{c}

is the set truncation absolute error value.

The census algorithm is a non-parametric transformation method that transforms the pixels in the matching window according to the transformation rules. The size relationship between the gray value of the center pixel of the window and the neighboring pixels in the window is transformed into a continuous bit string. The Hamming distance of the bit string of the corresponding block in the left and right images is used to measure the matching degree. The smaller the distance, the higher the matching degree.

However, the basic census method is used for the gray image. To make full use of the information contained in the color image, the color distance between the neighborhood pixel and the center pixel in the matching window is used as a reference. To reduce the algorithm’s complexity, the Manhattan distance is selected as the color distance, as shown in Equation (5).

d_{m c} (p, q) = \sum_{c \in {r, g, b}} |I^{c} (p) - I^{c} (q)|

(5)

where

d_{m c} (p, q)

is the Manhattan distance of the color between the points, and

I^{c} (p)

refers to the gray value on the corresponding color channel

c (c \in {r, g, b}

) at the point

p

.

The traditional census algorithm is very dependent on the center pixel’s gray value. That is, if the center point changes suddenly due to noise, the bit string generated by it will no longer be correct. That is, it is vulnerable to noise interference. To solve this problem, the standard deviation of Manhattan color distance between all neighborhood pixels and center pixels in the support window and the difference between Manhattan color distance corresponding to neighborhood pixels and average Manhattan color distance between neighborhood pixels are selected to replace the gray values of center and neighborhood pixels in the traditional census algorithm, so that the gray value of center and neighborhood pixels can be replaced. When the pixels change suddenly due to noise, the color image census transform still has suitable rubric properties. At this time, the color image census transformation rule is shown in Equation (6).

\{\begin{array}{l} C (p) = \underset{q \in N (p)}{\otimes} δ (d_{m} (q), d_{std} (p)) \\ δ (a, b) = \{\begin{matrix} 1 \begin{matrix} , & a < b \end{matrix} \\ 0 \begin{matrix} , & a \geq b \end{matrix} \end{matrix} \end{array}

(6)

where

q

is any pixel in the support window,

d_{std} (p)

is the standard deviation

d_{m} (q) = d_{m c} (q) - d_{m e a n} (p)

of Manhattan color distance between all pixels in the neighborhood and the center pixel, and is the difference between the Manhattan color distance between the point

q

and the center point

p

and the average value of the Manhattan color distance between all pixels in the neighborhood and the center pixel, and

d_{m e a n} (p)

is shown in Equation (7).

d_{m e a n} (p) = \frac{1}{N} \sum_{q \in N_{p}} d_{m c} (p, q)

(7)

The left and right color images are transformed according to the census transformation rules shown in Equation (4). That is, after the transformation, the corresponding values of each pixel in the left and right images are bit strings composed of 0 and 1. That is, when matching, the matching cost, which is between the corresponding two points in the left and right images, is the Hamming distance between the corresponding bit strings of these two points, and the matching cost is shown in Equation (8).

C_{c e n} (p, d) = H a m m i n g ((C (p), C (p - d))

(8)

By combining the matching cost of SAD and census algorithm, the matching cost (9) between the corresponding two points in the left and right image support Windows is obtained. Among them,

λ

is to adjust the specific gravity between SAD and census.

C (p, d) = λ C_{A D}^{'} (p, d) + (1 - λ) C_{c e n} (p, d)

(9)

3.3. Match Cost Aggregation Methods

In the aggregation operation of matching cost, the algorithm selects the support window with a fixed size and shape. Matching cost aggregation means that the matching costs corresponding to all pixels in the support window are accumulated to obtain the overall matching cost of left and right image regions. After aggregating, the matching cost is between the corresponding pixels in the left and right images. However, this method is often based on the premise that all pixels in the window have similar depth information. In fact, in the edge areas such as depth discontinuity and color difference, it often does not meet this condition, so it is easy to produce false matching.

To solve this problem, this algorithm uses an adaptive weight method based on color similarity and distance similarity to aggregate the matching cost, that is, the more similar the color of the neighborhood points and the center points, the greater the allocation weight; the closer the space distance between the inner point and the center point, the greater the allocation weight. Among them, color similarity refers to the degree of color similarity between two pixels. To reduce the amount of calculation, Manhattan color distance is used to describe; distance similarity refers to the geometric distance between two pixels, which is described by the Euclidean distance between pixel coordinates, as shown in Equation (10).

d_{e} (p, q) = \sqrt{(u_{q} - u_{p})^{2} + (v_{q} - v_{p})^{2}}

(10)

where

(u, v)

is the pixel coordinate of the pixel,

d_{e} (p, q)

is the Euclidean distance between the two points

p

,

q

.

For a central pixel

p

, the weight of any pixel

q

in the support window is determined by the color distance and space distance corresponding to the point

p

, and the weight is shown in Equation (11)

w (q) = \exp (- d_{e} (p, q) / λ_{e} - d_{m c} (p, q) / λ_{c})

(11)

3.4. Parallax Calculation and Post-Processing

After the aggregation calculation of the matching cost, the disparity value corresponding to the minimum matching cost is selected as the disparity value of the corresponding pixel according to the WTA principle, as shown in Equation (12).

d_{p} = a r g m i n E (p, d), d \in D

(12)

D is the set of all parallax values within the allowable parallax range.

In the disparity map based on the WTA principle, there are some mismatching points. To eliminate these mismatches, the left-right consistency constraint is used to detect the resulting disparity map. In stereo matching, when the left image is taken as the reference image, a point p ‘is searched in the right image as the midpoint of the left image. If the right image is used as the reference image, the matching point of “Midpoint” in the left image must be a point; if not, it is an error-matching point.

The left-right consistency constraint is shown in Equation (13). The left-right consistency detection is to use the obtained initial disparity map with the left and right images as the reference image to determine whether the disparity values of the corresponding matching points in the left and right images are equal. If they are equal, they are considered correct matching points. If they are not equal, they are identified as wrong matching points, and their parallax values are determined to be invalid so that their parallax values are 0.

d l (p) = d r (p - (d l (p)))

(13)

where

d l (p), d r (p - (d l (p)))

is the parallax values of the midpoint

p

of the left and the midpoint

p - (d l (p))

of the right image, respectively.

For the detected mismatching points, if the difference between the left and right pixels is effective and close to each other, if it does not meet this condition, that is, it is not a single mismatching point, and the corresponding image is taken as the reference image. For the point with an effective disparity value in the window, according to the gray level and distance between the pixels in the window and the central pixel, the disparity value of the point in the window can be directly replaced by the average value of the disparity value. For the mismatched points in the window, the weight is 0, and the average value of the weight disparity value of all points in the window is used to replace the disparity value of the current point, as shown in Equation (14).

d_{p} = \frac{\sum_{q \in N_{p}} w_{q} d_{q}}{\sum_{q \in N_{p}} w_{q}}

(14)

Among them,

d_{p}

is the new disparity value of non-single mismatched point

P

,

N_{p}

is the set of all pixels in the window,

w_{q}

and

d_{q}

are the weight value and disparity value of point q, respectively.

Because the parallax obtained in the matching process is an integer, and the matching point of the actual left image in the right image is almost impossible to fall on a complete pixel in the right image, that is, the phenomenon of precision loss exists in the previous matching operation. To reduce the precision loss, the parallax image is refined by the sub-pixel enhancement method to obtain sub-pixel precision. The interpolation Equation of the parallax image is shown in Equation (15).

d^{'} = d + \frac{E (p, d_{-}) - E (p, d_{+})}{2 (2 E (p, d) - E (p, d_{-}) - E (p, d_{+}))}

(15)

Among them,

E (p, d_{-})

,

E (p, d)

,

E (p, d_{+})

represent the matching cost of point

p

when the parallax is

d - 1

,

d,

and

d + 1

, respectively,

d^{'}

is the Subpixel-precision disparity value obtained after interpolation. Finally, median filtering is used to smooth the obtained refined disparity map to eliminate possible noise points.

3.5. Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weight

In this paper, the three-dimensional image of the heart soft tissue map is a color image. Compared with a gray image, the color image contains more information. To make full use of the information contained in the color image, this section improves the basic stereo matching algorithm, which is based on joint similarity measure and adaptive weight, To make the best of color image information. Secondly, some problems in adaptive weight are improved.

The implementation process of the improved stereo matching algorithm, which is based on joint similarity measure and adaptive weight, is as follows: the sum of absolute gray difference values of color channels between two pixels (called CAD) and Hamming distance between the corresponding bit strings of improved census transform (called CAD) is as follows. The linear combination of the two is used as the similarity measure. Therefore, in the process of the algorithm, the modified census transform is applied to the left and right images after correction, and then the corresponding CAD and CHD values are obtained according to the gray values of each channel of the left and right images and the improved census transform values, and then the corresponding CAD is obtained. Then, according to the improved adaptive weight method, each pixel in the support window is assigned a weight, and the matching cost of all pixels in the support window is aggregated. After the aggregated matching cost is obtained, the WTA is used to calculate the matching cost. In principle, the best matching points of each pixel are searched, the initial disparity map with the left and right images as reference images is obtained, and the disparity map post-processing operation is carried out on the obtained initial disparity map,

3.6. Matching Cost Calculation Based on the Improved Joint Similarity Measure

Considering that the corresponding heart soft tissue image in this project is a color image, to make full use of the information contained in the color image, the sum of the absolute values of the difference between the gray values of the three-color channels in the color image is selected as the matching cost between the two pixels, and the two points’ corresponding matching cost in the left and right image support Windows is shown in Equation (16).

C_{C A D} (p, d) = \sum_{c \in {r, g, b}} |I_{l}^{c} (p) - I_{r}^{c} (p - d)|

(16)

The traditional census algorithm is used for the grayscale image. In order to make full use of the information contained in the color image, the color distance between the neighborhood pixel and the center pixel in the matching window is used as the feature of each point. To reduce the complexity of the algorithm, Manhattan distance is selected as the color distance, as shown in Equation (17).

d_{m c} (p, q) = \sum_{c \in {r, g, b}} |I^{c} (p) - I^{c} (q)|

(17)

where

d_{m c} (p, q)

is the Manhattan distance of the color between the points p and q,

I^{c} (p)

refers to the gray value of the corresponding color channel

c (c \in {r, g, b})

at the point.

The traditional census algorithm is very dependent on the center pixel’s gray value. That is, if the center point changes suddenly due to noise, the bit string generated by it will no longer be correct. To solve this problem, the standard deviation of the Manhattan color distance between all the neighborhood pixels and the center pixels in the support window is used to replace the center point in the traditional census algorithm, and the difference between the Manhattan color distance corresponding to the neighborhood pixels and the average Manhattan color distance is used to replace the traditional census. In the algorithm, the gray value of the neighboring pixels in algorithm makes the whole change relatively even when the gray value of the central pixel changes due to the influence of noise, making the corresponding bit string basically unchanged, which ensures the census transform’s robustness. The improved census transform rule is shown in Equation (18).

\{\begin{matrix} C_{C} (p) = \underset{q \in N (p)}{\otimes} δ (d_{m} (q), d_{s t d} (p)) \\ δ (a, b) = \{\begin{matrix} 1, & a < b \\ 0, & a \geq b \end{matrix} \end{matrix}

(18)

Among them,

d_{s t d} (p)

is the standard deviation of Manhattan color distance between all neighborhood pixels and center pixels in the support window,

d_{m} (q) = d_{m c} (q) - d_{m e a n} (p)

, which is the average value of Manhattan color distance (q) between point and center point

P

and Manhattan color distance

d_{m c} (q)

between all pixels except the center point in the neighborhood

d_{m e a n} (p)

. The expression of

d_{m e a n} (p)

is shown in Equation (19), where

N (p)

is the set of all pixels except the center pixel in the support window, and

N

is the number of midpoints

N_{p}

.

d_{mean} (p) = \frac{1}{N} \sum_{q \in N_{p}} d_{m c} (p, q)

(19)

The left and right color images are transformed according to the improved census transform rule shown in Equation (18), and a bit string composed of 0 and 1 is obtained. When matching between the left and right images, the matching cost in the two corresponding points is the Hamming distance between the corresponding bit strings of these two points. The matching cost is shown in Equation (20).

C_{C H} (p, d) = H a m m i n g ((C_{C} (p), C_{C} (p - d))

(20)

where

C_{C} (p)

,

C_{C} (p - d)

is the result of the improved census transform of the pixels in the right image corresponding to the point to be matched, and the parallax value is d, respectively, and

C_{C H} (p, d)

is the matching cost of the parallax value.

To obtain the joint similarity measure composed of

C_{C A D} (p, d)

and

C_{C H} (p, d)

, the gray values of each color channel of the color image are normalized, and then the sum of the absolute values of the gray difference of each color channel is calculated. The result is called CAD, and the ratio of Hamming distance of the corresponding census transform to the length of a bit string is obtained, as shown in Equation (20). The result is called CHD, in which

N

is the length of the bit string.

C_{C H}^{'} (p, d) = \frac{C_{C H} (p, d)}{N}

(21)

To avoid the influence of noise and unmatched points, truncated absolute differences (TAD) are introduced here, and the matching cost is shown in Equation (22).

C (p, d) = λ m i n \{T_{a}, C_{C A D} (p, d)\} + (1 - λ) m i n \{T_{h}, C_{C H}^{'} (p, d)\}

(22)

Among them, T_a is the set CAD truncation absolute error value, and T_h is the set CHD truncation absolute error value, which is to adjust the proportion between CAD and CHD.

3.7. Matching Cost Aggregation Based on Improved Adaptive Weight

The size and shape of the support window must be determined first in the aggregation operation of matching cost. Since the general support window is a square window with a certain size, which contains all the pixels in the window, to reduce the operation time, this algorithm implements the improved census. The sparse support window is selected when the transformation and matching cost aggregation. The paper [27] compares the matching effect based on the sparse window and the ordinary window and proves the feasibility of the sparse window. The sparse window and the ordinary window are shown in Figure 1. If the window size is

(2 n + 1) \times (2 n + 1)

, the number of corresponding points in the common window is

(2 n + 1) \times (2 n + 1)

. However, the sparse window has only

(n + 1) \times (n + 1) + 1

, which is about 1/4 of the normal window.

When dealing with the problem of mismatching in edge regions, such as depth discontinuity, the matching cost aggregation method based on graph cuts can often achieve better results than adaptive weights. However, the matching accuracy corresponding to the method based on graph cuts depends heavily on the segmentation effect. In general, the cost of obtaining high matching accuracy is to enhance the robustness of the segmentation algorithm, and this operation also makes the overall complexity of the algorithm very high. That is, it is not suitable for occasions that require real-time performance. Based on adaptive, the weight method is relatively stable and less complex than the graph cut algorithm, but its matching accuracy is lower than that of the graph cut method.

To combine the advantages of graph cuts and adaptive weights, based on the adaptive weight method, we first use the color and distance similarity between the pixels in the support window and the center pixel to segment each pixel in the support window. If the pixel satisfies the set conditions, it is determined that it belongs to the same segmentation unit as the center point, and the maximum weight value 1 is assigned to it; if not, it is determined according to the color and distance similarity between the two pixels. Assigning weights, that is, the complexity of the improved adaptive weighting algorithm, remains unchanged. The rule for judging whether it belongs to the same division unit as the center point is shown in Equation (23).

q \in S_{p} \Leftarrow \{\begin{array}{l} D_{c} (p, q) \leq c_{1} D_{d} \leq d_{1} \\ D_{c} (p, q) \leq c_{2} d_{1} < D_{d} \leq d_{2} \end{array}

(23)

where

S_{p}

is the corresponding segmentation units of the central pixel

p

,

D_{c} (p, q), a n d D_{d} (p, q)

are the color difference and distance between the two points

p, q

, as shown in Equation (24),

c_{1}, c_{2}

are the threshold value of the set color difference, and

c_{1} < c_{2}

;

d_{1}, d_{2}

are the threshold value of the distance set. When the distance between two pixels is very close, a slightly loose range of color difference is used to judge whether the two pixels belong to the same segmentation unit as the center pixel. When the distance increases, the color threshold is reduced. It is considered that only when the color is very close can they be recognized as belonging to the same segmentation unit. When the distance is too large, it is considered that they do not belong to the same unit.

\{\begin{matrix} D_{c} (p, q) = \sqrt{{(R_{p} - R_{q})}^{2} + {(G_{p} - G_{q})}^{2} + {(B_{p} - B_{q})}^{2}} \\ D_{d} (p, q) = \sqrt{{(u_{q} - u_{p})}^{2} + {(v_{q} - v_{p})}^{2}} \end{matrix}

(24)

According to the method, the weight distribution rules of each pixel in the support window can be obtained, as shown in Equation (25).

w (p, q) = \{\begin{matrix} 1 & , q \in S_{p} \\ e x p (- D_{c} / λ_{c} - D_{d} / λ_{d}), & o t h e r \end{matrix}

(25)

Among them,

λ_{c}

and

λ_{d}

are color and distance factors, which are used to adjust the influence of color difference and distance on weight.

When matching, the weight of each pixel in the support window corresponding to the left and right images is considered. That is, the aggregate matching cost corresponding to two points in the left and right images is shown in Equation (26).

E (p_{l}, d) = \frac{\sum_{q_{l} \in N_{p l}, q_{r} \in N_{p r}} w (p_{l}, q_{l}) w (p_{r}, q_{r}) C (q_{l}, d)}{\sum_{q_{l} \in N_{p l}, q_{r} \in N_{p r}} w (p_{l}, q_{l}) w (p_{r}, q_{r})}

(26)

where

d

is the parallax between the matching points,

p_{l}

and

p_{r}

are the center point in the corresponding support window in the left and right images,

q_{l}

and

q_{r}

are the corresponding matching point in the left and right image support windows, respectively.

C (q_{l}, d)

is the matching cost of

q_{l}

and

q_{r}

.

4. Results

4.1. Improved Joint Similarity Measure Matching Experiment

To verify the feasibility of the improved census transform rule, the region stereo matching algorithm, which is based on the traditional census transform, and the region stereo algorithm, which is based on the improved census transform, are both implemented by programming. Both algorithms do not adopt the operation of matching cost aggregation, that is, directly using the support window to compare the corresponding bit strings with the census transformation rules to obtain the difference. In this way, the difference between the two algorithms is that they use different census transform rules. The region stereo matching algorithm is based on the traditional census transform corresponding to the gray image, while the improved stereo matching algorithm is based on the improved census transform corresponding to the color image. The two matching algorithms are used to stereo-match the corrected heart model images, and the image on the left is for reference without post-processing. As shown in Figure 2, the

31 \times 31

square pane is used as the support window.

The mismatching rates [26] of the disparity map corresponding to the traditional census and the improved census are counted, respectively, as shown in Table 1.

To compare the sensitivity of the traditional census algorithm and the improved census algorithm to illumination distortion, the left and right images of the two groups of heart models are adjusted, respectively. That is, 1/2 gray level, original image, and 3/2 gray level images are obtained by multiplying 1/2 and 3/2 based on the original image, respectively, and three different brightness images of one group of heart model images are shown in Figure 3.

These images are combined in pairs; that is, there are nine combinations. The traditional census algorithm and the improved census algorithm are respectively used to carry out stereo matching operations on the nine combinations of the two sets of image pairs. The traditional census uses the gray image obtained after the gray transformation of the corresponding color image. The improved census directly uses color images after brightness adjustment to calculate its mismatching rate in different combination forms, as shown in Figure 4.

The improved joint similarity measure is the Hamming distance between the sum of the absolute values of the gray level differences between the two pixels in the color image and the Hamming distance between the bit strings, which is obtained by using the improved census transform. To verify the feasibility of the improved joint similarity measure, the improved joint similarity measure is combined with the adaptive weight of the original stereo matching algorithm, which is based on the joint similarity measure and adaptive weight in this paper. The parallax image without post-processing is obtained with the left image for reference, as shown in Figure 5. The mismatching rates in the disparity map corresponding to the joint similarity measures before and after the improvement are respectively counted, as shown in Table 2.

4.2. Improved Adaptive Weight Matching Experiment

To verify the effectiveness of the matching cost aggregation method in this work, the improved joint similarity measure is combined with the adaptive weight method before and after the improvement to perform the stereo matching operation on two sets of heart model images and obtain the parallax image without post-processing with the left image as the reference image, as shown in Figure 6.

Comparing the standard parallax image with the parallax image of the corresponding result of the adaptive weight (AW) before and after improvement, it can be seen that under the same conditions, the complete parallax image can be obtained by the improved AW corresponding matching algorithm, but the matching result of AW before improvement has more mismatched areas than the corresponding matching results of AW after improvement, and the mismatching area is mainly distributed in areas with darker colors or certain light distortion and areas with obvious color changes, and the improved AW The corresponding matching result is closer to the standard parallax image.

The mismatching rates in the disparity map of AW corresponding matching results before and after the improvement are calculated, as shown in Table 3. It can be seen from the table that for the two sets of heart model images, the false matching rate of AW after improvement is lower than that before improvement, and the average false matching rate of AW before and after improvement is 7.99% and 5.92%, respectively, that is, the matching effect of improved AW corresponds to a more obvious improvement than that of AW before improvement, the main reason is that the improved AW draws on the idea of graph cutting, so it is more targeted when assigning weights to each pixel in the support window, so compared with the improved AW, the false matching point of the matching result is reduced. In addition, in the edge area of the color change, the matching effect is better.

4.3. Three-Dimensional Image Matching Experiment of Heart Soft Tissue

To verify the correctness of the calculation method of the matching cost of this algorithm, the mismatching rate of the two groups of heart model images without post-processing at different values is calculated, and the value range is [0, 1]. The mismatching rate and average mismatch rate of the disparity map corresponding to the two groups of heart model images are shown in Figure 7.

Before and after the improvement, the parallax map of the image with the left image as the reference and the corresponding mismatching rate of the stereo matching algorithm, which is based on joint similarity measure and adaptive weight before and after the improvement, are shown in Table 4.

After processing the initial disparity image at

λ = 0.5

, the final result disparity image is obtained, and the final result disparity map is compared with the standard disparity map. According to the above judgment rules of mismatching points, the distribution map of mismatching points is obtained. The mismatching points are represented in white, as shown in Figure 8.

To verify the effectiveness of post-processing, the corresponding mismatching rate of parallax images before and after post-processing is counted, as shown in Table 5.

To test the sensitivity of light distortion, the left and right images of the two sets of cardiac models are adjusted as grayscale. Based on the original image, multiply 1/2 and 3/2, respectively. The original picture and 3/2 gray degree three different brightness images, the left and right images are combined; that is, there are nine combinations. Use the algorithm, the matching algorithm composed of CAD and the improvement of adaptive weight methods (CAD+AW),the matching algorithm composed of CHD, and the improvement of adaptive weight methods (CHD+AW) to match these nine combinations, respectively. Moreover, obtain the error-matching rate corresponding to the result parallax figure, which is obtained from each algorithm for different brightness of the two-two combinations of the heart model image, and obtain the average of the corresponding error-matching rate of the two sets of cardiac model images. The average error-matching rate of the corresponding image of each brightness combination image is shown in Figure 9.

The stereo matching algorithm in this paper was used to perform the stereo matching operation on the actual stereo images of cardiac soft tissue corrected by the stereo endoscope, and the final parallax image was obtained with the left image as the reference image, as shown in Figure 10.

5. Discussion

This section mainly carries out some relevant experiments on the improved stereo matching algorithm, which is based on joint similarity measures and adaptive weight. The following discussion can be obtained from the experiments: With the comparison of the standard parallax image and the parallax image corresponding to the census transformation before and after the improvement, it can be seen that in the same case of supporting the size of the window, the traditional census algorithm and the improved census algorithm can obtain relatively complete parallax images. However, there are more obvious error-matching areas corresponding to the parallax images of the traditional census algorithm, and the improved census algorithms are closer to the standard vision image. Count the error-matching rate of the parallax images of the census algorithm before and after the improvement, and it is shown in Table 1. From the corresponding error-matching rate of the census algorithm before and after the improvement, it can be seen that the corresponding error-matching rate of the two sets of cardiac model images of the traditional census algorithm is more than 25%, while the algorithm after improvement is only about 14%, that is, the corresponding matching effect of the census algorithm that is improved has improved under the same conditions.

When brightness changes, the two algorithms have certain robustness for light distortion. However, the corresponding change in the improved census algorithm is relatively small, and its error-matching rate has decreased significantly.

By comparing the result parallax images with the standard parallax images corresponding to the joint similarity measure before and after the improvement, the joint similarity measure before and after the improvement can obtain a relatively complete dense parallax image to a certain extent. However, there are fewer obvious mismatches in the result parallax images corresponding to the improved joint similarity measure. The mismatching rate in the result parallax images corresponding to the joint similarity measure before and after the improvement is counted, respectively, as shown in Table 2. It can be seen from the table that for the two groups of heart model images, the corresponding mismatching rate of the joint similarity measure after the improvement is lower than that before the improvement, and the corresponding average mismatching rate before and after the improvement is 12.46% and 7.99%, respectively; that is, the matching accuracy after the improvement is greatly improved compared with that before the improvement.

Comparing the standard parallax image with the result parallax images of the adaptive weight (AW) before and after the improvement, it can be seen that under the same conditions, both matching algorithms can obtain a complete parallax image, but there are more mismatching areas in the AW before the improvement, and the mismatching areas are mainly distributed in the areas with a dark color or certain illumination distortion and the areas with obvious color changes, the matching result of the improved AW is closer to the standard parallax image. The mismatching rate in the parallax image of the AW corresponding to matching results before and after the improvement is counted, respectively, as shown in Table 3. It can be seen from the table that for the two groups of heart model images, the mismatching rate of the AW after the improvement is lower, and the average mismatching rate of the AW before and after the improvement is 7.99% and 5.92%, respectively; that is, the matching effect of improved AW is significantly higher. The main reason is that the improved AW draws on the idea of graph cutting. Therefore, when assigning weights to pixels in the support window, it is more targeted. Therefore, compared with the AW before the improvement, the false matching points of its matching results are reduced, and the matching effect is better in the edge region of color change.

As shown in Table 4, for two groups of heart model images, the false matching rates of the algorithm before the improvement are 11.74% and 13.17%, respectively, with an average false matching rate of 12.46%, while the improved algorithm is 6.13% and 5.17%, respectively, with an average error-matching rate of 5.92%. Combined with the previous analysis results, the matching accuracy of the heavy stereo matching algorithm of the improved joint similarity measure and adaptive weight has been greatly improved, and the overall matching effect is better.

As can be seen from Figure 8, the first column is the left image in the two groups of heart model images, the second column is the corresponding standard parallax image, and the third column is the final result parallax image corresponding to the two groups of heart model images, and the fourth column is the mismatch distribution corresponding to the final result parallax image. According to the result parallax image and the distribution image of their mismatching points, the mismatching points are mainly distributed in the curved surface area and the area with serious illumination distortion. However, according to the result parallax image, the parallax corresponding to the mismatching points in the illumination distortion area is not very different from the standard parallax value, which shows that the algorithm is robust to illumination distortion. The distribution of mismatched points in the surface area is in the form of stepped small stripes, which are mainly caused by the relatively gentle change of the surface. That is, there is a certain change between the corresponding visual difference of adjacent pixels, but its change value is very small. During the implementation of the algorithm, a single pixel should be taken as the minimum unit, and then sub-pixel enhancement should be carried out according to the matching cost so as to obtain a sub-pixel parallax image. Therefore, there will be some errors in the layered areas in the obtained unprocessed parallax image, so this situation will occur. However, by comparing the standard parallax image with the final result parallax image, it can be seen that the deviation between these mismatched points and the real disparity is still small. That is, the corresponding matching effect of this algorithm is suitable.

To verify the effectiveness of the post-processing process, the corresponding error-matching rate of the before and after post-processing vision images is counted. As shown in Table 5, the error-matching rate of the sub-vision image is significantly reduced by the post-processing process. That is, the post-processing operation is desirable. After combining the error-matching point distribution of after post-processing parallax figure, it can be seen that the dense vision images obtained by this algorithm are low, and the results figure obtained are basically in line with the image. That is, it can obtain a suitable dense vision image of the heart soft tissue image through this algorithm.

According to the average mismatch rate of each algorithm when combining images with different brightness, when the brightness of the left and right images is different, the corresponding mismatch rate of CAD+AW reaches more than 80%, which basically loses its function. The corresponding mismatch rate of CHD+AW and the algorithm in this paper is about 15%, but the difference between the mismatch rate of CHD+AW when the brightness of the left and right images is different and the brightness is the same, small, and when the brightness of the left and right images of this algorithm is different, its matching effect is basically the same as that of CHD+AW, because when there is a brightness difference between the left and right images, due to the corresponding truncation absolute error value set for CAD, CAD is basically replaced by its truncation absolute error value in parallax calculation; that is, at this time, it is similar to only using CHD as its similarity measure, so when the image brightness is different, the corresponding error-matching rate of this algorithm is only slightly higher than that of CHD+AW. At the same time, when there is no difference in brightness, its matching effect is better than CAD+AW and CHD+AW. In general, this algorithm inherits the robustness of census transform to illumination distortion. At the same time, when there is no difference in brightness, its matching effect is better than that when using a single similarity measure.

It can be seen from Figure 10 that the first list is the left image of three-dimensional images of heart soft tissue after correction, the second is the right image of the three-dimensional image of the heart soft tissue after correction, and the third column is the selected interest area corresponds to the result parallax image, which is in the left side obtained from the algorithm, and the fourth is the image that expands the difference of the result parallax image. It can be seen from the diagram after the expansion of the differences and the resulting parallax image. The three-dimensional matching algorithm of this article can obtain a dense visual image corresponding to the heart soft tissue interest block, and it can basically reflect the general outline of the corresponding cardiac tissue, and there are few obvious errors. That is, the three-dimensional matching algorithm in this article can also obtain a better parallax figure in the actual scenario.

6. Conclusions

The main work of this paper is as follows:

Firstly, a stereo matching algorithm based on the joint similarity measure and adaptive weight of Hamming distance corresponding to gray difference absolute value and census transform is studied. Among them, the matching cost calculation method based on a joint similarity measure, the matching cost aggregation method based on adaptive weight, and the disparity calculation and post-processing operation are analyzed.

Then, some problems are studied, and corresponding improvement schemes are proposed. That is, in order to make full use of the color image with more abundant information, the matching cost calculation method based on the joint similarity measure in the original algorithm is improved, and the absolute value of the gray difference of each color channel of two pixels in the color image and the Hamming distance corresponding to the improved census transform is used as its joint similarity measure, Some problems existing in the adaptive weight in the original algorithm are improved. Using the idea of graph cut for reference, 10 are formulated, and each pixel is segmented by using the color and distance similarity between each pixel and the central pixel in the support window. If it conforms to the segmentation rules, it is considered that it belongs to the same segmentation element as the central point, and the maximum weight of 1 is assigned to it, which is explained in Section 3.7. Otherwise, then it is determined that it does not belong to the same segmentation unit as the central pixel, and the original adaptive weight method is used to allocate weight to it.

Finally, we used the test images (three-dimensional images of the heart model) to conduct the verification experiments on the algorithm of this article. The experimental results show that the improved similarity measurement and the adaptive weight after the improvement of the improved similarities can effectively reduce the error-matching rate. Meanwhile, compared with using only one similarity measure, the corresponding matching effect based on the improved combined similarity measure has a better effect. After the improvement, the corresponding matching effect of the three-dimensional matching algorithm based on the combined similarity measurement and the adaptive weight is more significantly improved. The dense vision image basically meets the approximate outline of the corresponding interest block, and there is no obvious error-matching area.

This paper has made some achievements and progress in terms of the stereo image matching algorithm of cardiac soft tissue. However, there is still room for improvement. This experiment can be improved in the following aspects:

Although dense parallax images with high accuracy can be obtained through this algorithm, the algorithm is relatively complicated, which leads to the weak real-time performance of the algorithm. Therefore, it needs to be improved. The speed of the method, which is based on census transformation, can be increased by using hardware implementation, and the speed of the region-based stereo matching algorithm can be increased by GPU parallelization processing, that is, how to use these technologies to improve the real-time performance of the stereo matching algorithm can be considered next.

The research content of this paper is biased toward theoretical research and lacks practical application. Therefore, how to combine the theoretical results with practical application will be considered next.

Author Contributions

Conceptualization, W.Z. and B.Y.; methodology, Z.Y.; software, B.M.; formal analysis, W.Z. and B.Y.; data curation, X.L. and B.M.; writing—original draft preparation, X.L., Z.Y. and M.L.; writing—review and editing, X.L., L.Y. and W.Z.; visualization, L.Y.; supervision, B.Y. and M.L.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Sichuan Science and Technology Program (2021YFQ0003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this paper are provided by the Hamlyn Center at the Imperial College London at: https://imperialcollegelondon.app.box.com/s/kits2r3uha3fn7zkoyuiikjm1gjnyle3 (accessed on 1 February 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Gorpas, D.; Politopoulos, K.; Yova, D. A binocular machine vision system for three-dimensional surface measurement of small objects. Comput. Med. Imaging Graph. 2007, 31, 625–637. [Google Scholar] [CrossRef]
Liu, X.; Zheng, W.; Mou, Y.; Li, Y.; Yin, L. Microscopic 3D reconstruction based on point cloud data generated using defocused images. Meas. Control. 2021, 54, 1309–1318. [Google Scholar] [CrossRef]
Zhang, W.; Yao, G.; Yang, B.; Zheng, W.; Liu, C. Motion Prediction of Beating Heart Using Spatio-Temporal LSTM. IEEE Signal Process. Lett. 2022, 29, 787–791. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Y.; Tian, J.; Liu, S.; Yang, B.; Xiang, L.; Yin, L.; Zheng, W. Study on Reconstruction and Feature Tracking of Silicone Heart 3D Surface. Sensors 2021, 21, 7570. [Google Scholar] [CrossRef] [PubMed]
Kanade, T.; Okutomi, M. A stereo matching algorithm with an adaptive window: Theory and experiment. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 920–932. [Google Scholar] [CrossRef] [Green Version]
Fusiello, A.; Roberto, V.; Trucco, E. Efficient stereo with multiple windowing. In Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, San Juan, Puerto Rico, 17–19 June 1997; pp. 858–863. [Google Scholar]
Samadi, M.; Othman, M.F. A new fast and robust stereo matching algorithm for robotic systems. Adv. Intell. Syst. Comput. 2013, 209, 281–290. [Google Scholar]
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157. [Google Scholar]
Lowe, D.G. Distinctive image features from scale-invariant key points. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Ke, Y.; Sukthankar, R. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the 2014 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; p. 2. [Google Scholar]
Abdel-Hakim, A.E.; Farag, A.A. CSIFT: A SIFT descriptor with color invariant characteristics. In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1978–1983. [Google Scholar]
Zadeh, P.B.; Serdean, C.V. Stereo correspondence matching using multiwavelets. In Proceedings of the Fifth International Conference on Digital Telecommunications, Athens, Greece, 13–19 June 2010; pp. 153–157. [Google Scholar]
Hawi, F.; Sawan, M. Phase-based passive stereovision systems dedicated to cortical visual stimulators. In Proceedings of the International Conference on Computer Design, Montreal, QC, Canada, 30 September–3 October 2012; pp. 256–262. [Google Scholar]
Boykov, Y.; Veksler, O.; Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 23, 1222–1239. [Google Scholar] [CrossRef] [Green Version]
Bleyer, M.; Gelautz, M. Graph-cut-based stereo matching using image segmentation with symmetrical treatment of occlusions. Signal Process. : Image Commun. 2013, 22, 127–143. [Google Scholar] [CrossRef]
Fezza, S.A.; Ouddane, S. Fast stereo matching via graph cuts. In Proceedings of the 7th International Workshop on Systems, Signal Processing and their Applications (WOSSPA), Tipaza, Algeria, 9–11 May 2011; pp. 115–118. [Google Scholar]
Hirsehmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
Simon, H.; Sandino, M.; Reinhard, K. Half-resolution semi-global stereo matching. In Proceedings of the IEEE Intelligent Vehicles Symposium, Baden-Baden, Germany, 5–9 June 2011; pp. 201–206. [Google Scholar]
Ruichek, Y. Multilevel and neural-network based stereo-matching method for real-time obstacle detection using linear cam eras. IEEE Trans Intell. Transp. Syst. 2015, 6, 54–62. [Google Scholar] [CrossRef]
Hua, X.J.; Yokomichi, M.; Kono, M. Stereo correspondence using color based on competitive-cooperative neural networks. In Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05), Dalian, China, 5–8 December 2005. [Google Scholar]
Irijanti, E.; Nayan, M.; Yusoff, M.Z. Fast Stereo Correspondence Using Small-Color Census Transform. In Proceedings of the IEEE International Conference on Intelligent and Advanced Systems, Kuala Lumpur, Malaysia, 12–14 June 2012; Volume 4, pp. 685–690. [Google Scholar]
Wang, S.X.; Ding, J.N.; Yun, J.T.; Li, Q.Z.; Han, B.P. A robotic system with force feedback for micro-surgery. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 199–204. [Google Scholar]
Yang, B.; Liu, C.; Zheng, W.; Liu, S. Motion prediction via online instantaneous frequency estimation for vision-based beating heart tracking. Inf. Fusion 2017, 35, 58–67. [Google Scholar] [CrossRef]
Ye, M.; Giannarou, S.; Meining, A.; Yang, G.-Z. Online Tracking and Retargeting with Applications to Optical Biopsy in Gastrointestinal Endoscopic Examinations. Med. Image Anal. 2015, 30, 144–157. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ambrosch, K.; Kubinger, W.; Humenberger, M.; Steininger, A. Hardware implementation of an SAD based stereo vision algorithm. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–6. [Google Scholar]
Humenberger, M. A fast stereo matching algorithm suitable forembedded real-time systems. Comput. Vis. Image Underst. 2010, 114, 1180–1200. [Google Scholar] [CrossRef]
Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1615–1630. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Two types of support windows. (a) Sparse type; (b) common type.

Figure 2. The corresponding matching results of the census before and after improvement. (a) Left image of heart model; (b) standard parallax map; (c) traditional (d) improved census corresponding parallax map.

Figure 3. Left and right images of three different brightness cardiac models. (a) 1/2 brightness left and right images; (b) original brightness left and right images; (c) 3/2 brightness left and right images The mismatching rates of the disparity map corresponding to the traditional census and the improved census are counted, respectively, as shown in Table 1.

Figure 4. Census correspondence mismatch before and after improvement.

Figure 5. The corresponding matching results of the joint similarity measure before and after the improvement. (a,b) standard parallax map; (c) The improved joint similarity measure corresponds to the disparity map; (d) the improved joint similarity measure corresponds to the disparity map.

Figure 6. Two adaptive weight methods correspond to the matching results. (a) Left map of heart model; (b) standard disparity map; (c) disparity map corresponding to adaptive weight before improvement; (d) disparity map corresponding to adaptive weight after improvement.

Figure 7. Mismatching rate of disparity map with different

λ

values.

Figure 7. Mismatching rate of disparity map with different

λ

values.

Figure 8. Images and their matching results. (a) The left map of the heart model; (b) standard parallax map; (c) final result disparity map; (d) distribution of mismatching points.

Figure 9. Mean mismatching rate of images with different brightness combinations.

Figure 10. Stereo images of cardiac soft tissue and corresponding matching results. (a) Corrected left image; (b) corrected right image; (c) result parallax map; (d) enlarged posterior parallax map.

Table 1. The corresponding mismatching rate of the census algorithm before and after improvement.

Matching Algorithm	Model Image 1	Model Image 2	Average Mismatch Rate
Traditional census	25.64%	29.28%	27.46%
Improved census	13.57%	14.72%	14.15%

Table 2. Corresponding mismatching rate of joint similarity measure before and after improvement.

Matching Algorithm	Model Image 1	Model Image 2	Average Mismatch Rate
Joint similarity measure before improvement	11.74%	13.17%	12.46%
The improved joint similarity measure	8.17%	7.81%	7.99%

Table 3. Error-matching rate of adaptive weight before and after improvement.

Matching Algorithm	Model Image 1	Model Image 2	Average Mismatch Rate
Aw before improvement	8.17%	7.81%	7.99%
Improved aw	6.13%	5.71%	5.92%

Table 4. Corresponding error-matching rate of the improved algorithm.

Matching Algorithm	Model Image 1	Model Image 2	Average Mismatch Rate
Improved algorithm	11.74%	13.17%	12.46%
Improved algorithm	6.13%	5.71%	5.92%

Table 5. Matching error rate of parallax images before and after post-processing.

Matching Algorithm	Model Image 1	Model Image 2	Average Mismatch Rate
No post-treatment	6.13%	5.71%	5.92%
After post-treatment	4.31%	3.12%	3.72%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lai, X.; Yang, B.; Ma, B.; Liu, M.; Yin, Z.; Yin, L.; Zheng, W. An Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weights. Appl. Sci. 2023, 13, 514. https://doi.org/10.3390/app13010514

AMA Style

Lai X, Yang B, Ma B, Liu M, Yin Z, Yin L, Zheng W. An Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weights. Applied Sciences. 2023; 13(1):514. https://doi.org/10.3390/app13010514

Chicago/Turabian Style

Lai, Xiangjun, Bo Yang, Botao Ma, Mingzhe Liu, Zhengtong Yin, Lirong Yin, and Wenfeng Zheng. 2023. "An Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weights" Applied Sciences 13, no. 1: 514. https://doi.org/10.3390/app13010514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weights

Abstract

1. Introduction

2. Dataset

3. Methods

3.1. The Stereo Matching Algorithm in this Paper

3.2. Matching Cost Calculation Method

3.3. Match Cost Aggregation Methods

3.4. Parallax Calculation and Post-Processing

3.5. Improved Stereo Matching Algorithm Based on Joint Similarity Measure and Adaptive Weight

3.6. Matching Cost Calculation Based on the Improved Joint Similarity Measure

3.7. Matching Cost Aggregation Based on Improved Adaptive Weight

4. Results

4.1. Improved Joint Similarity Measure Matching Experiment

4.2. Improved Adaptive Weight Matching Experiment

4.3. Three-Dimensional Image Matching Experiment of Heart Soft Tissue

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI