1. Introduction
Interestingly, imaging complementarity of infrared and visible light imaging has attracted increasing interest for image fusion. It is well-established that visible light sensor can obtain high-resolution images with abundant texture and detailed information, but the quality of images captured by visible light sensor is strongly affected by the light environment, as poor illumination will reduce the visual image quality such as glare, smoke, overexposure, and so on. Compared to visible light imaging, infrared imaging is less affected by the illumination conditions and has strong penetrability to glare, smoke, etc. Images captured by infrared sensor could provide thermal information for the target and offer high contrast, but they lack the texture information and sensitivity of the non-thermal target. Therefore, it is anticipated that infrared and visible image fusion can cooperatively combine the merits of individual imaging technologies while minimizing potential defects.
With the increasing demands of infrared and visible image fusion technology for applications such as video surveillance, autopilot, military reconnaissance, etc., there has been a surge in the number of fusion methods that can be roughly divided into three categories: pixel level fusion, feature level fusion, and decision level fusion [
1]. Pixel level-based fusion is the most common fusion method at present, which can be roughly summarized as multi-scale decomposition (MSD), sparse representation (SR), deep learning (DL), saliency detection (SD), and hybrid methods.
The MSD-based method is achieved through decomposing the source image into different layers to represent the various spatial and frequency domain information of the source image, then applying specific fusion rules to different layers to obtain the corresponding fusion layer, and finally reconstructing the fused sub-layers to obtain the fused image. To sum up, the central challenge of MSD-based methods is the decomposition and fusion of images [
2]. Pyramid transform was the earliest methodology applied for image fusion applications, and the typical approaches of pyramid transforms are the Laplacian pyramid [
3,
4], contrast pyramid [
5], etc. However, the image fusion methods based on pyramid transform will introduce halo artifacts. To reduce halo artifacts, some wave transforms were recently applied to optimize infrared and visible image fusion, such as wavelet transform [
6], discrete wavelet transform [
7], dual-tree discrete wavelet transform [
8], curvelet transform [
9], contourlet transform [
10], etc. Although the wave transform-based method can practically reduce halo artifacts, it cannot preserve the edge of the source image in the complex spatial structure. Consequently, edge preservation filter technique was later proposed and applied for image processing such as bilateral preservation filter, guided filter, rolling guided filter [
11,
12,
13,
14], etc., which can effectively preserve the edge of the source image in the process of image decomposition. For example, Li et al. [
15] proposed an image fusion algorithm based on the guided filter. Bavirisetti et al. [
16] proposed a multi-scale decomposition rule-based guided filter to obtain the different layer sublayer by iterative. Liu et al. [
17] use guided filter combined with other methods to construct fusion weight map, which can enhance the detail and edge information for the fused image. Shreyamsha Kumar et al. [
18] proposed an image fusion method based on cross bilateral preservation filter combining the similarity and spatial structure of local areas. Lin et al. [
19] proposed a multi-scale decomposition method using rolling guided filter. In addition to the above methods, anisotropic heat diffusion [
20], log-Gabor transform [
21], etc. have also been successfully integrated to improve the quality of MSD-based image fusion.
Saliency is an attention mechanism that can attract human visual perception, and its key merit is that it is more efficient to capture human visual senses than the neighborhood of the point or the areas [
22]. Saliency detection can focus on the region of interest, so it is very suitable for the field of image fusion. The SD-based method is usually not used alone but combined with multi-scale decomposition for image fusion. The implementation of saliency detection is carried out sequentially: (1) decomposing source image into detail layer and base layer and (2) using saliency detection to process detail or base layer. Durga Prasad Bavirisetti et al. [
22] use the average filter and median filter to extract saliency information and construct a fusion weight map when performing detail layer fusion. Duan et al. [
23] use local average gradient energy to extract multiple saliency features on infrared and visible detail layers, which can enhance the detail information performance of the fused image. Lin et al. [
24] proposed a saliency detection rule based on local brightness contrast to extract the saliency layer which contains brightness contrast information of the source image. Besides these methods mentioned above, some other image fusion methods [
25,
26] also use saliency detection to improve the expressiveness of the fused image. These studies collectively supported that SD-based method can maximize focus on regions of interest.
Besides the methods mentioned above, SR-based, DL-based, and hybrid methods [
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37] are also reported for infrared and visible image fusion. Specifically, for the SR-based methods, an over-complete dictionary is used to represent the source image, and the correlation coefficients obtained from the over-complete dictionary are used for fused operation. However, the over-complete dictionary is often difficult to acquire, and a single over-complete dictionary is usually insufficient to ensure the robustness of images with different structures. With the development of the deep learning, the DL-based method has been a popular model in the field of image fusion. For instance, Li et al. proposed three infrared and visible image fusion methods using different neural network models, which is VGG19 [
27], ResNet [
29] and DenseFuse [
30], respectively. These methods can not only autonomously extract image features and fit suitable fusion coefficients but also get the good quality for fused image. However, the fused image weak the thermal target information and lack the visual detail information. In order to improve the image details and maintain the edge information, Lui et al. proposed a novel method based on ResNet and rolling guided filter [
31]. Although DL-based method has some advantages, the training process requires large amount of training data and is very time-consuming, leading to low computational efficiency. The hybrid methods [
35,
36,
37] can yield fused images with good quality by exploiting complicated compute models, but their computational efficiency is also very low.
Because of its parallel computing capacity, FPGA has become a promising implementation carrier for computational acceleration. Currently, many researchers have done a lot of work for image fusion based on FPGA to improve image fusion speed. For instance, various methods based on pyramid transform [
38,
39] and wave transform [
40,
41] are widely implemented on FPGA to meet real-time requirements, but these methods can introduce the artifacts and lack the edge structures relatively. In addition, Furkan Aydin et al. implement an image fusion method on FPGA using high-level synthesis (HLS) tools and other technologies based on color space transformation, mean, and variance [
42] to improve the image’s color information, Ashutosh Mishra et al. implement an image fusion method based on two-scale decomposition using average filter and fuse the detail layer using modified Frei-Chen operator [
43]. These methods can achieve image fusion and get the good process speed on FPGA, but the performance of the fused image would be reduced and have influence of glare, smoke, etc. restricted by the fusion model.
According to the introduction mentioned above, the time consumption of the image fusion process for those advanced algorithm is too long to meet real-time requirements, and the fused image is affected by glare, smoke, etc. which will reduce the performance of the fused image. To solve these problems, this paper proposes an MSD-based image fusion algorithm using guided filter and saliency detection that has high affinity for hardware. This method decomposes source image into three-scale layers while preserving the edge structure and construct the saliency layer fusion weight map using attention mechanism to eliminate the influence of the glare, smoke, etc. on the fused image. At the same time, the paper uses HLS to design, test and verify the method based on FPGA, which can accelerate the processing of the fusion method to meet the real-time requirements. Compared to many advanced fusion methods, this method can enhance performance of the fused image in the case of as low computational complexity as possible and eliminate the effects of the glare, smoke, etc. Apart from this, due to the simplification of computational operations, this method is more suitable to use hardware to accelerate process.
Section 2 and
Section 3 will introduce the detailed fusion method and analysis the fusion result. The FPGA implementation of this method will be introduced in
Section 4.
5. Discussion
At Present, quality and speed are still huge challenges in the field of infrared and visible image fusion. To enhance the performance of the fused image, enormous complex methods are proposed and make progress in this aspect, but the speed of the fused image is reduced because of complex computations, and simpler fusion models cannot guarantee the quality of the fused image. How to get the balance between the quality and speed for image fusion to meet real-time requirement is important.
In view of the slow fusion speed, most researchers implement algorithms based on FPGA to speed up the image fusion process. At present, the mainstream FPGA-based algorithms mainly focus on pyramid transform, wave transform and multi-scale transform. Their fusion effect is relatively poor and cannot highlight salient regions, which is not conducive to image fusion for regions of interest.
In this paper, a hardware-friendly infrared and visible light image fusion method based on guided filter and saliency detection is developed exploiting FPGA as the hardware circuit, which presents a viable solution for real-time scenarios such as automatic driving, video surveillance, military reconnaissance, etc. Compared with other advanced infrared and visible light image fusion methods, the computation and complexity of the proposed method are significantly lower. At the same time, this method eliminates the effects of the glare, smoke, etc., to the fused image. This paper analyzes the quality of the fused images based on PC, the experimental results show that the fused images obtained by the proposed method have good expressiveness, and the time consumption is about 1.15 s for each image with a resolution of 640 × 470. Although this is still insufficient for real-time image processing, it is still much faster than the other advanced image fusion method. To accelerate the processing speed, we designed a hardware circuit based on FPGA by exploiting its parallel computation capacity, which can enhance the processing speed to about 18 ms per image, thus realizing real-time output of the fused image. The FPGA-based image fusion method may offer balanced performance between fusion speed and image quality, which shows promise for real-life applications.