Next Article in Journal
Tone Mapping Method Based on the Least Squares Method
Next Article in Special Issue
A Reconfigurable Hardware Architecture for Miscellaneous Floating-Point Transcendental Functions
Previous Article in Journal
Review on Deep Learning Approaches for Anomaly Event Detection in Video Surveillance
Previous Article in Special Issue
A Hardware Trojan-Detection Technique Based on Suspicious Circuit Block Partition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction

College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310058, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(1), 30; https://doi.org/10.3390/electronics12010030
Submission received: 14 November 2022 / Revised: 29 November 2022 / Accepted: 30 November 2022 / Published: 22 December 2022
(This article belongs to the Special Issue Advances of Electronics Research from Zhejiang University)

Abstract

:
The rapid growth of sensing data demands compressed sensing (CS) in order to achieve high-density storage and fast data transmission. Deep neural networks (DNNs) have been under intensive development for the reconstruction of high-quality images from compressed data. However, the complicated auxiliary structures of DNN models in pursuit of better recovery performance lead to low computational efficiency and long reconstruction times. Furthermore, it is difficult for conventional neural network designs to reconstruct extra-high-frequency information at a very low sampling rate. In this work, we propose an efficient iterative neural network for CS reconstruction (EiCSNet). An efficient gradient extraction module is designed to replace the complex auxiliary structures in order to train the DNNs more efficiently. An iterative enhancement network is applied to make full use of the limited information available in CS for better iterative recovery. In addition, a frequency-aware weighted loss is further proposed for better image restoration quality. Our proposed compact model, EiCSNet2*1, improved the performance slightly and was nearly seven times faster than its counterparts, which shows that it has a highly efficient network design. Additionally, our complete model, EiCSNet6*1, achieved the best effect at this stage, where the average PSNR was improved by 0.37 dB for all testing sets and sampling rates.

1. Introduction

With the explosive growth of sensor devices, the acquisition, transmission, and processing of information are proliferating, which brings great challenges for data storage and transmission. In Nyquist sampling theory [1], the sampling process must meet the requirement that the lowest sampling frequency (denoted as the Nyquist sampling frequency) cannot be less than twice the bandwidth in the analog signal spectrum. However, under some extreme conditions with limited bandwidth and battery resources, such as in aerospace or underwater applications, high-quality sampling is difficult to achieve. In addition, radiology has high requirements for a shorter sampling time in order to reduce patients’ burden and discomfort as a result of claustrophobia. Therefore, new sampling theories and methods are needed in order to reduce the amount of sampled data.
Compressed sensing (CS) [2] theory has a high potential to break through the limitations of the traditional Nyquist sampling theory. CS theory demonstrates that information can still be reconstructed if it has been sampled significantly below the Nyquist rate [2]. It senses information as measurements through a measurement matrix. It has been theoretically guaranteed that recovery can be possible if the original signal and the measurement matrix satisfy some mathematical conditions [3]. Then, reconstruction algorithms are employed to decrypt the measurements and estimate the important information in the original signal.
In recent years, many algorithms have been proposed to deal with CS reconstruction. These methods can be divided into two categories: traditional and deep learning (DL) methods. Traditional methods usually have theoretical guarantees and simultaneously inherit interpretability. However, they inevitably suffer from a high computational cost dictated by interactive calculations [4]. In contrast, DL methods realize mapping from compressed data to the original signals by training large numbers of parameters in DNNs. However, the performance of some DL methods is mainly improved by a large DNN model size, ignoring the design of traditional algorithms. Moreover, with the addition of complicated auxiliary structures, it is not possible to make full use of graphics processing units (GPUs), and this severely reduces the run speed. In addition, high-frequency information is difficult to recover because the signals are sampled at a very low frequency.
CS is widely applied in magnetic resonance imaging (MRI) [5], hyperspectral imaging [6], wireless communications [7], space flight [8], and so on. However, these fields are still pursuing higher speeds and better reconstruction performance. For example, in the applications of space flight and MRI, there is hope to achieve a lower pressure of information transmission and a shorter sampling time, respectively, by reconstructing signals that are sampled with a lower compression ratio. Furthermore, imaging or acoustics [9] systems with certain real-time pursuits are eager to achieve the shortest reconstruction time.
In order to address the issues of the existing methods mentioned above and to make the application of CS to systems of signal compression and reconstruction more efficient, we propose an efficient iterative neural network for CS reconstruction (EiCSNet) and make the following contributions:
  • Unlike in the SOTA DL methods, a fast elementary reconstruction block based on a gradient-extraction module is proposed in order to realize efficient nonlinear mapping in the reconstruction task to speed up the inference without any complicated auxiliary structures added for inference;
  • Considering the known sampling matrix and measurements, the iterative enhancement module makes full use of the limited information available in order to achieve better reconstruction performance with the efficient designs employed in the operations of sampling, upsampling, and reshaping, thus greatly improving the run speed;
  • For poor reconstruction at a low sampling rate, a frequency-aware weighted loss that is suitable for CS is further proposed in order to pay more attention to the reconstruction of high-frequency information.
The rest of this paper is organized as follows. The related work is introduced and discussed in Section 2. Our proposed method is described in Section 3. The experimental results are provided in Section 4. Finally, the conclusion is drawn in Section 5.

2. Related Work

In this section, a brief review of CS and landmark work is provided. The methods can be divided into traditional and DL methods.

2.1. Compressed Sensing

CS breaks through the limitations of the Shannon–Nyquist sampling theorem [2]. It helps to simultaneously realize the processes of signal sampling and compression. Instead of directly measuring the signal X , it sets a non-adaptive linear projection in order to obtain the overall structure of X at a low sampling rate. Suppose that the signal is X R N × 1 and the sampling matrix is Φ R M × N , M < < N ; then, the measurement process can be expressed as follows:
Y = Φ X ,
where Y R M × 1 represents the CS measurement (sampled data). This means that the number of observations M is far lower than that of the signal N. The acquisition is carried out in the form of direct compressive sampling with no other form of sophisticated encoding. Therefore, the burden of quality reconstruction falls solely on the receiver side [10], which greatly reduces the acquisition, transmission, and storage of data.
Generally, underdetermined equations are impossible to solve completely. However, a large number of the signals, such as the natural images, can be approximated as sparse in some domains Ψ [11]. The sampling projection can obtain a more important structure or information of X in the sparse domain Ψ . On this basis, the CS theory makes the recovery of X from the corresponding measurements Y possible [12]. Then, most CS reconstruction problems can be addressed as optimization problems:
min X Ψ X p , such that Y = Φ X ,
where Ψ X is the sparse representation in the Ψ domain. p means the p norm of vector ∗. When p is set to 1 or 0, the norm aims to characterize the sparsity of the vector. Therefore, the reconstruction of compressed signal Y can be understood as finding the solution X ^ of X with the maximum probability of generating Y = Φ X , where Y , Ψ , and Φ are the known parameters.

2.2. Traditional Method

Various methods, such as convex optimization, greedy algorithms, and total variation, have been proposed to solve the reconstruction problem.
The convex optimization methods usually translate the L 0 norm constraint (nonconvex) in Equation (2) into a L 1 norm problem (convex), so that the reconstruction of X ^ can be conducted [13]. They can achieve accurate and robust recovering results. However, the convex optimization methods usually suffer from high computational complexity and the “tweaked” requirement to process image signals.
To reduce the computational complexity, greedy algorithms such as Matching Pursuit (MP) [14] and Projected Landweber (PL) [15] have been proposed for CS reconstruction. Orthogonal MP (OMP) [16] and stage-wise OMP [17] take their source from MP algorithms. They generally supplement the results through the iterative residual. Compared with other traditional approaches, MP methods have relatively low computational complexity at the cost of lowering the reconstruction quality [18,19,20,21,22] and have been proposed as an alternative based on PL. They obtain the reconstructed image by successively projecting and thresholding. They not only have lower computational complexity than convex-programming approaches but are also flexible in terms of incorporating other additional optimization criteria [18]. However, the PL methods have to conduct numerous iterations to obtain the final results, and heavy matrix operations need to be executed in each iteration.
For the task of CS image reconstruction, some existing works have established more sophisticated models with Total Variation (TV) regularized constraint being employed to give more expression to image priors [23,24]. Paying more attention to image priors can obtain the overall recovering images more accurately and robustly but slow down the run speed. In addition, a few details may be lost due to the oversmooth reconstruction.
Though traditional methods have been widely used in many practical projects [25,26], a large number of iterations and heavy matrix operations greatly reduce the operation speed.

2.3. Deep Learning Methods

Due to powerful GPU devices and sufficient data resources, DL methods conducted on GPUs have become possible in the field of CS [27,28]. They have achieved excellent recovery performance and orders of magnitude speed improvement. Refs [27] firstly employed Convolution Neural Network (CNN) in CS and realized the non-iterative reconstruction, which also achieved the highest run speed at that time. However, it still consumed much time because of the picture rearrangement and the heavy fully connected network for upsampling. In [28], the Stacked Denoising Autoencoder was applied to capture statistical dependencies between the different elements of certain signals, which improved the signal recovery performance. In [29], ISTA-Net was proposed for CS reconstruction, with the iterative shrinkage-thresholding algorithm.
Compared with the traditional methods, they have greatly improved the recovery performance and run speed. Through the training of large data sets, DL methods realize more flexible nonlinear mapping, as shown in Equation (3),
X ^ = f θ Ψ , Y ,
where f θ ( ) represents the mapping to solve underdetermined equations with learnable parameters θ . In this way, it can be more consistent with the statistical distribution of real image sets. In addition, it has been the focus of attention to efficiently utilize the information obtained from Y .
In [4], a hybrid framework was proposed to leverage the local spatial information from CNN and the global context provided by a transformer. However, its run time is similar to the traditional methods, which is hundreds of times that of the other SOTA DL methods. Ref. [30] unfolded the iterative denoising process of the well-known Approximate Message Passing (AMP) algorithm. It integrated deblocking modules to eliminate the blocking artifacts that usually appear in the CS of visual images. Ref. [31] built a trainable deep compressed sensing model by combining Convolution Generative Adversarial Networks and a Variational Autoencoder. In [32], ISTA-Net++ had the adaptability to handle CS problems with different ratios through a single model.
However, the abovementioned DL methods could not make full use of the large-scale parallel computing capability of GPU devices for three reasons. First, it is difficult to train the huge end-to-end DNNs without auxiliary structures, such as [18]. In this way, the heavy full connection between any two successive layers [27] and the auxiliary structures to strengthen nonlinear operations [4,18,30,33] greatly affects the utilization efficiency of the GPU devices. Second, some methods cut the image into blocks for processing [27] or reshape the upsampled matrix, which spends time on matrix splicing and the subsequent deblocking filtering. Finally, the unreasonable design of a network structure to realize the mapping may lead to needing a deeper network to achieve better results at the cost of more reconstruction time. Therefore, new DL models should be developed to improve the reconstruction speed by fully utilizing the GPU resources.

3. Methods

In order to achieve a higher reconstruction speed and keep the recovery performance as effective as possible, an efficient model EiCSNet is proposed by taking the characteristics of the DL methods and CS algorithms into consideration. The overview of EiCSNet, as shown in Figure 1, comprises the following modules:
  • The Gradient-Extraction Module (GEM) and Fast Elementary Reconstruction Block (FERB) aim for a hardware-friendly structure without auxiliary branches to speed up the inference of CS reconstruction (Section 3.1);
  • The Iterative Enhancement Module (IEM) better combines the elaborate design of traditional algorithms based on greedy iteration and makes full use of intermediate information to strengthen and supplement images to achieve efficient and high-quality reconstruction (Section 3.2);
  • The Frequency-Aware Weighted Loss (FAWL) function is proposed to pay particular attention to the reconstruction of high-frequency details (Section 3.3).

3.1. Hardware-Friendly GEM and FERB

Stochastic Gradient Descent [34,35] is widely used to train neural networks. However, the parameters in deeper networks cannot be fully trained because the gradients gradually decrease in backpropagation [36] with the growth of the layer numbers, which is noted as the vanishing gradient problem [37].
To solve the vanishing gradient problem, Residual Network (ResNet) was introduced in 2015 by researchers at Microsoft Research, which uses a technique called skip connections to skip training from a few layers and connect directly to the output. The advantage of adding this type of skip connection is that if any layer hurts the performance of the architecture then it will be skipped by regularization. In this case, it can effectively avoid network collapse by jumping through some layers. Furthermore, the gradients can also be transferred to a deeper network without too much attenuation.
The basic block of ResNet and traditional CNN options are shown in Figure 2. It can be divided into two branches: the Main Branch (MB) for the main nonlinear calculation and the Residual-link Branch (RB). The RB aims to realize a skip-connection from the deeper feature maps to the shallower ones so that the gradient variables can be recorded, preserved, and transferred effectively. When the dimensions of I / O feature maps are changed, there is a need for the RB to go through a 1 × 1 convolution to keep the residual connection, as shown in Figure 2b.
However, the additional RB places an extra burden on the hardware. The two branches cannot be calculated, completed, and discarded simultaneously, so extra memory space and memory access are required to store and transfer the data. In addition, the RB also needs many operations of point-wise addition, and the additions can only be carried out until the end of the MB. As a result, the different operations of the two branches cannot make full use of the GPU, seriously degrading the parallel performance.
Table 1 lists the inference time of three basic structures in Figure 2. Under various input and output conditions, the structure in Figure 2a is faster than the variant one in Figure 2b. It can also be clearly seen that the res-free structure (Figure 2c) achieves the best performance in terms of inference time. This phenomenon is more obvious when the input and output feature maps become larger.
To address the vanishing gradient issue while achieving a high reconstruction speed, we propose a hardware-friendly Gradient-Extraction Module (GEM), as shown in Figure 3. The blue cubes and gray boxes in Figure 3 express feature maps and several convolution operations, respectively. The arrows with different sizes and shades of green represent the gradient calculated in the process of backpropagation. The outputs of each stage marked by green boxes are calculated from the intermediate feature maps in the forward process through GEMs. In the main branch, the smaller and lighter arrows represent how the overall size of the gradient gradually decreases or even disappears with the deeper network. It is noted that all CNN blocks work together to minimize the difference between the final output and the ground truth. In this way, the structure of the RB in Figure 2a,b is replaced by the GEM in the proposed reconstruction neural network. The deeper parameters obtain an extra effective gradient to drive faster and more effective training. In addition, it can also play a restrictive role in the training of deep layers to avoid fading away from the goal. The parameters of the deeper layers continuously perceive the objective loss function through the leaf branches of the GEMs. In this way, deeper leaf branches can help to transfer and enhance the gradient to achieve a better complete training effect, even if parts of the gradient disappear. Because the parameters of shallower layers are trained easily, there is no need to add GEM to assist in the training. GEM not only helps to train deep layers of the neural network but also greatly improves the speed of the inference due to the removal of the residual branch.
The GEM was employed as the understructure of the Fast Elementary Reconstruction Block (FERB) to realize fast and efficient reconstruction. The testing and training pipelines of the FERB are shown in Figure 4a,b, respectively. Both input and output channels were set to 64. Relu modules, which only introduced a few hardware overheads, were applied to achieve the nonlinear fitting and the deeper stacking of network modules. The GEM helps to train parameters in deep layers for better results without any auxiliary structures. It should be noted that the GEM was removed to improve the inference speed because no intermediate stage outputs were required to assist the gradient.

3.2. Iterative Enhancement Module

The mapping solution of the DL is accompanied by a large number of parameters to realize the nonlinear transformations from the measurements to the original images. However, they are suspected of producing violent solutions that just fall into universal expressions of the golden mean, which is subject to the overall distribution of the training data. In this way, the final reconstruction tends to the distribution of the training datasets and the recovery of low-frequency information that is easy to be realized or just belongs to the common pattern. To this extent, the network practicability also becomes highly dependent on whether the testing data have a similar distribution as the training data. These methods do not connect and develop the predecessors’ exploration research and will have some weaknesses for practical use. Other DL methods such as [30] are firmly entrenched in the process of traditional methods to carry out the upsampling by Ψ T Y . In this way, the high dimensional nonlinear mapping realized by a large number of parameters in DL can not be fully utilized. To address the above issue, the effective Iterative Enhancement Module (IEM) based on a greedy reconstruction algorithm is proposed, which contains the following advantages:
  • Through hardware-friendly structures and high-speed parallel operation, better performance and faster speed can be obtained;
  • Better results can be obtained as the number of iterations increases;
  • Combined with the characteristics in the field of CS, the network structure can be explained to a certain extent.
The main flow of the IEM is illustrated in Figure 5. In the first iteration i = 1 , the input sampled data ( SD ) were sampled through the sampling matrix and were reconstructed as the initial reconstruction image IRI 1 :
SD = X SC IRI 1 = r e s h a p e ( SD UC 1 ) ,
where X and SD mean the input of the real image and the sampled data, respectively. SC and UC 1 represent the sampling matrix and the upsampling matrix realized by convolution 32 × 32 × 1 × ( 1024 r ) and 1 × 1 × ( 1024 r ) × ( 1024 ) (r represents the sampling ratio), respectively. Furthermore, the 1 × 1 convolution and pixel shuffle were used to stretch and reshape the upsampled information. The understructure of the upsampling and reshaping module is shown in Figure 6. All operations were easily deployed to the GPU to achieve high efficiency. The comparison of the performance, run time, and the settings of N I , N B are provided in Section 4.
In the following iteration i = 2 , 3 , N I , the stage sampled data ( SSD i ) were achieved by sampling SO i 1 , N B . SO i 1 , N B represents the stage output after N B FERB operations (mentioned in Section 3.1) in iteration i 1 . Thereafter, the remaining error between SD and SSD i was calculated. The Error Enhancement Image EEI i was stretched from the remaining error through upsampling and reshaping from the remaining error in the same way as i = 1 (defined in Equations (4) and (5)). The EEI i was added to the last iteration result SO i 1 , N B to strengthen it and compensate for its errors. In this process, the IRI i was calculated as follows:
SSD i = ( SO i 1 SC ) EEI i = r e s h a p e ( ( SD SSD i ) UC i ) IRI i = SO i 1 , N B + EEI i ,
where UC i represents the upsampling matrix to stretch the error in the compressed domain ( SD SSD i ) . Therefore, we realized various mapping interpretations to supplement the IRI i , which had a greater effect on the recovery process. In this way, the different supplementary information in different iteration rounds was obtained through the sampling error to strengthen the final results.
The stage SO i , j was calculated as follows:
SO i , j = F E R B i , j ( C N N 1 64 ( IRI i ) ) , if j = = 1 C N N 64 1 ( F E R B i , j ( ( IRI i ) ) , if j = = N B F E R B i , j ( SO i , j 1 ) , otherwise . ,
Here, F E R B i , j ( · ) means the j-th option in iteration i of the FERB. After N B FERB operations, the final stage output in iteration i can be represented as SO i , N B . To adapt to the I/O channel needs between the IEMs and FERBs, two convolution layers ( C N N 1 64 and C N N 64 1 ) were added to adjust the channel.
The pseudo-code is provided in Algorithm 1. The proposed method realized a reconstruction algorithm stage by stage. SD was absolutely calculated from the real image. As a result, we could keep employing this information to guide the reconstruction network. In this way, EEI i was calculated from SO i 1 , N B and the real sampled data SD at the beginning of each subsequent iteration i = 2 , 3 , N I (seen in Equation (5)). They were expected to play an important role in the induction and summary of the previous iteration results and were also regarded as important compensation to guide the next response.
Algorithm 1 PREDICTION of IEM
1:
PREDICT (Sampled data SD )
2:
     for each i [ 1 , N I ]  do
3:
           if  i == 1 then
4:
                  IRI i r e s h a p e ( SD UC 1 )
5:
           else
6:
                  SSD i ( SO i 1 , N B SC )
7:
                  EEI i r e s h a p e ( ( SD SSD i ) UC i )
8:
                  IRI i SO i 1 , N B + EEI i
9:
           end if
10:
        for each j [ 1 , N B ]  do
11:
               if  j = = 1  then
12:
                      SO i , 1 F E R B i , 1 ( C N N 1 64 ( IRI i ) )
13:
               elif  j = = N B  then
14:
                      SO i , N B C N N 64 1 ( F E R B i , 1 ( IRI i ) )
15:
               else
16:
                      SO i , j F E R B i , j ( SO i , j 1 )
17:
               end if
18:
          end for
19:
     end for
20:
     return  SO N I , N B
Our proposed network aims not only to reconstruct the final output to be similar to the target but also to obtain the following detailed goals. On the one hand, in the continuous iterations, the IEMs and FERBs assume that the intermediate stage outputs SO i , N B can gradually approach the target, reflected by the pursuit of the minimization error between SSD i and SD in the compressed domain. In addition, the EEI i turns darker, indicating that the error between SO i , N B and the ground truth is gradually reduced to zero in continuous interations. On the other hand, it also means that the difficulty of reconstruction decreases gradually. Compared with the previous work [18], the network parameters need to realize the complete mapping from the sampled data to the target. However, now there is a series of EEI i that can supplement, improve, and strengthen the final output at the beginning of each round. Therefore, the parameters can be used more efficiently. In the process of the step-by-step iteration, the error supplement gradually attains the results, which means that the network has a certain interpretability rather than a completely violent solution process.

3.3. Frequency-Aware Weighted Loss

There are two problems encountered in the process of solving the underdetermined reconstruction. First, there is only a small amount of effective information contained in the measurements. To a large extent, it can only guide the reconstruction of low-frequency energy. Second, the DL methods are generally updated in the training. Because of the common encounter of low-frequency patterns, the network will gradually find the golden mean method that will not obtain the extreme penalty from the objective function. In such a scenario, it will excessively pursue the overall mean performance to achieve stability and reliability, rather than the integrity of the reconstructed information at fewer high-frequency points.
To address the abovementioned issue, the compensation method Frequency-Aware Weighted Loss (FAWL) is proposed. We expanded the outer edge by symmetric mapping. The examples of the original image and the extended image EI , with the expansion length E = 1 , are shown in Figure 7. The edge mapping means that pixels at the original edge could also be applied to the following formulas. In addition, it also ensured that the calculated weight information at the edge would not be ignored though there were insufficient surrounding pixels.
The frequency-aware coefficient masks CM in Figure 8 show that the farther away from the target pixel ( i , j ) , the darker the colors that emerged from the CM . In other words, long-distance pixels had less impact on calculating the high-frequency parts of the information.
The CM was calculated as follows:
D i , j , m , n = i m 2 + j n 2 CM i , j , m , n = 1 , i = m , j = n 1 D i , j , m , n , 0 < i m , j n E 0 , otherwise .
Here, D i , j , m , n means the distance between pixel ( i , j ) and ( m , n ) . CM i , j , m , n represents the coefficient mask, and only pixels in the range of E are perceived. E can also be understood as the range of frequency perception.
The changing information of the surrounding pixels within the range E was multiplied by the CM . The comprehensive frequency characteristic weight of the central pixel ( i , j ) was then obtained by synthesizing the frequency gradient. The frequency-aware weighted feature ( FAWF ) was calculated through (8).
FAWF i , j = m = i E i + E n = j E j + E CM i , j , m , n | E I i , j E I m , n | ( 2 E + 1 ) 2
Figure 9 shows the origin image and the results of the FAWF with E = 1 , 10 , and 50. Through the comparison of different FAWF s, it can be found that the range of the frequency perception was changed when E was selected differently. When the range E was relatively small, the FAWF was very sensitive to the rapid pixel change in the edge and better provided a higher concentration for high-frequency information. When E was set relatively large, it acted as a broader frequency sensing effect, paying more attention to the wider scale information. The surrounding information was more displayed in the form of color blocks in FAWF when E = 10 and 50. When the setting E , the FAWF tended to a constant color area and represented an invalid feature without any key weighted information.
After obtaining the FAWF , the FAWL was calculated as follows:
M S E F ( O , T ) = λ i = 0 W 1 j = 0 H 1 O i , j T i , j 2 FAWF i , j W H F A W L ( O , T ) = M S E ( O , T ) + M S E F ( O , T ) 2 ,
where W , H represent the size of output image O and target image T , respectively. λ represents the balance between M S E ( O , T ) and M S E F ( O , T ) , where λ was set to 10 to ensure that they were in the same order of magnitude and could be regarded as empirical parameters.
Finally, combined with the IEM, the loss function was set as follows:
l o s s = 1 i N I λ i i N I λ i [ 2 f ( SO i , N B ) + f ( IRI i ) + j N B f ( GEMO i , j ) ] ,
where λ i was set to 10 when i = N I and 5 otherwise to retain the feedback balance between each iteration and pay more attention to the last one. f ( · ) represents F A W L ( · , T ) , which was defined in Equation (9). The corresponding output of the GEM was expressed as GEMO i , j .
The F A W L can be perfectly integrated with all loss functions based on the M S E . It is very suitable for the proposed reconstruction method because different concerns of high-frequency information can be introduced into different GEMs. The only disadvantage is that the early data preprocessing may require more preparation time compared with the direct use of M S E . However, the time will not be reflected in the process of inference. In Section 4.7, we evaluate the effectiveness of the loss.

4. Experiment

4.1. Datasets

The datasets for training and testing were prepared according to the experimental details in [18]. The BSDS500 [38] was applied as the training set. The 400 images in BSDS500 were cropped as small patches of 96 × 96 pixels with 57 strides. One patch was augmented into eight (i.e., the original image, flipped, rotation 90, rotation 90 plus flipped, rotation 180, rotation 180 plus flipped, rotation 270, and rotation 270 plus flipped). Finally, all 89,600 images were used as the training set. Set5 [39], Set14 [40], and BSD100 [41] were employed as the testing sets because they have been widely used in almost all similar tasks and works. The specific information of these datasets is listed in Table 2.

4.2. Index

To quantitatively evaluate the performance of different methods, the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) were employed. The PSNR is commonly used to quantify the reconstruction quality of images and videos. The PSNR was derived as:
MSE = 1 W H i = 0 W 1 j = 0 H 1 O i , j T i , j 2 PSNR = 10 log MAX 2 MSE ,
where MAX represents the range of pixel values and is calculated as 255 in the case of 8-bit images. W and H are the width and height of the images, respectively. In addition, O and T are the matrices of the output image and target label image, respectively.
SSIM [42], with the best possible value being 1, pays more attention to capturing the structural features of images. It reflects the similarity of the two images, and a larger SSIM indicates better performance [43]. The SSIM was derived as:
SSIM = 2 μ O μ T + c 1 2 σ O T + c 2 μ O 2 + μ T 2 + c 1 σ O 2 + σ T 2 + c 2 ,
where μ , σ , and σ O T are the mean of matrix ∗, the standard deviation of matrix ∗, and the covariance between matrix O , T , respectively. c 1 and c 2 are two constants to avoid dividing by 0 and were set as ( 0.01 MAX ) 2 and ( 0.03 MAX ) 2 , respectively.
When processing multichannel images, they are converted into YCbCr format, and then the PSNR and SSIM can be calculated on the Y channel as the result [18].

4.3. Settings

The settings of the training hyperparameters are detailed in Table 3. Due to the IEMs, the sampling matrix in our network structure was used N I times in the iterative process. The parameters for the reconstruction process were updated with the change in the sampling matrix. Therefore, it is not recommended for the sampling matrix to fluctuate significantly in the iterative process. In this way, the learning rate of the sampling parameters ( L S A M ) was set to be smaller than that of the reconstruction parameters ( L R E C ). The construction and the adjustment process of the model were both implemented on the open-source framework Pytorch 1.6.0 with Python v3.7. All experiments were conducted on a CPU (Intel Xeon CPU E5-2678 v3 @ 2.50 GHz) and one GPU (GeForce RTX 1080 Ti).
Four traditional methods, including discrete wavelet transform (DWT) [44], total variation (TV) [23], multi-hypothesis (MH) [21], and sparse group representation (GSR) [45], were used for comparison in terms of the run time and the quantitative evaluation of the PSNR and SSIM. We also implemented four methods based on DL, including ReconNet [27], ISTANet++ [32], CSNET [18], and AMPNet [30], as the comparative baseline under the same software framework and hardware environment. The four DL works were reproduced according to their papers and tested by the unified testing sets introduced in Section 4.1. It should be noted that if the performance of the reproduced module was lower than the results provided in their papers, we took the specific performance index from the original works. In addition, ReconNet [27] needed to be cascaded with a BM3D denoiser. The BM3D took more than ten seconds for each 256 × 256 image in practical use. Therefore, in the testing of the DL methods, the qualitative and quantitative evaluations were both conducted to compare their performance more intuitively, without any auxiliary filter or denoiser cascaded.
There were two hyperparameter settings in our proposed model: the number of iterations N I and the number of blocks N B in each iteration, which had a decisive impact on the run time and performance of the whole network. As a representative, two parameter settings were employed to show the performance and applied to different situations. The first was N I = 2 , N B = 1 , and the second was N I = 6 , N B = 1 , which were represented as EiCSNet2*1 and EiCSNet6*1, respectively. EiCSNet2*1 was a smaller and faster choice, which could achieve the highest speed under the condition of excellent reconstruction performance. EiCSNet6*1 could achieve the best recovery performance at this stage, and its run time was also less than other SOTA methods.

4.4. Quantitative Evaluation

Table 4 lists the PSNR and SSIM performance tested on the three testing sets and seven different sampling rates ( 1 % , 5 % , 10 % , 20 % , 30 % , 40 % , and 50 % ). The proposed method achieved the best results in all datasets and sampling rates. Compared with the three SOTA methods (ISTANet++ [32], CSNET [18], and AMPNet [30]), our proposed compact model EiCSNet2*1 achieved similar or even slightly better results of PSNR and SSIM . Furthermore, it had the highest speed (refer to Section 4.6). The results of EiCSNet6*1 had a much better performance improvement, especially under the condition of a low compression rate, such as r = 1 % . The average PSNR was increased by nearly 0.4 dB in all datasets and ratios, which is very helpful for the image reconstruction quality and practical use of CS.
Furthermore, some other recent works were taken into consideration. Because these works used different testing sets or the performance in the paper was not successfully reproduced, the indicators of the PSNR and SSIM were directly extracted from the paper [46] for comparison, as shown in Table 5. The proposed method was trained and tested under the same conditions described in [46] and showed the best performance on each testing set and under every sampling rate.
The better reconstruction performance was due to our proposed method integrating more CS characteristics, the segmentation and simplification of the reconstruction task, and the multilevel enhancement. In this way, the proposed method had better performance.

4.5. Qualitative Evaluation

For the qualitative evaluation, three sampling ratios ( 10 % , 20 % , and 30 % ) were selected for comparison. Images were randomly selected from each of the three datasets to fully demonstrate the intuitive performance of the reconstruction. To show the details of the results more clearly, we display the full images and enlarged parts simultaneously in Figure 10. In addition, the indicators of the whole images and detailed parts are calculated and listed. Clearer texture information and similar and sharper shapes were shown in the results of the proposed method. Our proposed model had fewer artifacts or blurred parts in the reconstruction images than the conventional counterparts. The comparison between different methods fully showed that our method had greater advantages in the processing of picture details.

4.6. Inference Speed

Images with a fixed size of 256 256 were employed to compare the run time of the different methods. The batch size B was set as 1 to carry out the actual run time of every single image. All images were cycled 15 times, and the last 10 times were recorded to obtain the average time for each image. The average run time of the different methods is shown in Table 6.
From Table 6, it can be seen that the traditional methods had severe shortcomings in speed, even when they were run on the CPU device. The run speed on the GPU of our method was much higher than all existing methods. The EiCSNet2*1 model achieved a similar performance and a nearly seven times speed improvement over the SOTA AMPNet (both shown in Table 4 and Table 6). In addition, our large model EiCSNet6*1 achieved the best PSNR and SSIM performance, ensuring that the speed was still higher than the SOTA methods. The speed improvement benefitted from the hardware-friendly structures, such as the IEM and FERB, which greatly improved the parallel efficiency and reduced the unnecessary memory access. In each iteration process, the completion and enhancement of the image increased the interpretability of the network and simplified the task difficulty of the network. Hence, it was possible to obtain good results with fewer auxiliary operations.
We also compared the detailed processes of sampling, upsampling, and reshaping between all DL methods on the GPU to show the efficiency of the image restoration. From Table 7 and Figure 11, it can be seen that the run time of our method was very stable under different sampling rates. Our method avoided the time loss caused by inefficient networks or auxiliary structures through our hardware-friendly designs.

4.7. Ablation Experiment

In the ablation study, we explored the effectiveness and settings of our proposed network structure to achieve the best performance. Furthermore, we also conducted ablation experiments on the FAWL and GEM to illustrate the improvements they brought to the task of CS recovery.
The Verification of the Module Settings: We calculated the average PSNR and SSIM from the three datasets and seven compression ratios under different network structure settings. They are all detailed in Table 8 and shown in Figure 12 to visualize the differences. The structure settings are identified in N I N B format. We set up 10 models ( 2 × 1 , 3 × 1 , 4 × 1 , 5 × 1 , 6 × 1 , 2 × 2 , 2 × 3 , 3 × 2 , 3 × 3 , and 1 × 5 ) to explore better structure settings.
Because the nonlinear operations were almost concentrated in the block convolution operations, we compared the performance difference of the network structure with the same total number of blocks ( 4 × 1 , 2 × 2 , 6 × 1 , 3 × 2 , 2 × 3 , 5 × 1 , and 1 × 5 ). It can be seen that, when the total number of convolutions was the same, the higher the iterative reinforcement was, the more effective the network was. Therefore, we found that it was more effective to distribute the convolution calculation more dispersed to each iteration, that is, setting N B to 1 had better cost performance. Furthermore, it explained the operation of the DL network in making up for the errors in the step-by-step iterative process.
From the data of 2 × 1 , 3 × 1 , 4 × 1 , 5 × 1 , and 6 × 1 , we found that with the increase in the N I , the network performance improved. The experiment ended when N I = 6 . It was mainly considered that the continuous increase in the network would not greatly improve the performance. Furthermore, we found that EiCSNet6*1 also had more advantages in both speed and performance than the existing methods.
FAWL and GEM: To verify the effectiveness of the functions of each part, we tested the final performance of the two network models at the ratio of 0.01 in four different cases as follows:
  • Nothing: neither FAWL nor GEMs were set. Only MSE loss worked;
  • W / O FAWL: no FAWL was set, but the GEMs played a part in the training process;
  • W / O GEM: no GEM was added, but the FAWL was considered;
  • ALL: both FAWL and GEM acted with united strength.
The results of the PSNR and SSIM are tabulated and shown in Table 9. It can be seen clearly that with the supplementation of FAWL, the network performance improved in different models. Furthermore, with the addition of the GEM, the training of the network became relatively simple and stable to support the better convergence of this task. Additionally, when the network structure tended to be deeper, a larger improvement range was introduced by the GEM. In contrast, in a shallower network structure, there was only a small performance increase.

5. Conclusions

In this paper, EiNet was proposed for better and faster CS image reconstruction. The FERB based on the GEM reduced the additional auxiliary structures and improved the parallel efficiency with no performance degradation. The IEM combined the characteristics and requirements of CS, which made the whole network structure more efficient and compact and obtained a significant performance improvement. FAWL made the image reconstruction network more effective and robust, avoiding the blurred performance in high-frequency information. Our method not only achieved better reconstruction performance but also was nearly seven times faster than other SOTA methods during the inference process. There is a strong potential to run the model on a mobile terminal, which may be valuable to future CS image restoration.

Author Contributions

Methodology, Z.Z.; software, Z.Z.; investigation, F.L.; data curation, Z.W.; writing—original draft preparation, Z.Z.; writing—review and editing, H.S.; visualization, F.L.; supervision, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CSCompressed Sensing
DNNDeep Neural Networks
CNNConvolutional Neural Network
EiCSNetEfficient Iterative Neural Network for CS Reconstruction
DLDeep Learning
GPUGraphics Processing Unit
MRIMagnetic Resonance Imaging
PSNRPeak Signal-to-Noise Ratio
SSIMStructural Similarity
SOTAState-Of-The-Art
PLProjected Landweber
MPMatching Pursuit
AMPApproximate Message Passing
OMPOrthogonal MP
TVTotal Variation
GSRGroup Sparse Representation
MHMulti-Hypothesis
DWTDiscrete Wavelet Transform
IEMIterative Enhancement Module
FERBFast Elementary Reconstruction Block
GEMGradient-Extraction Module
FAWLFrequency-Aware Weighted Loss
ResNet  Residual Network
MBMain Branch of ResNet
RBResidual-link Branch of ResNet
rSampling ratio
X Input signal or the real image input
Φ Sampling matrix
Y CS measurement (sampled data)
Ψ Sparse domain
f θ ( ) Mapping to solve underdetermined equations with learnable parameters θ
EEI Error Enhancement Image
IRI Initial Reconstruction Image
N I Number of iterations
N B Number of FERBs
SD Sampled Data
SC Sampling matrix realized by convolution 32 × 32 × 1 × ( 1024 r )
UC Upsampling matrix realized by convolution 1 × 1 × ( 1024 r ) × ( 1024 )
SO Stage Output
SSD Stage Sampled Data
EI Extended Image
EExpansion length
CM Frequency-Aware Coefficient Mask
FAWF Frequency-Aware Weighted Feature
GEMO Corresponding output of GEM

References

  1. Shannon, C. Communication in the Presence of Noise. Proc. Inst. Radio Eng. 1949, 37, 10–21. [Google Scholar] [CrossRef]
  2. Donoho, D. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
  3. Arjoune, Y.; Kaabouch, N.; El Ghazi, H.; Tamtaoui, A. A performance comparison of measurement matrices in compressive sensing. Int. J. Commun. Syst. 2018, 31, e3576. [Google Scholar] [CrossRef]
  4. Ye, D.; Ni, Z.; Wang, H.; Zhang, J.; Wang, S.; Kwong, S. CSformer: Bridging Convolution and Transformer for Compressive Sensing. arXiv 2021, arXiv:2112.15299. [Google Scholar]
  5. Lustig, M.; Donoho, D.L.; Santos, J.M.; Pauly, J.M. Compressed sensing MRI. IEEE Signal Process Mag. 2008, 25, 72–82. [Google Scholar] [CrossRef]
  6. Caiafa, C.F.; Cichocki, A. Multidimensional compressed sensing and their applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2013, 3, 355–380. [Google Scholar] [CrossRef]
  7. Choi, J.W.; Shim, B.; Ding, Y.; Rao, B.; Kim, D.I. Compressed sensing for wireless communications: Useful tips and tricks. IEEE Commun. Surv. Tutor. 2017, 19, 1527–1550. [Google Scholar] [CrossRef] [Green Version]
  8. Korde-Patel, A.; Barry, R.K.; Mohsenin, T. Compressive Sensing Based Space Flight Instrument Constellation for Measuring Gravitational Microlensing Parallax. Signals 2022, 3, 559–576. [Google Scholar] [CrossRef]
  9. Gerstoft, P.; Mecklenbräuker, C.F.; Seong, W.; Bianco, M. Introduction to special issue on compressive sensing in acoustics. J. Acoust. Soc. Am. 2018, 143, 3731–3736. [Google Scholar] [CrossRef] [Green Version]
  10. Liu, Y.; Li, M.; Pados, D.A. Motion-Aware Decoding of Compressed-Sensed Video. IEEE Trans. Circuits Syst. Video Technol. 2013, 23, 438–444. [Google Scholar] [CrossRef]
  11. Azghani, M.; Karimi, M.; Marvasti, F. Multihypothesis Compressed Video Sensing Technique. IEEE Trans. Circuits Syst. Video Technol. 2016, 26, 627–635. [Google Scholar] [CrossRef]
  12. Shi, W.; Liu, S.; Jiang, F.; Zhao, D. Video Compressed Sensing Using a Convolutional Neural Network. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 425–438. [Google Scholar] [CrossRef]
  13. Chen, S.S.; Donoho, D.L.; Saunders, M.A. Atomic decomposition by basis pursuit. SIAM Rev. 2001, 43, 129–159. [Google Scholar] [CrossRef] [Green Version]
  14. Mallat, S.G.; Zhang, Z. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 1993, 41, 3397–3415. [Google Scholar] [CrossRef] [Green Version]
  15. Bertero, M.; Boccacci, P.; De Mol, C. Introduction to Inverse Problems in Imaging; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
  16. Tropp, J.A.; Gilbert, A.C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef] [Green Version]
  17. Donoho, D.L.; Tsaig, Y.; Drori, I.; Starck, J.L. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inf. Theory 2012, 58, 1094–1121. [Google Scholar] [CrossRef]
  18. Shi, W.; Jiang, F.; Liu, S.; Zhao, D. Image Compressed Sensing Using Convolutional Neural Network. IEEE Trans. Image Process. 2020, 29, 375–388. [Google Scholar] [CrossRef]
  19. Mun, S.; Fowler, J.E. Residual reconstruction for block-based compressed sensing of video. In Proceedings of the 2011 Data Compression Conference, Snowbird, UT, USA, 29–31 March 2011; pp. 183–192. [Google Scholar]
  20. Haupt, J.; Nowak, R. Signal reconstruction from noisy random projections. IEEE Trans. Inf. Theory 2006, 52, 4036–4048. [Google Scholar] [CrossRef]
  21. Chen, C.; Tramel, E.W.; Fowler, J.E. Compressed-sensing recovery of images and video using multihypothesis predictions. In Proceedings of the IEEE 2012 46th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 4–7 November 2012; pp. 1193–1198. [Google Scholar]
  22. Gan, L. Block compressed sensing of natural images. In Proceedings of the International Conference on Digital Signal Processing, Cardiff, UK, 1–4 July 2007; pp. 403–406. [Google Scholar]
  23. Chengbo Li, W.Y.; Zhang, Y. TVAL3: TV Minimization by Augmented Lagrangian and Alternating Direction Agorithm 2009. Available online: https://www.caam.rice.edu/optimization/L1/TVAL3/ (accessed on 1 January 2013).
  24. Saba, T.; Rehman, A.; Haseeb, K.; Bahaj, S.A.; Jeon, G. Energy-Efficient Edge Optimization Embedded System Using Graph Theory with 2-Tiered Security. Electronics 2022, 11, 2942. [Google Scholar] [CrossRef]
  25. Wang, R.; Qin, Y.; Wang, Z.; Zheng, H. Group-Based Sparse Representation for Compressed Sensing Image Reconstruction with Joint Regularization. Electronics 2022, 11, 182. [Google Scholar] [CrossRef]
  26. Tian, X.; Wei, G.; Wang, J. Target Location Method Based on Compressed Sensing in Hidden Semi Markov Model. Electronics 2022, 11, 1715. [Google Scholar] [CrossRef]
  27. Kulkarni, K.; Lohit, S.; Turaga, P.; Kerviche, R.; Ashok, A. ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Measurements. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 449–458. [Google Scholar]
  28. Mousavi, A.; Patel, A.B.; Baraniuk, R.G. A deep learning approach to structured signal recovery. In Proceedings of the 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 29 September–2 October 2015; pp. 1336–1343. [Google Scholar]
  29. Zhang, J.; Ghanem, B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1828–1837. [Google Scholar]
  30. Zhang, Z.; Liu, Y.; Liu, J.; Wen, F.; Zhu, C. AMP-Net: Denoising-Based Deep Unfolding for Compressive Image Sensing. IEEE Trans. Image Process. 2021, 30, 1487–1500. [Google Scholar] [CrossRef]
  31. Zheng, B.; Zhang, J.; Sun, G.; Ren, X. EnGe-CSNet: A Trainable Image Compressed Sensing Model Based on Variational Encoder and Generative Networks. Electronics 2021, 10, 89. [Google Scholar] [CrossRef]
  32. You, D.; Xie, J.; Zhang, J. ISTA-NET++: Flexible Deep Unfolding Network for Compressive Sensing. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
  33. Li, N.; Zhou, C.C. AMPA-Net: Optimization-Inspired Attention Neural Network for Deep Compressed Sensing. In Proceedings of the 2020 IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020; pp. 1338–1344. [Google Scholar]
  34. Sultana, F.; Sufian, A.; Dutta, P. A review of object detection models based on convolutional neural network. In Intelligent Computing: Image Processing Based Applications. Advances in Intelligent Systems and Computing; Springer: Singapore, 2020; pp. 1–16. [Google Scholar]
  35. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
  36. Lan, Z. Applications of BP, Convolutional and RBF Networks. In Proceedings of the 2021 2nd International Conference on Computing and Data Science (CDS), Stanford, CA, USA, 28–29 January 2021; pp. 543–547. [Google Scholar] [CrossRef]
  37. Liu, M.; Chen, L.; Du, X.; Jin, L.; Shang, M. Activated Gradients for Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–13. [Google Scholar] [CrossRef]
  38. Arbeláez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour Detection and Hierarchical Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 898–916. [Google Scholar] [CrossRef] [Green Version]
  39. Bevilacqua, M.; Roumy, A.; Guillemot, C.; line Alberi Morel, M. Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC), Surrey, UK, 7–10 September 2012; pp. 135.1–135.10. [Google Scholar] [CrossRef] [Green Version]
  40. Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the International Conference on Curves and Surfaces, Avignon, France, 24–30 June 2010; pp. 711–730. [Google Scholar]
  41. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
  42. Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
  43. Wang, M.; Wei, S.; Liang, J.; Zhou, Z.; Qu, Q.; Shi, J.; Zhang, X. TPSSI-Net: Fast and Enhanced Two-Path Iterative Network for 3D SAR Sparse Imaging. IEEE Trans. Image Process. 2021, 30, 7317–7332. [Google Scholar] [CrossRef]
  44. Mun, S.; Fowler, J.E. Block compressed sensing of images using directional transforms. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 3021–3024. [Google Scholar] [CrossRef]
  45. Zhang, J.; Zhao, D.; Gao, W. Group-Based Sparse Representation for Image Restoration. IEEE Trans. Image Process. 2014, 23, 3336–3351. [Google Scholar] [CrossRef] [Green Version]
  46. Song, J.; Chen, B.; Zhang, J. Memory-Augmented Deep Unfolding Network for Compressive Sensing. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual event, China, 20–24 October 2021; pp. 4249–4258. [Google Scholar] [CrossRef]
  47. Zhou, S.; He, Y.; Liu, Y.; Li, C.; Zhang, J. Multi-Channel Deep Networks for Block-Based Image Compressive Sensing. IEEE Trans. Multimedia 2021, 23, 2627–2640. [Google Scholar] [CrossRef]
  48. Zhang, J.; Zhao, C.; Gao, W. Optimization-Inspired Compact Deep Compressive Sensing. IEEE J. Sel. Top. Sign. Proces. 2020, 14, 765–774. [Google Scholar] [CrossRef]
Figure 1. The overview of the proposed method. The real image input X is marked by a solid black wireframe. The error enhancement images EEI and initial reconstruction images IRI represent the intermediate variables and the outputs of N I Iterative Enhancement Modules (IEMs, introduced in Section 3.2), which are marked as blue solid and red dashed boxes, respectively. Each iterative recovery contains N B Fast Elementary Reconstruction Blocks (FERBs, introduced in Section 3.1) to accomplish the nonlinear problem solution of image restoration. The outputs of the FERBs are marked with green boxes. Two kinds of CNN blocks marked by blue and green represent two kinds of one-layer CNN. They are employed to adjust the I/O channels to adapt to the FERBs and IEMs.
Figure 1. The overview of the proposed method. The real image input X is marked by a solid black wireframe. The error enhancement images EEI and initial reconstruction images IRI represent the intermediate variables and the outputs of N I Iterative Enhancement Modules (IEMs, introduced in Section 3.2), which are marked as blue solid and red dashed boxes, respectively. Each iterative recovery contains N B Fast Elementary Reconstruction Blocks (FERBs, introduced in Section 3.1) to accomplish the nonlinear problem solution of image restoration. The outputs of the FERBs are marked with green boxes. Two kinds of CNN blocks marked by blue and green represent two kinds of one-layer CNN. They are employed to adjust the I/O channels to adapt to the FERBs and IEMs.
Electronics 12 00030 g001
Figure 2. The flow chart of the basic block of ResNet and common CNN options. (a) The structure of ResNet, (b) the variant of ResNet applied when the I/O channels are different, and (c) the basic CNN options without RB.
Figure 2. The flow chart of the basic block of ResNet and common CNN options. (a) The structure of ResNet, (b) the variant of ResNet applied when the I/O channels are different, and (c) the basic CNN options without RB.
Electronics 12 00030 g002
Figure 3. The flow chart of GEM.
Figure 3. The flow chart of GEM.
Electronics 12 00030 g003
Figure 4. The pipeline of FERB. (a) Testing pipeline. (b) Training pipeline.
Figure 4. The pipeline of FERB. (a) Testing pipeline. (b) Training pipeline.
Electronics 12 00030 g004
Figure 5. The flow chart of IEM.
Figure 5. The flow chart of IEM.
Electronics 12 00030 g005
Figure 6. The understructure of the upsampling and reshaping module. The first stage is to stretch the dimension of SD into the same dimension as X to obtain the IRI through convolution, and the second stage is the hardware-friendly reshaping based on the pixel shuffle.
Figure 6. The understructure of the upsampling and reshaping module. The first stage is to stretch the dimension of SD into the same dimension as X to obtain the IRI through convolution, and the second stage is the hardware-friendly reshaping based on the pixel shuffle.
Electronics 12 00030 g006
Figure 7. The original image and the extended image. The processed images are shown on the right. Black arrows represent the symmetric mapping.
Figure 7. The original image and the extended image. The processed images are shown on the right. Black arrows represent the symmetric mapping.
Electronics 12 00030 g007
Figure 8. The frequency-away mask.
Figure 8. The frequency-away mask.
Electronics 12 00030 g008
Figure 9. The result of the frequency-away weight feature when E = 1 , 10 , and 50.
Figure 9. The result of the frequency-away weight feature when E = 1 , 10 , and 50.
Electronics 12 00030 g009
Figure 10. The results of the qualitative evaluation. (a) Ground truth, (b) ReconNet, (c) ISTANet++, (d) CSNET, (e) AMPNet, (f) EiCSNet2*1, and (g) EiCSNet6*1. The three ratios of 0.1, 0.2, and 0.3 and one image from each of the three datasets are selected for comparison. Parts of the whole image are marked with red boxes. The marked parts are shown in an enlarged view below the corresponding images. The indicators of the complete and enlarged images are calculated and listed.
Figure 10. The results of the qualitative evaluation. (a) Ground truth, (b) ReconNet, (c) ISTANet++, (d) CSNET, (e) AMPNet, (f) EiCSNet2*1, and (g) EiCSNet6*1. The three ratios of 0.1, 0.2, and 0.3 and one image from each of the three datasets are selected for comparison. Parts of the whole image are marked with red boxes. The marked parts are shown in an enlarged view below the corresponding images. The indicators of the complete and enlarged images are calculated and listed.
Electronics 12 00030 g010
Figure 11. The time consumption of the sampling, upsampling, and reshaping of the different methods. All images were processed with B = 1 .
Figure 11. The time consumption of the sampling, upsampling, and reshaping of the different methods. All images were processed with B = 1 .
Electronics 12 00030 g011
Figure 12. The average PSNR and SSIM for the verification of the best model settings. The average values are calculated from the three datasets and seven ratios. The best choice of 6 × 1 is marked with the red dotted line.
Figure 12. The average PSNR and SSIM for the verification of the best model settings. The average values are calculated from the three datasets and seven ratios. The best choice of 6 × 1 is marked with the red dotted line.
Electronics 12 00030 g012
Table 1. The comparison of three basic structures. Each structure is stacked ten times. The total times of the three structures are listed, which are tested for 1000 rounds under various inputs and outputs of dimensions W H C .
Table 1. The comparison of three basic structures. Each structure is stacked ten times. The total times of the three structures are listed, which are tested for 1000 rounds under various inputs and outputs of dimensions W H C .
Structure  Figure 2a  Figure 2b  Figure 2c
W H C
4 × 4 × 512 3.323.822.92
8 × 8 × 256 2.923.752.39
16 × 16 × 128 2.403.181.87
32 × 32 × 64 2.363.131.85
64 × 64 × 32 2.423.231.94
128 × 128 × 16 2.353.121.86
Table 2. Summary of the datasets.
Table 2. Summary of the datasets.
DatasetNumberComments
BSDS500500400 for training
Set555 for testing, unfixed size
Set141414 for testing, unfixed size
BSD100100100 for testing, fixed size
Table 3. Summary of training hyperparameter settings.
Table 3. Summary of training hyperparameter settings.
ParametersValue
Batch size B64
Learning rate for REC L R E C 0.0001
Learning rate for SAM L S A M 0.00001
Epoch E300
Table 4. The performance results from the different methods. All methods were tested with the three testing datasets and the seven sampling ratios. The indicators are shown in PSNR / SSIM format. The best indicators are marked in bold for each column.
Table 4. The performance results from the different methods. All methods were tested with the three testing datasets and the seven sampling ratios. The indicators are shown in PSNR / SSIM format. The best indicators are marked in bold for each column.
MethodRatioSET5SET14BSD100MethodRatioSET5SET14BSD100
DWT0.019.27/0.14028.97/0.09899.63/0.1067ISPA
Net++
0.0121.47/0.591820.69/0.517121.62/0.5089
0.0514.27/0.355914.52/0.293314.81/0.29350.0527.24/0.793325.41/0.682725.17/0.6420
0.124.74/0.768024.16/0.679823.46/0.63430.130.71/0.871328.14/0.772527.15/0.7274
0.230.83/0.874928.13/0.788227.26/0.75160.234.43/0.924631.30/0.857229.74/0.8215
0.333.61/0.905030.38/0.838929.23/0.81080.336.77/0.946933.61/0.901331.72/0.8763
0.435.32/0.924931.99/0.875330.72/0.85240.438.62/0.960935.52/0.929233.49/0.9124
0.536.87/0.940933.54/0.904432.17/0.88620.540.32/0.970737.19/0.948535.18/0.9381
Avg.26.42/0.701424.53/0.639823.90/0.6194Avg.32.80/0.865630.27/0.801229.15/0.7752
TV0.0115.53/0.455415.26/0.389015.98/0.3995CSNET0.0124.18/0.647822.83/0.563023.76/0.5484
0.0523.16/0.667822.24/0.581523.05/0.56900.0529.74/0.848526.93/0.733126.78/0.6976
0.127.07/0.786525.24/0.688725.46/0.66120.132.59/0.906229.13/0.816928.53/0.7834
0.230.45/0.870928.07/0.784427.58/0.75570.236.05/0.948132.15/0.894131.05/0.8721
0.332.75/0.910730.12/0.842429.27/0.81910.338.25/0.964434.34/0.929733.08/0.9171
0.434.89/0.936332.03/0.883730.86/0.86600.440.11/0.974036.16/0.950234.91/0.9443
0.536.75/0.954033.84/0.914832.46/0.90190.541.79/0.980337.89/0.963136.68/0.9618
Avg.28.66/0.797426.69/0.726426.38/0.7103Avg.34.67/0.895631.35/0.835730.68/0.8178
MH0.0118.08/0.447217.23/0.421818.21/0.4076AMP
Net
0.0123.48/0.610322.77/0.550223.58/0.5367
0.0523.67/0.656621.64/0.652821.36/0.51690.0529.80/0.844327.19/0.733626.81/0.6973
0.128.57/0.821126.38/0.743325.16/0.66730.133.28/0.909629.88/0.824728.78/0.7861
0.232.08/0.888129.47/0.827828.09/0.77460.236.57/0.946632.84/0.896031.31/0.8714
0.334.06/0.915831.37/0.873229.85/0.83070.338.89/0.963335.23/0.936433.61/0.9186
0.435.65/0.933733.03/0.908431.35/0.86950.441.05/0.973237.25/0.952135.53/0.9453
0.537.21/0.948234.52/0.931432.86/0.90120.542.72/0.981839.01/0.964837.37/0.9627
Avg.29.90/0.801527.66/0.765526.70/0.7097Avg.35.11/0.889932.02/0.836830.99/0.8169
GSR0.0118.87/0.490917.87/0.433718.90/0.4431EiCSNet
2*1
0.0124.41/0.651323.14/0.568923.86/0.5504
0.0524.95/0.727022.54/0.614022.16/0.56820.0530.00/0.849527.22/0.736426.87/0.7003
0.129.99/0.865427.50/0.770525.91/0.70710.133.13/0.909429.63/0.823428.72/0.7871
0.234.17/0.925731.22/0.864229.18/0.81560.236.46/0.947832.60/0.897631.28/0.8747
0.336.83/0.949233.74/0.907131.33/0.87230.338.81/0.964534.85/0.932633.40/0.9200
0.438.81/0.962635.78/0.933633.20/0.90960.440.83/0.974836.79/0.953435.42/0.9480
0.540.65/0.972437.66/0.952234.94/0.93590.542.75/0.981838.71/0.967037.41/0.9661
Avg.32.04/0.841929.47/0.782227.95/0.7503Avg.35.20/0.897031.85/0.839931.00/0.8209
Recon
Net
0.0120.60/0.510720.06/0.455721.10/0.4609EiCSNet
6*1
0.0124.61/0.668323.34/0.579623.97/0.5560
0.0524.92/0.660823.40/0.576823.74/0.55870.0530.48/0.860827.61/0.747227.05/0.7076
0.126.68/0.729424.74/0.638024.83/0.61470.133.62/0.915630.06/0.831028.96/0.7931
0.228.55/0.794426.10/0.698825.93/0.66930.237.11/0.951133.21/0.902731.63/0.8784
0.330.44/0.846527.74/0.760327.16/0.72620.339.40/0.966335.37/0.936133.81/0.9225
0.432.95/0.898529.92/0.834729.06/0.80610.441.33/0.975737.29/0.956135.82/0.9496
0.533.77/0.909430.54/0.851929.61/0.82550.543.31/0.982439.20/0.968637.83/0.9672
Avg.28.27/0.764226.07/0.688025.92/0.6659Avg.35.69/0.902932.29/0.845931.30/0.8249
Table 5. The performance results of the different methods under the testing datasets SET11 [27] and BSD68 [41]. The indicators are shown in PSNR / SSIM format. The best indicators are marked in bold for each column.
Table 5. The performance results of the different methods under the testing datasets SET11 [27] and BSD68 [41]. The indicators are shown in PSNR / SSIM format. The best indicators are marked in bold for each column.
MethodRatioSET11 [27]BSD68 [41]
BCS-Net
[47]
0.129.42/0.867327.98/0.8015
0.335.63/0.949532.70/0.9301
0.436.68/0.966735.14/0.9397
0.539.58/0.973436.85/0.9682
Avg.35.33/0.939233.17/0.9099
OPINE-NET+
[48]
0.129.81/0.890427.82/0.8045
0.335.79/0.954132.35/0.9215
0.437.96/0.963334.95/0.9261
0.540.19/0.980036.35/0.9660
Avg.35.97/0.946932.87/0.9045
MADUN
[46]
0.129.91/0.898628.15/0.8229
0.336.94/0.967633.35/0.9379
0.439.15/0.977235.42/0.9606
0.540.77/0.983237.11/0.9730
Avg.36.69/0.956733.50/0.9236
EiCSNet
2*1
0.130.42/0.917728.96/0.8517
0.336.46/0.972133.69/0.9461
0.438.78/0.981735.68/0.9650
0.540.94/0.987937.68/0.9773
Avg.36.65/0.964834.00/0.9350
EiCSNet
6*1
0.130.95/0.924029.20/0.8560
0.337.19/0.975034.09/0.9480
0.439.50/0.983336.12/0.9665
0.541.67/0.988638.11/0.9782
Avg.37.33/0.967734.38/0.9372
Note: The indicator values of BCS-Net, OPINE-NET+, and MADUN were extracted from paper [46]. Because the PSNR/SSIM values under the sampling ratios of 0.01, 0.05, and 0.2 were not provided, the other results were compared when the proposed method was trained and tested on the same dataset as in the paper [46].
Table 6. The run time of the different methods.
Table 6. The run time of the different methods.
MethodsRatio 0.01Ratio 0.1
DWT10.3176/-10.5539/-
TV2.4006/-2.7405/-
MH23.1006/-19.0405/-
GSR235.6297/-230.4755/-
ReconNet0.5193/0.02440.5258/0.0289
ISTA-Net++0.5550/0.03560.5785/0.0377
CSNET0.8960/0.02620.9024/0.0287
AMPNET0.5440/0.03610.5440/0.0396
EiCSNet2*10.1737/0.00520.1742/0.0054
EiCSNet6*10.4970/0.01410.4915/0.0141
Note: The run time is shown in the format of CPU(s)/GPU(s). All images were processed with B = 1.
Table 7. The time consumption of the sampling, upsampling, and reshaping of different methods. All images were processed with B = 1 .
Table 7. The time consumption of the sampling, upsampling, and reshaping of different methods. All images were processed with B = 1 .
Method Ratio 0.01Ratio 0.1Method Ratio 0.01Ratio 0.1
Recon
Net
sampling--AMP
Net
sampling0.000850.00094
upsampling0.000750.00076upsampling0.011420.01126
reshaping0.003300.00377reshaping0.006190.00697
All0.004050.00453All0.018460.01917
ISTA-
Net++
sampling0.003920.00426EiCSNet
2*1
sampling0.000390.00036
upsampling0.001400.00178upsampling0.000250.00026
reshaping0.005110.00581reshaping0.000180.00019
All0.010430.01185All0.000820.00081
CS
Net
sampling0.000310.00026EiCSNet
6*1
sampling0.000960.00087
upsampling0.000160.00016upsampling0.000600.00066
reshaping0.005300.00551reshaping0.000450.00050
All0.005770.00593All0.002010.00203
Note: The run time is shown in the format of CPU(s)/GPU(s). All images were processed with B = 1.
Table 8. The average PSNR and SSIM for verification of the best model settings. Each PSNR and SSIM is the average of the three datasets and seven ratios.
Table 8. The average PSNR and SSIM for verification of the best model settings. Each PSNR and SSIM is the average of the three datasets and seven ratios.
SettingsPSNRSSIMSettingsPSNRSSIM
2 × 1 32.68080.8526 2 × 2 32.70420.8535
3 × 1 32.87030.8550 2 × 3 32.72560.8541
4 × 1 33.01550.8565 3 × 2 32.93740.8561
5 × 1 33.03810.8572 3 × 3 32.96420.8566
6 × 1 33.09470.8579 1 × 5 32.27080.8487
Table 9. The average PSNR and SSIM for the ablation experiment. Each PSNR and SSIM was averaged over the three datasets.
Table 9. The average PSNR and SSIM for the ablation experiment. Each PSNR and SSIM was averaged over the three datasets.
Settings 2 × 1 6 × 1
PSNRSSIMPSNRSSIM
Nothing23.650.583923.790.5906
W/O F23.750.586723.920.5957
W/O GE23.720.583723.860.5917
ALL23.800.590223.970.6013
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, Z.; Wang, Z.; Liu, F.; Shen, H. EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction. Electronics 2023, 12, 30. https://doi.org/10.3390/electronics12010030

AMA Style

Zhou Z, Wang Z, Liu F, Shen H. EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction. Electronics. 2023; 12(1):30. https://doi.org/10.3390/electronics12010030

Chicago/Turabian Style

Zhou, Ziqun, Zeyu Wang, Fengyin Liu, and Haibin Shen. 2023. "EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction" Electronics 12, no. 1: 30. https://doi.org/10.3390/electronics12010030

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop