EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction

Zhou, Ziqun; Wang, Zeyu; Liu, Fengyin; Shen, Haibin

doi:10.3390/electronics12010030

Open AccessArticle

EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction

by

Ziqun Zhou

,

Zeyu Wang

,

Fengyin Liu

and

Haibin Shen

^*

College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310058, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(1), 30; https://doi.org/10.3390/electronics12010030

Submission received: 14 November 2022 / Revised: 29 November 2022 / Accepted: 30 November 2022 / Published: 22 December 2022

(This article belongs to the Special Issue Advances of Electronics Research from Zhejiang University)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid growth of sensing data demands compressed sensing (CS) in order to achieve high-density storage and fast data transmission. Deep neural networks (DNNs) have been under intensive development for the reconstruction of high-quality images from compressed data. However, the complicated auxiliary structures of DNN models in pursuit of better recovery performance lead to low computational efficiency and long reconstruction times. Furthermore, it is difficult for conventional neural network designs to reconstruct extra-high-frequency information at a very low sampling rate. In this work, we propose an efficient iterative neural network for CS reconstruction (EiCSNet). An efficient gradient extraction module is designed to replace the complex auxiliary structures in order to train the DNNs more efficiently. An iterative enhancement network is applied to make full use of the limited information available in CS for better iterative recovery. In addition, a frequency-aware weighted loss is further proposed for better image restoration quality. Our proposed compact model, EiCSNet2*1, improved the performance slightly and was nearly seven times faster than its counterparts, which shows that it has a highly efficient network design. Additionally, our complete model, EiCSNet6*1, achieved the best effect at this stage, where the average PSNR was improved by 0.37 dB for all testing sets and sampling rates.

Keywords:

compressed sensing reconstruction; efficient gradient extraction; frequency-aware; greedy iterative; CNN

1. Introduction

With the explosive growth of sensor devices, the acquisition, transmission, and processing of information are proliferating, which brings great challenges for data storage and transmission. In Nyquist sampling theory [1], the sampling process must meet the requirement that the lowest sampling frequency (denoted as the Nyquist sampling frequency) cannot be less than twice the bandwidth in the analog signal spectrum. However, under some extreme conditions with limited bandwidth and battery resources, such as in aerospace or underwater applications, high-quality sampling is difficult to achieve. In addition, radiology has high requirements for a shorter sampling time in order to reduce patients’ burden and discomfort as a result of claustrophobia. Therefore, new sampling theories and methods are needed in order to reduce the amount of sampled data.

Compressed sensing (CS) [2] theory has a high potential to break through the limitations of the traditional Nyquist sampling theory. CS theory demonstrates that information can still be reconstructed if it has been sampled significantly below the Nyquist rate [2]. It senses information as measurements through a measurement matrix. It has been theoretically guaranteed that recovery can be possible if the original signal and the measurement matrix satisfy some mathematical conditions [3]. Then, reconstruction algorithms are employed to decrypt the measurements and estimate the important information in the original signal.

In recent years, many algorithms have been proposed to deal with CS reconstruction. These methods can be divided into two categories: traditional and deep learning (DL) methods. Traditional methods usually have theoretical guarantees and simultaneously inherit interpretability. However, they inevitably suffer from a high computational cost dictated by interactive calculations [4]. In contrast, DL methods realize mapping from compressed data to the original signals by training large numbers of parameters in DNNs. However, the performance of some DL methods is mainly improved by a large DNN model size, ignoring the design of traditional algorithms. Moreover, with the addition of complicated auxiliary structures, it is not possible to make full use of graphics processing units (GPUs), and this severely reduces the run speed. In addition, high-frequency information is difficult to recover because the signals are sampled at a very low frequency.

CS is widely applied in magnetic resonance imaging (MRI) [5], hyperspectral imaging [6], wireless communications [7], space flight [8], and so on. However, these fields are still pursuing higher speeds and better reconstruction performance. For example, in the applications of space flight and MRI, there is hope to achieve a lower pressure of information transmission and a shorter sampling time, respectively, by reconstructing signals that are sampled with a lower compression ratio. Furthermore, imaging or acoustics [9] systems with certain real-time pursuits are eager to achieve the shortest reconstruction time.

In order to address the issues of the existing methods mentioned above and to make the application of CS to systems of signal compression and reconstruction more efficient, we propose an efficient iterative neural network for CS reconstruction (EiCSNet) and make the following contributions:

Unlike in the SOTA DL methods, a fast elementary reconstruction block based on a gradient-extraction module is proposed in order to realize efficient nonlinear mapping in the reconstruction task to speed up the inference without any complicated auxiliary structures added for inference;
Considering the known sampling matrix and measurements, the iterative enhancement module makes full use of the limited information available in order to achieve better reconstruction performance with the efficient designs employed in the operations of sampling, upsampling, and reshaping, thus greatly improving the run speed;
For poor reconstruction at a low sampling rate, a frequency-aware weighted loss that is suitable for CS is further proposed in order to pay more attention to the reconstruction of high-frequency information.

The rest of this paper is organized as follows. The related work is introduced and discussed in Section 2. Our proposed method is described in Section 3. The experimental results are provided in Section 4. Finally, the conclusion is drawn in Section 5.

2. Related Work

In this section, a brief review of CS and landmark work is provided. The methods can be divided into traditional and DL methods.

2.1. Compressed Sensing

CS breaks through the limitations of the Shannon–Nyquist sampling theorem [2]. It helps to simultaneously realize the processes of signal sampling and compression. Instead of directly measuring the signal

X

, it sets a non-adaptive linear projection in order to obtain the overall structure of

X

at a low sampling rate. Suppose that the signal is

X \in R^{N \times 1}

and the sampling matrix is

Φ \in R^{M \times N}

,

M < < N

; then, the measurement process can be expressed as follows:

Y = Φ X,

(1)

where

Y \in R^{M \times 1}

represents the CS measurement (sampled data). This means that the number of observations M is far lower than that of the signal N. The acquisition is carried out in the form of direct compressive sampling with no other form of sophisticated encoding. Therefore, the burden of quality reconstruction falls solely on the receiver side [10], which greatly reduces the acquisition, transmission, and storage of data.

Generally, underdetermined equations are impossible to solve completely. However, a large number of the signals, such as the natural images, can be approximated as sparse in some domains

Ψ

[11]. The sampling projection can obtain a more important structure or information of

X

in the sparse domain

Ψ

. On this basis, the CS theory makes the recovery of

X

from the corresponding measurements

Y

possible [12]. Then, most CS reconstruction problems can be addressed as optimization problems:

min_{X} {∥Ψ X∥}_{p}, such that Y = Φ X,

(2)

where

Ψ X

is the sparse representation in the

Ψ

domain.

{∥ * ∥}_{p}

means the p norm of vector ∗. When p is set to 1 or 0, the norm aims to characterize the sparsity of the vector. Therefore, the reconstruction of compressed signal

Y

can be understood as finding the solution

\hat{X}

of

X

with the maximum probability of generating

Y = Φ X

, where

Y

,

Ψ

, and

Φ

are the known parameters.

2.2. Traditional Method

Various methods, such as convex optimization, greedy algorithms, and total variation, have been proposed to solve the reconstruction problem.

The convex optimization methods usually translate the

L_{0}

norm constraint (nonconvex) in Equation (2) into a

L_{1}

norm problem (convex), so that the reconstruction of

\hat{X}

can be conducted [13]. They can achieve accurate and robust recovering results. However, the convex optimization methods usually suffer from high computational complexity and the “tweaked” requirement to process image signals.

To reduce the computational complexity, greedy algorithms such as Matching Pursuit (MP) [14] and Projected Landweber (PL) [15] have been proposed for CS reconstruction. Orthogonal MP (OMP) [16] and stage-wise OMP [17] take their source from MP algorithms. They generally supplement the results through the iterative residual. Compared with other traditional approaches, MP methods have relatively low computational complexity at the cost of lowering the reconstruction quality [18,19,20,21,22] and have been proposed as an alternative based on PL. They obtain the reconstructed image by successively projecting and thresholding. They not only have lower computational complexity than convex-programming approaches but are also flexible in terms of incorporating other additional optimization criteria [18]. However, the PL methods have to conduct numerous iterations to obtain the final results, and heavy matrix operations need to be executed in each iteration.

For the task of CS image reconstruction, some existing works have established more sophisticated models with Total Variation (TV) regularized constraint being employed to give more expression to image priors [23,24]. Paying more attention to image priors can obtain the overall recovering images more accurately and robustly but slow down the run speed. In addition, a few details may be lost due to the oversmooth reconstruction.

Though traditional methods have been widely used in many practical projects [25,26], a large number of iterations and heavy matrix operations greatly reduce the operation speed.

2.3. Deep Learning Methods

Due to powerful GPU devices and sufficient data resources, DL methods conducted on GPUs have become possible in the field of CS [27,28]. They have achieved excellent recovery performance and orders of magnitude speed improvement. Refs [27] firstly employed Convolution Neural Network (CNN) in CS and realized the non-iterative reconstruction, which also achieved the highest run speed at that time. However, it still consumed much time because of the picture rearrangement and the heavy fully connected network for upsampling. In [28], the Stacked Denoising Autoencoder was applied to capture statistical dependencies between the different elements of certain signals, which improved the signal recovery performance. In [29], ISTA-Net was proposed for CS reconstruction, with the iterative shrinkage-thresholding algorithm.

Compared with the traditional methods, they have greatly improved the recovery performance and run speed. Through the training of large data sets, DL methods realize more flexible nonlinear mapping, as shown in Equation (3),

\hat{X} = f_{θ} (Ψ, Y),

(3)

where

f_{θ} (*)

represents the mapping to solve underdetermined equations with learnable parameters

θ

. In this way, it can be more consistent with the statistical distribution of real image sets. In addition, it has been the focus of attention to efficiently utilize the information obtained from

Y

.

In [4], a hybrid framework was proposed to leverage the local spatial information from CNN and the global context provided by a transformer. However, its run time is similar to the traditional methods, which is hundreds of times that of the other SOTA DL methods. Ref. [30] unfolded the iterative denoising process of the well-known Approximate Message Passing (AMP) algorithm. It integrated deblocking modules to eliminate the blocking artifacts that usually appear in the CS of visual images. Ref. [31] built a trainable deep compressed sensing model by combining Convolution Generative Adversarial Networks and a Variational Autoencoder. In [32], ISTA-Net++ had the adaptability to handle CS problems with different ratios through a single model.

However, the abovementioned DL methods could not make full use of the large-scale parallel computing capability of GPU devices for three reasons. First, it is difficult to train the huge end-to-end DNNs without auxiliary structures, such as [18]. In this way, the heavy full connection between any two successive layers [27] and the auxiliary structures to strengthen nonlinear operations [4,18,30,33] greatly affects the utilization efficiency of the GPU devices. Second, some methods cut the image into blocks for processing [27] or reshape the upsampled matrix, which spends time on matrix splicing and the subsequent deblocking filtering. Finally, the unreasonable design of a network structure to realize the mapping may lead to needing a deeper network to achieve better results at the cost of more reconstruction time. Therefore, new DL models should be developed to improve the reconstruction speed by fully utilizing the GPU resources.

3. Methods

In order to achieve a higher reconstruction speed and keep the recovery performance as effective as possible, an efficient model EiCSNet is proposed by taking the characteristics of the DL methods and CS algorithms into consideration. The overview of EiCSNet, as shown in Figure 1, comprises the following modules:

The Gradient-Extraction Module (GEM) and Fast Elementary Reconstruction Block (FERB) aim for a hardware-friendly structure without auxiliary branches to speed up the inference of CS reconstruction (Section 3.1);
The Iterative Enhancement Module (IEM) better combines the elaborate design of traditional algorithms based on greedy iteration and makes full use of intermediate information to strengthen and supplement images to achieve efficient and high-quality reconstruction (Section 3.2);
The Frequency-Aware Weighted Loss (FAWL) function is proposed to pay particular attention to the reconstruction of high-frequency details (Section 3.3).

3.1. Hardware-Friendly GEM and FERB

Stochastic Gradient Descent [34,35] is widely used to train neural networks. However, the parameters in deeper networks cannot be fully trained because the gradients gradually decrease in backpropagation [36] with the growth of the layer numbers, which is noted as the vanishing gradient problem [37].

To solve the vanishing gradient problem, Residual Network (ResNet) was introduced in 2015 by researchers at Microsoft Research, which uses a technique called skip connections to skip training from a few layers and connect directly to the output. The advantage of adding this type of skip connection is that if any layer hurts the performance of the architecture then it will be skipped by regularization. In this case, it can effectively avoid network collapse by jumping through some layers. Furthermore, the gradients can also be transferred to a deeper network without too much attenuation.

The basic block of ResNet and traditional CNN options are shown in Figure 2. It can be divided into two branches: the Main Branch (MB) for the main nonlinear calculation and the Residual-link Branch (RB). The RB aims to realize a skip-connection from the deeper feature maps to the shallower ones so that the gradient variables can be recorded, preserved, and transferred effectively. When the dimensions of

I / O

feature maps are changed, there is a need for the RB to go through a

1 \times 1

convolution to keep the residual connection, as shown in Figure 2b.

However, the additional RB places an extra burden on the hardware. The two branches cannot be calculated, completed, and discarded simultaneously, so extra memory space and memory access are required to store and transfer the data. In addition, the RB also needs many operations of point-wise addition, and the additions can only be carried out until the end of the MB. As a result, the different operations of the two branches cannot make full use of the GPU, seriously degrading the parallel performance.

Table 1 lists the inference time of three basic structures in Figure 2. Under various input and output conditions, the structure in Figure 2a is faster than the variant one in Figure 2b. It can also be clearly seen that the res-free structure (Figure 2c) achieves the best performance in terms of inference time. This phenomenon is more obvious when the input and output feature maps become larger.

To address the vanishing gradient issue while achieving a high reconstruction speed, we propose a hardware-friendly Gradient-Extraction Module (GEM), as shown in Figure 3. The blue cubes and gray boxes in Figure 3 express feature maps and several convolution operations, respectively. The arrows with different sizes and shades of green represent the gradient calculated in the process of backpropagation. The outputs of each stage marked by green boxes are calculated from the intermediate feature maps in the forward process through GEMs. In the main branch, the smaller and lighter arrows represent how the overall size of the gradient gradually decreases or even disappears with the deeper network. It is noted that all CNN blocks work together to minimize the difference between the final output and the ground truth. In this way, the structure of the RB in Figure 2a,b is replaced by the GEM in the proposed reconstruction neural network. The deeper parameters obtain an extra effective gradient to drive faster and more effective training. In addition, it can also play a restrictive role in the training of deep layers to avoid fading away from the goal. The parameters of the deeper layers continuously perceive the objective loss function through the leaf branches of the GEMs. In this way, deeper leaf branches can help to transfer and enhance the gradient to achieve a better complete training effect, even if parts of the gradient disappear. Because the parameters of shallower layers are trained easily, there is no need to add GEM to assist in the training. GEM not only helps to train deep layers of the neural network but also greatly improves the speed of the inference due to the removal of the residual branch.

The GEM was employed as the understructure of the Fast Elementary Reconstruction Block (FERB) to realize fast and efficient reconstruction. The testing and training pipelines of the FERB are shown in Figure 4a,b, respectively. Both input and output channels were set to 64. Relu modules, which only introduced a few hardware overheads, were applied to achieve the nonlinear fitting and the deeper stacking of network modules. The GEM helps to train parameters in deep layers for better results without any auxiliary structures. It should be noted that the GEM was removed to improve the inference speed because no intermediate stage outputs were required to assist the gradient.

3.2. Iterative Enhancement Module

The mapping solution of the DL is accompanied by a large number of parameters to realize the nonlinear transformations from the measurements to the original images. However, they are suspected of producing violent solutions that just fall into universal expressions of the golden mean, which is subject to the overall distribution of the training data. In this way, the final reconstruction tends to the distribution of the training datasets and the recovery of low-frequency information that is easy to be realized or just belongs to the common pattern. To this extent, the network practicability also becomes highly dependent on whether the testing data have a similar distribution as the training data. These methods do not connect and develop the predecessors’ exploration research and will have some weaknesses for practical use. Other DL methods such as [30] are firmly entrenched in the process of traditional methods to carry out the upsampling by

Ψ^{T} Y

. In this way, the high dimensional nonlinear mapping realized by a large number of parameters in DL can not be fully utilized. To address the above issue, the effective Iterative Enhancement Module (IEM) based on a greedy reconstruction algorithm is proposed, which contains the following advantages:

Through hardware-friendly structures and high-speed parallel operation, better performance and faster speed can be obtained;
Better results can be obtained as the number of iterations increases;
Combined with the characteristics in the field of CS, the network structure can be explained to a certain extent.

The main flow of the IEM is illustrated in Figure 5. In the first iteration

i = 1

, the input sampled data (

SD

) were sampled through the sampling matrix and were reconstructed as the initial reconstruction image

{IRI}_{1}

:

\begin{matrix} SD & = X * SC \\ {IRI}_{1} & = r e s h a p e (SD * {UC}_{1}) \end{matrix},

(4)

where

X

and

SD

mean the input of the real image and the sampled data, respectively.

SC

and

{UC}_{1}

represent the sampling matrix and the upsampling matrix realized by convolution

32 \times 32 \times 1 \times (1024 * r)

and

1 \times 1 \times (1024 * r) \times (1024)

(r represents the sampling ratio), respectively. Furthermore, the

1 \times 1

convolution and pixel shuffle were used to stretch and reshape the upsampled information. The understructure of the upsampling and reshaping module is shown in Figure 6. All operations were easily deployed to the GPU to achieve high efficiency. The comparison of the performance, run time, and the settings of

N_{I}, N_{B}

are provided in Section 4.

In the following iteration

i = 2, 3, \dots N_{I}

, the stage sampled data (

{SSD}_{i}

) were achieved by sampling

{SO}_{i - 1, N_{B}}

.

{SO}_{i - 1, N_{B}}

represents the stage output after

N_{B}

FERB operations (mentioned in Section 3.1) in iteration

i - 1

. Thereafter, the remaining error between

SD

and

{SSD}_{i}

was calculated. The Error Enhancement Image

{EEI}_{i}

was stretched from the remaining error through upsampling and reshaping from the remaining error in the same way as

i = 1

(defined in Equations (4) and (5)). The

{EEI}_{i}

was added to the last iteration result

{SO}_{i - 1, N_{B}}

to strengthen it and compensate for its errors. In this process, the

{IRI}_{i}

was calculated as follows:

\begin{matrix} {SSD}_{i} & = ({SO}_{i - 1} * SC) \\ {EEI}_{i} & = r e s h a p e ((SD - {SSD}_{i}) * {UC}_{i}) \\ {IRI}_{i} & = {SO}_{i - 1, N_{B}} + {EEI}_{i}, \end{matrix}

(5)

where

{UC}_{i}

represents the upsampling matrix to stretch the error in the compressed domain

(SD - {SSD}_{i})

. Therefore, we realized various mapping interpretations to supplement the

{IRI}_{i}

, which had a greater effect on the recovery process. In this way, the different supplementary information in different iteration rounds was obtained through the sampling error to strengthen the final results.

The stage

{SO}_{i, j}

was calculated as follows:

{SO}_{i, j} = \{\begin{matrix} F E R B_{i, j} (C N N_{1 - 64} ({IRI}_{i})), & if j = = 1 \\ C N N_{64 - 1} (F E R B_{i, j} (({IRI}_{i})), & if j = = N_{B} \\ F E R B_{i, j} ({SO}_{i, j - 1}), & otherwise . \end{matrix},

(6)

Here,

F E R B_{i, j} (\cdot)

means the j-th option in iteration i of the FERB. After

N_{B}

FERB operations, the final stage output in iteration i can be represented as

{SO}_{i, N_{B}}

. To adapt to the I/O channel needs between the IEMs and FERBs, two convolution layers (

C N N_{1 - 64}

and

C N N_{64 - 1}

) were added to adjust the channel.

The pseudo-code is provided in Algorithm 1. The proposed method realized a reconstruction algorithm stage by stage.

SD

was absolutely calculated from the real image. As a result, we could keep employing this information to guide the reconstruction network. In this way,

{EEI}_{i}

was calculated from

{SO}_{i - 1, N_{B}}

and the real sampled data

SD

at the beginning of each subsequent iteration

i = 2, 3, \dots N_{I}

(seen in Equation (5)). They were expected to play an important role in the induction and summary of the previous iteration results and were also regarded as important compensation to guide the next response.

Algorithm 1 PREDICTION of IEM

1:: PREDICT (Sampled data $SD$ )
2:: for each $i \in [1, N_{I}]$ do
3:: if $i$ == 1 then
4:: ${IRI}_{i} \leftarrow r e s h a p e (SD * {UC}_{1})$
5:: else
6:: ${SSD}_{i} \leftarrow ({SO}_{i - 1, N B} * SC)$
7:: ${EEI}_{i} \leftarrow r e s h a p e ((SD - {SSD}_{i}) * {UC}_{i})$
8:: ${IRI}_{i} \leftarrow {SO}_{i - 1, N B} + {EEI}_{i}$
9:: end if
10:: for each $j \in [1, N_{B}]$ do
11:: if $j = = 1$ then
12:: ${SO}_{i, 1} \leftarrow F E R B_{i, 1} (C N N_{1 - 64} ({IRI}_{i}))$
13:: elif $j = = N_{B}$ then
14:: ${SO}_{i, N_{B}} \leftarrow C N N_{64 - 1} (F E R B_{i, 1} ({IRI}_{i}))$
15:: else
16:: ${SO}_{i, j} \leftarrow F E R B_{i, j} ({SO}_{i, j - 1})$
17:: end if
18:: end for
19:: end for
20:: return ${SO}_{N_{I}, N_{B}}$

Our proposed network aims not only to reconstruct the final output to be similar to the target but also to obtain the following detailed goals. On the one hand, in the continuous iterations, the IEMs and FERBs assume that the intermediate stage outputs

{SO}_{i, N_{B}}

can gradually approach the target, reflected by the pursuit of the minimization error between

{SSD}_{i}

and

SD

in the compressed domain. In addition, the

{EEI}_{i}

turns darker, indicating that the error between

{SO}_{i, N_{B}}

and the ground truth is gradually reduced to zero in continuous interations. On the other hand, it also means that the difficulty of reconstruction decreases gradually. Compared with the previous work [18], the network parameters need to realize the complete mapping from the sampled data to the target. However, now there is a series of

{EEI}_{i}

that can supplement, improve, and strengthen the final output at the beginning of each round. Therefore, the parameters can be used more efficiently. In the process of the step-by-step iteration, the error supplement gradually attains the results, which means that the network has a certain interpretability rather than a completely violent solution process.

3.3. Frequency-Aware Weighted Loss

There are two problems encountered in the process of solving the underdetermined reconstruction. First, there is only a small amount of effective information contained in the measurements. To a large extent, it can only guide the reconstruction of low-frequency energy. Second, the DL methods are generally updated in the training. Because of the common encounter of low-frequency patterns, the network will gradually find the golden mean method that will not obtain the extreme penalty from the objective function. In such a scenario, it will excessively pursue the overall mean performance to achieve stability and reliability, rather than the integrity of the reconstructed information at fewer high-frequency points.

To address the abovementioned issue, the compensation method Frequency-Aware Weighted Loss (FAWL) is proposed. We expanded the outer edge by symmetric mapping. The examples of the original image and the extended image

EI

, with the expansion length

E = 1

, are shown in Figure 7. The edge mapping means that pixels at the original edge could also be applied to the following formulas. In addition, it also ensured that the calculated weight information at the edge would not be ignored though there were insufficient surrounding pixels.

The frequency-aware coefficient masks

CM

in Figure 8 show that the farther away from the target pixel

(i, j)

, the darker the colors that emerged from the

CM

. In other words, long-distance pixels had less impact on calculating the high-frequency parts of the information.

The

CM

was calculated as follows:

\begin{matrix} D_{i, j, m, n} & = \sqrt{{(i - m)}^{2} + {(j - n)}^{2}} \\ {CM}_{i, j, m, n} & = \{\begin{matrix} 1, & i = m, j = n \\ \frac{1}{D_{i, j, m, n}}, & 0 < |i - m|, |j - n| \leq E \\ 0, & otherwise . \end{matrix} \end{matrix}

(7)

Here,

D_{i, j, m, n}

means the distance between pixel

(i, j)

and

(m, n)

.

{CM}_{i, j, m, n}

represents the coefficient mask, and only pixels in the range of E are perceived. E can also be understood as the range of frequency perception.

The changing information of the surrounding pixels within the range E was multiplied by the

CM

. The comprehensive frequency characteristic weight of the central pixel

(i, j)

was then obtained by synthesizing the frequency gradient. The frequency-aware weighted feature (

FAWF

) was calculated through (8).

\begin{matrix} {FAWF}_{i, j} & = \sum_{m = i - E}^{i + E} \sum_{n = j - E}^{j + E} \frac{{CM}_{i, j, m, n} | E I_{i, j} - E I_{m, n} |}{{(2 E + 1)}^{2}} \end{matrix}

(8)

Figure 9 shows the origin image and the results of the

FAWF

with

E = 1, 10,

and 50. Through the comparison of different

FAWF

s, it can be found that the range of the frequency perception was changed when E was selected differently. When the range E was relatively small, the

FAWF

was very sensitive to the rapid pixel change in the edge and better provided a higher concentration for high-frequency information. When E was set relatively large, it acted as a broader frequency sensing effect, paying more attention to the wider scale information. The surrounding information was more displayed in the form of color blocks in

FAWF

when

E = 10

and 50. When the setting

E \to \infty

, the

FAWF

tended to a constant color area and represented an invalid feature without any key weighted information.

After obtaining the

FAWF

, the FAWL was calculated as follows:

\begin{matrix} M S E_{F} (O, T) & = λ \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} \frac{{(O_{i, j} - T_{i, j})}^{2} * {FAWF}_{i, j}}{W H} \\ F A W L (O, T) & = \frac{M S E (O, T) + M S E_{F} (O, T)}{2}, \end{matrix}

(9)

where

W, H

represent the size of output image

O

and target image

T

, respectively.

λ

represents the balance between

M S E (O, T)

and

M S E F (O, T)

, where

λ

was set to 10 to ensure that they were in the same order of magnitude and could be regarded as empirical parameters.

Finally, combined with the IEM, the loss function was set as follows:

\begin{matrix} l o s s = \frac{1}{\sum_{i}^{N_{I}} λ_{i}} \sum_{i}^{N_{I}} λ_{i} [2 f ({SO}_{i, N_{B}}) + f ({IRI}_{i}) + \sum_{j}^{N_{B}} f ({GEMO}_{i, j})], \end{matrix}

(10)

where

λ_{i}

was set to 10 when

i = N_{I}

and 5 otherwise to retain the feedback balance between each iteration and pay more attention to the last one.

f (\cdot)

represents

F A W L (\cdot, T)

, which was defined in Equation (9). The corresponding output of the GEM was expressed as

{GEMO}_{i, j}

.

The

F A W L

can be perfectly integrated with all loss functions based on the

M S E

. It is very suitable for the proposed reconstruction method because different concerns of high-frequency information can be introduced into different GEMs. The only disadvantage is that the early data preprocessing may require more preparation time compared with the direct use of

M S E

. However, the time will not be reflected in the process of inference. In Section 4.7, we evaluate the effectiveness of the loss.

4. Experiment

4.1. Datasets

The datasets for training and testing were prepared according to the experimental details in [18]. The BSDS500 [38] was applied as the training set. The 400 images in BSDS500 were cropped as small patches of 96 × 96 pixels with 57 strides. One patch was augmented into eight (i.e., the original image, flipped, rotation 90, rotation 90 plus flipped, rotation 180, rotation 180 plus flipped, rotation 270, and rotation 270 plus flipped). Finally, all 89,600 images were used as the training set. Set5 [39], Set14 [40], and BSD100 [41] were employed as the testing sets because they have been widely used in almost all similar tasks and works. The specific information of these datasets is listed in Table 2.

4.2. Index

To quantitatively evaluate the performance of different methods, the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) were employed. The PSNR is commonly used to quantify the reconstruction quality of images and videos. The PSNR was derived as:

\begin{matrix} MSE & = \frac{1}{W H} \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} {[O_{i, j} - T_{i, j}]}^{2} \\ PSNR & = 10 log [\frac{{MAX}^{2}}{MSE}], \end{matrix}

(11)

where

MAX

represents the range of pixel values and is calculated as 255 in the case of 8-bit images. W and H are the width and height of the images, respectively. In addition,

O

and

T

are the matrices of the output image and target label image, respectively.

SSIM [42], with the best possible value being 1, pays more attention to capturing the structural features of images. It reflects the similarity of the two images, and a larger SSIM indicates better performance [43]. The

SSIM

was derived as:

\begin{matrix} SSIM = \frac{(2 μ_{O} μ_{T} + c_{1}) (2 σ_{O T} + c_{2})}{(μ_{O}^{2} + μ_{T}^{2} + c_{1}) (σ_{O}^{2} + σ_{T}^{2} + c_{2})}, \end{matrix}

(12)

where

μ_{*}

,

σ_{*}

, and

σ_{O T}

are the mean of matrix ∗, the standard deviation of matrix ∗, and the covariance between matrix

O, T

, respectively.

c_{1}

and

c_{2}

are two constants to avoid dividing by 0 and were set as

{(0.01 * MAX)}^{2}

and

{(0.03 * MAX)}^{2}

, respectively.

When processing multichannel images, they are converted into

YCbCr

format, and then the

PSNR

and

SSIM

can be calculated on the

Y

channel as the result [18].

4.3. Settings

The settings of the training hyperparameters are detailed in Table 3. Due to the IEMs, the sampling matrix in our network structure was used

N_{I}

times in the iterative process. The parameters for the reconstruction process were updated with the change in the sampling matrix. Therefore, it is not recommended for the sampling matrix to fluctuate significantly in the iterative process. In this way, the learning rate of the sampling parameters (

L_{S A M}

) was set to be smaller than that of the reconstruction parameters (

L_{R E C}

). The construction and the adjustment process of the model were both implemented on the open-source framework Pytorch 1.6.0 with Python v3.7. All experiments were conducted on a CPU (Intel Xeon CPU E5-2678 v3 @ 2.50 GHz) and one GPU (GeForce RTX 1080 Ti).

Four traditional methods, including discrete wavelet transform (DWT) [44], total variation (TV) [23], multi-hypothesis (MH) [21], and sparse group representation (GSR) [45], were used for comparison in terms of the run time and the quantitative evaluation of the PSNR and SSIM. We also implemented four methods based on DL, including ReconNet [27], ISTANet++ [32], CSNET [18], and AMPNet [30], as the comparative baseline under the same software framework and hardware environment. The four DL works were reproduced according to their papers and tested by the unified testing sets introduced in Section 4.1. It should be noted that if the performance of the reproduced module was lower than the results provided in their papers, we took the specific performance index from the original works. In addition, ReconNet [27] needed to be cascaded with a BM3D denoiser. The BM3D took more than ten seconds for each 256 × 256 image in practical use. Therefore, in the testing of the DL methods, the qualitative and quantitative evaluations were both conducted to compare their performance more intuitively, without any auxiliary filter or denoiser cascaded.

There were two hyperparameter settings in our proposed model: the number of iterations

N_{I}

and the number of blocks

N_{B}

in each iteration, which had a decisive impact on the run time and performance of the whole network. As a representative, two parameter settings were employed to show the performance and applied to different situations. The first was

N_{I} = 2, N_{B} = 1

, and the second was

N_{I} = 6, N_{B} = 1

, which were represented as EiCSNet2*1 and EiCSNet6*1, respectively. EiCSNet2*1 was a smaller and faster choice, which could achieve the highest speed under the condition of excellent reconstruction performance. EiCSNet6*1 could achieve the best recovery performance at this stage, and its run time was also less than other SOTA methods.

4.4. Quantitative Evaluation

Table 4 lists the PSNR and SSIM performance tested on the three testing sets and seven different sampling rates (

1 %, 5 %, 10 %, 20 %, 30 %, 40 %,

and

50 %

). The proposed method achieved the best results in all datasets and sampling rates. Compared with the three SOTA methods (ISTANet++ [32], CSNET [18], and AMPNet [30]), our proposed compact model EiCSNet2*1 achieved similar or even slightly better results of

PSNR

and

SSIM

. Furthermore, it had the highest speed (refer to Section 4.6). The results of EiCSNet6*1 had a much better performance improvement, especially under the condition of a low compression rate, such as

r = 1 %

. The average

PSNR

was increased by nearly 0.4 dB in all datasets and ratios, which is very helpful for the image reconstruction quality and practical use of CS.

Furthermore, some other recent works were taken into consideration. Because these works used different testing sets or the performance in the paper was not successfully reproduced, the indicators of the

PSNR

and

SSIM

were directly extracted from the paper [46] for comparison, as shown in Table 5. The proposed method was trained and tested under the same conditions described in [46] and showed the best performance on each testing set and under every sampling rate.

The better reconstruction performance was due to our proposed method integrating more CS characteristics, the segmentation and simplification of the reconstruction task, and the multilevel enhancement. In this way, the proposed method had better performance.

4.5. Qualitative Evaluation

For the qualitative evaluation, three sampling ratios (

10 %

,

20 %

, and

30 %

) were selected for comparison. Images were randomly selected from each of the three datasets to fully demonstrate the intuitive performance of the reconstruction. To show the details of the results more clearly, we display the full images and enlarged parts simultaneously in Figure 10. In addition, the indicators of the whole images and detailed parts are calculated and listed. Clearer texture information and similar and sharper shapes were shown in the results of the proposed method. Our proposed model had fewer artifacts or blurred parts in the reconstruction images than the conventional counterparts. The comparison between different methods fully showed that our method had greater advantages in the processing of picture details.

4.6. Inference Speed

Images with a fixed size of

256 * 256

were employed to compare the run time of the different methods. The batch size B was set as 1 to carry out the actual run time of every single image. All images were cycled 15 times, and the last 10 times were recorded to obtain the average time for each image. The average run time of the different methods is shown in Table 6.

From Table 6, it can be seen that the traditional methods had severe shortcomings in speed, even when they were run on the CPU device. The run speed on the GPU of our method was much higher than all existing methods. The EiCSNet2*1 model achieved a similar performance and a nearly seven times speed improvement over the SOTA AMPNet (both shown in Table 4 and Table 6). In addition, our large model EiCSNet6*1 achieved the best PSNR and SSIM performance, ensuring that the speed was still higher than the SOTA methods. The speed improvement benefitted from the hardware-friendly structures, such as the IEM and FERB, which greatly improved the parallel efficiency and reduced the unnecessary memory access. In each iteration process, the completion and enhancement of the image increased the interpretability of the network and simplified the task difficulty of the network. Hence, it was possible to obtain good results with fewer auxiliary operations.

We also compared the detailed processes of sampling, upsampling, and reshaping between all DL methods on the GPU to show the efficiency of the image restoration. From Table 7 and Figure 11, it can be seen that the run time of our method was very stable under different sampling rates. Our method avoided the time loss caused by inefficient networks or auxiliary structures through our hardware-friendly designs.

4.7. Ablation Experiment

In the ablation study, we explored the effectiveness and settings of our proposed network structure to achieve the best performance. Furthermore, we also conducted ablation experiments on the FAWL and GEM to illustrate the improvements they brought to the task of CS recovery.

The Verification of the Module Settings: We calculated the average PSNR and SSIM from the three datasets and seven compression ratios under different network structure settings. They are all detailed in Table 8 and shown in Figure 12 to visualize the differences. The structure settings are identified in

N_{I} * N_{B}

format. We set up 10 models (

2 \times 1

,

3 \times 1

,

4 \times 1

,

5 \times 1

,

6 \times 1

,

2 \times 2

,

2 \times 3

,

3 \times 2

,

3 \times 3

, and

1 \times 5

) to explore better structure settings.

Because the nonlinear operations were almost concentrated in the block convolution operations, we compared the performance difference of the network structure with the same total number of blocks (

4 \times 1

,

2 \times 2

,

6 \times 1

,

3 \times 2

,

2 \times 3

,

5 \times 1

, and

1 \times 5

). It can be seen that, when the total number of convolutions was the same, the higher the iterative reinforcement was, the more effective the network was. Therefore, we found that it was more effective to distribute the convolution calculation more dispersed to each iteration, that is, setting

N_{B}

to 1 had better cost performance. Furthermore, it explained the operation of the DL network in making up for the errors in the step-by-step iterative process.

From the data of

2 \times 1

,

3 \times 1

,

4 \times 1

,

5 \times 1

, and

6 \times 1

, we found that with the increase in the

N_{I}

, the network performance improved. The experiment ended when

N_{I} = 6

. It was mainly considered that the continuous increase in the network would not greatly improve the performance. Furthermore, we found that EiCSNet6*1 also had more advantages in both speed and performance than the existing methods.

FAWL and GEM: To verify the effectiveness of the functions of each part, we tested the final performance of the two network models at the ratio of 0.01 in four different cases as follows:

Nothing: neither FAWL nor GEMs were set. Only MSE loss worked;
$W / O$ FAWL: no FAWL was set, but the GEMs played a part in the training process;
$W / O$ GEM: no GEM was added, but the FAWL was considered;
ALL: both FAWL and GEM acted with united strength.

The results of the PSNR and SSIM are tabulated and shown in Table 9. It can be seen clearly that with the supplementation of FAWL, the network performance improved in different models. Furthermore, with the addition of the GEM, the training of the network became relatively simple and stable to support the better convergence of this task. Additionally, when the network structure tended to be deeper, a larger improvement range was introduced by the GEM. In contrast, in a shallower network structure, there was only a small performance increase.

5. Conclusions

In this paper, EiNet was proposed for better and faster CS image reconstruction. The FERB based on the GEM reduced the additional auxiliary structures and improved the parallel efficiency with no performance degradation. The IEM combined the characteristics and requirements of CS, which made the whole network structure more efficient and compact and obtained a significant performance improvement. FAWL made the image reconstruction network more effective and robust, avoiding the blurred performance in high-frequency information. Our method not only achieved better reconstruction performance but also was nearly seven times faster than other SOTA methods during the inference process. There is a strong potential to run the model on a mobile terminal, which may be valuable to future CS image restoration.

Author Contributions

Methodology, Z.Z.; software, Z.Z.; investigation, F.L.; data curation, Z.W.; writing—original draft preparation, Z.Z.; writing—review and editing, H.S.; visualization, F.L.; supervision, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CS	Compressed Sensing
DNN	Deep Neural Networks
CNN	Convolutional Neural Network
EiCSNet	Efficient Iterative Neural Network for CS Reconstruction
DL	Deep Learning
GPU	Graphics Processing Unit
MRI	Magnetic Resonance Imaging
PSNR	Peak Signal-to-Noise Ratio
SSIM	Structural Similarity
SOTA	State-Of-The-Art
PL	Projected Landweber
MP	Matching Pursuit
AMP	Approximate Message Passing
OMP	Orthogonal MP
TV	Total Variation
GSR	Group Sparse Representation
MH	Multi-Hypothesis
DWT	Discrete Wavelet Transform
IEM	Iterative Enhancement Module
FERB	Fast Elementary Reconstruction Block
GEM	Gradient-Extraction Module
FAWL	Frequency-Aware Weighted Loss
ResNet	Residual Network
MB	Main Branch of ResNet
RB	Residual-link Branch of ResNet
r	Sampling ratio
$X$	Input signal or the real image input
$Φ$	Sampling matrix
$Y$	CS measurement (sampled data)
$Ψ$	Sparse domain
$f_{θ} (*)$	Mapping to solve underdetermined equations with learnable parameters $θ$
$EEI$	Error Enhancement Image
$IRI$	Initial Reconstruction Image
$N_{I}$	Number of iterations
$N_{B}$	Number of FERBs
$SD$	Sampled Data
$SC$	Sampling matrix realized by convolution $32 \times 32 \times 1 \times (1024 * r)$
$UC$	Upsampling matrix realized by convolution $1 \times 1 \times (1024 * r) \times (1024)$
$SO$	Stage Output
$SSD$	Stage Sampled Data
$EI$	Extended Image
E	Expansion length
$CM$	Frequency-Aware Coefficient Mask
$FAWF$	Frequency-Aware Weighted Feature
$GEMO$	Corresponding output of GEM

References

Shannon, C. Communication in the Presence of Noise. Proc. Inst. Radio Eng. 1949, 37, 10–21. [Google Scholar] [CrossRef]
Donoho, D. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
Arjoune, Y.; Kaabouch, N.; El Ghazi, H.; Tamtaoui, A. A performance comparison of measurement matrices in compressive sensing. Int. J. Commun. Syst. 2018, 31, e3576. [Google Scholar] [CrossRef]
Ye, D.; Ni, Z.; Wang, H.; Zhang, J.; Wang, S.; Kwong, S. CSformer: Bridging Convolution and Transformer for Compressive Sensing. arXiv 2021, arXiv:2112.15299. [Google Scholar]
Lustig, M.; Donoho, D.L.; Santos, J.M.; Pauly, J.M. Compressed sensing MRI. IEEE Signal Process Mag. 2008, 25, 72–82. [Google Scholar] [CrossRef]
Caiafa, C.F.; Cichocki, A. Multidimensional compressed sensing and their applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2013, 3, 355–380. [Google Scholar] [CrossRef]
Choi, J.W.; Shim, B.; Ding, Y.; Rao, B.; Kim, D.I. Compressed sensing for wireless communications: Useful tips and tricks. IEEE Commun. Surv. Tutor. 2017, 19, 1527–1550. [Google Scholar] [CrossRef] [Green Version]
Korde-Patel, A.; Barry, R.K.; Mohsenin, T. Compressive Sensing Based Space Flight Instrument Constellation for Measuring Gravitational Microlensing Parallax. Signals 2022, 3, 559–576. [Google Scholar] [CrossRef]
Gerstoft, P.; Mecklenbräuker, C.F.; Seong, W.; Bianco, M. Introduction to special issue on compressive sensing in acoustics. J. Acoust. Soc. Am. 2018, 143, 3731–3736. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Li, M.; Pados, D.A. Motion-Aware Decoding of Compressed-Sensed Video. IEEE Trans. Circuits Syst. Video Technol. 2013, 23, 438–444. [Google Scholar] [CrossRef]
Azghani, M.; Karimi, M.; Marvasti, F. Multihypothesis Compressed Video Sensing Technique. IEEE Trans. Circuits Syst. Video Technol. 2016, 26, 627–635. [Google Scholar] [CrossRef]
Shi, W.; Liu, S.; Jiang, F.; Zhao, D. Video Compressed Sensing Using a Convolutional Neural Network. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 425–438. [Google Scholar] [CrossRef]
Chen, S.S.; Donoho, D.L.; Saunders, M.A. Atomic decomposition by basis pursuit. SIAM Rev. 2001, 43, 129–159. [Google Scholar] [CrossRef] [Green Version]
Mallat, S.G.; Zhang, Z. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 1993, 41, 3397–3415. [Google Scholar] [CrossRef] [Green Version]
Bertero, M.; Boccacci, P.; De Mol, C. Introduction to Inverse Problems in Imaging; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
Tropp, J.A.; Gilbert, A.C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef] [Green Version]
Donoho, D.L.; Tsaig, Y.; Drori, I.; Starck, J.L. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inf. Theory 2012, 58, 1094–1121. [Google Scholar] [CrossRef]
Shi, W.; Jiang, F.; Liu, S.; Zhao, D. Image Compressed Sensing Using Convolutional Neural Network. IEEE Trans. Image Process. 2020, 29, 375–388. [Google Scholar] [CrossRef]
Mun, S.; Fowler, J.E. Residual reconstruction for block-based compressed sensing of video. In Proceedings of the 2011 Data Compression Conference, Snowbird, UT, USA, 29–31 March 2011; pp. 183–192. [Google Scholar]
Haupt, J.; Nowak, R. Signal reconstruction from noisy random projections. IEEE Trans. Inf. Theory 2006, 52, 4036–4048. [Google Scholar] [CrossRef]
Chen, C.; Tramel, E.W.; Fowler, J.E. Compressed-sensing recovery of images and video using multihypothesis predictions. In Proceedings of the IEEE 2012 46th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 4–7 November 2012; pp. 1193–1198. [Google Scholar]
Gan, L. Block compressed sensing of natural images. In Proceedings of the International Conference on Digital Signal Processing, Cardiff, UK, 1–4 July 2007; pp. 403–406. [Google Scholar]
Chengbo Li, W.Y.; Zhang, Y. TVAL3: TV Minimization by Augmented Lagrangian and Alternating Direction Agorithm 2009. Available online: https://www.caam.rice.edu/optimization/L1/TVAL3/ (accessed on 1 January 2013).
Saba, T.; Rehman, A.; Haseeb, K.; Bahaj, S.A.; Jeon, G. Energy-Efficient Edge Optimization Embedded System Using Graph Theory with 2-Tiered Security. Electronics 2022, 11, 2942. [Google Scholar] [CrossRef]
Wang, R.; Qin, Y.; Wang, Z.; Zheng, H. Group-Based Sparse Representation for Compressed Sensing Image Reconstruction with Joint Regularization. Electronics 2022, 11, 182. [Google Scholar] [CrossRef]
Tian, X.; Wei, G.; Wang, J. Target Location Method Based on Compressed Sensing in Hidden Semi Markov Model. Electronics 2022, 11, 1715. [Google Scholar] [CrossRef]
Kulkarni, K.; Lohit, S.; Turaga, P.; Kerviche, R.; Ashok, A. ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Measurements. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 449–458. [Google Scholar]
Mousavi, A.; Patel, A.B.; Baraniuk, R.G. A deep learning approach to structured signal recovery. In Proceedings of the 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 29 September–2 October 2015; pp. 1336–1343. [Google Scholar]
Zhang, J.; Ghanem, B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1828–1837. [Google Scholar]
Zhang, Z.; Liu, Y.; Liu, J.; Wen, F.; Zhu, C. AMP-Net: Denoising-Based Deep Unfolding for Compressive Image Sensing. IEEE Trans. Image Process. 2021, 30, 1487–1500. [Google Scholar] [CrossRef]
Zheng, B.; Zhang, J.; Sun, G.; Ren, X. EnGe-CSNet: A Trainable Image Compressed Sensing Model Based on Variational Encoder and Generative Networks. Electronics 2021, 10, 89. [Google Scholar] [CrossRef]
You, D.; Xie, J.; Zhang, J. ISTA-NET⁺⁺: Flexible Deep Unfolding Network for Compressive Sensing. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
Li, N.; Zhou, C.C. AMPA-Net: Optimization-Inspired Attention Neural Network for Deep Compressed Sensing. In Proceedings of the 2020 IEEE 20th International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020; pp. 1338–1344. [Google Scholar]
Sultana, F.; Sufian, A.; Dutta, P. A review of object detection models based on convolutional neural network. In Intelligent Computing: Image Processing Based Applications. Advances in Intelligent Systems and Computing; Springer: Singapore, 2020; pp. 1–16. [Google Scholar]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Lan, Z. Applications of BP, Convolutional and RBF Networks. In Proceedings of the 2021 2nd International Conference on Computing and Data Science (CDS), Stanford, CA, USA, 28–29 January 2021; pp. 543–547. [Google Scholar] [CrossRef]
Liu, M.; Chen, L.; Du, X.; Jin, L.; Shang, M. Activated Gradients for Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–13. [Google Scholar] [CrossRef]
Arbeláez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour Detection and Hierarchical Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 898–916. [Google Scholar] [CrossRef] [Green Version]
Bevilacqua, M.; Roumy, A.; Guillemot, C.; line Alberi Morel, M. Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC), Surrey, UK, 7–10 September 2012; pp. 135.1–135.10. [Google Scholar] [CrossRef] [Green Version]
Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the International Conference on Curves and Surfaces, Avignon, France, 24–30 June 2010; pp. 711–730. [Google Scholar]
Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
Wang, M.; Wei, S.; Liang, J.; Zhou, Z.; Qu, Q.; Shi, J.; Zhang, X. TPSSI-Net: Fast and Enhanced Two-Path Iterative Network for 3D SAR Sparse Imaging. IEEE Trans. Image Process. 2021, 30, 7317–7332. [Google Scholar] [CrossRef]
Mun, S.; Fowler, J.E. Block compressed sensing of images using directional transforms. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 3021–3024. [Google Scholar] [CrossRef]
Zhang, J.; Zhao, D.; Gao, W. Group-Based Sparse Representation for Image Restoration. IEEE Trans. Image Process. 2014, 23, 3336–3351. [Google Scholar] [CrossRef] [Green Version]
Song, J.; Chen, B.; Zhang, J. Memory-Augmented Deep Unfolding Network for Compressive Sensing. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual event, China, 20–24 October 2021; pp. 4249–4258. [Google Scholar] [CrossRef]
Zhou, S.; He, Y.; Liu, Y.; Li, C.; Zhang, J. Multi-Channel Deep Networks for Block-Based Image Compressive Sensing. IEEE Trans. Multimedia 2021, 23, 2627–2640. [Google Scholar] [CrossRef]
Zhang, J.; Zhao, C.; Gao, W. Optimization-Inspired Compact Deep Compressive Sensing. IEEE J. Sel. Top. Sign. Proces. 2020, 14, 765–774. [Google Scholar] [CrossRef]

Figure 1. The overview of the proposed method. The real image input

X

is marked by a solid black wireframe. The error enhancement images

EEI

and initial reconstruction images

IRI

represent the intermediate variables and the outputs of

N_{I}

Iterative Enhancement Modules (IEMs, introduced in Section 3.2), which are marked as blue solid and red dashed boxes, respectively. Each iterative recovery contains

N_{B}

Fast Elementary Reconstruction Blocks (FERBs, introduced in Section 3.1) to accomplish the nonlinear problem solution of image restoration. The outputs of the FERBs are marked with green boxes. Two kinds of CNN blocks marked by blue and green represent two kinds of one-layer CNN. They are employed to adjust the I/O channels to adapt to the FERBs and IEMs.

Figure 1. The overview of the proposed method. The real image input

X

is marked by a solid black wireframe. The error enhancement images

EEI

and initial reconstruction images

IRI

represent the intermediate variables and the outputs of

N_{I}

Iterative Enhancement Modules (IEMs, introduced in Section 3.2), which are marked as blue solid and red dashed boxes, respectively. Each iterative recovery contains

N_{B}

Fast Elementary Reconstruction Blocks (FERBs, introduced in Section 3.1) to accomplish the nonlinear problem solution of image restoration. The outputs of the FERBs are marked with green boxes. Two kinds of CNN blocks marked by blue and green represent two kinds of one-layer CNN. They are employed to adjust the I/O channels to adapt to the FERBs and IEMs.

Figure 2. The flow chart of the basic block of ResNet and common CNN options. (a) The structure of ResNet, (b) the variant of ResNet applied when the I/O channels are different, and (c) the basic CNN options without RB.

Figure 3. The flow chart of GEM.

Figure 4. The pipeline of FERB. (a) Testing pipeline. (b) Training pipeline.

Figure 5. The flow chart of IEM.

Figure 6. The understructure of the upsampling and reshaping module. The first stage is to stretch the dimension of

SD

into the same dimension as

X

to obtain the

IRI

through convolution, and the second stage is the hardware-friendly reshaping based on the pixel shuffle.

Figure 6. The understructure of the upsampling and reshaping module. The first stage is to stretch the dimension of

SD

into the same dimension as

X

to obtain the

IRI

through convolution, and the second stage is the hardware-friendly reshaping based on the pixel shuffle.

Figure 7. The original image and the extended image. The processed images are shown on the right. Black arrows represent the symmetric mapping.

Figure 8. The frequency-away mask.

Figure 9. The result of the frequency-away weight feature when

E = 1, 10,

and 50.

Figure 9. The result of the frequency-away weight feature when

E = 1, 10,

and 50.

Figure 10. The results of the qualitative evaluation. (a) Ground truth, (b) ReconNet, (c) ISTANet++, (d) CSNET, (e) AMPNet, (f) EiCSNet2*1, and (g) EiCSNet6*1. The three ratios of 0.1, 0.2, and 0.3 and one image from each of the three datasets are selected for comparison. Parts of the whole image are marked with red boxes. The marked parts are shown in an enlarged view below the corresponding images. The indicators of the complete and enlarged images are calculated and listed.

Figure 11. The time consumption of the sampling, upsampling, and reshaping of the different methods. All images were processed with

B = 1

.

Figure 11. The time consumption of the sampling, upsampling, and reshaping of the different methods. All images were processed with

B = 1

.

Figure 12. The average PSNR and SSIM for the verification of the best model settings. The average values are calculated from the three datasets and seven ratios. The best choice of

6 \times 1

is marked with the red dotted line.

Figure 12. The average PSNR and SSIM for the verification of the best model settings. The average values are calculated from the three datasets and seven ratios. The best choice of

6 \times 1

is marked with the red dotted line.

Table 1. The comparison of three basic structures. Each structure is stacked ten times. The total times of the three structures are listed, which are tested for 1000 rounds under various inputs and outputs of dimensions

W * H * C

.

Table 1. The comparison of three basic structures. Each structure is stacked ten times. The total times of the three structures are listed, which are tested for 1000 rounds under various inputs and outputs of dimensions

W * H * C

.

	Figure 2a	Figure 2b	Figure 2c
$W * H * C$	Figure 2a	Figure 2b	Figure 2c
$4 \times 4 \times 512$	3.32	3.82	2.92
$8 \times 8 \times 256$	2.92	3.75	2.39
$16 \times 16 \times 128$	2.40	3.18	1.87
$32 \times 32 \times 64$	2.36	3.13	1.85
$64 \times 64 \times 32$	2.42	3.23	1.94
$128 \times 128 \times 16$	2.35	3.12	1.86

Table 2. Summary of the datasets.

Dataset	Number	Comments
BSDS500	500	400 for training
Set5	5	5 for testing, unfixed size
Set14	14	14 for testing, unfixed size
BSD100	100	100 for testing, fixed size

Table 3. Summary of training hyperparameter settings.

Parameters	Value
Batch size B	64
Learning rate for REC $L_{R E C}$	0.0001
Learning rate for SAM $L_{S A M}$	0.00001
Epoch E	300

Table 4. The performance results from the different methods. All methods were tested with the three testing datasets and the seven sampling ratios. The indicators are shown in

PSNR / SSIM

format. The best indicators are marked in bold for each column.

Table 4. The performance results from the different methods. All methods were tested with the three testing datasets and the seven sampling ratios. The indicators are shown in

PSNR / SSIM

format. The best indicators are marked in bold for each column.

Method	Ratio	SET5	SET14	BSD100	Method	Ratio	SET5	SET14	BSD100
DWT	0.01	9.27/0.1402	8.97/0.0989	9.63/0.1067	ISPA Net++	0.01	21.47/0.5918	20.69/0.5171	21.62/0.5089
	0.05	14.27/0.3559	14.52/0.2933	14.81/0.2935		0.05	27.24/0.7933	25.41/0.6827	25.17/0.6420
	0.1	24.74/0.7680	24.16/0.6798	23.46/0.6343		0.1	30.71/0.8713	28.14/0.7725	27.15/0.7274
	0.2	30.83/0.8749	28.13/0.7882	27.26/0.7516		0.2	34.43/0.9246	31.30/0.8572	29.74/0.8215
	0.3	33.61/0.9050	30.38/0.8389	29.23/0.8108		0.3	36.77/0.9469	33.61/0.9013	31.72/0.8763
	0.4	35.32/0.9249	31.99/0.8753	30.72/0.8524		0.4	38.62/0.9609	35.52/0.9292	33.49/0.9124
	0.5	36.87/0.9409	33.54/0.9044	32.17/0.8862		0.5	40.32/0.9707	37.19/0.9485	35.18/0.9381
	Avg.	26.42/0.7014	24.53/0.6398	23.90/0.6194		Avg.	32.80/0.8656	30.27/0.8012	29.15/0.7752
TV	0.01	15.53/0.4554	15.26/0.3890	15.98/0.3995	CSNET	0.01	24.18/0.6478	22.83/0.5630	23.76/0.5484
	0.05	23.16/0.6678	22.24/0.5815	23.05/0.5690		0.05	29.74/0.8485	26.93/0.7331	26.78/0.6976
	0.1	27.07/0.7865	25.24/0.6887	25.46/0.6612		0.1	32.59/0.9062	29.13/0.8169	28.53/0.7834
	0.2	30.45/0.8709	28.07/0.7844	27.58/0.7557		0.2	36.05/0.9481	32.15/0.8941	31.05/0.8721
	0.3	32.75/0.9107	30.12/0.8424	29.27/0.8191		0.3	38.25/0.9644	34.34/0.9297	33.08/0.9171
	0.4	34.89/0.9363	32.03/0.8837	30.86/0.8660		0.4	40.11/0.9740	36.16/0.9502	34.91/0.9443
	0.5	36.75/0.9540	33.84/0.9148	32.46/0.9019		0.5	41.79/0.9803	37.89/0.9631	36.68/0.9618
	Avg.	28.66/0.7974	26.69/0.7264	26.38/0.7103		Avg.	34.67/0.8956	31.35/0.8357	30.68/0.8178
MH	0.01	18.08/0.4472	17.23/0.4218	18.21/0.4076	AMP Net	0.01	23.48/0.6103	22.77/0.5502	23.58/0.5367
	0.05	23.67/0.6566	21.64/0.6528	21.36/0.5169		0.05	29.80/0.8443	27.19/0.7336	26.81/0.6973
	0.1	28.57/0.8211	26.38/0.7433	25.16/0.6673		0.1	33.28/0.9096	29.88/0.8247	28.78/0.7861
	0.2	32.08/0.8881	29.47/0.8278	28.09/0.7746		0.2	36.57/0.9466	32.84/0.8960	31.31/0.8714
	0.3	34.06/0.9158	31.37/0.8732	29.85/0.8307		0.3	38.89/0.9633	35.23/0.9364	33.61/0.9186
	0.4	35.65/0.9337	33.03/0.9084	31.35/0.8695		0.4	41.05/0.9732	37.25/0.9521	35.53/0.9453
	0.5	37.21/0.9482	34.52/0.9314	32.86/0.9012		0.5	42.72/0.9818	39.01/0.9648	37.37/0.9627
	Avg.	29.90/0.8015	27.66/0.7655	26.70/0.7097		Avg.	35.11/0.8899	32.02/0.8368	30.99/0.8169
GSR	0.01	18.87/0.4909	17.87/0.4337	18.90/0.4431	EiCSNet 2*1	0.01	24.41/0.6513	23.14/0.5689	23.86/0.5504
	0.05	24.95/0.7270	22.54/0.6140	22.16/0.5682		0.05	30.00/0.8495	27.22/0.7364	26.87/0.7003
	0.1	29.99/0.8654	27.50/0.7705	25.91/0.7071		0.1	33.13/0.9094	29.63/0.8234	28.72/0.7871
	0.2	34.17/0.9257	31.22/0.8642	29.18/0.8156		0.2	36.46/0.9478	32.60/0.8976	31.28/0.8747
	0.3	36.83/0.9492	33.74/0.9071	31.33/0.8723		0.3	38.81/0.9645	34.85/0.9326	33.40/0.9200
	0.4	38.81/0.9626	35.78/0.9336	33.20/0.9096		0.4	40.83/0.9748	36.79/0.9534	35.42/0.9480
	0.5	40.65/0.9724	37.66/0.9522	34.94/0.9359		0.5	42.75/0.9818	38.71/0.9670	37.41/0.9661
	Avg.	32.04/0.8419	29.47/0.7822	27.95/0.7503		Avg.	35.20/0.8970	31.85/0.8399	31.00/0.8209
Recon Net	0.01	20.60/0.5107	20.06/0.4557	21.10/0.4609	EiCSNet 6*1	0.01	24.61/0.6683	23.34/0.5796	23.97/0.5560
	0.05	24.92/0.6608	23.40/0.5768	23.74/0.5587		0.05	30.48/0.8608	27.61/0.7472	27.05/0.7076
	0.1	26.68/0.7294	24.74/0.6380	24.83/0.6147		0.1	33.62/0.9156	30.06/0.8310	28.96/0.7931
	0.2	28.55/0.7944	26.10/0.6988	25.93/0.6693		0.2	37.11/0.9511	33.21/0.9027	31.63/0.8784
	0.3	30.44/0.8465	27.74/0.7603	27.16/0.7262		0.3	39.40/0.9663	35.37/0.9361	33.81/0.9225
	0.4	32.95/0.8985	29.92/0.8347	29.06/0.8061		0.4	41.33/0.9757	37.29/0.9561	35.82/0.9496
	0.5	33.77/0.9094	30.54/0.8519	29.61/0.8255		0.5	43.31/0.9824	39.20/0.9686	37.83/0.9672
	Avg.	28.27/0.7642	26.07/0.6880	25.92/0.6659		Avg.	35.69/0.9029	32.29/0.8459	31.30/0.8249

Table 5. The performance results of the different methods under the testing datasets SET11 [27] and BSD68 [41]. The indicators are shown in

PSNR / SSIM

format. The best indicators are marked in bold for each column.

Table 5. The performance results of the different methods under the testing datasets SET11 [27] and BSD68 [41]. The indicators are shown in

PSNR / SSIM

format. The best indicators are marked in bold for each column.

Method	Ratio	SET11 [27]	BSD68 [41]
BCS-Net [47]	0.1	29.42/0.8673	27.98/0.8015
	0.3	35.63/0.9495	32.70/0.9301
	0.4	36.68/0.9667	35.14/0.9397
	0.5	39.58/0.9734	36.85/0.9682
	Avg.	35.33/0.9392	33.17/0.9099
OPINE-NET+ [48]	0.1	29.81/0.8904	27.82/0.8045
	0.3	35.79/0.9541	32.35/0.9215
	0.4	37.96/0.9633	34.95/0.9261
	0.5	40.19/0.9800	36.35/0.9660
	Avg.	35.97/0.9469	32.87/0.9045
MADUN [46]	0.1	29.91/0.8986	28.15/0.8229
	0.3	36.94/0.9676	33.35/0.9379
	0.4	39.15/0.9772	35.42/0.9606
	0.5	40.77/0.9832	37.11/0.9730
	Avg.	36.69/0.9567	33.50/0.9236
EiCSNet 2*1	0.1	30.42/0.9177	28.96/0.8517
	0.3	36.46/0.9721	33.69/0.9461
	0.4	38.78/0.9817	35.68/0.9650
	0.5	40.94/0.9879	37.68/0.9773
	Avg.	36.65/0.9648	34.00/0.9350
EiCSNet 6*1	0.1	30.95/0.9240	29.20/0.8560
	0.3	37.19/0.9750	34.09/0.9480
	0.4	39.50/0.9833	36.12/0.9665
	0.5	41.67/0.9886	38.11/0.9782
	Avg.	37.33/0.9677	34.38/0.9372

Note: The indicator values of BCS-Net, OPINE-NET+, and MADUN were extracted from paper [46]. Because the PSNR/SSIM values under the sampling ratios of 0.01, 0.05, and 0.2 were not provided, the other results were compared when the proposed method was trained and tested on the same dataset as in the paper [46].

Table 6. The run time of the different methods.

Methods	Ratio 0.01	Ratio 0.1
DWT	10.3176/-	10.5539/-
TV	2.4006/-	2.7405/-
MH	23.1006/-	19.0405/-
GSR	235.6297/-	230.4755/-
ReconNet	0.5193/0.0244	0.5258/0.0289
ISTA-Net++	0.5550/0.0356	0.5785/0.0377
CSNET	0.8960/0.0262	0.9024/0.0287
AMPNET	0.5440/0.0361	0.5440/0.0396
EiCSNet2*1	0.1737/0.0052	0.1742/0.0054
EiCSNet6*1	0.4970/0.0141	0.4915/0.0141

Note: The run time is shown in the format of CPU(s)/GPU(s). All images were processed with B = 1.

Table 7. The time consumption of the sampling, upsampling, and reshaping of different methods. All images were processed with

B = 1

.

Table 7. The time consumption of the sampling, upsampling, and reshaping of different methods. All images were processed with

B = 1

.

Method		Ratio 0.01	Ratio 0.1	Method		Ratio 0.01	Ratio 0.1
Recon Net	sampling	-	-	AMP Net	sampling	0.00085	0.00094
	upsampling	0.00075	0.00076		upsampling	0.01142	0.01126
	reshaping	0.00330	0.00377		reshaping	0.00619	0.00697
	All	0.00405	0.00453		All	0.01846	0.01917
ISTA- Net++	sampling	0.00392	0.00426	EiCSNet 2*1	sampling	0.00039	0.00036
	upsampling	0.00140	0.00178		upsampling	0.00025	0.00026
	reshaping	0.00511	0.00581		reshaping	0.00018	0.00019
	All	0.01043	0.01185		All	0.00082	0.00081
CS Net	sampling	0.00031	0.00026	EiCSNet 6*1	sampling	0.00096	0.00087
	upsampling	0.00016	0.00016		upsampling	0.00060	0.00066
	reshaping	0.00530	0.00551		reshaping	0.00045	0.00050
	All	0.00577	0.00593		All	0.00201	0.00203

Note: The run time is shown in the format of CPU(s)/GPU(s). All images were processed with B = 1.

Table 8. The average PSNR and SSIM for verification of the best model settings. Each PSNR and SSIM is the average of the three datasets and seven ratios.

Settings	PSNR	SSIM	Settings	PSNR	SSIM
$2 \times 1$	32.6808	0.8526	$2 \times 2$	32.7042	0.8535
$3 \times 1$	32.8703	0.8550	$2 \times 3$	32.7256	0.8541
$4 \times 1$	33.0155	0.8565	$3 \times 2$	32.9374	0.8561
$5 \times 1$	33.0381	0.8572	$3 \times 3$	32.9642	0.8566
$6 \times 1$	33.0947	0.8579	$1 \times 5$	32.2708	0.8487

Table 9. The average PSNR and SSIM for the ablation experiment. Each PSNR and SSIM was averaged over the three datasets.

Settings	$2 \times 1$		$6 \times 1$
Settings	PSNR	SSIM	PSNR	SSIM
Nothing	23.65	0.5839	23.79	0.5906
W/O F	23.75	0.5867	23.92	0.5957
W/O GE	23.72	0.5837	23.86	0.5917
ALL	23.80	0.5902	23.97	0.6013

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, Z.; Wang, Z.; Liu, F.; Shen, H. EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction. Electronics 2023, 12, 30. https://doi.org/10.3390/electronics12010030

AMA Style

Zhou Z, Wang Z, Liu F, Shen H. EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction. Electronics. 2023; 12(1):30. https://doi.org/10.3390/electronics12010030

Chicago/Turabian Style

Zhou, Ziqun, Zeyu Wang, Fengyin Liu, and Haibin Shen. 2023. "EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction" Electronics 12, no. 1: 30. https://doi.org/10.3390/electronics12010030

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EiCSNet: Efficient Iterative Neural Network for Compressed Sensing Reconstruction

Abstract

1. Introduction

2. Related Work

2.1. Compressed Sensing

2.2. Traditional Method

2.3. Deep Learning Methods

3. Methods

3.1. Hardware-Friendly GEM and FERB

3.2. Iterative Enhancement Module

3.3. Frequency-Aware Weighted Loss

4. Experiment

4.1. Datasets

4.2. Index

4.3. Settings

4.4. Quantitative Evaluation

4.5. Qualitative Evaluation

4.6. Inference Speed

4.7. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI