Simultaneous Audio Encryption and Compression Using Compressive Sensing Techniques

Moreno-Alvarado, Rodolfo; Rivera-Jaramillo, Eduardo; Nakano, Mariko; Perez-Meana, Hector

doi:10.3390/electronics9050863

Open AccessArticle

Simultaneous Audio Encryption and Compression Using Compressive Sensing Techniques

¹

División de Ingeniería y Ciencias Exactas, Universidad Anahuac Mayab, Merida Yucatan 97302, Mexico

²

Mechanical and Electrical Engineering School, Culhuacan Campus, Instituto Politécnico Nacional, Mexico City 04440, Mexico

^*

Author to whom correspondence should be addressed.

Electronics 2020, 9(5), 863; https://doi.org/10.3390/electronics9050863

Submission received: 6 April 2020 / Revised: 14 May 2020 / Accepted: 15 May 2020 / Published: 22 May 2020

(This article belongs to the Section Circuit and Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

The development of coding schemes with the capacity to simultaneously encrypt and compress audio signals is a subject of active research because of the increasing necessity for transmitting sensitive audio information over insecure communication channels. Thus, several schemes have been developed; firstly, some of them compress the digital information and subsequently encrypt the resulting information. These schemas efficiently compress and encrypt the information. However, they may compromise the information as it can be accessed before encryption. To overcome this problem, a compressing sensing-based system to simultaneously compress and encrypt audio signals is proposed in which the audio signal is segmented in frames of 1024 samples and transformed into a sparse frame using the discrete cosine transform (DCT). Each frame is then multiplied by a different sensing matrix generated using the chaotic mixing scheme. This fact allows that the proposed scheme satisfies the extended Wyner secrecy (EWS) criterion. The evaluation results obtained using several genres of audio signals show that the proposed system allows to simultaneously compress and encrypt audio signals, satisfying the EWS criterion.

Keywords:

extended Wyner secrecy (EWS); compressive sensing (CS); M-sequence; chaotic mixing; Pearson correlation coefficient; UACI; NSCR

1. Introduction

The large amount of digital information transmitted over unsecure channels has led to the necessity of developing efficient schemes for increasing the amount of information transmitted over the existing unsecure communication channels, as well as improving the security of the transmitted information. Thus, to meet these two requirements, many efforts have been undertaken that intend to develop encoding schemes able to simultaneously compress and encrypt audio signals, before their transmission over unsecure communication channels [1,2]. These topics have attracted the attention of a significant number of researchers, consequently leading to the development of several efficient schemes, which firstly compress and subsequently encrypt the compressed information. These schemes intuitively simplify the encryption task because the redundant information has been eliminated during the compression operation. However, because the compressed information is stored before encryption, its security may be compromised because it can be accessed before performing the encryption task. To overcome this problem, several schemes have been proposed in which the information is firstly encrypted and then the resulting information is compressed [1]. The main disadvantage of such schemes is the fact that a lossless compression scheme must be used to avoid the encrypted information being destroyed. A suitable approach to reduce these limitations is the development of algorithms allowing the simultaneous encryption and compression audio signals, such as those based on compressive sensing [3,4], which is a suitable scheme for encryption of digital information [5].

Because of the growing number of practical applications, compressive sensing has attracted the attention of a large number of researchers working in fields such as audio, image, and video processing [6,7]. As a result, several algorithms able to simultaneously encrypt and compress digital information, based on compressive sensing techniques, have been proposed during the last years [4,8,9], because these schemes have the capacity to meet these requirements simultaneously, using simple matrix operations. In encryption systems based on compressive sensing, the encoding signal is estimated transforming the input frame into a sparse one, using a discrete cosine transform (DCT). The transformed frame is then multiplied by a sensing matrix whose row number is much smaller than its columns; such that when a compressive sensing (CS) approach is used, the audio signal can be simultaneously compressed and encrypted. Thus, using CS, an encrypted signal is also obtained, because to properly decode the encoded signal, the sensing matrix used for decoding must be the same as that used in the encoding stage [5]. Thus, the sensing matrix can be considered as a private key of the CS-based encryption-compression system [4,8,9].

The CS-based joint compression and encryption system has several advantages. The decoding is carried out using only standard matrix operation, and thus it generally has lower computational complexity, compared with other previously proposed systems [4,8,9]. Because the signals to be encoded must be firstly segmented and transformed into sparse signals before applying the CS, each audio segment can be independently encoded and sent in any order to the receiver side. This is an important advantage of the CS-based system and other block-based encryption schemes, used to jointly compress and encrypt any kind of audio signals. However, several drawbacks must be solved to develop trustworthy CS-based audio compression and encryption systems. Firstly, to properly recover the original signal, the sensing matrixes used in the transmission and reception stages must be the same [5]; it is necessary to have a mechanism to allow the generation of the same sensing matrix in both the encryption and des-encryption stages. Second, because both the encryption and des-encryption stages use only linear operations, the security of the CS-based encryption system must be ensured [3,10,11,12].

Taking in account the requirements described above, this paper proposes a compression-encryption system based on CS, in which the audio signal is firstly segmented into L non overlapping frames of 1024 samples. Each frame is then independently compressed and encrypted using a different sensing matrix for each frame, which is generated using three secret keys provided by the user. These secret keys are transmitted to the receiver side before sending the digital information, encrypted with a public key cryptography algorithm. The rest of this paper is organized as follows. Section 2 presents the development of CD-based proposed system, together with a review of the compressive sensing and an analysis of the security of proposed system. Section 3 provides a detailed evaluation of proposed system and, finally, Section 4 contains the conclusions of this research.

2. Proposed System for Simultaneously Compression and Encryption of Audio Signals

This section provides a description of proposed encoding and decoding system, shown in Figure 1, that allows the simultaneous encryption and compression of audio signals. In the proposed structure, the incoming audio signal X(k) is segmented in frames of “n” samples, which are encrypted with a compression rate of n/m, where m is the number of samples in the compressed frame, using a CS approach. To this end, firstly, the user inserts the values n and m into the encoder stage together with three secret keys k₁, k₂, and k₃ provided by the user, which are then used to estimate the sensing matrix in the transmission stage. Because these secret keys are also required for estimation of the sensing matrix in the decoder stage, they are transmitted to the receiver side encrypted using the Rivest, Shamir, and Adleman (RSA) public key algorithm. This allows that the proposed system operates in a multiuser form and even with different frame sizes and compression rates for each possible user. In the receiver side, using the secret keys k₁, k₂, and k₃, the sensing matrix required for decoding is generated.

After matrix A is generated, the sensing matrix A₀, of size n × m, used for compressing and encrypting the first block of audio signal, is estimated using the chaotic mixing approach described in Section 2.1. Next, the first block of the input signal given by X₀(k) = X(k), k = 1, 2, …, n is extracted, which is transformed using the (DCT) to estimate a sparse representation of such block, S_i. Then, S₁ is multiplied by the sensing matrix A₀ to generate the compressed and encrypted version of the first block of input signal

y_{1}

, which is transmitted to the received side. In general, the input signal X(k) is segmented in non-overlapped blocks given by X_i(k) = X_i(k + in), k = 1, 2, …, n; i = 1, 2, …, which are then transformed using the DCT to generate a sparse frame, S_i. Next, using the chaotic mixing method [13,14], the n × m sensing matrix of the i-th frame, A_i, is generated from the random matrix A. This allows to generate a different sensing matrix for each frame, without significantly increasing the computational complexity, satisfying at the same time the extended Wyner secrecy (EWS) criterion [12]. Next, the sparse vector S_i, estimated using the DCT, is multiplied the sensing matrix A_i to obtain the encrypted frame

y_{1}

with a compression rate of n/m, which is sent to the reception side.

In the receiver stage, provided in Figure 1b, the received information is decoded using the RSA des-encryption module, which allows to recover the values of m and n as well as the users secret keys k₁, k₂, and k₃. These parameters are then used to generate the matrix A and then, using the chaotic mixing method, the sensing matrix for the i-th block, A_i, is generated in the same form as it is estimated in the encoding stage. Next, the sensing matrix A_i and the input frame

y_{i} (n)

are fed into the CS recovery stage to obtain

\hat{S_{l}}

. Then, the inverse DCT (IDCT) of

\hat{S_{l}}

is computed to obtain

\hat{X_{l}}

, which is then concatenated with the previously decoded frames to estimate the decoded signal. The encoding and decoding process described above is performed with each frame of input signal X. The following sections provide a description of each stage of the proposed system.

2.1. Sensing Matrix Generation

The sensing matrix, A, required in a CS-based audio compression system, becomes the secret key used for the proposed encryption system. Thus, to obtain sufficiently accurate signal decoding, the sensing matrix A must satisfy the restrictive isometry property (RIP) given by [6,15,16]

(1 - δ_{k}) ⟦ A S ⟧_{2}^{2} \leq ⟦ A S ⟧_{2}^{2} \leq (1 + δ_{k}) ⟦ A S ⟧_{2}^{2},

(1)

where

0 \leq δ_{k} \leq 1 .

Thus, because

∥ A S ∥_{2}^{2} = {(A S)}^{T} A S = S^{T} A^{T} A S

(2)

and assuming that

A^{T} A = σ^{2} I

, from Equation (1), it follows that

(1 - δ_{k}) ⟦ S ⟧_{2}^{2} \leq σ^{2} ⟦ S ⟧_{2}^{2} \leq (1 + δ_{k}) ⟦ S ⟧_{2}^{2} .

(3)

Thus, A satisfies the RIP if

σ^{2} = 1 .

Then, if the sensing matrix, A, satisfies (3), the signal S can be accurately recovered [6]. Then, in the encoding stage, the sensing matrix A is constructed using a pseudo random number generator whose initial value is the key, k₁, provided by the user, while in the decoding stage, using the same user key, k₁, the sensing matrix, A, required to decode S, is generated. To this end, firstly, (L/2)² pairs of uniformly distributed random numbers (U_j, V_j), j = 0, 1, …, L/2 − 1, are generated [16]. Next, using the Marsaglia polar method [16], the (L/2)² pairs of uniformly distributed random numbers (U_j, V_j) are converted into L² Gaussian distributed random numbers used to estimate the matrix A. To this end, L uniformly distributed random numbers are computed (U_j, V_j), j = 0, 1, …, L/2 – 1.

S^{2} = V_{j}^{2} + U_{j}^{2},

(4)

then

A (p, 2 j) = U_{j} \sqrt{\frac{- 2 \ln (S)}{S}}, j = 0, 1, \dots, \frac{L}{2} - 1; p = 0, 1, \dots, L - 1

(5)

A (p, 2 j + 1) = V_{j} \sqrt{\frac{- 2 \ln (S)}{S}}, j = 0, 1, \dots, \frac{L}{2} - 1; p = 0, 1, \dots, L - 1

(6)

The matrix A, described by (5) and (6), will be used to generate the sensing matrixes required to compress and encrypt the input signal, simultaneously satisfying the RIP property [15,17] and the EWS criterion [10].

To satisfy the EWS criterion [10], a different sensing matrix must be used in each frame, which must also satisfy the RIP. To generate a different sensing matrix for each frame, the chaotic mixing scheme [13,14] is applied to random matrix A, as described in Figure 2 and Equations (7)–(15).

To modify only the position of the matrix elements and not their values themselves, the chaotic mixing method is used, because it performs a mapping from

M_{L} \to M_{L}

. To achieve this goal, the location of the element, (x, y) of the matrix,

A (x, y)

, is modified using a matrix

B_{L}

given by [13,14]

B_{L} = (\begin{matrix} 1 & 1 \\ k_{2} & k_{2} + 1 \end{matrix}),

(7)

where B_L satisfies that

\det (B) = 1

and

t r a c e (B) = λ_{1} + (1 / λ_{1})

, where

λ_{1}

is the largest eigenvalue of

B_{L} (k)

. Consider the largest and smallest eigenvalues of

B_{L} (k)

, which are given by [13,14]

λ_{1} = 1 / 2 [k_{2} + 2 + \sqrt{4 k_{2} + k_{2}^{2}}]

(8)

and

λ_{2} = 1 / 2 [k_{2} + 2 - \sqrt{4 k_{2} + k_{2}^{2}}] .

(9)

Next, from (5) and (6), it follows that

λ_{1} λ_{2} = \frac{1}{4} ([k_{2} + 2 + \sqrt{4 k_{2} + k_{2}^{2}}] [k_{2} + 2 - \sqrt{4 k_{2} + k_{2}^{2}}]),

(10)

t r a c e (B) = k_{2} + 2 = λ_{1} + λ_{2} = λ_{1} + (1 / λ_{1}),

(11)

and

\det (B) = (k_{2} + 1) - k_{2} = 1 .

(12)

Thus, because

B_{L}

is not singular and thus

B_{L}^{- 1}

exists, from (8)–(12), it follows that positions of the elements of

A (x, y)

are estimated using B_L as follows

(\begin{matrix} x_{r + 1} \\ y_{r + 1} \end{matrix}) = (\begin{matrix} 1 & 1 \\ k_{2} & k_{2} + 1 \end{matrix}) (\begin{matrix} x_{r} \\ y_{r} \end{matrix}) m o d (L), r = 1, 2, \dots, k_{3},

(13)

x_{r + 1} = (x_{r} + y_{r}) m o d (L); r = 1, 2, \dots, k_{3},

(14)

y_{r + 1} = (k_{2} x_{r} + (k_{2} + 1) y_{r}) m o d (L); r = 1, 2, \dots, k_{3},

(15)

where

k_{2}

and

k_{3}

are the secret keys provided by the user, where user’s key

k_{3}

determines the required iterations. Thus, Equations (14) and (15) are iterated from

r = 1, 2, \dots, k_{3},

where

(x_{r}, y_{r}) ϵ [0, L - 1]

denotes the position of (x, y) during the r-th iteration. Thus, using the chaotic mixing method, a new matrix is estimated in each frame. Thus, using the chaotic mixing approach, firstly, the random matrix A(x, y), at the i-th frame, is estimated after iterating k₃ times, using (14) and (15), the random matrix estimated in the (i − 1)th frame. Next, using

A (x, y)

, the

n \times m

sensing matrix during i-th frame,

A_{i} (x, y)

, is given by

A_{i} (x, y) = A (x, y)

, x = 0, 1, …, n − 1; y = 0, 1, 2 …, m − 1.

2.2. Public Key Encryption of Secrete User Key

In the encoding stage of the proposed system, firstly, the compression parameters m and n together with the user secret keys k₁, k₂, and k₃ are encrypted with the public key RSA algorithm [14,18,19,20,21], whose security depends on the difficulty of factoring large integer numbers into their prime components. In order for the transmitter A to be able to send the above information to the receiver B, using the RSA algorithm, the receiver B must send to the transmitter A the product of two prime secrete numbers, N_B, where

N_{B} = p_{B} q_{B}

, and

p_{B} a n d q_{B}

are two secrete prime numbers of B, together with a no-secrete public encryption exponent

E_{B}

. Thus, for sending the public key, the transmitter, A, must firstly receive from the receiver, B, the product of two prime secrete numbers,

N_{B}

, together with its public encryption exponent

E_{B} .

Additionally, the receiver generates its secret decryption exponent

D_{B}

. Thus, using the parameters received from B, the secret key of the proposed scheme is that transmitted by the encoder stage as [18,21]

Y_{i} = {(K_{i})}^{E_{B}} m o d (N_{B}); i = 1, 2, 3 .

(16)

Next, using

D_{B}

, the receiver decrypts the secret keys, sent by the transmitter and used for generating the sensing matrix,

A_{i}

, as follows [18,21]:

K_{i} = {(Y_{i})}^{D_{B}} m o d (N_{B}); i = 1, 2, 3,

(17)

where

D_{B}

satisfies the relation

E_{B} D_{B} = 1 m o d [(p_{B} - 1) (q_{B} - 1)],

(18)

where

p_{B} a n d q_{B}

are two prime secret numbers generated in the reception side. Finally, the receiver sent to the transmitter a conformation massage, given by [18,21]

C = M^{D_{B}} m o d (N_{B}),

(19)

which is decrypted by the transmitter as follows [18,21]

M = C^{E_{B}} m o d (N_{B}) .

(20)

Because E_B is public, the message M can be recovered by any member of the network, however, only the receiver B may have sent the confirmation of the encryption of the message M.

2.3. Encrypted and Compressed Signal

In the transmission stage, firstly, the input audio signal, X(m), is segmented in a set of nonoverlapped frames, such that its i-th frame is given by

X_{i} (k) = X (k + i n)

, where

0 \leq k \leq n

. X_i(k) is then transformed to the DCT domain, which provides a k sparse representation with only k << n terms different to zero, that is,

S_{i} = Ψ X_{i} .

Finally, the encrypted and compressed signal is computed by multiplying the sparse vector

S_{i}

by the i-th,

n \times m

size sensing matrix

A_{i}

. Thus, the i-th frame of the transmitted signal is given by [6,17,22].

y = A_{i} S_{i} = A_{i} Ψ X_{i},

(21)

where

Ψ

denotes the DCT basis functions. Thus, according to the compressive sensing theory, S_i can be reconstructed if the input signal is represented with at least m samples, where

m \geq O (k l o g n)

[6,17,22].

2.4. Decrypted and Decompressed Signal

The transmitted signal is decrypted and decompressed by minimizing the norm

l_{1}

because, if the received frame S_i is sparse enough, the probability that the recovered signal is almost equal to the original one is very high [6,17,22], because the norm

l_{1}

improves the signal reconstruction, that is, for a given sensing matrix

A_{i} \in R^{n \times m}

and a received vector

y_{i} \in R^{m}

, the i-th transmitted sparse vector,

{\hat{S}}_{i}

, can be estimated minimizing [6,17,22]

\min_{y \in R^{m}} ∥ y - A \hat{S} ∥_{1} given that y = A S,

(22)

using orthogonal matching pursit (OMP) [6]. Finally, the transmitted vector X is estimated computing the inverse DCT of

{\hat{S}}_{i},

that is,

X_{i} {= Ψ}^{- 1} S_{i}

. Because

k, m ≪ n

and S_i is sparse, it can be recovered with about k × n × m operations [6], and then, the CS-based scheme is a highly competitive compression-encryption system. However, as in any other encryption system, its security is of great importance. Thus, attending to this fact, the next subsection presents a security analysis of the CS-based encryption system.

2.5. Security Analysis of CS-Based System

The security of the proposed system strongly depends on the fact that the encoding and decoding sensing matrixes be enough different from each other. To carry out this analysis, consider the binary hypothesis testing theory developed by Ramezani-Mayiami et al. [5], using the Norman–Pearson test. Using this theory, it can be shown that, when the same sensing matrix, A_i, is used in both the encoding and decoding stages, the probability of correctly detecting the transmitted signal, P_s(α), is given by [5]

P_{S} (α) = Q [Q^{- 1} (α) - \frac{∥ A_{i} S_{i} ∥_{2}}{σ}] .

(23)

Meanwhile, when two different matrixes A_i and B_i are used in the encoding and decoding stages, the probability of correctly detection, P_d(α), is given by

P_{d} (α) = Q [Q^{- 1} (α) - \frac{∥ A_{i} S_{i} ∥_{2} 〈 A_{i} S_{i}, B_{i} S_{i} 〉}{σ ∥ A_{i} S_{i} ∥_{2} ∥ B_{i} S_{i} ∥_{2}}],

(24)

P_{d} (α) = Q [Q^{- 1} (α) - \frac{∥ A_{i} S_{i} ∥_{2} ∥ A_{i} S_{i} ∥_{2} ∥ B_{i} S_{i} ∥_{2} \cos (θ)}{σ ∥ A_{i} S_{i} ∥_{2} ∥ B_{i} S_{i} ∥_{2}}],

(25)

P_{d} (α) = Q [Q^{- 1} (α) - \frac{∥ A_{i} S_{i} ∥_{2} \cos (θ)}{σ}],

(26)

where θ is the angle between sub-spaces

∥ A_{i} S_{i} ∥_{2}

and

∥ B_{i} S_{i} ∥_{2}

. Because

0 \leq \cos (θ) \leq 1

from (23) and (26), it follows that, when

∥ A_{i} S_{i} ∥_{2}

and

∥ B_{i} S_{i} ∥_{2}

are orthogonal sub-spaces, that is,

\cos (θ) = 0,

the decoded signal is completely useless because, in this situation, P_d(α) = 0.5. From the information theoretic perspective, this situation is satisfied when the perfect secrecy is satisfied [5].

Equations (23)–(26) show that, to correctly decode the incoming signal, the encoding matrix must be the same as the decoding sensing matrix. Thus, a possible attack is to try to estimate the sensing matrix using several received frames by means of some blind signal separation methods, such as the independent component analysis (ICA). Thus, it is necessary to determine the conditions that allow to increase the security against ICA or other blind separation analysis. To analyze the security of the CS-based crypto system, it will be assumed that the secrecy of sensing matrix A is guaranteed. To this end, consider the plain text to be k sparse, where k < n, such that there is at most k elements different from zero in a frame S_i of length n, such that the CS-based encoded signal is given by [6,10]

y = A_{i} S_{i},

(27)

where

y \in R^{m}

is the encoded vector, and

A \in R^{m \times n}

and

S \in R^{n}

is the input vector. Next, defining

A = [A^{a}, A^{b}]

and

S = [S^{a}, S^{b}],

A^{a} \in R^{m \times (n - k)},

A^{b} \in R^{m \times k},

S^{a} \in R^{(n - k) \times 1},

S^{b} \in R^{k \times 1},

without loss of generality, assume [10]

S^{a} = {[S_{1}, S_{2}, S_{3}, \dots, S_{n - k}]}^{T},

(28)

S^{b} = {[S_{n - k + 1}, S_{n - k + 2}, S_{n - k + 3}, \dots, S_{n}]}^{T},

(29)

A^{a} = {[a_{1}, a_{2}, a_{3}, \dots, a_{n - k}]}^{T},

(30)

A^{b} = {[a_{n - k + 1}, a_{n - k + 2}, a_{n - k + 3}, \dots, a_{n}]}^{T}

(31)

and

a_{1} \in R^{m \times 1} .

Substituting (28)–(31) into (27),

y = A^{a} S^{a} + A^{b} S^{b}

(32)

and using the Moore–Penrose inverse matrix,

S^{b}

is given by [10]

S^{b} = {({(A^{b})}^{T} A^{b})}^{- 1} {(A^{b})}^{T} (X - A^{a} S^{a}) .

(33)

Next, consider the conditional entropy function of

S^{a}

given S, which satisfies [10,19]

H (S^{a} / S) = 0,

(34)

where the entropy of S and conditional entropies satisfies

H (S) \leq H (S^{a}) + H (S^{b}),

(35)

H (S / S^{a}) = H (S, S^{a}) - H (S^{a}),

(36)

H (S^{a} / S) = H (S, S^{a}) - H (S)

(37)

and then

H (S, S^{a}) = H (S^{a} / S) + H (S) .

(38)

Substituting (38) and (35) into (36), from (34), it follows that

H (S / S^{a}) = H (S^{a} / S) + H (S) - H (S^{a}),

(39)

H (S / S^{a}) = H (S^{a}) + H (S^{b}) - H (S^{a}),

(40)

H (S / S^{a}) = H (S^{b}) .

(41)

Assuming that the entropy of

S^{b}

is smaller than or equal to the sum of the entropy of its elements, that is,

H (S^{b}) \leq H (S_{n - k + 1}) + H (S_{n - k + 2}) + \dots + H (S_{n - k + j}) + \dots + H (S_{n}) .

(42)

Using the fact that all elements of

S^{b}

given by (33) have the same distribution, it follows that

H (S^{b}) \leq \sum_{j = 1}^{K} \log (M) = k \log (M),

(43)

where

M = 2^{B}

and B is the number of bits used for representing an information sample, that is, an audio sample in an audio signal or a pixel in an image.

Next, consider the conditional mutual information of

y

and S, given

S^{a}

, which is given as [6]

I (y; S / S^{a}) = H (S / S^{a}) - H (S / X, S^{a}) .

(44)

Substituting (41) into (44), it follows that

I (y; S / S^{a}) = H (S^{b}) - H (S / X, S^{a}),

(45)

I (y; S / S^{a}) = k l o g (M) - H (S / X, S^{a}),

(46)

I (y; S / S^{a}) \leq k l o g (M) .

(47)

Next, consider the mutual information between the input vector y and the sensing matrix A given

S^{a}, I (y; A / S^{a}),

which, using the chain rule and the fact that

A = [A^{a}, A^{b}]

, can be expressed as follows [10]:

I (y; A / S^{a}) = I (y; A^{a}, A^{b} / S^{a}),

(48)

I (y; A / S^{a}) = I (y; A^{b} / S^{a}) + I (y; A^{a} / A^{b}, S^{a}),

(49)

I (y; A / S^{a}) = I (y; A^{b} / S^{a}) + H (A^{a} / A^{b}, S^{a}) - H (A^{a} / y, A^{b}, S^{a}),

(50)

Because

A^{a}

is independent of

A^{b}

and S is independent of A, besides that the elements of A and S are statistically independent, it follows that

H (A^{a} / A^{b}, S) = H (A^{a} / S) - I (A^{a}, A^{b} / S),

(51)

H (A^{a} / A^{b}, S) = H (A^{a} / S) - H (A^{b} / S) - H (A^{a}, A^{b} / S) .

(52)

As

A^{a}

and

A^{b}

are statistically independent of S, from (52), it follows that [10]

H (A^{a} / A^{b}, S) = - H (A^{b}) + H (A),

(53)

H (A^{a} / A^{b}, S) = - H (A^{b}) + H (A^{a}) + H (A^{b}),

(54)

H (A^{a} / A^{b}, S) = H (A^{a}) .

(55)

Next, if

S^{a} = 0

, from (32), it follows that

y = A^{b} S^{b} .

(56)

Then,

H (A^{a} / y, A^{b}, S^{a}) = H (A^{a} / A^{b} S^{b}, A^{b}, S^{a}),

(57)

H (A^{a} / y, A^{b}, S^{a}) \geq H (A^{a} / A^{b}, S^{b}, A^{b}, S^{a}),

(58)

H (A^{a} / y, A^{b}, S^{a}) = H (A^{a} / A^{b}, S) .

(59)

Then, from (59), it follows that

H (A^{a} / y, A^{b}, S^{a}) = H (A^{a}) .

(60)

Next, consider Equation (58),

I (y; A / S^{a}) = I (y; A^{b} / S^{a}) + H (A^{a} / A^{b}, S^{a}) - H (A^{a} / X, A^{b}, S^{a})

(61)

and substituting (60) and (55) into (50), it follows that

I (y; A / S^{a}) = I (y; A^{b} / S^{a}) + H (A^{a}) - H (A^{a}),

(62)

I (y; A / S^{a}) = I (y; A^{b} / S^{a}) .

(63)

Next, consider the mutual information of y and A, given

S^{a}

, which is given by [10]

I (y; A / S^{a}) = H (A^{b} / S^{a}) - H (A^{b} / y, S^{a}) .

(64)

Next, assuming that

A^{b}

has k × m entries, which are mutually independent and also independent of

S^{a}

, it follows that

H (A^{b} / S^{a}) = H (A^{b}),

(65)

H (A^{b} / S^{a}) = \sum_{i = 1}^{m} \sum_{j = n - k + 1}^{n} H (a_{i j}) \leq k m l o g (C) .

(66)

Because

I (y; A / S^{a}) = H (A^{b} / S^{a}) - H (A^{b} / y, S^{a})

(67)

and

H (A^{b} / y, S^{a}) \geq 0

(68)

it follows that

I (y; A / S^{a}) \leq k m l o g (C) .

(69)

Thus, from (69) and (68), it follows that [10]

I (y; S / S^{a}) + I (y; A / S^{a}) \leq k l o g (M) + k m l o g (C) .

(70)

Finally, considering that

S^{a} = 0

, from (70), it follows that

I (y; S) + I (y; A) \leq k l o g (M) + k m l o g (C) .

(71)

Then, from (68), it follows that

\lim_{n \to \infty} \frac{I (y; S) + I (y; A)}{n} \leq \lim_{n \to \infty} \frac{k l o g (M) + k m l o g (C)}{n} = 0 .

(72)

As the mutual information is always positive, that is,

I (y; S) + I (y; A)

is always non-negative, it approaches zero as n increases. Then the CS-based joint encryption-compression system satisfies the EWS criterion [10], when the key is used only once.

3. Experimental Results

To evaluate the compression and encryption capability of proposed algorithm, it is necessary to simultaneously compress and encrypt different genres of audio signals, such as Mexican, Caribbean, classic, pop, and rock music, as well as speech signals with different compression rates. To this end, these signals are encoded and decoded using either the same or different sensing matrixes. To evaluate the security performance of proposed system, several tests are performed that are described in the following subsections.

3.1. Waveform Plotting

One of the more common evaluations of the system performance is the waveform plotting, which allows a visual comparison about the similarity between the original audio and the decrypted/decompressed signals. Figure 3a–e show the plot of decrypted/decompressed violin audio signal segment of 1.1 s corresponding to a Bach concert with a sampling rate of 44 kHz and 16 bits/sample without compression, that is, with a bit rate of 704 kb/s, plot in Figure 3a. Figure 3b shows the decrypted signal using the same sensing matrix for both the encryption and decryption process without compression. Figure 3c shows the decrypted signal when the sensing matrix used for encryption is different from that used for decryption; in this case, the original signal was encoded without any compression. Figure 3d plots the decrypted/decompressed signal when the sensing matrixes used for both encryption/compression and decryption/decompression are the same. In this situation, the original signal was encoded with 176 kb/s. Finally, Figure 3e shows the decoded signal when the sensing matrix used for decoding is different to that used during the encoding process.

Figure 3f–h show the plot of a decrypted/decompressed signal segment of popular music sampled at 44 kHz. Figure 3f plots the original signal. Figure 3g plots the decoded signal when the encoding and decoding sensing matrixes are the same, and the original signal was compressed to 352 kb/s. Finally, Figure 3h plots the decoded signal obtained when different sensing matrixes are used during the encoding and decoding processes. In this case, the transmission rate was equal to 352 kb/s. Figure 3a–h show that when the same sensing matrix is used for encoding and decoding, the decoded signal closely resembles the original one, independently of the audio signal genre and compression rate used. On the other hand, when the sensing matrix used for decoding is different from that used for encoding, the decoded signal is quite different to the original one, even though, for some genre signals, the envelope has some similarity.

3.2. Spectrogram

Another important evaluation method consists of the comparison of the spectral characteristics of the original, encrypted, and decrypted signals, using different compression rates. Figure 4a–f show the spectrogram of violin music obtained from a Bach concert. These signals are encrypted using compressive sensing with different compression rates. Figure 4a shows the spectrogram of the original Bach concert signal. Figure 4b shows the spectrogram of the encrypted signal without compression. Figure 4c shows the decrypted and decompressed signal when the encoded signal is decoded using the same sensing matrix used during the encoding process. The original signal is encoded with a bit rate of 176 kb/s. Figure 4d shows the spectrogram of the decoded signal when the decoded signal is obtained using a sensing matrix different from that used during the encoding process. Here, the original signal was encoded with a bit rate of 176 kb/s. Figure 4e shows the decoded signal obtained when the original signal is encoded with a bit rate of 88 kb/s and decoded using the same matrix used during the encoding process. Finally, Figure 4f shows the spectrogram of the signal decoded using a sensing matrix different from that used during the encryption and compression processes. These figures show that the spectrum obtained when the sensing matrix used for encoding and decoding is different, and is almost flat, and they strongly infer the signal, shown in Figure 4a, from the knowledge of the signal in Figure 4b. On the other hand, these figures also show that the spectrogram of the signals obtained when the sensing matrix used for encoding and decoding is the same clearly resembles to that of the original one, while they are quite different from those obtained when the sensing matrix used for encoding and decoding is different. Thus, when the decoded signal is obtained using the same sensing matrixes in both the encoding stage and decoding stages, it clearly resembles the original one, while the spectrum of the decoded signal obtained using different sensing matrix in both encoded and decoded stages is clearly different.

3.3. Pearson Correlation Analysis

Another important parameter used for evaluating the similarity between the original signal and the decoded one is the Pearson correlation coefficient, which is given as follows:

\frac{M \sum_{n = 0}^{M} x_{0} (n) x_{d} (n) - {\bar{x}}_{o} (n) {\bar{x}}_{d} (n)}{\sqrt{M \bar{x_{o}^{2}} (n) - {({\bar{x}}_{o} (n))}^{2}} \sqrt{M \bar{x_{d}^{2}} (n) - {({\bar{x}}_{d} (n))}^{2}}},

(73)

where

{\bar{x}}_{(o, d)} (n) = \sum_{n = 0}^{M} x_{(o, d)} (n),

(74)

\bar{x_{(o, d)}^{2}} (n) = \sum_{n = 0}^{M} {(x_{(o, d)} (n))}^{2}

(75)

and x_(o,d)(n) denotes either the original signal, x_o(n), or the decoded one, x_d(n). Figure 5a–h show the comparison of the Pearson correlation coefficient obtained when the received signal is decoded using the same sensing matrix used for encoding,

R_{x x_{S}} (k)

, together with the Person correlation coefficient obtained when the received signal is decoded using a sensing matrix different to that used during the encoding process

R_{x x_{d}} (k)

. Figure 5a shows the Pearson correlation coefficients when the original signal is popular music with a bit rate of 704 kb/s. Figure 5b shows the Pearson correlation coefficients when the original signal is encoded using 352 kb/s. Figure 5c,d shows the Pearson correlation coefficients when popular music is encoded with 176 kb/s and 88 kb/s, respectively. Figure 5e,f show the Pearson correlation coefficients when the original signal is a segment of a Bach concert signal with bit rates of 704 kb/s and 352 kb/s, respectively. Moreover, Figure 5g,h show the Pearson correlation coefficients for each frame when the original Bach concert signal is encoded with bit rates of 176 kb/s and 88 kb/s, respectively. Figure 5i,j show the dispersion diagram of original and decoded popular music audio signals with a bit rate of 352 kb/s. Figure 5i shows that, when the same matrix is used for encoding and decoding, the dispersion diagram is close to a straight line with a slope. This means that, from the decoded audio signal, it is possible to obtain the input one. Figure 5j shows when the sensing matrix used for encoding and decoding is different, the dispersion diagrams are quite spread, such that the decoded signal cannot be inferred from the original one.

The evaluation results show that the Pearson correlation coefficient, of each frame, between the original and decoded signal when the sensing matrixes used in the encoded and decoded stages are different, is close to zero, around

10^{- 2}

. From the dispersion diagram shown in Figure 5j, it follows that, if the sensing matrix used for encoding and decoding is different, the dependence between the original and signals decoded is too weak, such that the original signal cannot be estimated from the decoded one. Thus, it would be tough for an intruder to hack the audio signal during the transmission. On the other hand, when the sensing matrixes used for encoding and decoding are the same, the correlation coefficients for each frame are close to one, even when the bit rate used is relatively low. This fact can be observed from the dispersion diagram of Figure 5i, which plots the dispersion diagram between the original and decoded signal. Here, we can see that, when the same matrix is used, the decoded signal closely approaches the original one, grouping around a straight line with a slope. This means that the original signal can be accurately inferred from the decoded one. Thus, the proposed system allows secure and high-quality audio signal transmission.

3.4. Normalized Mean Square Error Analysis

Other important parameter used to evaluate the quality of the proposed system is the normalized mean square error between the original and decoded signals, when the sensing matrixes used for decoding are the same or different to those used for encoding. The normalized mean square error (MSE) is given by

M S E = \frac{\sum_{k = 1}^{1024} {(x_{o} (1024 (k - 1) + n) - x_{d} (1024 (k - 1) + n))}^{2}}{\sum_{k = 1}^{1024} x_{o}^{2} (1024 (k - 1) + n)},

(76)

where x_o(n) and x_d(n) are the original and recovered signals, respectively. For evaluating the performance of the proposed system, several audio signals were used, such as popular Mexican and Caribbean music, POP music, classic music, and rock music signals sampled at 44 KHz. Each signal is encoded using 16 bits/sample, that is, a bit rate of 704 kb/s. For encoding, as described in Section 2.1, each signal is divided in frames of 1024 samples/frame before computing the DCT, whose resulting vector is multiplied by the sensing matrix. Figure 6a–h show the MSE obtained when the input signals are decoded using either the same or different sensing matrixes used for encoding, that is, the correct or incorrect private secret key. These figures show that the MSE obtained when the decoded sensing matrix is different to that used during the encoding process, that is, an incorrect private decoding key, is larger, whereas when a correct sensing matrix is used, the MSE is close to zero. This fact can be also observed from Figure 5i,f, which show that, when the same matrix is used for encoding and decoding, the decoded signal closely approaches to the original one, which results in an approximation error close to zero. If we consider that the MSE given by (76) can be considered as the inverse of the signal-to-noise ratio (SNR), that is,

M S E^{- 1} = S N R

, the evaluation results show that, when the same matrix is used for encoding and decoding, a decoded signal with high SNR is obtained, that is, a high quality signal can be obtained. Meanwhile, when the sensing matrix used for encoding and decoding is different among them, a rather noisy decoded signal with SNR smaller than zero is obtained, which results in an unintelligible decoded signal. Thus, it can be expected that the proposed system allows the secure transmission of the high-quality signal. When the compression rate increases, as can be expected, the quality of the decoded signal becomes lower.

3.5. Spectral Similarity Analysis

Another metric that can be used for evaluating the security and reconstruction quality of proposed system is the spectral similarity (SMSE), which is given by

S M S E (m) = \frac{\sum_{k = 1}^{1024} {(x_{f o} (1024 (m - 1) + k) - x_{f d} (1024 (m - 1) + k))}^{2}}{\sum_{k = 1}^{1000} x_{f o}^{2} (1000 (m - 1) + k)},

(77)

where

X_{f o} (1024 (m - 1) + k)

and

X_{f d} (1024 (m - 1) + k)

is the k-th component of the m-th frame of original and decoded signals, respectively. Figure 7a–d show the spectral similarity obtained when the sensing matrix used for decoding is equal and different to that used in the encoding stage. Figure 7a,b show the spectral similarity obtained when the sensing matrixes equal and different to that used for encoding used classical music signals with bit rates of 704 and 352 bits/s, respectively.

Figure 7c,d show the spectral similarity obtained when the sensing matrix is equal and different to that used for encoding using a classical music signals with bit rates of 176 kb/s and 88 kb/s, respectively.

The evaluation results show that the MSE obtained when the signal is transmitted without and with compression rates of 50% is close to zero, providing secure communications with high quality decoded signals. Meanwhile, when the compression rate increases, the quality of the decoded signal becomes lower.

Table 1 and Table 2 show the MSE, SMSE, and correlation coefficient of the proposed algorithm when it is used for encoding popular Mexican music. Table 2 shows the performance of the proposed scheme when it is used for compressing and encrypting classic music. Table 3 shows the NSCR and UACI parameters obtained when the incoming signal is classic music, popular music, and pop music with different bit rates.

3.6. NSCR and UACI Parameters

Other important parameters included in the NIST recommendations to determine the quality of speech encryption are the NSCR and UACI, which determine the number of changing samples and the number of average of changes in the intensity of the encrypted speech, respectively. The Number of Sample Change Rate (NSCR) and Unified Average Changing Intensity (UACI) are given by

NSCR = \frac{1}{N} \sum_{i = 1}^{N} D_{i} \times 100 %,

(78)

where

D_{i} = {\begin{matrix} 1, x_{i} \neq x_{i}^{'} \\ 0, x_{i} = x_{i} \end{matrix},

(79)

UACI = \frac{1}{N * \max (x^{'} (n))} \sum_{i = 1}^{N} | x_{i} - x_{i}^{'} | \times 100 %,

(80)

where

x_{i} (n)

and

x_{i}^{'} (n)

are the ith sample of two cyphered audio signals, whose original versions differ only in one sample, and N denotes the length of the audio frame. Table 3 provides the NSCR and UACI when the proposed algorithm is required to compress and encrypt several genders of audio signals.

Table 3 shows that the values of UACI and NCSR provided by the proposed scheme are close to the optimum ones reported in the literature [23].

3.7. Comparison with Other Reported Schemes

An important evaluation of the proposed scheme is the comparison of its performance with the performance provided by other previously proposed schemes. Table 4 shows a comparison of the Pearson correlation coefficients and the mean square error provided by the proposed scheme and other previously proposed schemes when the sensing matrix used for decoding the encrypted signal is either the same or different sensing matrix to that used for encoding the audio signals.

Table 4 and Table 5 show a comparison of the correlation coefficient and MSE provided by the proposed scheme together with the system proposed by G. Sudhish et al. [3], Sathiyamurthi [23,24] Kordov [25], and when the input signals are classic and popular music, with a bit rate of 704 kb/s. Table 6 shows a comparison of the correlation coefficient and MSE provided by the proposed scheme and the system proposed by G. Sudhish et al. [3], when the input signals are classic and popular music, with bit rates of 352 kb/s and 176 kb/s, respectively. Finally, Table 7 shows a comparison of the NSCR and UACI parameters provided by the proposed scheme together with the system proposed by Sathiyamurthi [23] and Kordov [25] when the input signals are classic and popular music, with a bit rate of 704 kb/s.

Table 4, Table 5, Table 6 and Table 7 show that the proposed scheme provides results that are quite competitive compared with other previously proposed schemes. It provides the same correlation coefficients and smaller MSE than other previously proposed schemes, when the audio signals are transmitted without compression [3,21,24]. On the other hand, when the audio signals are simultaneously compressed and encrypted, the proposed scheme is quite competitive with other previously proposed schemes [3].

4. Conclusions

This paper presents a CS-based encoding system for jointly encrypting and compressing audio signals. In proposed scheme, the audio signals are firstly segmented in frames of 1024 samples, which are then transformed using the DCT for generating a sparse frame. Each frame is then multiplied by a different sensing matrix for compression and encryption, which is constructed using a Gaussian random number generator and a chaotic mixing scheme. This assures that the sensing matrixes used in the proposed system are different in each frame, and then satisfies the EWS criterion.

The evaluation results obtained show that the proposed algorithm provides a rather secure transmission system with a very good quality of decoded signal, because when the same matrix is used for encoding and decoding; the correlation coefficient is close to one, while the MSE and SMSE are close to zero. Meanwhile, when the sensing matrixes used for encoding and decoding are different, the correlation coefficients for each frame are close to zero, and MSE and SMSE become larger than one, even when the bit rates used are relatively low. Besides that, the NSCR and UACI obtained are close to 100% and 33%, respectively. Thus, the proposed scheme allows the secure transmission of high-quality audio signals.

Finally, the evaluation results show that the proposed scheme provides results that are quite competitive compared with other previously proposed schemes. It also provides the same correlation coefficients and smaller MSE than other previously proposed schemes, when the audio signals are transmitted without compression, whereas when the audio signals are simultaneously compressed and encrypted, the proposed scheme is quite competitive compared with other previously proposed schemes.

Author Contributions

The authors contributions are as follows: conceptualization, R.M.-A. and H.P.-M.; methodology and software, E.R.-J.; validation, R.M.-A., H.P.-M., and M.N.; formal analysis, E.R.-J. and R.M.-A.; investigation, M.N.; data analysis, R.M.-A.; writing and original draft preparation, H.P.-M.; review, R.M.-A. and M.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors thank the National Science and Technology Council of Mexico and the National Polytechnic Institute of Mexico for the financial support provided during the realization of this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gadanayak, B.; Prodhan, C. Selective Encryption of MP3 Compression. In Proceedings of the International Conference on Information Systems and Technology, Shanghai, China, 4–7 December 2011; pp. 23–26. [Google Scholar]
Kaur, M.; Kaur, S. Survey of various encryption techniques for audio data. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2014, 4, 1314–1317. [Google Scholar]
Sudhish, G.; Nishanth, A.; Deepthi, P. Audio security through compressive sensing and cellular automata. Multimed. Tools Appl. 2005, 74, 10291–10417. [Google Scholar]
Cambareri, V.; Mangia, M.; Plow, F.; Pareschi, C.; Rovatti, R.; Setti, G. Low complexity multiclass encryption by compressed sensing. IEEE Trans. Signal Process. 2015, 63, 2183–2195. [Google Scholar] [CrossRef]
Ramezani-Matimi, M.; Bafghi, H.; Seyfe, B. Compressive sensing encryption: Compresive sensing meets detection theory. J. Commun. 2018, 13, 82–87. [Google Scholar] [CrossRef]
Eldar, Y.; Kutiniok, G. Compressive Sensing: Theory and Applications; Cambridge University Press: Cambridge, MA, USA, 2012. [Google Scholar]
Parkale, Y.; Nalbalwar, S. Application of 1-D Discrete Wavelet Transform based Compressed Sensing Matrices for Speech Compression; Springer: Berlin/Heidelberg, Germany, 2017; Volume 20, pp. 1–60. [Google Scholar]
Duta, C.; Gheorhe, L.; Tapus, N. Performance Comparison of Voice Encryption Algorithms Implemented in Blackfin Platform. In Proceedings of the International Conference on Information Systems, Security and Privacy, Rome, Italy, 19–21 February 2016; pp. 169–191. [Google Scholar]
Ponnanian, D.; Chandranbabu, K. Crypt Analysis Compression-encryption algorithm and a modified scheme using compressive sensing. Optik 2017, 147, 265–276. [Google Scholar]
Yang, Z.; Yan, W.; Xiang, Y. On the Security of Compressed Sensing-Based Signal Cryptosystem. IEEE Trans. Emerg. Top. Comput. 2015, 3, 363–371. [Google Scholar] [CrossRef]
Moreno-Alvarado, R.; Rivera-Jaramillo, E.; Nakano, M.; Perez-Meana, H. Joint Encryption and Compression of Audio Based on Compressive Sensing. In Proceedings of the International Conference on Telecommunications and Signal Processing, Budapest, Hungary, 1–3 July 2019; pp. 58–61. [Google Scholar]
Al-Azawi, M.M.; Gaze, A. Combined speech compression and encryption using chaotic compressive sensing with large key size. IET Signal Process. 2018, 12, 214–218. [Google Scholar] [CrossRef]
Reyes, R.; Cruz, C.; Nakano, M.; Perez-Meana, H. Digital Video Watermarking in DWT Domain Using Chaotic Mixtures. IEEE Lat. Am. Trans. 2010, 8, 304–310. [Google Scholar] [CrossRef]
Alvarez-Hernandez, M.; Shinbrot, T.; Zalc, J. Practical chaotic mixing. Chem. Eng. Sci. 2002, 57, 3749–3753. [Google Scholar] [CrossRef]
Candes, E. Compressive Sampling. In Proceedings of the International Congress of Mathematicians, Madrid, Spain, 20–30 August 2006; pp. 1–20. [Google Scholar]
Marsaglia, G.; Bray, T.A. A Convenient Method for Generating Normal Variables. SIAM Rev. 1964, 6, 260–264. [Google Scholar] [CrossRef]
Donoho, D. Compressed sensing. Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
Cohen, A.; Parhi, K. Architecture Optimizations for the ERSA Public Key Cryptosystems: A Tutorial. IEEE Circuit Syst. Mag. 2011, 11, 24–34. [Google Scholar] [CrossRef]
Cover, T.; Thomas, J. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 1991. [Google Scholar]
Van Tilborg, H.; Jalodia, S. Encyclopedia of Cryptography and Security; Springer: Boston, MA, USA, 2011. [Google Scholar]
Newman, D.; Omura, J.; Pickholtz, R. Public key management for network security. Netw. Mag. 1987, 1, 11–16. [Google Scholar] [CrossRef]
Candes, E.; Romberg, J.; Tao, T. Robust uncertainty principles; exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 2006, 52, 489–509. [Google Scholar] [CrossRef] [Green Version]
Sathiyamurthi, P.; Ramakrishnan, S. Speech encryption algorithm using FFT and 3D-Lorenz–logistic chaotic map. Multimedia Tools Appl. 2020, 79. [Google Scholar] [CrossRef]
Sathiyamurthi, P.; Ramakrishnan, S. Speech encryption using chaotic shift keying for secure speech communications. EURASIP J. Audio Speech Music. Process. 2017, 20, 1–11. [Google Scholar] [CrossRef]
Kordov, K. A Novel Audio Encryption Algorithm with Permutation-Substitution Architecture. Electronics 2019, 8, 530. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Proposed encryption/des-encryption system. (a) Encryption and compression stage, (b) des-encryption and decompression stage. DCT, Discrete Cosine Transform; IDCT, Inverse DCT; CS, Compressive Sensing; RSA, Rivest, Shamir, and Adleman.

Figure 2. Generation of matrix

A (x, y)

, using chaotic mixing. (a) Matrix A during frame j − 1, (b) matrix A during frame j, and (c) matrix

A_{i} (x, y)

during frame j.

Figure 2. Generation of matrix

A (x, y)

, using chaotic mixing. (a) Matrix A during frame j − 1, (b) matrix A during frame j, and (c) matrix

A_{i} (x, y)

during frame j.

Figure 3. (a). Original violin signal with a bit rate of 704 kb/s. (b). Decoded violin signal using the same sensing matrix during the encoding process with a bit rate of 704 kb/s. (c). Decoded violin signal using different sensing matrixes when the encoding and decoding processes are different, with a bit rate of 704 kb/s. (d). Decoded violin signal using the same sensing matrix during the encoding process with a bit rate of 176 kb/s. (e). Decoded violin signal using different sensing matrix during the encoding and decoding processes, with a bit rate of 176 kb/s. (f). Original popular music segment with a bit rate of 704 kb/s. (g). Decoded popular signal using the same sensing matrix during the encoding and decoding process with a bit rate of 352 kb/s. (h). Decoded popular signal using the different sensing matrix during the encoding and decoding process with a bit rate of 352 kb/s.

Figure 4. (a). Spectrogram of a music signal obtained from a Bach concert with a bit rate of 704 kb/s. (b). Spectrogram of a encrypted music Bach violin signal with a bit rate of 704 kb/s. (c). Spectrogram of a decoded signal obtained using the same sensing matrix for encoding and decoding process with a bit rate of 176 kb/s. (d). Spectrogram of a decoded signal obtained using different sensing matrixes during the encoding and decoding process with a bit rate of 176 kb/s. (e). Spectrogram of a decoded signal obtained using the same sensing matrix for encoding and decoding process with a bit rate of 176 kb/s. (f). Spectrogram of a decoded signal obtained using different sensing matrixes during the encoding and decoding process with a bit rate of 176 kb/s.