Next Article in Journal
Thermoelastic Coupling Response of an Unbounded Solid with a Cylindrical Cavity Due to a Moving Heat Source
Next Article in Special Issue
Stochastic Approximate Algorithms for Uncertain Constrained K-Means Problem
Previous Article in Journal
Rotating Flow in a Nanofluid with CNT Nanoparticles over a Stretching/Shrinking Surface
Previous Article in Special Issue
Reversible Data Hiding Based on Pixel-Value-Ordering and Prediction-Error Triplet Expansion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel LSB Matching Algorithm Based on Information Pre-Processing

1
Department of Cryptogram Engineering, Information Engineering University, Zhengzhou 450001, China
2
School of Information Science and Technology, Hainan Normal University, Haikou 571158, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(1), 8; https://doi.org/10.3390/math10010008
Submission received: 18 November 2021 / Revised: 10 December 2021 / Accepted: 16 December 2021 / Published: 21 December 2021

Abstract

:
This paper analyzes random bits and scanned documents, two forms of secret data. The secret data were pre-processed by halftone, quadtree, and S-Box transformations, and the size of the scanned document was reduced by 8.11 times. A novel LSB matching algorithm with low distortion was proposed for the embedding step. The golden ratio was firstly applied to find the optimal embedding position and was used to design the matching function. Both theory and experiment have demonstrated that our study presented a good trade-off between high capacity and low distortion and is superior to other related schemes.

1. Introduction

With the development of the internet, the transmission and sharing of information have become increasingly convenient. However, with this convenience, criminals may tamper with or intercept information on the internet. To solve the apparently conflicting open access of the network and information security, many privacy protection methods have been studied [1,2,3,4]. Encryption can protect privacy, but the spread of encrypted files on the internet easily attracts the attention of attackers. Information hiding technology, which hides secret information in the carrier, emerged in the 1990s. After more than 20 years of research and development, the technology has gained a measure of maturity, although it is still the focus of research in network security.
According to whether an embedded image can be reconstructed, information hiding is divided into two types, reversible and irreversible. Reversible information hiding is usually divided into four categories: lossless compression [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37], difference expansion [8,9,10], prediction error expansion [11,12,13], and histogram shifting [14,15,16]. All reversible information-hiding schemes can extract secret information and restore the original image; however, the hiding capacity is not high. Most of the time, we need to embed a large volume of information with low distortion, and it does not matter whether the original image can be reconstructed entirely. The least significant bit (LSB) algorithm is a classic spatial information-hiding algorithm. The secret data are embedded into the least significant bit of the pixel value. The LSB algorithm has low complexity, simple operation, and greater hiding capacity, but its robustness is poor. Today, there are many LSB matching algorithms with low distortion.
There are two major concerns when selecting a gray image as the carrier to convey information. The first relates to the high capacity of its pixel modification: if the payload of each pixel for a cover image is less than 3 bpp, human vision is not able to detect the visual artifacts of a steganographic image. An LSB++ scheme was developed to improve the power of LSB-based algorithms. Generally, all these methods have tried to use the reductant space in the cover image more fully. The second concern is the quality of the steganographic image: Digital gray images are widespread on the internet. In many cases, secret data are embedded into cover images without noticeable visual artifacts. However, a high embedding capacity can distort the image, so the transfer is not secure.
With the digital development of life and office, the secure transfer of documents and mail through the internet became necessary. Although current steganography methods send information as random bits, few people have embedded scanned documents into cover images to transmit them safely [17,18,19,20]. The scanned documents were pre-processed and embedded into the cover image, so people could share more information with the same capacity. The earliest steganography method was a simple LSB (least-significant-bit) substitution [21]. In many images, differences in the least significant bits of a pixel are imperceptible, so they seem suitable for embedding sensitive information into a cover image. To hide greater volumes of data, many improved methods have been proposed [22,23,24,25,26]. The authors of [22,23,24] improved the visual quality of the simple LSB methods with low complexity; however, they were ineffective when the embedding rate was 1 bpp [25,26] designed embedding unit consists not only one pixel, and the distortion was better than [22,23,24]. Refs. [36,37] proposed dual-layer LSB matching algorithms with high embedding efficiency, and the cover image can be reconstructed completely.
There are three problems associated with current methods. The embedding method of the scanned document was rarely specified; high capacity and low distortion could not be achieved together, and a completely reconstructed high-capacity image could not be achieved. The method proposed to improve the LSB matching algorithms by embedding the secret data in the optimal position based on the golden ratio. The major improvements of the proposed scheme are outlined below:
1.
The secret data included random bits and scanned document, and they were pre-processed by halftone, quadtree, and S-Box transformations, and the size of the scanned document was reduced by 8.11 times.
2.
The golden ratio was applied to find the optimal embedding position and design the matching function.
3.
This study got a good trade-off between high capacity and low distortion.
This paper presents our solution to the three obstacles and proposes a new LSB matching algorithm based on scanned document pre-processing. Section 2 introduces related work, including data-hiding schemes based on random bit streams and scanned document images. In Section 3, we describe details of the proposed method, including secret data pre-processing, scanned-document hiding, data extraction, and image recovery. Our study investigated three candidates for pre-processing secret data: the halftone, quadtree, and simple substitution. A novel LSB matching algorithm with low distortion and based on the golden ratio is proposed for the embedding step. Pre-processing provides a steganographic image with low distortion and more transformed secret information than current methods offer. Our LSB data hiding method guarantees approximate cover image reconstruction. In Section 4, we report the experimental results and analysis. In Section 5, our conclusions are presented, and future work is proposed.

2. Related Work

The following methods were evaluated for their hiding capacity of two types of information related to this paper. The main ideas, hiding capacity, and image quality are briefly discussed.
In 2017, Soleymani et al. [17] proposed high-capacity image data hiding on a sparse message of a scanned document image. They compressed the scanned document image by halftone technology and converted the binary strings to their equivalent decimal values. Then, they embedded this information into the cover image using 3-LSB. The average payload was 5.43 bpp, and the quality of the steganographic image was 36 dB. However, this method also coded the background area of the binary image. In 2018, Soleymani et al. [18] improved [17] by using a more effective quadtree algorithm to code only the content of the binary image. The average payload was 7.98 bpp, the PSNR (peak signal to noise ratio) was 38.83 dB, and the SSIM (structural similarity index) was 0.93. Generally, for a high-quality visual image, the PSNR was greater than 50 dB.
Unlike [17,18], which tried to improve embedding capacity using the vacated room of the information and the cover image, Ref. [20] hid secret data in a gray image with the mapping method. The binary values of each pixel image and character were divided into four parts. After that, they selected two bits of the secret data, searched for a two-bit similarity in the image pixels, and saved the location of the match. This approach tried to leave the cover image unchanged and send the matches to the receiver secretly. However, when the message capacity was high, the data could not be embedded completely, and it was hard to recover the original information.
In [19], a high-capacity embedding technique and high-quality encoded image were proposed. The secret data were first converted to their equivalent decimal values then into binary strings. They hid the secret data in the edges of four similar gray images using LSB. In this approach, the PSNR of each encoded image was equal to 81.23 dB. However, the receiver needed to obtain the four images simultaneously to extract all the information.
The earliest steganography method for grayscale images was proposed in [21], which offered a simple method for embedding data in cover images. This scheme embedded information by replacing the LSB plane of the gray-level pixel value; it was invisible. The main disadvantages of this scheme were its low capacity and poor security. When the volume of secret data was high, so was the distortion of the cover image. To reduce the distortion of the LSB algorithm, in [22,23,24], they proposed the optimal LSB method. The optimal LSB algorithm could generate three steganographic pixel values by the remainder operator, in which one of them had the least distortion. The simple LSB method or the optimal LSB method considered one pixel as an embedding unit. The LSB matching revisited scheme [25,26] considered more than one pixel as an embedding unit. In [25], the cover image was divided into non-overlapping pixel pairs, and two bits of secret information were embedded into the first pixel and a binary function. In [26], three pixels of the cover image were considered as the embedding unit. This scheme utilized the first and second most significant bits; then, the remaining six bits were XORed. The secret data were embedded by comparing the result of XOR with three bits of the secret information. The revisited LSB matching scheme minimized the image distortion, but the embedding capacity was limited, and the original image could not be recovered completely.

3. Proposed Method

Current data-hiding methods try to provide high embedding capacity with low distortion. We propose a novel LSB matching algorithm with low distortion that embeds high-capacity data in the cover images. The constructions of this paper are as follows: (1) the scanned document was pre-processed by halftone, quadtree, decimal coding, and S-Box; (2) a novel LSB matching algorithm with the lowest distortion was applied, based on the golden ratio.

3.1. Pre-Processing Step

3.1.1. S-Box

We examined the DES [27] algorithm, a classic encryption algorithm. The S-Box is a non-linear structure and, for any S-Box, the substitution mapping listed in eight S-Boxes is such that, according to the values of rows and columns, its input is mapped to a compressed equivalent decimal value. For any S-Box, assuming I = i 1 i 2 i 3 i 4 i 5 i 6 , let k = i 2 i 3 i 4 i 5 and h = i 1 i 6 . According to the values k and h, we could look up the Box value in row h and column k: O = o 1 o 2 o 3 o 4 , a compressed decimal value. It can be seen that the secret data were compressed from 6 to 4. For example, consider I = 111,000, and let k = 12 and h = 2. In row 2 and column 12 of the S8-Box in Table 1, the number O = ( 15 ) 10 = ( 1111 ) 2 was found. The size of the secret data was reduced by a factor of 1.5. In this study, to make good use of the working principle of the S-Box, secret information in a bitstream was divided into 6-bit groups, then compressed by a substitution operator.

3.1.2. Halftone and Quadtree

When secret information was scanned in our study, it was converted to embeddable bits by halftone and quadtree techniques. The halftone method was divided into the error-diffusion [28,29,30] and dither types [31,32]. The halftone image generated by the dither method usually contains an artificial periodic texture; thus, we used the error-diffusion method in our study. By considering the correlation between proximate pixels, the halftone scheme converted each pixel to 0 or 1. Thus, the size of the secret information was reduced by 8 times. The halftone image of a scanned document usually includes signs and white backgrounds, shown with 0 and 1 bits, respectively. People are mainly concerned only with the document content; thus, it is necessary to separate the content from the background with a quadtree algorithm (applicable to any image dimensions). The error-diffusion method consists of three steps:
Step 1: For any scanned document, the integer matrix is converted into a real matrix B by dividing the pixel value by 255.
Step 2: Assume that the threshold t is 1/2 and real matrix B is accessed in raster scan order. If the element of the real matrix is less than t, the halftone pixel value I ( i , j ) is 0, or 1 otherwise.
Step 3: Here, we defined one value w c ( i , j ) , and w c = B ( i , j ) I ( i , j ) . The error of the current pixel is transferred in a ratio of 7:3:5:1 and superimposed on four adjacent pixels. When all the pixels were processed, we obtained the halftone image I.
The quadtree method also consists of three steps:
Step 1: The matrix of the halftone image I was divided into four sub-rectangles. If the size of the sub-rectangle was larger than 1 × 1, the sub-rectangles were divided until the size of all the sub-rectangles was 1 × 1.
Step 2: Some sub-rectangles did not contain information, so only the content and coordinates of sub-rectangles that contain information were kept.
Step 3: All sub-rectangles that contain content are merged into larger rectangles.
Figure 1 shows the process of scanning a document. As seen in Figure 1c, not all sub-rectangles contained a message. Figure 1d shows that it was necessary to save only the content and coordinates of the sub-rectangles that contained the message. The more sub-rectangles there were, the more content and coordinates needed to be saved. As in Figure 1e, to reduce the number of coordinates, all the sub-rectangles that contained messages were merged by scanning neighbor rectangles horizontally and vertically.

3.1.3. Decimal Coding

Usually, zeros on the left side of a binary string do not affect the size of the value. In our study, the content and the merged coordinate were processed by decimal coding and S-Box substitution. In the first step, the bit string of the content and the coordinates were converted to decimal values. In the second step, the values were divided into 6-bit groups then compressed into 4-bit groups by S-Box substitution. In Figure 1, the title of the paper was tested, and the size of the original scanned document image was 17.6 KB (18,106 B). After the above steps, the size of the results was reduced to 1953 B. According to the result, we can see the secret data were compressed by 9.27 times.

3.2. Data Embedding

Mielikainen [25] proposed a simple LSB matching algorithm by modifying the pixel ±1, and two pixels as an embedding unit. The embedding and extraction procedure of Milelikainen’s scheme was illustrated as follows:
Set p and q are the cover pixels pair, and c1 and c2 are two bits of secret data, respectively. The embedding equation is given in Equation (1). After embedding, the stego image is obtained, and p′ and q′ are the modified pixels pair. The secret data c1 can be extracted from the least significant bit of p′. The secret data c2 can be extracted according to Equation (2).
( p , q ) = { ( p , q ) , L S B ( p ) = c 1   and   LSB ( p 2 + q ) = c 2 ( p , q + 1 ) , L S B ( p ) = c 1   and   LSB ( p 2 + q ) c 2 ( p 1 , q ) , L S B ( p ) c 1   and   LSB ( p 1 2 + q ) = c 2 ( p + 1 , q ) , L S B ( p ) c 1   and   LSB ( p 1 2 + q ) c 2
c 2 = L S B ( p 2 + q )
In this section, the information compressed by halftone, quadtree, decimal coding, and S-Box substitution was embedded into a cover image by a novel revisited LSB matching method. To improve the capacity of data hiding and transmission security, the secret data were compressed then embedded into the cover image by an LSB matching algorithm based on the golden ratio. For the first time, the golden ratio point was used to find the best embedding position and applied as the basic criterion to design the mapping function. First, because the output of the S-Box was 4 bits, the cover image was divided into non-overlapping pixel pairs, and every four pixels were defined as a group. Second, the optimal embedding positions were found according to the golden ratio. Finally, the XOR operation assembled the eight least-significant bits to yield four original bits from the embedding unit. Our new scheme is described below:
➀ In raster scan order, the cover image was divided into non-overlapping pixel pairs, each pair including four pixels. Assuming the four pixels P i , P i + 1 , P i + 2 and P i + 3 comprise a hiding unit, the four bits of secret information were S 1 S 2 S 3 S 4 .
➁ Each pixel was converted into eight binary bits, and the embedding positions were found according to the calculations 8 × (1 − 0.618) ≈ 3. Normally, the change of the lowest three significant bits of the pixel value does not affect human vision. To get better visual quality, the optimal embedding position was found according to the calculations 3 × (1 − 0.618) ≈ 1. The least significant bit can be used to embed information. Assuming P i = a 8 a 7 a 6 a 5 a 4 a 3 a 2 a 1 , P i + 1 = b 8 b 7 b 6 b 5 b 4 b 3 b 2 b 1 , P i + 2 = c 8 c 7 c 6 c 5 c 4 c 3 c 2 c 1 , P i + 3 = d 8 d 7 d 6 d 5 d 4 d 3 d 2 d 1 , the four bits of secret data are embedded into the exact location. Here, we defined four values A, B, C, and D, and they were obtained according to Equation (3):
{ A = a 1 a 2 b 1 B = b 1 b 2 c 1 C = c 1 c 2 d 1 D = d 1 d 2 a 1
As shown in Equation (1), the values of A and D were controlled by changing a 1 of p i . Similarly, the values of A and B were controlled by the least significant bit b 1 of p i + 1 . B and C were controlled by the least significant bit c 1 of p i + 2 . C and D were controlled by the least significant bit d 1 of p i + 3 . When the pixel p i + 1 was an odd number, A was controlled by modifying bit b 1 and b 2 by P i + 1 + 1 . When the pixel was an even number, A was controlled by modifying bit b 1 and b 2 by P i + 1 1 . Similarly, B, C, and D were controlled by modifying the least significant bit and the second least significant bit of p i + 2 , p i + 3 , and p i .
➂ We compared four secret data with four values to see whether they were the same. The four pixels did not need to be altered in the data-hiding process if they were equal. Otherwise, we needed to modify the four pixels until they were equal. We describe the scheme in detail as follows:
Step 1: If there was ( s 1 = A ) & & ( s 2 = B ) & & ( s 3 = C ) & & ( s 4 = D ) , the four pixels did not need to be altered in the data-hiding process.
Step 2: If only ( s 1 A ) or ( s 2 B ) or ( s 3 C ) or ( s 4 D ) , and the pixel p i + 1 was an odd number, we needed to control it with P i + 1 + 1 ; otherwise, we controlled it with P i + 1 1 , so that S 1 = A . In the same way, if the pixel p i + 2 was an odd number, we needed to control it with P i + 2 + 1 ; otherwise, we controlled it with P i + 2 1 , so that S 2 = B . If the pixel p i + 3 was an odd number, we needed to control it with p i + 3 + 1 ; otherwise, we controlled it with p i + 3 1 , so that S 3 = C . If the pixel p i was an odd number, we needed to control it with P i + 1 ; otherwise, we controlled it with P i 1 , so that S 4 = D .
Step 3: If only ( s 1 A ) & & ( s 2 B ) or ( s 1 A ) & & ( s 3 C ) or ( s 1 A ) & & ( s 4 D ) or   ( s 2 B ) & & ( s 3 C ) or   ( s 2 B ) & & ( s 4 D ) or   ( s 3 C ) & & ( s 4 D ) , if the pixel p i + 1 was an odd number, we needed to control it with P i + 1 1 , otherwise, we controlled it with P i + 1 + 1 , so that   ( s 1 = A ) & & ( s 2 = B ) . In the same manner, if the pixels p i + 1 and p i + 3 were odd numbers, we needed to control them with P i + 1 + 1 , P i + 3 + 1 , otherwise, we controlled them with P i + 1 1 , p i + 3 1 , so that   ( s 1 = A ) & & ( s 3 = C ) . If the pixel p i was an odd number, we needed to control it with P i 1 ; otherwise, we controlled it with P i + 1 , so that   ( s 1 = A ) & & ( s 4 = D ) . If the pixel p i + 2 was an odd number, we needed to control it with P i + 2 1 ; otherwise, we controlled it b with y P i + 2 + 1 , so that   ( s 2 = B ) & & ( s 3 = C ) . If the pixel p i , p i + 2 were odd numbers, we needed to control them with P i + 1 , P i + 2 + 1 ; otherwise, we controlled them with P i 1 , P i + 2 1 , so that   ( s 2 = B ) & & ( s 4 = D ) . If the pixel p i + 3 was an odd number, we needed to control it with p i + 3 1 ; otherwise, we controlled it with p i + 3 + 1 , so that   ( s 3 = C ) & & ( s 4 = D ) .
Step 4: If   ( s 1 A ) & & ( s 2 B ) & & ( s 3 C ) or   ( s 1 A ) & & ( s 2 B ) & & ( s 4 D ) or   ( s 1 A ) & & ( s 3 C ) & & ( s 4 D ) or   ( s 2 B ) & & ( s 3 C ) & & ( s 4 D ) . If p i + 1 was an odd number, we needed to control it with P i + 1 + 1 ; otherwise, we controlled it with P i + 1 1 . If p i + 2 was an odd number, we needed to control it with P i + 2 1 ; otherwise, we controlled it with P i + 2 + 1 , so that   ( s 1 = A ) & & ( s 2 = B ) & & ( s 3 = C ) . In the same manner, we modified the other pixels and obtained   ( s 1 = A ) & & ( s 2 = B ) & & ( s 4 = D ) , ( s 1 = A ) & & ( s 3 = C ) & & ( s 4 = D ) , ( s 2 = B ) & & ( s 3 = C ) & & ( s 4 = D ) .
Step 5: If   ( s 1 A ) & & ( s 2 B ) & & ( s 3 C ) & & ( s 4 D ) , when p i was an odd number, we needed to control it with P i + 1 ; otherwise, we controlled it with P i 1 . If p i + 1 was an odd number, we needed to control it with P i + 1 1 ; otherwise, we controlled it with P i + 1 + 1 . If p i + 3 was an odd number, we needed to control it with P i + 3 + 1 ; otherwise, we controlled it with p i + 3 1 . Lastly, we obtained   ( s 1 = A ) & & ( s 2 = B ) & & ( s 3 = C ) & & ( s 4 = D ) .
According to the scheme above, four bits of the secret data, s 1 , s 2 , s 3 and s 4 , were ensured to be embedded into the pixel pairs p i , p i + 1 , p i + 2 and p i + 3 respectively.
For example, as Table 2 shows, s 1 , s 2 , s 3 and s 4 represent any four bits of secret information. When p i = ( 101 ) 10 = ( 01100101 ) 2 , p i + 1 = ( 50 ) 10 = ( 00110010 ) 2 , p i + 2 = ( 213 ) 10 = ( 11010101 ) 2 , p i + 3 = ( 210 ) 10 = ( 11010010 ) 2 , we obtained A = 1, B = 0, C = 1, and D = 0 according to Equation (1). We adjusted the pixel values by the above rule and let p i , p i + 1 , p i + 2 , p i + 3 denote the adjusted pixel values.
As seen from Table 1, the probability of four pixels that needed to be modified was 1/16, the probability of three pixels that need to be modified was 4/16, the probability of two pixels that needed to be modified was 6/16, the probability of one pixel that needed to be modified was 4/16, and the probability of the preserved original pixels was 1/16. The expected value of the changed pixels of the proposed algorithm was:
(1/16) × 4 + (4/16) × 3 + (6/16) × 2 + (4/16) × 1 + (1/16) × 0 = 29/16
The expected number of modifications per pixel was: (29/16) ÷ 4 ≈ 0.453.
As Table 3 shows, one of the most important factors of the proposed LSB matching revisited scheme was that at most, only one pixel at a time can be modified by +1 or −1 when carrying four bits of secret information. Changing four pixels at the same time does not occur. The probability of two pixels needing modification was 7/16, the probability of one pixel needing modification was 8/16, and the probability of the preserved original pixels was 1/16. The expected value of the changed pixels of the proposed algorithm was (7/16) × 2 + (8/16) × 1 + (1/16) × 0 = 22/16. The expected number of modifications per pixel was (22/16) ÷ 4 ≈ 0.344. The secret data were pre-processed: when the secret data were a bit stream, the expected number of modifications per pixel was (22/16) ÷ 6 ≈ 0.229. When the secret data in the document were scanned, the expected number of modifications per pixel was (22/16) ÷ 32 ≈ 0.0430. This result demonstrates that the proposed approach effectively prevents pixel distortion after data hiding.
Figure 2 shows the comparison of the probability of modifying pixels of the three methods. Mielikainen [25] proposed an LSB matching revisited scheme and groups two pixels as an embedding unit. For every four bits of data embedded, the pixel modification probability of the LSB method and Mielikainen’s scheme. However, the LSB matching scheme has low computation complexity. It can be seen that our proposed method modified at most two pixels every four pixels, and the magnitude of the modification was 1. The LSB scheme and Mielikainen’s approach modified more pixel values. Our study set every four pixels as a unit, and the computational complexity was lower.

3.3. Extraction

During extraction, the receiver can acquire secret data without any knowledge of the cover image. There are two steps:
(1)
Reading the steganographic image: the steganographic image was divided in raster scan order into non-overlapping pixel pairs, and each pair included four pixels.
(2)
Extracting the secret data: The four bits of embedded information can be extracted using Equation (1) without knowing the original image information. If the secret data were in a scanned image, the coordinates and S-Box were used to recover the secret information according to content.

4. Experimental Results and Comparisons

This section presents the results obtained from our study of the proposed LSB matching algorithm, using 20 standard images from the USC-SIPI image database. PSNR and SSIM were used to evaluate the image; Section 1 gives a detailed example. Our aim was to discover a general method to improve the hiding capacity of images, and we found an effective trade-off between high capacity and low distortion. In part (2), we compare the efficiency of our scheme with other schemes and discuss its implications.

4.1. A Detailed Example

Eight scanned documents of [26,33] were used as secret data. Table 4 lists eight pages of scanned documents. Figure 3 shows the relation between segment size and compression ratio. It can be seen that, for the same scanned document, the segment sizes were 1 × 1, 4 × 4, 8 × 8, 16 × 16, and 32 × 32, and the compression ratios for five different thresholds were 8.110523, 4.573963, 2.051653, 2.051653, and 1.665751, respectively. Figure 4 lists the relation between minimum rectangular size and the mean embedding capacity for the Lena image. The mean embedding capacities for five different thresholds were 1.66186 bpp, 2.940742 bpp, 4.494556 bpp, 6.567377 bpp, and 8.084202 bpp. Figure 3 and Figure 4 show that when the segment size was 1 × 1, the compression ratio was the best, and the volume of secret data transmitted was the highest. In Figure 5, the PSNR values of the steganographic image for five different thresholds were 44.35711 dB, 44.16443 dB, 44.16444 dB, 44.16445 dB, and 44.16446 dB. When the segment size was 1 × 1, we determined that the visual artifacts were the best.
Using the other algorithms, the 32 KB scanned document needed 262,144 bits of secret information to be embedded. In our study, we had to embed only 32,324 bits. Usually, for a small rectangle, the smaller the divided area was, the greater the time cost. It was confirmed that the larger the segmentation area, the rougher it is, and the lower the cost time. Figure 6 shows two scanned documents of 3 KB and 35 KB. Table 5 lists the actual embedding amounts and the times from embedding to complete extraction for two differently sized scanned documents. The smaller the segmentation size was, the more accurate and the smaller the time cost. The fastest processing time for the 3 KB scanned document was 1 s, and the slowest processing time was 3 s. For the 35 KB scanned document, the fastest processing time was 6 s, and the slowest processing time was 48 s. The larger the document, the longer the processing time, especially with the 32 × 32 block size, which exceeded the user’s time limit.
Table 6 compares the actual embedding amount and PSNR. When the segmentation size is 1 × 1, the actual embedding amount of the two documents is the smallest, and the image quality is also the best. The values of PSNR were 69.547 and 57.4617. The distortion rate of the image increases as the amount of embedding increases. For the same document, the larger the segmentation area, the more redundant the messages and, therefore, the greater the distortion rate of the steganographic image. For the five segmentation sizes of 35 KB documents, background redundancy was eliminated to varying degrees, but the values of PSNR were all above 50 dB, which shows that the information pre-processing and matching mapping function of this algorithm is sophisticated and practical. Figure 7 shows the images and their histogram, where (a) is the cover image and its histogram, (b) is embedded in document (a) and its histogram, and (c) is embedded in the document (b) and its histogram. Visually, it is impossible to distinguish the difference between the images. The proposed algorithm has good visibility, and the PSNR values are all greater than 57 dB. Because the distortion rate is relatively low, it is not easy to attract the attention of a third party when transmitting on an open channel.
Taking Figure 6 as the secret document, we evaluated our approach against attacks like cropping, rotate, Gaussian noise, pepper, and salt noise. The results of the experiment are in Table 7, which is under extraction accuracy as well.

4.2. Comparisons with Related Studies and Discussion

We compared our study to nine state-of-the-art schemes for hiding capacity and image distortion. Table 8 shows the PNSR comparison results for the same scanned document (262,144 bits), and the visual metric PSNR of the LSB scheme [22,23,24] was 51.154 dB. However, the revisited LSB matching method [25,26] can raise the PSNR to 1.247 dB and 1.763 dB, separately. Lu [36] proposed a dual image based on reversible data-hiding algorithm by improving the LSB matching scheme of [25]. Because there were two stego images, the embedding capacity was 524288 bits, and the average of PSNR was 49.24 dB. Sahu [37] improved Lu’s scheme by using a dual-layer LSB matching algorithm. The secret data were embedded into four stego images, and the PSNR and embedding capacity were 46.51 dB and 1572864 bits separately. Our study was also a revisited LSB matching method, but we can embed bit stream and scanned document images into the cover image with an average PSNR of 53.025 dB and 65.55372 dB. It can be seen that Lu and Sahu’s schemes with higher embedding capacity and low distortion. However, our study can embed two forms of secret data. We believe that our study demonstrates a significant improvement. Table 9 and Table 10 compare our study with similar work. In the comparisons, the same cover images were processed [18] with the quadtree and LSB algorithm, which significantly improved the embedding amount and image quality over [34]. The average PSNR of our study was 44.44 dB, and the value of SSIM was closer to 1. Table 11 summarizes the proposed scheme’s average quality and data hiding capacity for comparison with [17,18,34,35,36,37]. In our study, information was pre-processed, and the matching function makes the distortion rate small. This gives us information hiding with high embedding capacity and a low distortion rate.

5. Discussion

In our study, we proposed a novel LSB matching algorithm based on information pre-processing. In the experiments, we proved that our scheme with high capacity and low distortion. To the best of our knowledge, it is the first information pre-processing for a novel LSB matching algorithm. Furthermore, we want to discuss two issues:
(1)
Application: In our opinion, our study is most suitable for a digital office because with the digital development of life and office, the secure transfer of documents and mail through the internet became necessary.
(2)
Future work: Because the authors did not evaluate our study against the most common attacks, it is just a data-hiding scheme for pre-processed secret data. In the future, we plan to strengthen the study of robustness.

6. Conclusions

We present in this paper a novel, efficient LSB matching algorithm. Experiments showed that it had the lowest distortion, outperforming other related schemes. Before embedding secret data, the information was pre-processed by halftone, quadtree, decimal coding, and substitution treatment, and the size was reduced by at least a factor of eight. In the data hiding step, the cover image was divided into 1 × 1 sub-blocks. The compressed information was inserted into pixels by a new revisited LSB matching scheme based on the golden ratio. The receiver can extract the information without any knowledge. Therefore, our method has general applicability and provides the best trade-off between capacity and PSNR.
In our study, we saved the additional information and sent it to the receiver secretly. In future work, we plan to improve the speed of pre-processing and reconstruct the cover image completely. Therefore, it is suggested that a more efficient scheme for text documents should be developed.

Author Contributions

Conceptualization, Y.H. and X.L.; methodology, X.L. and J.M.; validation, X.L. and Y.H.; writing—review and editing, X.L., Y.H., and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work were supported by the Foundation of Science and Technology on Information Assurance Laboratory (No. KJ-15-108) and Hainan Provincial Reform in Education Project of China (No. Hnjg2020-31).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The corresponding author can provide the data sets utilized in this work upon reasonable request.

Acknowledgments

The authors thank the review of Yang for the deep and constructive comments on the first version of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liu, X.M.; Choo, K.K.R.; Deng, R.H.; Lu, R.X.; Weng, J. Efficient and Privacy-Preserving Outsourced Calculation of Rational Numbers. IEEE Trans. Dependable Secur. Comput. 2018, 15, 27–39. [Google Scholar] [CrossRef]
  2. Xiong, J.B.; Ma, R.; Chen, L.; Tian, Y.L.; Li, Q.; Liu, X.M.; Yao, Z.Q. A personalized privacy protection framework for mobile crowdsensing in IIoT. IEEE Trans. Ind. Inform. 2020, 16, 4231–4241. [Google Scholar] [CrossRef]
  3. Chen, Z.; Tian, Y.; Peng, C. An incentive-compatible rational secret sharing scheme using blockchain and smart contract. Sci. China Inf. Sci. 2021, 64, 202301. [Google Scholar] [CrossRef]
  4. Liu, X.M.; Deng, R.H.; Choo, K.K.R.; Yang, Y. Privacy-Preserving Outsourced Support Vector Machine Design for Secure Drug Discovery. IEEE Trans. Cloud Comput. 2020, 8, 610–622. [Google Scholar] [CrossRef]
  5. Fridrich, J.; Goljan, M.; Du, R. Invertible authentication. In Proceedings of the Security and Watermarking of Multimedia Contents III, San Jose, CA, USA, 20 January 2001. [Google Scholar] [CrossRef]
  6. Fridrich, J.; Goljan, M.; Du, R. Lossless data embedding for all image formats. In Proceedings of the Security and Watermarking of Multimedia Contents IV, San Jose, CA, USA, 29 April 2002. [Google Scholar] [CrossRef]
  7. Celik, M.U.; Sharma, G.; Tekalp, A.M.; Saber, E. Lossless generalized-LSB data embedding. IEEE Trans. Image Process. 2005, 14, 253–266. [Google Scholar] [CrossRef] [PubMed]
  8. Jun, T. Reversible data embedding using a difference expansion. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 890–896. [Google Scholar] [CrossRef] [Green Version]
  9. Alattar, A.M. Reversible watermark using the difference expansion of a generalized integer transform. IEEE Trans. Image Process. 2004, 13, 1147–1156. [Google Scholar] [CrossRef]
  10. Thodi, D.M.; Rodriguez, J.J. Expansion embedding techniques for reversible watermarking. IEEE Trans. Image Process. 2007, 16, 721–730. [Google Scholar] [CrossRef]
  11. Li, X.L.; Li, J.; Li, B.; Yang, B. High-fidelity reversible data hiding scheme based on pixel-value-ordering and prediction-error expansion. Signal Process. 2013, 93, 198–205. [Google Scholar] [CrossRef]
  12. Ou, B.; Li, X.L.; Zhao, Y.; Ni, R.R. Reversible data hiding using invariant pixel-value-ordering and prediction-error expansion. Signal Process. Image Commun. 2014, 29, 760–772. [Google Scholar] [CrossRef]
  13. Qu, X.C.; Kim, H.J. Pixel-based pixel value ordering predictor for high-fidelity reversible data hiding. Signal Process. 2015, 111, 249–260. [Google Scholar] [CrossRef]
  14. Ni, Z.C.; Shi, Y.Q.; Ansari, N.; Su, W. Reversible data hiding. IEEE Trans. Circuits Syst. Video Technol. 2006, 16, 354–362. [Google Scholar] [CrossRef]
  15. Lin, C.C.; Hsueh, N.L. A lossless data hiding scheme based on three-pixel block differences. Pattern Recognit. 2008, 41, 1415–1425. [Google Scholar] [CrossRef]
  16. Tsai, P.; Hu, Y.C.; Yeh, H.L. Reversible image hiding scheme using predictive coding and histogram shifting. Signal Process. 2009, 89, 1129–1143. [Google Scholar] [CrossRef]
  17. Soleymani, S.H.; Taherinia, A.H. High capacity image steganography on sparse message of scanned document image (SMSDI). Multimed. Tools Appl. 2017, 76, 20847–20867. [Google Scholar] [CrossRef]
  18. Soleymani, S.H.; Taherinia, A.H. High capacity image data hiding of scanned text documents using improved quadtree. arXiv 2018, arXiv:1803.11286. [Google Scholar]
  19. Basheer, N.M.; Aaref, A.M.; Ayyed, D.J. Proposed method of text hiding in image edges. Int. J. Comput. Appl. 2015, 126, 33–37. [Google Scholar] [CrossRef]
  20. Hussein, H.L.; Abbass, A.A.; Naji, S.A.; Al-Augby, S.; Lafta, J.H. Hiding text in gray image using mapping technique. J. Phys. Conf. Ser. 2018, 1003, 012032. [Google Scholar] [CrossRef] [Green Version]
  21. Cox, I.J.; Kilian, J.; Leighton, T.; Shamoon, T. A secure, robust watermark for multimedia. In Information Hiding; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
  22. Chan, C.K.; Cheng, L.M. Hiding data in images by simple LSB substitution. Pattern Recognit. 2004, 37, 469–474. [Google Scholar] [CrossRef]
  23. Thien, C.C.; Lin, J.C. A simple and high-hiding capacity method for hiding digit-by-digit data in images based on modulus function. Pattern Recognit. 2003, 36, 2875–2881. [Google Scholar] [CrossRef]
  24. Wang, S.J. Steganography of capacity required using modulo operator for embedding secret image. Appl. Math Comput. 2005, 164, 99–116. [Google Scholar] [CrossRef]
  25. Mielikainen, J. LSB matching revisited. IEEE Signal Process. Lett. 2006, 13, 285–287. [Google Scholar] [CrossRef]
  26. Wu, N.I.; Hwang, M.S. A novel LSB data hiding scheme with the lowest distortion. Imaging Sci. J. 2017, 65, 371–378. [Google Scholar] [CrossRef]
  27. Chen, L.S.; Shen, S.Y. Modern Cryptography, 2nd ed.; Science Press: Beijing, China, 2008. [Google Scholar]
  28. Liu, L.Y.; Chen, W.; Zheng, W.T.; Geng, W.D. Structure-aware error-diffusion approach using entropy-constrained threshold modulation. Vis. Comput. 2014, 30, 1145–1156. [Google Scholar] [CrossRef]
  29. Li, X. Edge-directed error diffusion halftoning. IEEE Signal Process. Lett. 2006, 13, 688–690. [Google Scholar] [CrossRef]
  30. Singh, Y.K. Generalized error diffusion method for halftoning. In Proceedings of the IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Tamil Nadu, India, 5–7 March 2015. [Google Scholar] [CrossRef]
  31. Zhou, Z.; Arce, G.R.; Crescenzo, G.D. Halftone visual cryptography. IEEE Trans. Image Process. 2006, 15, 2441–2453. [Google Scholar] [CrossRef]
  32. Alasseur, C.; Constantinides, A.G.; Husson, L. Colour quantisation through dithering techniques. In Proceedings of the International Conference on Image Processing (Cat. No.03CH37429), Barcelona, Spain, 14–17 September 2003. [Google Scholar] [CrossRef]
  33. Li, X.Y.; Zhou, X.B.; Zhou, Q.L.; Han, S.J.; Liu, Z. High-capacity reversible data hiding in encrypted images by information preprocessing. Complexity 2020, 2020, 6989452. [Google Scholar] [CrossRef]
  34. Jana, B. High payload reversible data hiding scheme using weighted matrix. Optik 2016, 127, 3347–3358. [Google Scholar] [CrossRef]
  35. Bai, J.L.; Chang, C.C.; Nguyen, T.S.; Zhu, C.; Liu, Y.J. A high payload steganographic algorithm based on edge detection. Displays 2017, 46, 42–51. [Google Scholar] [CrossRef]
  36. Lu, T.C.; Tseng, C.Y.; Wu, J.H. Dual imaging-based reversible hiding technique using LSB matching. Signal Process. 2015, 108, 77–89. [Google Scholar] [CrossRef]
  37. Aditya, K.S.; Gandharba, S. Reversible Image Steganography Using Dual-Layer LSB Matching. Sens. Imaging 2020, 21, 1. [Google Scholar] [CrossRef]
Figure 1. (a) Secret document; (b) halftone process; (c) sub-rectangles; (d) content sub-rectangles; (e) content sub-rectangles merging.
Figure 1. (a) Secret document; (b) halftone process; (c) sub-rectangles; (d) content sub-rectangles; (e) content sub-rectangles merging.
Mathematics 10 00008 g001
Figure 2. Relationship between the number of modified pixels and the probability.
Figure 2. Relationship between the number of modified pixels and the probability.
Mathematics 10 00008 g002
Figure 3. Relationship between segment size and compression ratio.
Figure 3. Relationship between segment size and compression ratio.
Mathematics 10 00008 g003
Figure 4. Relationship between segment size and bpp.
Figure 4. Relationship between segment size and bpp.
Mathematics 10 00008 g004
Figure 5. Relationship between segment size and PSNR. PSNR—peak signal to noise ratio.
Figure 5. Relationship between segment size and PSNR. PSNR—peak signal to noise ratio.
Mathematics 10 00008 g005
Figure 6. Scanned documents. (a) Scanned document in 3 KB; (b) Scanned document in 35 KB.
Figure 6. Scanned documents. (a) Scanned document in 3 KB; (b) Scanned document in 35 KB.
Mathematics 10 00008 g006
Figure 7. Images and their histograms. (a) cover image and histogram; (b) stego image and histogram of Figure 6a; (c) stego image and histogram of Figure 6b.
Figure 7. Images and their histograms. (a) cover image and histogram; (b) stego image and histogram of Figure 6a; (c) stego image and histogram of Figure 6b.
Mathematics 10 00008 g007
Table 1. An example for S8-Box.
Table 1. An example for S8-Box.
Col.0123456789101112131415
R.
01328461511110931450127
11151381037412561101492
27114191214206101315358
32114741081315129035611
Table 2. Pixel variation using the LSB algorithm after data hiding. LSB—least significant bit.
Table 2. Pixel variation using the LSB algorithm after data hiding. LSB—least significant bit.
Secret Dataqiqi+1qi+2qi+3Secret Dataqiqi+1qi+2qi+3
(0000)2101 − 150213 − 1210(1000)210150213 − 1210
(0001)2101 − 150213 − 1210 + 1(1001)210150213 − 1210 + 1
(0010)2101 − 150213210(1010)210150213210
(0011)2101 − 150213210 + 1(1011)210150213210 + 1
(0100)2101 − 150 + 1213 − 1210(1100)210150 + 1213 − 1210
(0101)2101 − 150 + 1213 − 1210 + 1(1101)210150 + 1213 − 1210 + 1
(0110)2101 − 150 + 1213210(1110)210150 + 1213210
(0111)2101 − 150 + 1213210 + 1(1111)210150 + 11213210 + 1
Table 3. Pixel variation using the proposed LSB matching revisited after data hiding. LSB—least significant bit.
Table 3. Pixel variation using the proposed LSB matching revisited after data hiding. LSB—least significant bit.
Secret Dataqiqi+1qi+2qi+3Secret Dataqiqi+1qi+2qi+3
(0000)210150 − 1213210 − 1(1000)210150213210 − 1
(0001)210150 − 1213210 + 1(1001)210150213210 + 1
(0010)210150 − 1213210(1010)210150213210
(0011)2101 − 150213210(1011)2101 + 150213210
(0100)210150213 − 1210−1(1100)210150213 − 1210
(0101)2101 − 150213 − 1210(1101)2101 + 150213 − 1210
(0110)210150 + 1213210(1110)210150213 + 1210
(0111)210150 + 1213 + 1210(1111)2101 + 150213 + 1210
Table 4. Eight scanned documents and their sizes.
Table 4. Eight scanned documents and their sizes.
Page NumberSize (B)
1 308,002
2 507,020
3 524,171
4 467,128
5 419,473
6 436,539
7 492,878
8 378,691
Table 5. The relationship between block size, embedding amount, and time.
Table 5. The relationship between block size, embedding amount, and time.
Scanned DocumentNorm1 × 14 × 48 × 816 × 1632 × 32
(a)Embedding amount (bits)14601972258851485148
time (s)31222
(b)Embedding amount (bits)24,82842,06882,03682,036115,780
time (s)610272748
Table 6. The relationship between block size, embedding amount, and PSNR.
Table 6. The relationship between block size, embedding amount, and PSNR.
Scanned DocumentNorm1 × 14 × 48 × 816 × 1632 × 32
(a)Embedding amount (bits)14601972258851485148
PSNR (dB)69.54768.291967.112764.192464.1924
(b)Embedding amount (bits)24,82842,06882,03682,036115,780
PSNR (dB)62.794655.16852.25452.25450.7378
Table 7. PSNR and accuracy under attacks.
Table 7. PSNR and accuracy under attacks.
AttackPSNRAccuracy
No attack62.7946100%
Cropping (1:128,1:128)17.588893.6695%
Rotate (3°)16.383393.7371%
Salt&pepper (0.01)25.358298.9409%
Salt&pepper (0.03)20.675197.1126%
Gaussian noise (0.01)20.070993.0112%
Gaussian noise (0.03)15.557991.4116%
Table 8. Comparison between the method proposed and [22,23,24,25,26,36,37].
Table 8. Comparison between the method proposed and [22,23,24,25,26,36,37].
Cover Image[22,23,24][25][26][36][37]Proposed Method
Bit StreamScanned Document
Lena51.15652.40452.91649.2546.5053.02165.5538
Airplane51.14352.452.92549.2246.4953.04365.5541
Baboon51.16152.40552.91749.2646.5153.02265.5538
Elaine51.1752.40752.91449.2746.5253.01465.5536
Man51.13852.3952.91149.2246.4853.02565.5533
Average51.15452.40152.91749.2446.5153.02565.55372
Table 9. Comparison between the method proposed and [18,34].
Table 9. Comparison between the method proposed and [18,34].
ImageProposed Method[18][34]
PSNRbppPSNRbppPSNRbpp
Pepper 44.2915.9037.489.4134.933.95
Lena 44.3610.7337.716.35N/AN/A
Aerial 44.3111.7737.666.97N/AN/A
Jetplane 44.7812.6938.147.5134.673.95
Average 44.4412.7737.747.5634.83.95
(4)
Table 10. Comparison between the method proposed and [17,18,35].
Table 10. Comparison between the method proposed and [17,18,35].
Cover ImageProposed Method[18][17][35]
PSNRbppPSNRbppPSNRbppPSNRbpp
Blonde44.3515.9837.489.4635.235.6537.313.04
Pepper44.2915.9037.489.4136.195.6537.273.05
Jetplane44.7812.6938.147.5136.844.7733.853.91
Boat44.3016.2537.469.6237.005.7033.573.91
Average44.4315.2037.649.0036.315.4335.503.47
Table 11. Comparison between proposed method and [17,18,34,35,36,37].
Table 11. Comparison between proposed method and [17,18,34,35,36,37].
MethodPSNREmbedding Rate (bpp)
[17]36.315.43
[18]37.837.98
[34]34.83.95
[35]35.503.47
[36]49.244
[37]46.516
Proposed method44.3513.48
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hu, Y.; Li, X.; Ma, J. A Novel LSB Matching Algorithm Based on Information Pre-Processing. Mathematics 2022, 10, 8. https://doi.org/10.3390/math10010008

AMA Style

Hu Y, Li X, Ma J. A Novel LSB Matching Algorithm Based on Information Pre-Processing. Mathematics. 2022; 10(1):8. https://doi.org/10.3390/math10010008

Chicago/Turabian Style

Hu, Yongjin, Xiyan Li, and Jun Ma. 2022. "A Novel LSB Matching Algorithm Based on Information Pre-Processing" Mathematics 10, no. 1: 8. https://doi.org/10.3390/math10010008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop