Next Article in Journal
Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network
Next Article in Special Issue
Dynamic Yarn-Tension Detection Using Machine Vision Combined with a Tension Observer
Previous Article in Journal
Nano-Enriched Self-Powered Wireless Body Area Network for Sustainable Health Monitoring Services
Previous Article in Special Issue
Analysis of Microalgal Density Estimation by Using LASSO and Image Texture Features
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Just Noticeable Difference Model for Images with Color Sensitivity

School of Electrical Engineering, Shanghai University of Engineering Science, No. 333, Longteng Road, Songjiang District, Shanghai 201620, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(5), 2634; https://doi.org/10.3390/s23052634
Submission received: 15 January 2023 / Revised: 20 February 2023 / Accepted: 23 February 2023 / Published: 27 February 2023
(This article belongs to the Special Issue Image/Signal Processing and Machine Vision in Sensing Applications)

Abstract

:
The just noticeable difference (JND) model reflects the visibility limitations of the human visual system (HVS), which plays an important role in perceptual image/video processing and is commonly applied to perceptual redundancy removal. However, existing JND models are usually constructed by treating the color components of three channels equally, and their estimation of the masking effect is inadequate. In this paper, we introduce visual saliency and color sensitivity modulation to improve the JND model. Firstly, we comprehensively combined contrast masking, pattern masking, and edge protection to estimate the masking effect. Then, the visual saliency of HVS was taken into account to adaptively modulate the masking effect. Finally, we built color sensitivity modulation according to the perceptual sensitivities of HVS, to adjust the sub-JND thresholds of Y, Cb, and Cr components. Thus, the color-sensitivity-based JND model (CSJND) was constructed. Extensive experiments and subjective tests were conducted to verify the effectiveness of the CSJND model. We found that consistency between the CSJND model and HVS was better than existing state-of-the-art JND models.

1. Introduction

With the development of digital media technology, the volume of image/video data is exploding. The question of how to efficiently compress image/video becomes a huge challenge. In traditional image/video compression schemes, spatial and temporal redundancies are removed according to the statistical correlation of signals, to achieve high-efficiency compression [1]. However, as the final recipient of these signals, the human visual system (HVS) has perceptual redundancies owing to its own limitations. Researchers are committed to improving processing techniques by considering the characteristics of HVS to remove as much perceptual redundancy as possible and achieve higher compression ratios without reducing perceptual quality. One popular method that measures perceptual redundancy and has been intensively researched is the just noticeable difference (JND) model.
Research has shown that the visibility of the HVS is limited for all visual signals. The human eyes can only notice changes beyond a certain threshold of visibility, and this threshold is called the just noticeable difference (JND) [2,3]. By simulating the perceptual properties of the HVS, the JND model was developed to estimate the visibility threshold of human eyes. When the change in pixel value is below this threshold, this change is not perceived by human eyes. Therefore, an appropriate JND model can accurately estimate the visibility threshold of the HVS and is frequently applied to image/video processing, such as in information concealing and watermarking [4,5,6], perceptual image/video coding [7,8], and perceptual quality assessment [9,10,11].
Pixel-domain JND models typically consider the background luminance adaption effect and masking effect [12]. The luminance adaption effect reflects the visual sensitivity of the HVS to different degrees of background luminance. On the other hand, the masking effect is a complex mechanism of perception that is the result of multiple stimuli interfering and interacting with each other. For example, image features such as texture [13], structure [14], spatio-temporal frequency [15], and pattern [16] may have an influence on the visibility threshold. It was found that the HVS is highly skilled at summarizing the regularity of images and extracting repetitive visual patterns for comprehension [17]. The complexity of visual patterns is closely related to the masking effect. Moreover, the HVS devotes more attention to edge regions [18]; the distortion around edge regions is more easily perceived than other regions. Thus, it is necessary to provide protection for edge regions to prevent obvious distortions.
Visual saliency (VS) [19] and color sensitivity (CS) are two significant perceptual characteristics of the HVS. According to the visual saliency mechanism, regions with high saliency make a greater contribution to perceptual quality. The human eyes preferentially focus on regions with high saliency, for a greater length of time. Therefore, the introduction of visual saliency is beneficial in estimating the masking effect more accurately. Moreover, the HVS has different sensitivity to different color components, and the degree of distortion that can be tolerated differs according to color. Therefore, the color sensitivities of the HVS should be accounted for in the construction of the JND model.
In this paper, we propose a color-sensitivity-based JND model (CSJND) in the pixel-domain. The performance of the CSJND model was verified through extensive experiments and subjective tests. Our main contributions are threefold:
  • The masking effect, including contrast masking, pattern masking, and edge protection, are adaptively modulated by visual saliency, which is a more comprehensive measure of the masking effect.
  • As the color sensitivities of the HVS to different color components differ, we applied color sensitivity modulation based on the visual sensitivity of human eyes to Y, Cb, Cr components. This is the first JND model that accounts for color sensitivity.
  • The proposed CSJND model was utilized to guide the injection of noise into the images. Our experimental results demonstrated that the CSJND model could tolerate more noise and had better perceptual quality.
The organization of this paper is as follows: In Section 2, we provide a literature review of related work. Section 3 introduces the proposed CSJND model in detail. The experimental results and analyses are provided in Section 4. Finally, Section 5 concludes the paper and provides an outlook for future work.

2. Related Works

During the past two decades, plenty of JND models have been intensively researched and developed. Existing JND models can be classified into two types depending on the domain of the JND threshold that is computed. One type is the pixel-domain JND model [12,13,14], where JND thresholds are directly computed according to image pixels. The second type is the transform-domain (or subband-domain) JND model [20,21], where images are required to be converted to a transform domain. The pixel-domain JND can be considered as a compound effect of all transformed domains. In terms of practical operational efficiency, it is more convenient to calculate JND directly from pixel values without a transformation step.
In terms of pixel-domain JND models, Chou et al. [12] constructed an early JND model using luminance adaptation (LA) and contrast masking (CM), but this model ignored the overlapping effect of these two factors. Based on this study, Yang et al. [13] designed a nonlinear additivity model for masking (NAMM) to reduce the overlapping effect between LA and CM. However, the above two JND models overestimated the JND threshold of the edge regions. Liu et al. [14] decomposed images into a structural image and a textural image, then computed the edge masking effect and texture masking effect, respectively. Chen et al. [15] developed a foveated JND (FJND) model consisting of a foveation model, spatial model, and temporal model. In [22], Wu et al. introduced the free-energy principle to determine the JND thresholds of disordered regions. Due to the distinct properties of screen content images (SCI), Wang et al. [23] developed a JND model for SCI, which could be applied to screen compression algorithms. Wu et al. [24] proposed an enhanced JND model for images with pattern complexity. Chen et al. [25] constructed an asymmetric foveated JND model using the effect of eccentricity on visual sensitivity. Wang et al. [26] proposed a superpixel-wise JND model based on region modulation. Jiang et al. [27] took a different approach and proposed a JND model based on top-down design philosophy. They utilized the KLT transform to predict a critical perceptual lossless (CPL) image from the original image, and then used the difference between the CPL and original image as the JND map. Pixel-domain JND models can be applied in image quality assessments, such as artificially added noise, distortion metrics, or used in combination with referenceless quality metrics [9].
For transform-domain JND models, Ahumada et al. [28] proposed a JND model for grayscale images by computing the spatial contrast sensitivity function (CSF). Watson [29] further considered the influence of luminance adaptation and contrast masking, and proposed the DCTune method. Zhang et al. [30] took spatial CSF, luminance adaptation, and intra- and inter-band contrast masking into account to establish the JND model. Wei et al. [21] introduced gamma correction, a block classification method, and a temporal modulation factor to improve accuracy of the JND model. Wang et al. [31] developed a Saliency-JND (S-JND) model consisting of a visual attention model and visual sensitivity model. Wan et al. [32] designed an orientation-based JND model, which determined orientation regularity according to the DCT coefficient distribution. Wang et al. [33] presented an adaptive foveated weighting JND model based on a fixation point estimation method. Transform-domain JND models are usually applied in the transform and quantization steps of video coding [21].
With the development of artificial intelligence in recent years, researchers have considered building JND models based on deep learning. Ki et al. [34] proposed two learning-based just-noticeable-quantization-distortion (JNQD) models and used them for preprocessing in video coding. Liu et al. [35] introduced a picture-wise JND prediction model based on deep learning, which could be applied to image compression. Shen et al. [36] proposed a patch-level structural visibility learning method to infer the JND profile.
However, currently existing JND models ignore the perceptual characteristic of the HVS in its varied sensitivity to different color components. For example, in the RGB color space, human eyes have different sensitivities to R, G, and B components, being the most sensitive to green, followed by red, and finally blue [37]. In the YCbCr color space, human eyes are most sensitive to the luminance component Y, followed by the chrominance components Cb and Cr [38]. Reviewing previous JND models, it is evident that they did not adequately consider the masking effect or the differences in color sensitivity to different channels. In this paper, we propose a color-sensitivity-based JND model (CSJND), which makes up for these shortcomings. To our knowledge, this is the first work that has considered color sensitivity in the establishment of the JND model.

3. The Proposed CSJND Model

The proposed CSJND model is composed of three main modules, which are the luminance adaptation effect ( L A ), visual masking effect with saliency modulation ( V M θ S ), and color sensitivity modulation ( C S θ ). The framework of the proposed CSJND model is shown in Figure 1. Firstly, we considered the luminance adaptation effect ( L A ), where the visual threshold of human eyes changes according to different levels of background luminance. Then, contrast masking ( C M θ ), pattern masking ( P M θ ), and edge protection ( E P θ ) were combined to estimate the masking effect. Visual saliency was employed to adaptively modulate the masking effect, which is conducive to more accurate computations of the visual masking effect ( V M θ S ). Next, the sub-JND thresholds J N D θ S of the three components Y, Cb, and Cr were calculated by the NAMM [13]. Finally, taking the sensitivities of human eyes to the Y, Cb, and Cr components into account, we added color sensitivity modulation ( C S θ ) to adjust the sub-JND thresholds of Y, Cb, and Cr components. Therefore, C S J N D θ is defined as a function of C S θ and J N D θ S , which is expressed as
C S J N D θ ( x ) = C S θ · J N D θ S ( x ) ,
J N D θ S ( x ) = L A ( x ) + V M θ S ( x ) α · min L A ( x ) , V M θ S ( x ) ,
where α is the gain reduction factor due to the overlapping effect between L A and V M θ S . In this paper, we set α = 0.3, which is the same as in [13]. θ represents the three color components Y, Cb, and Cr.

3.1. Luminance Adaptation Effect

It has been shown that HVS visibility changes according to different levels of background luminance [12]. The visibility of human eyes is limited in dark environments, but improves under better lighting conditions. As a result, the threshold of visibility differs depending on the luminance of the background. This threshold can be simulated by the luminance adaptation effect L A , which is given by
L A ( x ) = 17 × 1 l ( x ) 127 + 3 If l ( x ) 127 3 × ( l ( x ) 127 ) 128 + 3 if l ( x ) > 127 ,
where l ( x ) is the average luminance of the neighborhood where x is located (e.g., a 5 × 5 neighborhood).

3.2. Visual Masking Effect with Saliency Modulation

In this paper, contrast masking ( C M θ ), pattern masking ( P M θ ), and edge protection ( E P θ ) were comprehensively considered to estimate the masking effect. As there is a positive correlation between these three masking effects, we combined them by the method of multiplication, as shown in Equation (4). In addition, visual saliency was employed to adaptively modulate the masking effect to improve estimation accuracy. Thus, an improved visual masking effect with saliency modulation V M θ S was obtained, which is given by Equation (5).
V M θ ( x ) = C M θ ( x ) · P M θ ( x ) · E P θ ( x ) ,
V M θ S ( x ) = V M θ ( x ) · U S ( x ) ,
where U S is the saliency modulation factor.

3.2.1. Contrast Masking

The masking effect is weaker and the visibility threshold is lower in uniform regions with no contrast variation. On the contrary, regions with large contrast variation have stronger masking effect and higher visibility threshold. For an image F, the contrast variation c ( x ) can be calculated as the variance of pixels. The classical contrast masking function computes the visibility threshold as a linear relationship [12]. However, this method overestimates the threshold of regions with large contrast. Perceptual research has shown that the response of human eyes to light intensity change is nonlinear, and the growth rate of the visibility threshold should decrease as contrast increases [16]. Therefore, a nonlinear transducer was introduced to calculate contrast masking, which is denoted by
C M θ ( x ) = 0.115 × a 1 · c ( x ) 2.4 c ( x ) 2 + a 2 2 ,
where a 1 and a 2 are two control parameters. We set a 1 = 16 and a 2 = 26, which is the same as in [16]. The response map for contrast masking of the Y component is shown in Figure 2b.

3.2.2. Pattern Masking

Research has shown that the complexity of a pattern is highly associated with the orientation of pixels in the pattern. Complex patterns contain more different orientations and have a stronger masking effect, whereas simple patterns have fewer orientations and their masking effect is weaker. According to [24], the orientation of pixels can be regarded as the gradient direction, which is calculated as
φ ( x ) = arctan g v ( x ) g h ( x ) ,
where g v and g h represent the gradient in vertical and horizontal directions, respectively. In this paper, we use the Prewitt kernels, as shown in Figure 3.
The histogram of φ ( x ) is a valid representation method to describe the distribution of orientation. Therefore, the φ ( x ) is quantified as φ ^ ( x ) to generate the histogram H ( x ) , which is given by
H i ( x ) = x R ( x ) δ ( φ ^ ( x ) , i ) ,
where δ ( · ) is a pulse function. The histogram of a complex pattern is dense because it has many different orientations, whereas the histogram of a simple pattern is sparse. Thus, the pattern complexity P C ( x ) is computed as the sparsity of its histogram, which is given by
P C ( x ) = i = 1 N H i ( x ) 0 ,
where · 0 represents the L 0 norm. The pattern masking is modeled as
P M θ ( x ) = b 1 · P C b 2 P C 2 + b 3 2 ,
where b 1 is a proportional constant, b 2 is an exponential parameter (the larger b 2 is, the faster the gain), and b 3 is a small constant. We set b 1 = 0.8, b 2 = 2.7, and b 3 = 0.1, which is the same as in [24]. The response map for pattern masking of the Y component is shown in Figure 2c.

3.2.3. Edge Protection

As edge structures can attract a lot of attention, the HVS has a relatively low tolerance for the distortion near edge structures. The distortion in edge regions is more likely to be noticed by human eyes. If edge and non-edge structures are not distinguished by the algorithm, the image may appear obviously distorted. Therefore, we took edge information into account and applied protection methods for edge regions, which is calculated as
E P θ ( x ) = λ θ · G ( x ) · W ( x ) ,
G ( x ) = max j = 1 , 2 , 3 , 4 grad j ( x ) ,
where G ( x ) represents the maximal weighted average of gradients around the pixel at x. W ( x ) represents the edge-related weight of the pixel at x. λ θ were set as 0.117, 0.65, and 0.45 for Y, Cb, and Cr components, which is the same as in [13].

3.2.4. Saliency Modulation

According to the visual saliency of the HVS, regions with high saliency attract more attention from human eyes and have an important influence on the perceptual quality of an image. For regions with different visual saliencies, the masking effect is also different. Zhang et al. [39] compared several methods of saliency detection, among which the SDSP [40] had higher prediction performance and lower calculation costs. The SDSP method is expressed as
S ( x ) = S F ( x ) · S D ( x ) · S C ( x ) ,
where S F , S D , and S C denote the frequency prior, location prior, and color prior, respectively. The detailed process is described in [40].
We use Equation (14) for the normalization operation. As shown in Figure 2d, the brighter part in a saliency prediction map indicates that the value of S ( x ) is closer to “1” and the saliency of this region is higher. As people pay more attention to high-saliency regions, the masking effect of high-saliency regions is weaker. In low-saliency regions, the masking effect is stronger. Visual saliency was employed to adaptively modulate the masking effect of high-saliency and low-saliency regions. The saliency modulation factor U S is defined as follows:
S ( x ) = S ( x ) min ( x ) max ( x ) min ( x ) .
U S ( x ) = 1 S ( x ) .

3.3. Color Sensitivity Modulation

Research has shown that the sensitivity of the HVS to different color components varies, and the degree of distortion that can be tolerated by human eyes to different color components also varies. Compared with the Cb and Cr components, HVS is more sensitive to distortion in the Y component. However, previous JND models have not fully accounted for this color sensitivity of the HVS. There was no difference in the processing of different color components, whether in the RGB or YCbCr space. After the sub-JND thresholds of the three components were separately calculated, they were simply combined in a ratio of C S θ as 1 : 1 : 1 to establish the JND model. If color sensitivity is ignored, the JND model will not be accurate enough.
Based on the transform relationship between the YCbCr and RGB color space, Shang et al. [38] conducted extensive subjective tests of contrast sensitivity to quantify the visual sensitivity of human eyes to Y, Cb, and Cr components. This provided a theoretical basis for the development of a JND model that considers the color sensitivity of the different components. According to the test results in [38], the average ratio of the critical distance D θ ( θ = Y, Cb, or Cr) for Y, Cb, and Cr components that can be perceived by human eyes is 1 : 0.432 : 0.501. When the distance is fixed, the Y, Cb, and Cr components are considered to be a perceptual sub-unit of a pixel with the side length ratio of ( 1 / D Y ) : ( 1 / D C b ) : ( 1 / D C r ) . Then, the percentage of each sub-unit in a unit area was computed to obtain the perceptual sensitivity parameters of Y, Cb, and Cr components, which is given by
S Y = 0.695 , S C b = 0.130 , S C r = 0.175 .
As shown in Equation (16), the perceptual sensitivity parameters of human eyes to the Y component is the largest. Evidently, the HVS is most sensitive to the Y component and distortion in the Y component is the easiest to be perceived. Therefore, the JND threshold corresponding to the Y component should be smaller. As the HVS is relatively insensitive to Cb and Cr components, a large degree of distortion in Cb and Cr components can be tolerated, so the corresponding JND thresholds can be slightly larger.
A good JND model should guide the noise to regions that are insensitive to human eyes and hide the noise as much as possible. Therefore, the color sensitivity weights in JND model should be inversely proportional to the perceptual sensitivity parameters of Y, Cb, Cr components. Taking the inverse ratio of the perceptual sensitivity parameters of Y, Cb, Cr, we obtain
S Y = 1 S Y , S C b = 1 S C b , S C r = 1 S C r .
The color sensitivity weights of three components are normalized by Equation (18). Thus, we obtain the color sensitivity weights C S θ , as follows
C S Y = 3 · S Y S Y + S C b + S C r , C S C b = 3 · S C b S Y + S C b + S C r , C S C r = 3 · S C r S Y + S C b + S C r .
C S Y = 0.291 , C S C b = 1.554 , C S C r = 1.155 .
The existing JND models simply combine the sub-JND thresholds of the three components at a ratio of C S θ as 1 : 1 : 1 to establish the JND model, which contradicts with the perceptual properties of human eyes. In contrast to previous JND models, we assigned different weights to the three color components. The sub-JND thresholds of Y, Cb, and Cr components were adjusted according to color sensitivity. The JND map of the Y component is shown in Figure 2e. We also provided a contaminated image guided by JND noise, as shown in Figure 2f. The image (with PSNR = 27.00 dB) has almost no noticeable distortion. Through color sensitivity modulation, the threshold of the Y component was smaller, whereas the thresholds of Cb and Cr components were relatively larger. Correspondingly, the amount of noise injected into the Y, Cb, and Cr components also changed. The CSJND model guided less noise into the Y component and more noise into chrominance components Cb and Cr, which is consistent with the perceptual properties of human eyes.

4. Experimental Results and Analysis

In this section, we describe the extensive experiments and subjective tests conducted to evaluate the effectiveness and accuracy of the CSJND model. Firstly, we analyze the impact of the proposed factors on the JND model through comparative experiments. Then, the CSJND model is tested in comparison with other JND models to verify its performance. Finally, the correlation between the CSJND model and subjective perception is evaluated by subjective viewing tests.
In order to evaluate the performance of our model compared with different JND models, we injected random noise into each pixel of the test images, which is given by
F ^ θ ( x ) = F θ ( x ) + β · r ( x ) · J N D θ ( x ) ,
where F ^ θ ( x ) is the image following noise injection guided by the JND model. r ( x ) represents the random noise of ±1. β is a noise level controller, which makes the injected noise from different JND models have the same energy.
In the field of image quality assessment, widely used assessment metrics are the PSNR or SSIM. However, they do not provide good feedback on the subjective feelings of people, and the calculated results are sometimes inconsistent with subjective perception. VMAF (visual multi-method assessment fusion) [41,42] is an image quality assessment standard that correlates well with subjective scores. It uses a large amount of subjective data as a training set, and fuses the algorithms of different assessment dimensions by means of machine learning. In this paper, we used SSIM and VMAF to evaluate the objective quality and subjective quality of images, respectively. In addition, we also conducted subjective test experiments, and these metrics were combined to provide a more accurate assessment.

4.1. Analysis of the Proposed Factors

To test the influence of visual saliency and color sensitivity modulation on the performance of the JND model, experiments were conducted by the control variable method. We took the J N D θ B defined in Equation (21) as the basic model and tested the influence of the proposed factors on this basis. The JND model composed of the basic model and saliency modulation was defined as J N D θ S , as shown in Equation (2). The JND model composed of the basic model and color sensitivity modulation was defined as J N D θ C , as shown in Equation (22). The proposed JND model C S J N D θ was composed of the basic model, saliency modulation, and color sensitivity weights modulation, as shown in Equation (1).
J N D θ B ( x ) = L A ( x ) + V M θ ( x ) α · min L A ( x ) , V M θ ( x ) .
J N D θ C ( x ) = C S θ · J N D θ B ( x ) .
With the help of Equation (20), the image was contaminated under the guidance of JND models based on the different factors, including J N D θ B , J N D θ S , J N D θ C , and C S J N D θ . The comparison of contaminated images from JND models based on the different proposed factors is shown in Figure 4. The contaminated images (size 768 × 512) had the same level of noise, with PSNR = 28.25 dB. However, their perceptual quality was significantly different.
As shown in Figure 4b, the basic model J N D θ B considering luminance adaptation and the masking effect showed distortion in the whole image, with a VMAF score of 80.10. As shown in Figure 4c, the model J N D θ S based on the basic model and saliency modulation was a little better. A lot of distorted areas in the wall and balcony regions were still perceived, and the VMAF score was 84.42. As shown in Figure 4d, the model J N D θ C based on the basic model and color sensitivity modulation showed significant quality improvements, with only slight distortion, and a VMAF score of 88.04. For the models based on visual saliency and color sensitivity modulation, the perceptual quality was clearly better than the basic model. In contrast, the proposed model C S J N D θ had the best perceptual quality, with almost unnoticeable distortion and a VMAF score of 94.75, as shown in Figure 4e. The experimental results indicate that these two factors are beneficial in constructing a JND model that can accurately and efficiently measure perceptual redundancy.

4.2. Comparison of JND Models

4.2.1. Performance Comparison of JND-Guided Noise Injection

For a comprehensive analysis, four state-of-the-art JND models were selected for comparison to verify the effectiveness of the CSJND model. This included Wu’s model [16] (Wu2013), Wu’s model [24] (Wu2017), Chen’s model [25] (Chen2019), and Jiang’s model [27] (Jiang2022). As the CSJND model was proposed on the basis of traditional methods, it was only compared with relevant JND models in the pixel-domain and did not involve deep learning models.
With the help of Equation (20), the image was injected with an equal level of noise under the guidance of different JND models. In the case of equal PNSR, the better the perceptual quality, the higher the agreement between the JND model and HVS. This suggests that the JND model can improve perceptual quality by guiding more noise to regions with higher visual redundancy that are insensitive to human eyes. A detailed example of the comparison is shown in Figure 5; the perceptual quality of the CSJND model was better than other JND models.
As shown in Figure 5b, the VMAF score for Wu2013 was 82.65. Wu2013 proposed that disorderly regions can hide more noise, so the model injects less noise into ordered fence regions and more noise into disordered grassland regions. However, this model overestimates the visibility threshold of disordered regions, resulting in many spots in the grassland and sky regions. As shown in Figure 5c, the VMAF score of Wu2017 was 83.41. Wu2017 introduced pattern complexity to estimate the masking effect. Although this model correctly highlights the complex grassland and sky regions, it still guides too much noise into the fence and lighthouse regions, resulting in obvious distortion. As shown in Figure 5d, the VMAF score for Chen2019 was 87.44. Chen2019 built an asymmetric foveated JND model; the fence and lighthouse regions near the center of the focus are not distorted, but the sky and grass regions away from the center are distorted. As shown in Figure 5e, the VMAF score for Jiang2022 was 90.34. Jiang2022 used the KLT transform to predict the corresponding critical perceptual lossless (CPL) image from the original image, which is different approach from the previous JND models. However, this model does not consider the visual saliency and color sensitivity of the HVS, which leads to perceptible distortions in regions such as the walls, lighthouse, and grassland. As shown in Figure 5f, our proposed CSJND model effectively improves this situation, with a VMAF score of 94.99. By introducing color sensitivity modulation to adjust the the sub-JND threshold of different components, more noise was injected into Cb and Cr components, whereas less noise was injected into the Y component. The noise is not easily perceived because human eyes are less sensitive to Cb and Cr components. Although human eyes are most sensitive to the Y component, the noise in this component was almost imperceptible due to the low level of noise.
A set of images were selected for the comparison experiment. The test images are shown in Figure 6, where I1–I6 (size 512 × 512) are frequently utilized for JND estimation [43] and I6–I12 (size 768 × 512) are frequently utilized for quality assessment [44]. The performance comparison results of noise injection are shown in Table 1. It can be seen that the CSJND model had the highest SSIM and VMAF scores under the same PSNR. The average VMAF score of the CSJND model was 9.75 higher than Wu2013, 8.14 higher than Wu2017, 5.25 higher than Chen2019, and 3.63 higher than Jiang2022. The SSIM and VMAF scores were in good agreement with the subjective perceptual quality, which demonstrates the accuracy of our data. From these experiments, we found that the CSJND model performed better in perceptual redundancy measurement than other JND models.

4.2.2. Performance Comparison of Maximum Tolerable Noise Level

An effective JND model needs to hide the noise in images without reducing perceptual quality. This suggests that a JND model that can measure the perceptual redundancy more accurately can tolerate more noise. Therefore, we further tested the maximum tolerable noise level for different JND models. We used PSNR as an indicator: the lower the PNSR, the more noise the model can tolerate. The performance comparison results of maximum tolerable noise levels are shown in Table 2. It can be seen that the CSJND model had the lowest PNSR and could protect the subjective quality while hiding noise, which again demonstrates the validity of the CSJND model.

4.3. Comparison of Subjective Quality

Subjective viewing tests were conducted to further evaluate the performance of the CSJND model. During the experiments, we used a 24.5-inch professional-grade OLED screen monitor, SONY PVM-2541. Using Equation (20), the images were contaminated with the same level of noise under the guidance of the CSJND model and other comparison models, respectively. Then, the images contaminated by JND noise were randomly displayed side-by-side on the left or right part of the screen, in the same scene. The subjects were required to determine which image was better and how much better. If the left image was considered to be better, subjects gave a positive score; otherwise, it was given a negative score. The evaluation standard for subjective quality comparison is shown in Table 3. The subjective viewing tests were conducted in strict compliance with the ITU-R BT.500-11 standard [45].
Forty subjects with normal or corrected-to-normal vision were invited to participate in this experiment. There were 24 men and 16 women, with an average age of 24 years, and none had previous experience with image processing. The experimental results of the subjective quality comparison are presented in Table 4, where the “mean” represents the average of subjective quality scores given by subjects and the “standard deviation” denotes their standard deviation. The positive (or negative) scores indicate whether the perceptual quality of images processed by the CSJND model was better (or worse) than that of the other JND models.
Through a comparison with four other JND models (i.e., Wu2013, Wu2017, Chen2019, and Jiang2022) in subjective viewing tests, we can see that the CSJND model outperformed the other models on all the images (positive mean scores in Table 4). In three images I3, I10, and I11, in particular, there was significant perceptual quality improvement at the same level of noise (corresponding to a larger positive mean score). In addition, for the majority of the quality scores presented in Table 4, the standard deviation values (i.e., ’std’) were small, meaning that the results of quality scores given by subjects were generally consistent and the data obtained were relatively accurate. Overall, the CSJND model showed better consistency with the HVS than Wu2013, Wu2017, Chen2019, and Jiang2022 models (mean scores were 1.016, 0.952, 0.893, and 0.844 higher, respectively). Some images of the subjective viewing tests are provided in Supplementary Materials. The CSJND model correlated well with the HVS because the color sensitivity of human eyes to different components were taken into account, leading to better performance in perceptual redundancy measurement.

5. Conclusions

In this paper, we propose a color-sensitivity-based JND model to measure perceptual redundancy. In contrast to previous JND models, we introduce visual saliency modulation to improve the estimation of masking effect. In addition, color sensitivity modulation based on the perceptual sensitivity of human eyes to the Y, Cb, and Cr components is performed to adjust the sub-JND thresholds of the three components. The experimental results demonstrate that the CSJND model can conceal more noise and maintain better perceptual quality with the same PSNR. In the future, we will focus on applying the CSJND model to image quality assessment and video coding.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/s23052634/s1.

Author Contributions

Conceptualization, Z.Z., X.S., G.L. and G.W.; methodology, Z.Z., X.S., G.L. and G.W.; software, Z.Z. and X.S.; validation, Z.Z. and X.S.; writing—original draft preparation, Z.Z.; writing—review and editing, Z.Z. and X.S.; funding acquisition, X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by National Natural Science Foundation of China: 62001283.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shang, X.; Li, G.; Zhao, X.; Zuo, Y. Low complexity inter coding scheme for Versatile Video Coding (VVC). J. Vis. Commun. Image Represent. 2023, 90, 103683. [Google Scholar] [CrossRef]
  2. Wu, J.; Shi, G.; Lin, W. Survey of visual just noticeable difference estimation. Front. Comput. Sci. 2019, 13, 4–15. [Google Scholar] [CrossRef]
  3. Lin, W.; Ghinea, G. Progress and Opportunities in Modelling Just-Noticeable Difference (JND) for Multimedia. IEEE Trans. Multimed. 2021, 24, 3706–3721. [Google Scholar] [CrossRef]
  4. Wan, W.; Zhou, K.; Zhang, K.; Zhan, Y.; Li, J. JND-guided perceptually color image watermarking in spatial domain. IEEE Access 2020, 8, 164504–164520. [Google Scholar] [CrossRef]
  5. Zhang, Y.; Wang, Z.; Zhan, Y.; Meng, L.; Sun, J.; Wan, W. JND-aware robust image watermarking with tri-directional inter-block correlation. Int. J. Intell. Syst. 2021, 36, 7053–7079. [Google Scholar] [CrossRef]
  6. Wan, W.; Li, W.; Liu, W.; Diao, Z.; Zhan, Y. QuatJND: A Robust Quaternion JND Model for Color Image Watermarking. Entropy 2022, 24, 1051. [Google Scholar] [CrossRef] [PubMed]
  7. Ki, S.; Do, J.; Kim, M. Learning-based JND-directed HDR video preprocessing for perceptually lossless compression with HEVC. IEEE Access 2020, 8, 228605–228618. [Google Scholar] [CrossRef]
  8. Nami, S.; Pakdaman, F.; Hashemi, M.R.; Shirmohammadi, S. BL-JUNIPER: A CNN-Assisted Framework for Perceptual Video Coding Leveraging Block-Level JND. IEEE Trans. Multimed. 2022, 1–16. [Google Scholar] [CrossRef]
  9. Dai, T.; Gu, K.; Niu, L.; Zhang, Y.b.; Lu, W.; Xia, S.T. Referenceless quality metric of multiply-distorted images based on structural degradation. Neurocomputing 2018, 290, 185–195. [Google Scholar] [CrossRef]
  10. Seo, S.; Ki, S.; Kim, M. A novel just-noticeable-difference-based saliency-channel attention residual network for full-reference image quality predictions. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 2602–2616. [Google Scholar] [CrossRef]
  11. Sendjasni, A.; Larabi, M.C.; Cheikh, F.A. Perceptually-weighted CNN for 360-degree image quality assessment using visual scan-path and JND. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1439–1443. [Google Scholar]
  12. Chou, C.H.; Li, Y.C. A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile. IEEE Trans. Circuits Syst. Video Technol. 1995, 5, 467–476. [Google Scholar] [CrossRef]
  13. Yang, X.; Ling, W.; Lu, Z.; Ong, E.P.; Yao, S. Just noticeable distortion model and its applications in video coding. Signal Process. Image Commun. 2005, 20, 662–680. [Google Scholar] [CrossRef]
  14. Liu, A.; Lin, W.; Paul, M.; Deng, C.; Zhang, F. Just noticeable difference for images with decomposition model for separating edge and textured regions. IEEE Trans. Circuits Syst. Video Technol. 2010, 20, 1648–1652. [Google Scholar] [CrossRef]
  15. Chen, Z.; Guillemot, C. Perceptually-friendly H. 264/AVC video coding based on foveated just-noticeable-distortion model. IEEE Trans. Circuits Syst. Video Technol. 2010, 20, 806–819. [Google Scholar] [CrossRef]
  16. Wu, J.; Lin, W.; Shi, G.; Wang, X.; Li, F. Pattern masking estimation in image with structural uncertainty. IEEE Trans. Image Process. 2013, 22, 4892–4904. [Google Scholar] [CrossRef]
  17. Wu, J.; Lin, W.; Shi, G.; Li, L.; Fang, Y. Orientation selectivity based visual pattern for reduced-reference image quality assessment. Inf. Sci. 2016, 351, 18–29. [Google Scholar] [CrossRef]
  18. Eckert, M.P.; Bradley, A.P. Perceptual quality metrics applied to still image compression. Signal Process. 1998, 70, 177–200. [Google Scholar] [CrossRef]
  19. Borji, A.; Itti, L. State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 185–207. [Google Scholar] [CrossRef]
  20. Zhang, X.; Lin, W.; Xue, P. Improved estimation for just-noticeable visual distortion. Signal Process. 2005, 85, 795–808. [Google Scholar] [CrossRef]
  21. Wei, Z.; Ngan, K.N. Spatio-temporal just noticeable distortion profile for grey scale image/video in DCT domain. IEEE Trans. Circuits Syst. Video Technol. 2009, 19, 337–346. [Google Scholar]
  22. Wu, J.; Shi, G.; Lin, W.; Liu, A.; Qi, F. Just noticeable difference estimation for images with free-energy principle. IEEE Trans. Multimed. 2013, 15, 1705–1710. [Google Scholar] [CrossRef]
  23. Wang, S.; Ma, L.; Fang, Y.; Lin, W.; Ma, S.; Gao, W. Just noticeable difference estimation for screen content images. IEEE Trans. Image Process. 2016, 25, 3838–3851. [Google Scholar] [CrossRef] [PubMed]
  24. Wu, J.; Li, L.; Dong, W.; Shi, G.; Lin, W.; Kuo, C.C.J. Enhanced just noticeable difference model for images with pattern complexity. IEEE Trans. Image Process. 2017, 26, 2682–2693. [Google Scholar] [CrossRef] [PubMed]
  25. Chen, Z.; Wu, W. Asymmetric foveated just-noticeable-difference model for images with visual field inhomogeneities. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 4064–4074. [Google Scholar] [CrossRef]
  26. Wang, C.; Wang, Y.; Lian, J. A Superpixel-Wise Just Noticeable Distortion Model. IEEE Access 2020, 8, 204816–204824. [Google Scholar] [CrossRef]
  27. Jiang, Q.; Liu, Z.; Wang, S.; Shao, F.; Lin, W. Towards Top-Down Just Noticeable Difference Estimation of Natural Images. IEEE Trans. Image Process. 2022, 31, 3697–3712. [Google Scholar] [CrossRef] [PubMed]
  28. Ahumada, A.J., Jr.; Peterson, H.A. Luminance-model-based DCT quantization for color image compression. In Proceedings of the Human Vision, Visual Processing, and Digital Display III, San Jose, CA, USA, 10–13 February 1992; Volume 1666, pp. 365–374. [Google Scholar]
  29. Watson, A.B. DCTune: A technique for visual optimization of DCT quantization matrices for individual images. In Proceedings of the SID International Symposium Digest of Technical Papers, Society for Information Display, Playa del Rey, CA, USA, 26 January 1993; Volume 24, p. 946. [Google Scholar]
  30. Zhang, X.; Lin, W.; Xue, P. Just-noticeable difference estimation with pixels in images. J. Vis. Commun. Image Represent. 2008, 19, 30–41. [Google Scholar] [CrossRef]
  31. Wang, H.; Wang, L.; Hu, X.; Tu, Q.; Men, A. Perceptual video coding based on saliency and just noticeable distortion for H. 265/HEVC. In Proceedings of the 2014 International Symposium on Wireless Personal Multimedia Communications (WPMC), Sydney, Australia, 7–10 September 2014; pp. 106–111. [Google Scholar]
  32. Wan, W.; Wu, J.; Xie, X.; Shi, G. A novel just noticeable difference model via orientation regularity in DCT domain. IEEE Access 2017, 5, 22953–22964. [Google Scholar] [CrossRef]
  33. Wang, H.; Yu, L.; Wang, S.; Xia, G.; Yin, H. A novel foveated-JND profile based on an adaptive foveated weighting model. In Proceedings of the 2018 IEEE Visual Communications and Image Processing (VCIP), Taichung, Taiwan, 9–12 December 2018; pp. 1–4. [Google Scholar]
  34. Ki, S.; Bae, S.H.; Kim, M.; Ko, H. Learning-based just-noticeable-quantization-distortion modeling for perceptual video coding. IEEE Trans. Image Process. 2018, 27, 3178–3193. [Google Scholar] [CrossRef]
  35. Liu, H.; Zhang, Y.; Zhang, H.; Fan, C.; Kwong, S.; Kuo, C.C.J.; Fan, X. Deep learning-based picture-wise just noticeable distortion prediction model for image compression. IEEE Trans. Image Process. 2019, 29, 641–656. [Google Scholar] [CrossRef]
  36. Shen, X.; Ni, Z.; Yang, W.; Zhang, X.; Wang, S.; Kwong, S. Just noticeable distortion profile inference: A patch-level structural visibility learning approach. IEEE Trans. Image Process. 2020, 30, 26–38. [Google Scholar] [CrossRef]
  37. Shang, X.; Wang, G.; Zhao, X.; Zuo, Y.; Liang, J.; Bajić, I.V. Weighting quantization matrices for HEVC/H. 265-coded RGB videos. IEEE Access 2019, 7, 36019–36032. [Google Scholar] [CrossRef]
  38. Shang, X.; Liang, J.; Wang, G.; Zhao, H.; Wu, C.; Lin, C. Color-sensitivity-based combined PSNR for objective video quality assessment. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 1239–1250. [Google Scholar] [CrossRef]
  39. Zhang, L.; Shen, Y.; Li, H. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 2014, 23, 4270–4281. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Zhang, L.; Gu, Z.; Li, H. SDSP: A novel saliency detection method by combining simple priors. In Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, Australia, 15–18 September 2013; pp. 171–175. [Google Scholar]
  41. Li, Z.; Aaron, A.; Katsavounidis, I.; Moorthy, A.; Manohara, M. Toward a Practical Perceptual Video Quality Metric. Netflix Technology Blog, 6 June 2016. [Google Scholar]
  42. Li, Z.; Bampis, C.; Novak, J.; Aaron, A.; Swanson, K.; Moorthy, A.; Cock, J. VMAF: The Journey Continues. Netflix Technology Blog, 25 October 2018. [Google Scholar]
  43. Sheikh, H.R. Image and Video Quality Assessment Research at LIVE. 2003. The University of Texas. Available online: https://sipi.usc.edu/database/database.php (accessed on 10 January 2022).
  44. Franzen, R. Kodak Lossless True Color Image Suite. Available online: http://www.r0k.us/graphics/kodak/ (accessed on 1 February 2022).
  45. Methodology for the Subjective Assessment of the Quality of Television Pictures; Document ITU-R BT. 500-11; International Telecommunication Union: Geneva, Switzerland, 2002.
Figure 1. The framework of the proposed CSJND model.
Figure 1. The framework of the proposed CSJND model.
Sensors 23 02634 g001
Figure 2. An example of JND generation and a contaminated image guided by JND noise: (a) the original image; (b) response map for contrast masking of Y component; (c) response map for pattern masking of Y component; (d) saliency prediction map; (e) JND map of Y component; and (f) JND-contaminated image, with PSNR = 27.00 dB.
Figure 2. An example of JND generation and a contaminated image guided by JND noise: (a) the original image; (b) response map for contrast masking of Y component; (c) response map for pattern masking of Y component; (d) saliency prediction map; (e) JND map of Y component; and (f) JND-contaminated image, with PSNR = 27.00 dB.
Sensors 23 02634 g002
Figure 3. The Prewitt kernels in vertical and horizontal directions.
Figure 3. The Prewitt kernels in vertical and horizontal directions.
Sensors 23 02634 g003
Figure 4. The comparison of contaminated images from JND models based on different proposed factors. The contaminated images have the same level of noise, with PSNR = 28.25 dB. (a) The original image. (b) The basic model J N D θ B , VMAF = 80.10. (c) The model J N D θ S based on the basic model and saliency modulation, with VMAF = 84.42. (d) The model J N D θ C based on the basic model and color sensitivity modulation, with VMAF = 88.04. (e) The proposed model C S J N D θ , with VMAF = 94.75.
Figure 4. The comparison of contaminated images from JND models based on different proposed factors. The contaminated images have the same level of noise, with PSNR = 28.25 dB. (a) The original image. (b) The basic model J N D θ B , VMAF = 80.10. (c) The model J N D θ S based on the basic model and saliency modulation, with VMAF = 84.42. (d) The model J N D θ C based on the basic model and color sensitivity modulation, with VMAF = 88.04. (e) The proposed model C S J N D θ , with VMAF = 94.75.
Sensors 23 02634 g004
Figure 5. An example of the comparison of contaminated images from different JND models. The contaminated images have the same level of noise, with PSNR = 28.91 dB. (a) The original image; (b) Wu2013, VMAF = 82.65; (c) Wu2017, VMAF = 83.41; (d) Chen2019, VMAF = 87.44; (e) Jiang2022, VMAF = 90.34; and (f) The proposed CSJND model, VMAF = 94.99.
Figure 5. An example of the comparison of contaminated images from different JND models. The contaminated images have the same level of noise, with PSNR = 28.91 dB. (a) The original image; (b) Wu2013, VMAF = 82.65; (c) Wu2017, VMAF = 83.41; (d) Chen2019, VMAF = 87.44; (e) Jiang2022, VMAF = 90.34; and (f) The proposed CSJND model, VMAF = 94.99.
Sensors 23 02634 g005
Figure 6. The set of test images, in order from I1–I12.
Figure 6. The set of test images, in order from I1–I12.
Sensors 23 02634 g006
Table 1. Performance comparison results of noise injection based on different JND models.
Table 1. Performance comparison results of noise injection based on different JND models.
ImagePSNRWu2013Wu2017Chen2019Jiang2022Proposed
SSIMVMAFSSIMVMAFSSIMVMAFSSIMVMAFSSIMVMAF
I128.230.8087.050.8189.310.8492.370.8594.170.8896.01
I224.520.7178.160.7280.760.7583.480.7785.560.8290.98
I326.470.8088.240.8088.690.8290.710.8392.480.8795.85
I425.950.7483.330.7585.040.7887.340.8290.020.8594.50
I526.180.7884.120.7985.310.8086.750.8189.910.8594.08
I626.910.7986.950.8187.760.8288.540.8391.360.8495.74
I727.490.8085.870.8286.990.8490.470.8592.470.8694.15
I824.910.7383.420.7984.640.8188.400.8289.400.8493.03
I926.370.7282.860.7383.720.7384.010.7586.010.7889.73
I1024.560.7180.830.7585.760.7688.580.7990.370.8294.59
I1126.380.7986.430.8087.490.8289.450.8392.800.8594.43
I1225.160.7382.820.7583.880.7786.790.7988.940.8493.91
Average26.090.7684.170.7885.780.8088.070.8190.290.8493.92
Table 2. Performance comparison results of maximum tolerable noise level based on different JND models.
Table 2. Performance comparison results of maximum tolerable noise level based on different JND models.
ImageWu2013Wu2017Chen2019Jiang2022Proposed
I135.7235.2834.4532.5632.25
I236.8634.6835.2733.2531.92
I335.4236.8235.5834.8432.82
I437.2435.6535.2733.4832.45
I535.5434.7834.8233.6532.28
I633.6434.4833.8632.9432.35
I736.5335.8533.4634.5232.93
I835.5834.6935.8735.7233.68
I933.6433.9333.4133.8531.38
I1035.9234.3434.4833.5732.01
I1134.8235.1434.7834.4232.57
I1236.6235.4834.5634.2531.94
Average35.6335.0934.6533.9232.38
Table 3. Evaluation standard for subjective quality comparison.
Table 3. Evaluation standard for subjective quality comparison.
DescriptionSame QualitySlightly BetterBetterMuch Better
Score0123
Table 4. Comparison results of subjective quality.
Table 4. Comparison results of subjective quality.
MethodsProposed vs. Wu2013Proposed vs. Wu2017Proposed vs. Chen2019Proposed vs. Jiang2022
Images MeanStdMeanStdMeanStdMeanStd
I10.6670.5770.9340.4550.7780.5740.8340.574
I20.7620.5390.9520.5400.8360.6500.7520.458
I31.6190.6691.5320.5651.3890.6001.2350.745
I40.9050.7000.9460.4320.8620.5120.7460.375
I51.1431.0620.8640.7421.1240.6480.9620.784
I61.1900.7501.1830.7900.8540.8200.9430.824
I70.7140.4630.6440.4520.6840.6200.6480.848
I80.8571.1530.8430.6040.7930.7040.7450.762
I90.3810.5900.4160.5800.1770.8500.2110.650
I101.3330.6581.3480.6941.3680.5691.2480.480
I111.4290.6760.9680.4800.9830.8340.9540.568
I121.1900.6020.7930.6750.8670.7040.8530.565
Average1.0160.7030.9520.5840.8930.6740.8440.636
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Z.; Shang, X.; Li, G.; Wang, G. Just Noticeable Difference Model for Images with Color Sensitivity. Sensors 2023, 23, 2634. https://doi.org/10.3390/s23052634

AMA Style

Zhang Z, Shang X, Li G, Wang G. Just Noticeable Difference Model for Images with Color Sensitivity. Sensors. 2023; 23(5):2634. https://doi.org/10.3390/s23052634

Chicago/Turabian Style

Zhang, Zhao, Xiwu Shang, Guoping Li, and Guozhong Wang. 2023. "Just Noticeable Difference Model for Images with Color Sensitivity" Sensors 23, no. 5: 2634. https://doi.org/10.3390/s23052634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop