Next Article in Journal
White Box Watermarking for Convolution Layers in Fine-Tuning Model Using the Constant Weight Code
Previous Article in Journal
Imaging of Gastrointestinal Tract Ailments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Optimization-Based Family of Predictive, Fusion-Based Models for Full-Reference Image Quality Assessment

Ronin Institute, Montclair, NJ 07043, USA
J. Imaging 2023, 9(6), 116; https://doi.org/10.3390/jimaging9060116
Submission received: 27 April 2023 / Revised: 1 June 2023 / Accepted: 5 June 2023 / Published: 8 June 2023

Abstract

:
Given the reference (distortion-free) image, full-reference image quality assessment (FR-IQA) algorithms seek to assess the perceptual quality of the test image. Over the years, many effective, hand-crafted FR-IQA metrics have been proposed in the literature. In this work, we present a novel framework for FR-IQA that combines multiple metrics and tries to leverage the strength of each by formulating FR-IQA as an optimization problem. Following the idea of other fusion-based metrics, the perceptual quality of a test image is defined as the weighted product of several already existing, hand-crafted FR-IQA metrics. Unlike other methods, the weights are determined in an optimization-based framework and the objective function is defined to maximize the correlation and minimize the root mean square error between the predicted and ground-truth quality scores. The obtained metrics are evaluated on four popular benchmark IQA databases and compared to the state of the art. This comparison has revealed that the compiled fusion-based metrics are able to outperform other competing algorithms, including deep learning-based ones.

1. Introduction

With social media and streaming applications booming, it is required from systems, that are able to quickly transmit a large number of images, to provide the best available user experience [1]. However, various distortions are added to digital images during storage, compression, and transmission. Therefore, the continuous evaluation and monitoring of image quality is of great importance to content providers [2]. As a consequence, objective image quality assessment (IQA) has become a very hot research topic [3], because it tries to devise mathematical models that are able to give perceptual quality estimation consistent with human judgment. The literature usually divides objective IQA into three branches [4,5] based on the availability or unavailability of the reference (distortion-free) images in the quality evaluation process. As the terminology suggests, full-reference (FR) IQA evaluates the quality of distorted images with full access to their reference counterparts, while no-reference (NR) IQA has no access and reduced-reference (RR) IQA has partial access to them.
Because the underlying model of the human visual system (HVS) is extremely complex and its many elements are not fully understood [6], researchers have proposed many FR-IQA algorithms, which take into consideration different aspects of the HVS, over the years. Recently, there have been numerous attempts to increase the performance of FR-IQA by combining several already existing FR-IQA metrics to compile a “super” evaluator. First, Okarma [7] introduced such a fusion-based metric by applying the product and power of MS-SSIM [8], VIF [9], and R-SVD [10]. Later, this idea was developed further into several directions. A line of works utilized optimization or regression techniques to determine optimal weights or exponents in summations or products of already existing FR-IQA metrics. For instance, Oszust [11] determined the optimal weights using a genetic algorithm with a root mean square error (RMSE) objective function which was calculated between predicted and ground-truth scores. Bakurov et al. [12] chose a similar solution, but the authors revisited the SSIM [13] and MS-SSIM [8] metrics to find optimal parameters in their formulas, using evolutionary and swarm intelligence methods instead of the originally proposed grid search. On the other hand, Okarma [14] used the MATLAB fminsearch function to determine the optimal exponents in a weighted product of traditional FR-IQA metrics. Another line of works utilize machine learning techniques to create fusion-based methods. The illustrative example is the paper of Lukin et al. [15] where the results of traditional FR-IQA metrics were used as a feature vector to train a shallow neural network. Amirshahi et al. [16] compiled a fusion-based metric by comparing the activation maps produced through reference and distorted images of an AlexNet [17] convolutional neural network using traditional image quality metrics.

1.1. Contributions

In this paper, we make the following contributions. We propose a novel framework for FR-IQA that combines multiple metrics and tries to leverage the strength of each by formulating FR-IQA as an optimization problem. Following the idea of other fusion-based metrics, the perceptual quality of a test image is defined as the weighted product of several already existing, hand-crafted FR-IQA metrics. Unlike other methods [7,18,19], the weights in the product are determined in a novel optimization-based framework and the objective function is defined to maximize the correlation strength and minimize the root mean square error between the predicted and ground-truth quality scores.

1.2. Structure of the Paper

To provide a clear and organized presentation of our work, this paper is structured as follows. In Section 2, we discuss the main approaches of FR-IQA and summarize significant methods of the field. Section 3 outlines our proposed method. In Section 4, we present the experimental results and analyze the performance of our method by comparing it to other state-of-the-art methods. We conclude this paper in Section 5 and discuss potential applications and future research directions.

2. Related Work

Taking the mean square error between reference and distorted images is a simple and straightforward FR-IQA metric. However, the provided quality scores do not correlate well with human judgment [20]. Similarly, PSNR [21] is also simple and straightforward but fails to give satisfactory results [22]. Other metrics take the sensitivity of the HVS to structural degradation into consideration, such as the structural similarity index (SSIM) [13]. On the basis of SSIM [13], a large number of FR-IQA metrics has been proposed over the years, such as MS-SSIM [8], CW-SSIM [23], ESSIM [24], GSSIM [25], IW-SSIM [26], and 3-SSIM [27]. In SSIM [13], a comparison between the distorted and reference (distortion-free) images is performed on the basis of three features, i.e., luminance, contrast, and structure. To be more specific, the SSIM between two images (denoted here by A and B) in an image patch around ( x , y ) coordinates is given as
S S I M ( x , y ) = l ( x , y ) α × c ( x , y ) β × s ( x , y ) γ ,
where the luminance component is defined as
l ( x , y ) = 2 μ A ( x , y ) μ B ( x , y ) + C 1 μ A ( x , y ) 2 + μ B ( x , y ) 2 + C 1 ,
the contrast component is given as
c ( x , y ) = 2 σ A ( x , y ) σ B ( x , y ) + C 2 σ A ( x , y ) 2 + σ B ( x , y ) 2 + C 2 ,
and the structure component is determined as
s ( x , y ) = σ A B ( x , y ) + C 3 σ A ( x , y ) σ B ( x , y ) + C 3 .
In Equations (1)–(4), μ A ( x , y ) and μ B ( x , y ) denote the average of the pixel values in the image patch around ( x , y ) in images A and B, respectively. Similarly, σ A ( x , y ) 2 and σ B ( x , y ) 2 stand for the variances. Further, σ A B ( x , y ) is the covariance calculated between the two images from A and B. The constants are calculated as C 1 = ( K 1 L ) 2 , C 2 = ( K 2 L ) 2 , and C 3 = C 2 / 2 . Further, L stands for the dynamic range of the pixel values, and for images with 8-bit depth, L = 255 is the recommended value. By default, K 1 = 0.01 and K 2 = 0.03 are also constants whose exact values were chosen by Wang et al. [13] after an ablation study. To give the perceptual quality of the distorted image in possession of the reference image, the arithmetic average of S S I M ( x , y ) is taken. As already mentioned, a huge number of FR-IQA metrics has been inspired by the original SSIM. For comprehensive overviews on SSIM-motivated methods, the following papers can be recommended [12,28,29,30,31]. Here, several representative methods are mentioned in the following. The authors of multi-scale SSIM [8] extended the idea of SSIM into multiple scales. Sampat et al. [23] replaced the components of SSIM by complex wavelet coefficients [32]. In contrast, Zhang et al. [24] defined an edge strength-based image quality metric where the strength of edges was defined in horizontal and diagonal directions using directional derivatives. Chen et al. [25] took a similar approach, but the edge information was characterized by gradient magnitudes. In [26], the authors used the information content measure as a weighting factor in the pooling process of SSIM [13] to obtain improved prediction results. This idea was further improved by Larson et al. [33] where low-level distortions, which are nearly imperceptible, were modeled by local luminance and contrast masking, while high-level distortions were modeled using spatial-frequency components. Kolaman and Yadid-Pecht [34] extended the SSIM metric to colorful images by modeling colors with quaternions. In [35], the authors analyzed different strategies aiming at the usage of visual saliency maps [36] in improving IQA algorithms. A proposal was the weighting of local estimates by local saliency values. In [37], first- and second-order Riesz-transform [38] coefficients were used to create feature maps for the reference and the distorted images which were compared to give an estimation of the perceptual image quality. Similarly, Zhang et al. [39] compared feature maps to quantify image quality, but the authors used phase congruency [40] and gradient magnitude maps.
Recently, the scientific community has paid more and more attention to the deployment of machine and deep learning models in almost all computer vision tasks [41]. The field of image quality assessment has accommodated this trend [3,42]. For instance, Tang et al. [43] extracted spatial and frequency domain features from reference–distorted image pairs and trained a random forest regressor for image quality prediction. In contrast, Bosse et al. [44] devised a convolutional neural network (CNN) architecture which can be trained end-to-end on single images or on image pairs. Similarly, Zhang et al. [45] trained an end-to-end CNN in a patch-wise fashion for FR-IQA and compared the effectiveness of deep features extracted from different pretrained CNNs. As a consequence, it can be used for both NR- and FR-IQA. In [46], the authors proposed a pairwise-learning framework for FR-IQA. Several works extracted deep features via pretrained CNNs from reference–distorted image pairs and compared them to assess the perceptual image quality. For instance, Amirshahi et al. [47] compared the histograms of deep features using a histogram intersection kernel (HIK) [48] at multiple levels. The perceptual quality was obtained by aggregating the similarity scores provided by the HIKs. Later, this approach was further developed in [16] by replacing the HIK in comparing convolutional feature maps with a traditional image similarity metric. In [49], the authors used the error map calculated between the reference and distorted images and the subjective saliencies of the distorted images to train a CNN for perceptual image quality estimation.
Recently, several researchers devised fusion-based FR-IQA methods where the goal is creating a “super-evaluator" using already known FR-IQA metrics to achieve advanced performance. A large number of fusion-based algorithms determine weights for each FR-IQA metric in a summation or in a product of sequence [1]. An illustrative example is the method proposed by Okarma [7]. Namely, the properties of three different FR-IQA metrics were examined thoroughly, and a combined metric was devised based on the metrics’ arithmetical product and power. By using mathematical optimization techniques, the parameter values of this fusion-based metric were refined in [14]. Oszust [50] and Yuan et al. [51] also developed this approach further by applying lasso regression and kernel ridge regression, respectively. Oszust [11] determined the weights in a linear combination of traditional FR-IQA metrics by applying a genetic algorithm. In [52], this approach was further developed by using multi-gene genetic programming [53]. The effectiveness of this approach was also demonstrated on screen content images [54]. Simulated annealing was also applied in this framework [55]. Machine learning techniques were also used in creating fusion-based algorithms. An illustrative example is Lukin et al.’s [15] work. Namely, the authors used the outcomes of several FR-IQA metrics as features and trained a neural network on top of them to predict perceptual quality. A similar approach using a neural network was proposed for the quality assessment of remote sensing images [56].
In summary, this section has highlighted the various approaches that have been proposed in the literature for FR-IQA. Although the reviewed studies have contributed significantly to the field, a detailed overview about IQA or FR-IQA is out of the scope of this study. For a general overview about the field of IQA, the PhD dissertations of Jenadeleh [57] and Men [58] can be recommended while Min-juan et al. [59], Phadikar et al. [60], George et al. [61], and Pedersen et al. [30] provide state-of-the-art studies on FR-IQA.

3. Proposed Method

In [7], Okarma took into account the different properties of three different FR-IQA metrics (MS-SSIM [8], VIF [62], and R-SVD [10]) and defined a combined quality metric (CQM):
C Q M = ( M S S S I M ) a × ( V I F ) b × ( R S V D ) c ,
where a = 7 , b = 0.3 , and c = 0.15 values were used because they lead to a near-optimal solution on an IQA benchmark database. Following this basic idea of Okarma [7], a fusion-based metric is defined as the weighted product of n different traditional FR-IQA methods’ results:
Q p = i = 1 n q i α i ,
where q i s are the results of the applied FR-IQA metrics and α i s are the associated weights. Specifically, we chose n = 18 and the following metrics were utilized: FSIM [39], FSIMc [39], GSM [63], IFC [9], IW-SSIM [26], MAD [33], MS-SSIM [8], NQM [64], PSNR [21], RFSIM [37], SFF [65], SSIM [13], SR-SIM [66], UQI [67], VIF [62], VSI [68], and VSNR [69]. A summary of the acronyms of the used FR-IQA metrics can be found in Table 1. In the literature, the parameters of an FR-IQA metric are tuned on a smaller subset of images. In the case of a traditional metric, such as SSIM [13] given by Equations (1)–(4), the number of tunable parameters is one or two. As a consequence, appropriate values can be easily found applying for cycles over a search space. In contrast, our fusion-based metric given by Equation (6) contains n = 18 parameters and an optimization task is defined to find their exact values. To determine the optimal weights (parameters) in Equation (6), the following optimization problem is defined:
max α S R O C C ( Q p , S ) + K R O C C ( Q p , S ) R M S E ( F ( Q p , β ) , S ) , subject to α i R , n N , β 0 ,
where Q p and S are vectors containing the predicted and ground-truth quality scores, respectively. S R O C C ( · , · ) and K R O C C ( · , · ) denote the Spearman’s rank-order correlation coefficient and Kendall’s rank-order correlation coefficient calculated between two vectors, respectively. Further, R M S E ( · , · ) is the root mean square error determined between two vectors. Prior to the calculation of the RMSE, a non-linear mapping is applied to the predicted scores following the recommendations of [4]. In this paper, the following non-linear function was applied
F ( Q p , β ) = β 1 1 2 1 1 + e β 2 ( Q p β 3 ) + β 4 Q p + β 5 ,
with the following β parameters, β 1 = 10 , β 2 = 0 , β 3 = m e a n ( Q p ) , β 4 = 1 , and β 4 = 0.1 , which were also used in the MATLAB implementation of the VSI [68] method.
Because an FR-IQA metric is supposed to provide objective scores which have a high correlation and low RMSE with respect to subjective quality scores collected from human observers, the objective function’s—given by Equation (7)—numerator consists of the sum of SROCC and KROCC while the denominator corresponds to the RMSE. Our preliminary investigations revealed that considering only SROCC or KROCC may result in a higher RMSE than those of the state of the art. That is why we decided to divide the sum of SROCC and KROCC by the RMSE.
Two nature-inspired optimization methods, such as the genetic algorithm [70] (GA) and pattern search [71] (PS), were applied to the problem defined by Equation (7) to determine the optimal weights. Further, the simplex method of Lagarias et al. [72], which is implemented in the fminsearch function of MATLAB’s Optimization Toolbox, was also used. To improve the efficiency of the fusion, each method was able to execute model selection (which FR-IQA metric to aggregate or not). The main motivation behind the choice of optimization methods was to collect algorithms that are able to give at least approximate solutions for NP (non-deterministic polynomial-time) hard problems. Figure 1 and Figure 2 depict the compilation of the proposed fusion-based FR-IQA metric. Specifically, the fusion was carried out on 20% of the reference images and their corresponding distorted counterparts in our method. In the literature, 20% is a common choice for parameter setting in a derived formula [73,74], but there are also researchers who used 30% [39] or 80% [75]. In total, four fusion strategies were realized with the help of one optimization method and benchmark database. Further, the fusion strategies were also cross-database tested. Each optimization method was carried out 100 times and the best solution was finally selected. We codenamed our method OFIQA to refer to the fact that the decision fusion was carried out via optimization.
In the GA, the population size and the number of generations were set to 100. The best solutions on the four benchmark databases were provided by the following equations:
O F I Q A L I V E G A = F S I M 3.6888 × G S M 12.5693 × I W S S I M 0.9556 × I F S 1.8159 ,
O F I Q A T I D 2013 G A = V S I 13.9336 × F S I M c 2.2946 × G S M 10.864 × N Q M 0.1713 × × S R S I M 2.4651 × I F S 0.5139 ,
O F I Q A T I D 2008 G A = V S I 7.0221 × F S I M 0.259 × F S I M c 1.0055 × G S M 19.8267 × × P S N R 0.1471 × V I F 0.1452 × S F F 2.4029 ,
O F I Q A C S I Q G A = F S I M c 2.7532 × M A D 0.9692 × M S S S I M 1.1892 × S S I M 1.6561 × V I F 0.75 × × I F S 3.4013 × S F F 2.2901 .
In the case of the PS after 100 runs, the following fusion-based metrics can be obtained:
O F I Q A L I V E P S = F S I M 0.6964 × F S I M c 2.6056 × M A D 1.0817 × M S S S I M 0.4711 × × S S I M 0.7302 × U Q I 0.9946 ,
O F I Q A T I D 2013 P S = V S I 24.1037 × F S I M 0.9292 × G S M 19.5555 × I W S S I M 2.1053 × × M S S S I M 6.1562 × P S N R 0.4649 × V I F 0.5463 × S F F 4.3998 ,
O F I Q A T I D 2008 P S = V S I 23.5097 × F S I M 1.2155 × F S I M c 0.2494 × G S M 21.8595 × × I W S S I M 1.2984 × M S S S I M 1.9792 × P S N R 0.5571 × S S I M 1.8374 × × V I F 0.5491 ,
O F I Q A C S I Q P S = F S I M 0.3471 × F S I M c 0.7575 × G S M 60.4948 × M A D 1.8509 × × N Q M 0.8049 × P S N R 0.8181 × U Q I 0.5142 × V I F 0.1294 .
Using the method of Lagarias et al. [72], the following fusion metrics can be obtained:
O F I Q A L I V E f m i n = V S I 0.442 × F S I M c 1.1986 × G S M 1.0479 × I F C 0.1531 × × I W S S I M 1.8895 × M A D 1.6539 × M S S S I M 0.8459 × N Q M 0.4463 × × P S N R 0.2781 × R F S I M 0.1719 × S R S I M 1.1448 × S S I M 0.0811 × × U Q I 0.0955 × V I F 0.6669 × V S N R 0.0765 × S F F 0.1433 .
O F I Q A T I D 2013 f m i n = V S I 0.6577 × F S I M 0.1742 × F S I M c 0.7197 × G S M 0.6092 × × I F C 0.3768 × I W S S I M 0.8904 × M A D 0.4482 × M S S S I M 0.9852 × × N Q M 0.6304 × P S N R 0.8077 × R F S I M 0.3861 × S S I M 0.7985 × × U Q I 0.3612 × V I F 0.0149 × V S N R 0.3338 × I F S 0.8979 × × S F F 0.9113 ,
O F I Q A T I D 2008 f m i n = V S I 2.0607 × F S I M 0.4211 × F S I M c 0.8323 × G S M 0.1672 × × I F C 0.0322 × M A D 0.0753 × N Q M 0.098 × P S N R 0.3727 × × R F S I M 0.5523 × S R S I M 0.5783 × S S I M 0.2377 × U Q I 0.3083 × × V I F 0.5273 × V S N R 0.0292 × I F S 1.9221 × S F F 0.0902 ,
O F I Q A C S I Q f m i n = V S I 0.4328 × F S I M 0.4287 × G S M 0.0521 × I F C 0.0569 × × I W S S I M 1.4847 × M A D 0.982 × M S S S I M 0.3763 × N Q M 0.3919 × × P S N R 0.2141 × R F S I M 0.1544 × S R S I M 0.7473 × U Q I 0.2373 × × V I F 0.2373 × V I F 0.2243 × V S N R 0.1147 × I F S 1.3102 × S F F 1.7745 .

4. Results

In this section, the experimental numerical results are presented. First, the used benchmark IQA databases are introduced in Section 4.1. Next, Section 4.2 defines the applied evaluation metrics. A parameter study with respect to the applied optimization methods is presented in Section 4.3. Finally, the results of a comparison to the state of the art is given in Section 4.4.

4.1. Databases

For evaluation, four IQA benchmark databases are used, i.e., LIVE (Laboratory for Image and Video Engineering) [4], TID2013 (Tampere Image Database) [76], TID2008 [77], and CSIQ (Categorical Image Quality) [33], which contain a small set of reference images (whose perceptual qualities are considered perfect) and a large set of quality annotated distorted images generated from the reference images using different distortion types at different distortion levels. The main characteristics of the applied databases are given in Table 2.

4.2. Evaluation Metrics

In this study, four different performance indices, i.e., root mean square error (RMSE), Pearson’s linear correlation coefficient (PLCC), Spearman’s rank-order correlation coefficient (SROCC), and Kendall’s rank-order correlation coefficient (KROCC), are applied to characterize the performance of the proposed fusion-based metric and other considered state-of-the-art methods in an ablation study and a comparison to the state of the art. The RMSE and PLCC are calculated after a non-linear mapping of the vector of predicted scores. This mapping has already been given by Equation (8). The RMSE is given as
R M S E ( Q p , S ) = ( Q p S ) T ( Q p S ) m
where Q p is the vector of predicted scores after the non-linear mapping, S is the vector of ground-truth scores, and m denotes the number of samples (in this case images). PLCC is given as
P L C C ( Q p , S ) = Q ¯ p T S ¯ Q ¯ p T Q ¯ p S ¯ T S ¯
where Q ¯ p and S ¯ are mean-removed vectors. SROCC is defined as
S R O C C ( Q , S ) = 1 6 × i = 1 m d i 2 m ( m 2 1 ) ,
where d i stands for the difference between Q and S at the ith entry. KROCC is defined as
K R O C C ( Q , S ) = m c m d 1 2 m ( m 1 ) ,
where m c and m d are the number of concordant and discordant pairs in the database, respectively.
In Table 3, the details of the computer configuration applied in our experiments are given.

4.3. Ablation Study

In this subsection, an ablation study is carried out with respect to the applied optimization method. As already mentioned, a GA [70], a PS [71], and the simplex method of Lagarias et al. [72] (implemented in MATLAB’s fminsearch) were considered. Further, the obtained metrics have already been given by Equations (9)–(20). The results in terms of the RMSE and SROCC are summarized in Table 4 and Table 5. From these numerical results, it can be clearly seen that the weights obtained by the GA are significantly better than those of the other two methods. Namely, they provide a consistently good performance in terms of the RMSE and SROCC on all the databases, while the other weights fail to provide a good performance in several cases and sometimes give an unacceptably high RMSE or a low SROCC (the underlined values in Table 4).

4.4. Comparison to the State of the Art

In this subsection, the proposed FR-IQA metrics are compared to a set of state-of-the-art methods (2stepQA [78], CSV [79], DISTS [80], ESSIM [24], FSIM [39], FSIMc [39], GSM [63], IFC [9], IFS [81], IW-SSIM [26], MAD [33], MS-SSIM [8], NQM [64], PSNR [21], ReSIFT [82], RFSIM [37], RVSIM [83], SFF [65], SR-SIM [66], SSIM [13], SUMMER [84], VIF [62], and VSI [68]), whose MATLAB source codes were made available by researchers. Further, we reimplemented the SSIM-CNN method proposed by Amirshahi et al. [16] and made it available for the research community (https://github.com/Skythianos/SSIM-CNN, accessed on 1 January 2023). In addition to this, the results of the recently published GP-SSIM [85] and the results of deep learning-based DeepSIM [86], DeepIQA [44], PieAPP [46], and LPIPS [45] are also included in our comparison based on Bakurov et al.’s [85] study. Our proposed fusion-based FR-IQA metrics have also been exactly given by Equations (9)–(20) and are used in this comparison.
The numerical results measured on different databases are summarized in Table 6, Table 7 and Table 8. From these results, it can be seen that O F I Q A T I D 2013 G A is able to provide the lowest error on TID2013 [76] and the second lowest on TID2008 [77], respectively. On the other hand, O F I Q A T I D 2008 G A gives the lowest, second lowest, and third lowest error on TID2008 [77], CSIQ [33], and TID2013 [76], respectively. The RMSE values for each database are summarized in Figure 3, Figure 4, Figure 5 and Figure 6. If we take a look at the correlation strengths, we can observe the following. On LIVE [4], the deep learning-based DeepSIM provides the highest correlation performance. However, the proposed O F I Q A C S I Q G A ’s results closely follow those of DeepSIM. Namely, the differences between the two methods are 0.01 and 0.02 in terms of PLCC and SROCC, respectively. Further, O F I Q A T I D 2008 G A provides the third highest results in terms of PLCC and SROCC on CLIVE [4]. On TID2013 [76], O F I Q A T I D 2013 G A gives the highest results in terms of PLCC and KROCC, while O F I Q A T I D 2008 G A provides the third highest PLCC. On TID2008 [77], O F I Q A T I D 2008 G A has the highest SROCC and KROCC values and it is outperformed by PLCC in the deep learning-based DeepIQA. Table 9 illustrates a summary of the direct and weighted averages of the correlation performance indices on the considered databases. It can be seen that O F I Q A T I D 2008 G A has the highest performance in terms of PLCC and KROCC, if we consider the direct averages of the correlation strengths. O F I Q A T I D 2008 G A preserves it first in terms of PLCC and gives the second highest performance of KROCC, if we consider the weighted averages. O F I Q A T I D 2013 G A is the second best in terms of PLCC/SROCC and the third best in terms of KROCC. In the weighted averages, the VSI [68] is the best in terms of SROCC and the third best in terms of PLCC/KROCC. However, it gives a higher RMSE on the considered IQA databases than O F I Q A T I D 2013 G A which is the second best weighted SROCC.

5. Conclusions

This paper proposed a novel decision-fusion framework based on optimization for FR-IQA. First, the fusion-based metric was defined as a weighted product of n different traditional FR-IQA measures following Okarma’s [7] ideas. Next, an optimization problem was specified using SROCC, KROCC, and the RMSE between the predicted and the ground-truth quality scores in the objective function to maximize the correlation strength and minimize the error. Then, several optimization techniques were applied to determine the weights in the weighted product of quality measures. To obtain a fusion-based metric with its parameters, 20% of the reference images and their distorted counterparts were used. Our analysis revealed that a GA is a suitable choice to solve the defined optimization problem. The experimental results and a comparison to the state of the art on four large, widely accepted benchmark databases, LIVE [4], TID2013 [76], TID2008 [77], and CSIQ [33], uncovered that the FR-IQA metrics coming from our optimization-based framework are able to outperform other traditional and deep learning-based state-of-the-art algorithms. Future research may involve the usage of other objective functions or multi-objective optimization.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

In this paper, the following publicly available benchmark databases were used: 1. LIVE: https://live.ece.utexas.edu/research/quality/subjective.htm (accessed on 12 April 2023), 2. TID2013: http://www.ponomarenko.info/tid2013.htm (accessed on 12 April 2023), 3. TID2008: http://www.ponomarenko.info/tid2008.htm (accessed on 12 April 2023), and 4. CSIQ: https://isp.uv.es/data_quality.html (accessed on 12 April 2023).

Acknowledgments

We thank the anonymous reviewers and the academic editor for their careful reading of our manuscript and their many insightful comments and suggestions.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CNNconvolutional neural network
CPUcentral processing unit
CQMcombined quality metric
CSIQcategorical image quality
FRfull reference
FR-IQAfull-reference image quality assessment
GAgenetic algorithm
GPUgraphics processing unit
HIKhistogram intersection kernel
HVShuman visual system
IQAimage quality assessment
KROCCKendall’s rank-order correlation coefficient
LIVElaboratory for image and video engineering
NRno reference
PLCCPearson’s linear correlation coefficient
PSpattern search
RMSEroot mean square error
RRreduced reference
SROCCSpearman’s rank-order correlation coefficient
SSIMstructural similarity index
TIDTampere image database

References

  1. Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Comparison of full-reference image quality models for optimization of image processing systems. Int. J. Comput. Vis. 2021, 129, 1258–1281. [Google Scholar] [CrossRef]
  2. Chami, Z.A.; Jaoude, C.A.; Chbeir, R.; Barhamgi, M.; Alraja, M.N. A No-Reference and Full-Reference image quality assessment and enhancement framework in real-time. Multimed. Tools Appl. 2022, 81, 32491–32517. [Google Scholar] [CrossRef]
  3. Saupe, D.; Hahn, F.; Hosu, V.; Zingman, I.; Rana, M.; Li, S. Crowd workers proven useful: A comparative study of subjective video quality assessment. In Proceedings of the QoMEX 2016: 8th International Conference on Quality of Multimedia Experience, Lisbon, Portugal, 6–8 June 2016. [Google Scholar]
  4. Sheikh, H.R.; Sabir, M.F.; Bovik, A.C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 2006, 15, 3440–3451. [Google Scholar] [CrossRef] [PubMed]
  5. Lin, H.; Hosu, V.; Saupe, D. KADID-10k: A large-scale artificially distorted IQA database. In Proceedings of the 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 5–7 June 2019; pp. 1–3. [Google Scholar]
  6. Asadi, H.; Mohamed, S.; Lim, C.P.; Nahavandi, S. A review on otolith models in human perception. Behav. Brain Res. 2016, 309, 67–76. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Okarma, K. Combined full-reference image quality metric linearly correlated with subjective assessment. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 13–17 June 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 539–546. [Google Scholar]
  8. Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
  9. Sheikh, H.R.; Bovik, A.C.; De Veciana, G. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process. 2005, 14, 2117–2128. [Google Scholar] [CrossRef] [Green Version]
  10. Mansouri, A.; Aznaveh, A.M.; Torkamani-Azar, F.; Jahanshahi, J.A. Image quality assessment using the singular value decomposition theorem. Opt. Rev. 2009, 16, 49–53. [Google Scholar] [CrossRef]
  11. Oszust, M. Full-reference image quality assessment with linear combination of genetically selected quality measures. PLoS ONE 2016, 11, e0158333. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Bakurov, I.; Buzzelli, M.; Schettini, R.; Castelli, M.; Vanneschi, L. Structural similarity index (SSIM) revisited: A data-driven approach. Expert Syst. Appl. 2022, 189, 116087. [Google Scholar] [CrossRef]
  13. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
  14. Okarma, K.; Lech, P.; Lukin, V.V. Combined Full-Reference Image Quality Metrics for Objective Assessment of Multiply Distorted Images. Electronics 2021, 10, 2256. [Google Scholar] [CrossRef]
  15. Lukin, V.V.; Ponomarenko, N.N.; Ieremeiev, O.I.; Egiazarian, K.O.; Astola, J. Combining full-reference image visual quality metrics by neural network. In Proceedings of the Human Vision and Electronic Imaging XX, SPIE, San Francisco, CA, USA, 8–12 February 2015; Volume 9394, pp. 172–183. [Google Scholar]
  16. Amirshahi, S.A.; Pedersen, M.; Beghdadi, A. Reviving traditional image quality metrics using CNNs. In Proceedings of the Color and Imaging Conference. Society for Imaging Science and Technology, Vancouver, BC, Canada, 12–16 November 2018; Volume 2018, pp. 241–246. [Google Scholar]
  17. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef] [Green Version]
  18. Okarma, K. Combined image similarity index. Opt. Rev. 2012, 19, 349–354. [Google Scholar] [CrossRef]
  19. Okarma, K. Extended hybrid image similarity–combined full-reference image quality metric linearly correlated with subjective scores. Elektron. Elektrotechnika 2013, 19, 129–132. [Google Scholar] [CrossRef] [Green Version]
  20. Girod, B. What’s wrong with mean-squared error? In Digital Images and Human Vision; MIT Press: Cambridge, MA, USA, 1993; pp. 207–220. [Google Scholar]
  21. Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
  22. Lin, W.; Kuo, C.C.J. Perceptual visual quality metrics: A survey. J. Vis. Commun. Image Represent. 2011, 22, 297–312. [Google Scholar] [CrossRef]
  23. Sampat, M.P.; Wang, Z.; Gupta, S.; Bovik, A.C.; Markey, M.K. Complex wavelet structural similarity: A new image similarity index. IEEE Trans. Image Process. 2009, 18, 2385–2401. [Google Scholar] [CrossRef] [PubMed]
  24. Zhang, X.; Feng, X.; Wang, W.; Xue, W. Edge strength similarity for image quality assessment. IEEE Signal Process. Lett. 2013, 20, 319–322. [Google Scholar] [CrossRef]
  25. Chen, G.H.; Yang, C.L.; Xie, S.L. Gradient-based structural similarity for image quality assessment. In Proceedings of the 2006 International Conference on Image Processing, Atlanta, GA, USA, 8–11 October 2006; pp. 2929–2932. [Google Scholar]
  26. Wang, Z.; Li, Q. Information content weighting for perceptual image quality assessment. IEEE Trans. Image Process. 2010, 20, 1185–1198. [Google Scholar] [CrossRef]
  27. Li, C.; Bovik, A.C. Content-weighted video quality assessment using a three-component image model. J. Electron. Imaging 2010, 19, 011003. [Google Scholar]
  28. Rouse, D.M.; Hemami, S.S. Understanding and simplifying the structural similarity metric. In Proceedings of the 2008 15th IEEE International Conference on Image Processing, San Diego, CA, USA, 12–15 October 2008; pp. 1188–1191. [Google Scholar]
  29. Nilsson, J.; Akenine-Möller, T. Understanding ssim. arXiv 2020, arXiv:2006.13846. [Google Scholar]
  30. Pedersen, M.; Hardeberg, J.Y. Full-reference image quality metrics: Classification and evaluation. Found. Trends® Comput. Graph. Vis. 2012, 7, 1–80. [Google Scholar]
  31. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. A comprehensive evaluation of full reference image quality assessment algorithms. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 1477–1480. [Google Scholar]
  32. Kingsbury, N. The dual-tree complex wavelet transform: A new efficient tool for image restoration and enhancement. In Proceedings of the 9th European Signal Processing Conference (EUSIPCO 1998), Rhodes, Greece, 8–11 September 1998; pp. 1–4. [Google Scholar]
  33. Larson, E.C.; Chandler, D.M. Most apparent distortion: Full-reference image quality assessment and the role of strategy. J. Electron. Imaging 2010, 19, 011006. [Google Scholar]
  34. Kolaman, A.; Yadid-Pecht, O. Quaternion structural similarity: A new quality index for color images. IEEE Trans. Image Process. 2011, 21, 1526–1536. [Google Scholar] [CrossRef]
  35. Lukin, V.; Bataeva, E.; Abramov, S. Saliency map in image visual quality assessment and processing. Radioelectron. Comput. Syst. 2023, 1, 112–121. [Google Scholar] [CrossRef]
  36. Gu, K.; Zhai, G.; Lin, W.; Yang, X.; Zhang, W. Visual saliency detection with free energy theory. IEEE Signal Process. Lett. 2015, 22, 1552–1555. [Google Scholar] [CrossRef]
  37. Zhang, L.; Zhang, L.; Mou, X. RFSIM: A feature based image quality assessment metric using Riesz transforms. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 321–324. [Google Scholar]
  38. Martin, G.; Iwaniec, T. Riesz transforms and related singular integrals. J. Reine Angew. Math. (Crelles J.) 1996, 1996, 25–58. [Google Scholar]
  39. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [Green Version]
  40. Kovesi, P. Image features from phase congruency. Videre J. Comput. Vis. Res. 1999, 1, 1–26. [Google Scholar]
  41. Götz-Hahn, F.; Hosu, V.; Lin, H.; Saupe, D. KonVid-150k: A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild. IEEE Access 2021, 9, 72139–72160. [Google Scholar] [CrossRef]
  42. Hosu, V.; Hahn, F.; Jenadeleh, M.; Lin, H.; Men, H.; Szirányi, T.; Li, S.; Saupe, D. The Konstanz natural video database (KoNViD-1k). In Proceedings of the 2017 Ninth international conference on quality of multimedia experience (QoMEX), Erfurt, Germany, 31 May–2 June 2017; pp. 1–6. [Google Scholar]
  43. Tang, Z.; Zheng, Y.; Gu, K.; Liao, K.; Wang, W.; Yu, M. Full-reference image quality assessment by combining features in spatial and frequency domains. IEEE Trans. Broadcast. 2018, 65, 138–151. [Google Scholar] [CrossRef]
  44. Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 2017, 27, 206–219. [Google Scholar] [CrossRef] [Green Version]
  45. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
  46. Prashnani, E.; Cai, H.; Mostofi, Y.; Sen, P. Pieapp: Perceptual image-error assessment through pairwise preference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1808–1817. [Google Scholar]
  47. Amirshahi, S.A.; Pedersen, M.; Yu, S.X. Image quality assessment by comparing CNN features between images. J. Imaging Sci. Technol. 2016, 60, 60410-1–60410-10. [Google Scholar] [CrossRef]
  48. Barla, A.; Franceschi, E.; Odone, F.; Verri, A. Image kernels. In Proceedings of the International Workshop on Support Vector Machines, Niagara Falls, ON, Canada, 10 August 2002; Springer: Berlin, Germany, 2002; pp. 83–96. [Google Scholar]
  49. Ai, D.; Liu, Y.; Yang, Y.; Lu, M.; Liu, Y.; Ling, N. A Full-Reference Image Quality Assessment Method with Saliency and Error Feature Fusion. In Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 28 May–1 June 2022; pp. 3165–3169. [Google Scholar]
  50. Oszust, M. Image quality assessment with lasso regression and pairwise score differences. Multimed. Tools Appl. 2017, 76, 13255–13270. [Google Scholar] [CrossRef] [Green Version]
  51. Yuan, Y.; Guo, Q.; Lu, X. Image quality assessment: A sparse learning way. Neurocomputing 2015, 159, 227–241. [Google Scholar] [CrossRef]
  52. Merzougui, N.; Djerou, L. Multi-gene Genetic Programming based Predictive Models for Full-reference Image Quality Assessment. J. Imaging Sci. Technol. 2021, 65, 60409-1–60409-13. [Google Scholar] [CrossRef]
  53. Koza, J.R. Genetic programming as a means for programming computers by natural selection. Stat. Comput. 1994, 4, 87–112. [Google Scholar] [CrossRef]
  54. Merzougui, N.; Djerou, L. Genetic Programming for Screen Content Image Quality Assessment. In Proceedings of the Artificial Intelligence: Theories and Applications: First International Conference, ICAITA 2022, Mascara, Algeria, 7–8 November 2022; Revised Selected Papers. Springer: Berlin, Germany, 2023; pp. 52–64. [Google Scholar]
  55. Varga, D. Full-Reference Image Quality Assessment Based on an Optimal Linear Combination of Quality Measures Selected by Simulated Annealing. J. Imaging 2022, 8, 224. [Google Scholar] [CrossRef] [PubMed]
  56. Ieremeiev, O.; Lukin, V.; Okarma, K.; Egiazarian, K. Full-reference quality metric based on neural network to assess the visual quality of remote sensing images. Remote Sens. 2020, 12, 2349. [Google Scholar] [CrossRef]
  57. Jenadeleh, M. Blind Image and Video Quality Assessment. Ph.D. Thesis, University of Konstanz, Konstanz, Germany, 2018. [Google Scholar]
  58. Men, H. Boosting for Visual Quality Assessment with Applications for Frame Interpolation Methods. Ph.D. Thesis, University of Konstanz, Konstanz, Germany, 2022. [Google Scholar]
  59. Gao, M.-J.; Dang, H.-S.; Wei, L.-L.; Liu, G.-J.; Zhang, X.-D. Review and Prospect of Full Reference Image Quality Assessment. Acta Electonica Sin. 2021, 49, 2261. [Google Scholar]
  60. Phadikar, B.S.; Maity, G.K.; Phadikar, A. Full reference image quality assessment: A survey. In Industry Interactive Innovations in Science, Engineering and Technology, Proceedings of the International Conference, I3SET 2016, Singapore, 10–12 June 2018; Springer: Singapore, 2018; pp. 197–208. [Google Scholar]
  61. George, A.; Livingston, S.J. A survey on full reference image quality assessment algorithms. Int. J. Res. Eng. Technol. 2013, 2, 303–307. [Google Scholar]
  62. Sheikh, H.R.; Bovik, A.C. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [Google Scholar] [CrossRef]
  63. Liu, A.; Lin, W.; Narwaria, M. Image quality assessment based on gradient similarity. IEEE Trans. Image Process. 2011, 21, 1500–1512. [Google Scholar]
  64. Damera-Venkata, N.; Kite, T.D.; Geisler, W.S.; Evans, B.L.; Bovik, A.C. Image quality assessment based on a degradation model. IEEE Trans. Image Process. 2000, 9, 636–650. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Chang, H.W.; Yang, H.; Gan, Y.; Wang, M.H. Sparse feature fidelity for perceptual image quality assessment. IEEE Trans. Image Process. 2013, 22, 4007–4018. [Google Scholar] [CrossRef] [PubMed]
  66. Zhang, L.; Li, H. SR-SIM: A fast and high performance IQA index based on spectral residual. In Proceedings of the 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 1473–1476. [Google Scholar]
  67. Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
  68. Zhang, L.; Shen, Y.; Li, H. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 2014, 23, 4270–4281. [Google Scholar] [CrossRef] [Green Version]
  69. Chandler, D.M.; Hemami, S.S. VSNR: A wavelet-based visual signal-to-noise ratio for natural images. IEEE Trans. Image Process. 2007, 16, 2284–2298. [Google Scholar] [CrossRef]
  70. Goldberg, D. GeneticAtgorithms in Search, Optimization & Machine Learning; Addison-Wesley: Boston, MA, USA, 1989. [Google Scholar]
  71. Audet, C.; Dennis Jr, J.E. Analysis of generalized pattern searches. SIAM J. Optim. 2002, 13, 889–903. [Google Scholar] [CrossRef] [Green Version]
  72. Lagarias, J.C.; Reeds, J.A.; Wright, M.H.; Wright, P.E. Convergence properties of the Nelder–Mead simplex method in low dimensions. SIAM J. Optim. 1998, 9, 112–147. [Google Scholar] [CrossRef] [Green Version]
  73. Shi, C.; Lin, Y. Full reference image quality assessment based on visual salience with color appearance and gradient similarity. IEEE Access 2020, 8, 97310–97320. [Google Scholar] [CrossRef]
  74. Shi, C.; Lin, Y. Image Quality Assessment Based on Three Features Fusion in Three Fusion Steps. Symmetry 2022, 14, 773. [Google Scholar] [CrossRef]
  75. Wu, J.; Lin, W.; Shi, G. Image quality assessment with degradation on spatial structure. IEEE Signal Process. Lett. 2014, 21, 437–440. [Google Scholar] [CrossRef]
  76. Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Jin, L.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. Color image database TID2013: Peculiarities and preliminary results. In Proceedings of the European Workshop on Visual Information Processing (EUVIP), Paris, France, 10–12 June 2013; pp. 106–111. [Google Scholar]
  77. Ponomarenko, N.; Lukin, V.; Zelensky, A.; Egiazarian, K.; Carli, M.; Battisti, F. TID2008-a database for evaluation of full-reference visual quality assessment metrics. Adv. Mod. Radioelectron. 2009, 10, 30–45. [Google Scholar]
  78. Yu, X.; Bampis, C.G.; Gupta, P.; Bovik, A.C. Predicting the quality of images compressed after distortion in two steps. IEEE Trans. Image Process. 2019, 28, 5757–5770. [Google Scholar] [CrossRef]
  79. Temel, D.; AlRegib, G. CSV: Image quality assessment based on color, structure, and visual system. Signal Process. Image Commun. 2016, 48, 92–103. [Google Scholar] [CrossRef] [Green Version]
  80. Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Image quality assessment: Unifying structure and texture similarity. arXiv 2020, arXiv:2004.07728. [Google Scholar] [CrossRef] [PubMed]
  81. Chang, H.w.; Zhang, Q.w.; Wu, Q.g.; Gan, Y. Perceptual image quality assessment by independent feature detector. Neurocomputing 2015, 151, 1142–1152. [Google Scholar] [CrossRef]
  82. Temel, D.; AlRegib, G. ReSIFT: Reliability-weighted sift-based image quality assessment. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 2047–2051. [Google Scholar]
  83. Yang, G.; Li, D.; Lu, F.; Liao, Y.; Yang, W. RVSIM: A feature similarity method for full-reference image quality assessment. EURASIP J. Image Video Process. 2018, 2018, 1–15. [Google Scholar] [CrossRef] [Green Version]
  84. Temel, D.; AlRegib, G. Perceptual image quality assessment through spectral analysis of error representations. Signal Process. Image Commun. 2019, 70, 37–46. [Google Scholar] [CrossRef] [Green Version]
  85. Bakurov, I.; Buzzelli, M.; Schettini, R.; Castelli, M.; Vanneschi, L. Full-Reference Image Quality Expression via Genetic Programming. IEEE Trans. Image Process. 2023, 32, 1458–1473. [Google Scholar] [CrossRef]
  86. Gao, F.; Wang, Y.; Li, P.; Tan, M.; Yu, J.; Zhu, Y. Deepsim: Deep similarity for image quality assessment. Neurocomputing 2017, 257, 104–114. [Google Scholar] [CrossRef]
Figure 1. Twenty percent of the reference images with their corresponding distorted counterparts are selected to determine the parameters of the proposed fusion-based metric in the optimization process. The resulting metric is codenamed as OFIQA.
Figure 1. Twenty percent of the reference images with their corresponding distorted counterparts are selected to determine the parameters of the proposed fusion-based metric in the optimization process. The resulting metric is codenamed as OFIQA.
Jimaging 09 00116 g001
Figure 2. The weighted product of the selected FR-IQA metrics is used to estimate the perceptual quality of a distorted image in the evaluation stage.
Figure 2. The weighted product of the selected FR-IQA metrics is used to estimate the perceptual quality of a distorted image in the evaluation stage.
Jimaging 09 00116 g002
Figure 3. RMSE measured on LIVE [4].
Figure 3. RMSE measured on LIVE [4].
Jimaging 09 00116 g003
Figure 4. RMSE measured on TID2013 [76].
Figure 4. RMSE measured on TID2013 [76].
Jimaging 09 00116 g004
Figure 5. RMSE measured on TID2008 [77].
Figure 5. RMSE measured on TID2008 [77].
Jimaging 09 00116 g005
Figure 6. RMSE measured on CSIQ [33].
Figure 6. RMSE measured on CSIQ [33].
Jimaging 09 00116 g006
Table 1. Acronyms of the used FR-IQA metrics applied in our fusion-based method.
Table 1. Acronyms of the used FR-IQA metrics applied in our fusion-based method.
Method’s AcronymFull Name
FSIM [39]feature similarity index
FSIMc [39]feature similarity index extension
GSM [63]gradient similarity measure
IFC [9]information fidelity criterion
IW-SSIM [26]information content weighted SSIM
MAD [33]most apparent distortion
MS-SSIM [8]multi-scale SSIM
NQM [64]noise quality measure
PSNR [21]peak signal-to-noise ratio
RFSIM [37]Riesz-transform-based feature similarity metric
SFF [65]sparse feature fidelity
SSIM [13]structural similarity index measure
SR-SIM [66]spectral residual-based similarity
UQI [67]universal image quality index
VIF [62]visual information fidelity
VSI [68]visual saliency-induced index
VSNR [69]visual signal-to-noise ratio
Table 2. Applied benchmark IQA databases.
Table 2. Applied benchmark IQA databases.
LIVE [4]TID2013 [76]TID2008 [77]CSIQ [33]
Image resolution 768 × 512 512 × 384 512 × 384 500 × 500
No. of reference images29252530
No. of distorted images77930001700866
No. of distortions524176
No. of levels5544–5
No. of observers16191783835
Table 3. Computer configuration applied in our experiments.
Table 3. Computer configuration applied in our experiments.
Computer modelSTRIX Z270H Gaming
Operating systemWindows 10
Memory15 GB
CPUIntel(R) Core(TM) i7-7700K CPU 4.20 GHz (8 cores)
GPUNvidia GeForce GTX 1080
Table 4. RMSE performance comparison with respect to the applied optimization method applied in the proposed fusion-based FR-IQA metrics. Results are given for LIVE [4], TID2013 [76], TID2008 [77], and CSIQ [33]. The best results for each database are typed in red, the second best results are in green, the third best results are in blue, and the worst results are underlined.
Table 4. RMSE performance comparison with respect to the applied optimization method applied in the proposed fusion-based FR-IQA metrics. Results are given for LIVE [4], TID2013 [76], TID2008 [77], and CSIQ [33]. The best results for each database are typed in red, the second best results are in green, the third best results are in blue, and the worst results are underlined.
FR-IQA MetricLIVE [4]TID2013 [76]TID2008 [77]CSIQ [33]
O F I Q A L I V E G A 7.8950.5930.6410.109
O F I Q A T I D 2013 G A 8.0620.5260.6060.091
O F I Q A T I D 2008 G A 7.0780.5700.5570.068
O F I Q A C S I Q G A 6.9180.6650.6450.067
O F I Q A L I V E P S 6.8600.6751.1160.165
O F I Q A T I D 2013 P S 7.5031.1500.5370.069
O F I Q A T I D 2008 P S 7.6620.7660.5440.081
O F I Q A C S I Q P S 7.8820.6580.6720.124
O F I Q A L I V E f m i n 6.6120.6420.6480.060
O F I Q A T I D 2013 f m i n 16.1950.8560.8340.833
O F I Q A T I D 2008 f m i n 14.4560.8330.6310.630
O F I Q A C S I Q f m i n 7.1670.8120.6800.104
Table 5. SROCC performance comparison with respect to the applied optimization method applied in the proposed fusion-based FR-IQA metrics. Results are given for LIVE [4], TID2013 [76], TID2008 [77], and CSIQ [33]. The best results for each database are typed in red, the second best results are in green, the third best results are in blue, and the worst results are underlined.
Table 5. SROCC performance comparison with respect to the applied optimization method applied in the proposed fusion-based FR-IQA metrics. Results are given for LIVE [4], TID2013 [76], TID2008 [77], and CSIQ [33]. The best results for each database are typed in red, the second best results are in green, the third best results are in blue, and the worst results are underlined.
FR-IQA MetricLIVE [4]TID2013 [76]TID2008 [77]CSIQ [33]
O F I Q A L I V E G A 0.9610.8630.8880.938
O F I Q A T I D 2013 G A 0.9570.8900.9040.923
O F I Q A T I D 2008 G A 0.9670.8250.9110.964
O F I Q A C S I Q G A 0.9720.8080.8820.965
O F I Q A L I V E P S 0.9680.7900.5850.807
O F I Q A T I D 2013 P S 0.9650.8260.9140.962
O F I Q A T I D 2008 P S 0.9630.8220.9150.944
O F I Q A C S I Q P S 0.9650.7940.8610.964
O F I Q A L I V E f m i n 0.9720.8640.8670.969
O F I Q A T I D 2013 f m i n 0.7810.8450.7690.496
O F I Q A T I D 2008 f m i n 0.8340.7170.8880.586
O F I Q A C S I Q f m i n 0.9740.8230.8460.955
Table 6. RMSE performance comparison of the proposed fusion-based FR-IQA metrics with the state of the art on LIVE [4], TID2013 [76], TID2008 [77], and CSIQ [33]. The best results are typed in red, the second best results are in green, and the third best results are in blue.
Table 6. RMSE performance comparison of the proposed fusion-based FR-IQA metrics with the state of the art on LIVE [4], TID2013 [76], TID2008 [77], and CSIQ [33]. The best results are typed in red, the second best results are in green, and the third best results are in blue.
FR-IQA MetricLIVE [4]TID2013 [76]TID2008 [77]CSIQ [33]
2stepQA [78]7.8560.7760.8610.139
CSV [79]5.9450.6240.6770.091
DISTS [80]6.0050.5410.5580.091
ESSIM [24]7.6890.6300.6380.101
FSIM [39]7.6780.6350.6530.108
FSIMc [39]7.5300.5960.6470.103
GSM [63]8.4330.7230.8470.115
IFC [9]8.0010.7560.7830.115
IFS [81]7.7760.5910.6350.076
IW-SSIM [26]8.3470.6880.6900.115
MAD [33]6.9070.6980.7470.082
MS-SSIM [8]8.6190.6860.7170.115
NQM [64]11.3471.1231.3820.153
PSNR [21]13.5961.2401.0990.158
ReSIFT [82]6.1450.6400.7790.153
RFSIM [37]6.9890.6570.7780.148
RVSIM [83]7.9270.7720.8130.101
SFF [65]7.3460.6100.6330.070
SR-SIM [66]8.0810.6350.6210.100
SSIM [13]9.8310.9681.1600.161
SSIM-CNN [16]6.9670.8560.7050.146
SUMMER [84]6.0020.6380.7770.151
VIF [62]9.2400.7860.7900.098
VSI [68]8.6810.5800.6470.098
GP-SSIM [85]----
DeepSIM [86]----
DeepIQA [44]----
PieAPP [46]----
LPIPS [45]----
O F I Q A L I V E G A 7.8950.5930.6410.109
O F I Q A T I D 2013 G A 8.0620.5260.6060.091
O F I Q A T I D 2008 G A 7.0780.5700.5570.068
O F I Q A C S I Q G A 6.9180.6650.6450.067
Table 7. PLCC, SROCC, and KROCC performance comparison of the proposed fusion-based FR-IQA metrics with the state of the art on LIVE [4] and TID2013 [76]. The best results are typed in red, the second best results are in green, and the third best results are in blue.
Table 7. PLCC, SROCC, and KROCC performance comparison of the proposed fusion-based FR-IQA metrics with the state of the art on LIVE [4] and TID2013 [76]. The best results are typed in red, the second best results are in green, and the third best results are in blue.
LIVE [4]TID2013 [76]
FR-IQA Metric PLCC SROCC KROCC PLCC SROCC KROCC
2stepQA [78]0.9370.9320.8280.7360.7330.550
CSV [79]0.9670.9590.8340.8520.8480.657
DISTS [80]0.9540.9540.8110.7590.7110.524
ESSIM [24]0.9630.9620.8400.7400.7970.627
FSIM [39]0.9600.9630.8330.8590.8020.629
FSIMc [39]0.9610.9650.8360.8770.8510.667
GSM [63]0.9440.9550.8310.7890.7870.593
IFC [9]0.9270.9260.7580.5540.5390.394
IFS [81]0.9590.9600.8250.8790.8700.679
IW-SSIM [26]0.9520.9560.8170.8320.7780.598
MAD [33]0.9670.9670.8420.8270.7780.600
MS-SSIM [8]0.9410.9510.8040.7940.7850.604
NQM [64]0.9120.9090.7410.6900.6430.474
PSNR [21]0.8720.8760.6870.6160.6460.467
ReSIFT [82]0.9610.9620.8380.6300.6230.471
RFSIM [37]0.9350.9400.7820.8330.7740.595
RVSIM [83]0.6410.6300.4950.7630.6830.520
SFF [65]0.9630.9650.8360.8710.8510.658
SR-SIM [66]0.9550.9620.8290.8590.8000.631
SSIM [13]0.9410.9510.8040.6180.6160.437
SSIM-CNN [16]0.9650.9630.8380.7590.7520.566
SUMMER [84]0.9670.9590.8330.6230.6220.472
VIF [62]0.9410.9640.8280.7740.6770.515
VSI [68]0.9480.9520.8050.9000.8940.677
GP-SSIM [85]0.9080.918-0.8460.808-
DeepSIM [86]0.9680.974-0.8720.846-
DeepIQA [44]0.9400.947-0.8340.831-
PieAPP [46]0.9080.919-0.8590.876-
LPIPS [45]0.9320.934-0.7490.670-
O F I Q A L I V E G A 0.9570.9610.8280.8780.8630.672
O F I Q A T I D 2013 G A 0.9560.9570.8140.9060.8900.713
O F I Q A T I D 2008 G A 0.9660.9670.8390.8880.8250.651
O F I Q A C S I Q G A 0.9670.9720.8540.8440.8080.634
Table 8. PLCC, SROCC, and KROCC performance comparison of the proposed fusion-based FR-IQA metrics with the state of the art on TID2008 [77] and CSIQ [33]. The best results are typed in red, the second best results are in green, and the third best results are in blue.
Table 8. PLCC, SROCC, and KROCC performance comparison of the proposed fusion-based FR-IQA metrics with the state of the art on TID2008 [77] and CSIQ [33]. The best results are typed in red, the second best results are in green, and the third best results are in blue.
TID2008 [77]CSIQ [33]
FR-IQA Metric PLCC SROCC KROCC PLCC SROCC KROCC
2stepQA [78]0.7570.7690.5740.8410.8490.655
CSV [79]0.8520.8480.6570.9330.9330.766
DISTS [80]0.7050.6680.4880.9300.9300.764
ESSIM [24]0.6580.8760.6960.8140.9330.768
FSIM [39]0.8740.8810.6950.9120.9240.757
FSIMc [39]0.8760.8840.6990.9190.9310.769
GSM [63]0.7820.7810.5780.8960.9110.737
IFC [9]0.5750.5680.4240.8370.7670.590
IFS [81]0.8790.8690.6780.9580.9580.817
IW-SSIM [26]0.8420.8560.6640.8040.9210.753
MAD [33]0.8310.8290.6390.9500.9470.797
MS-SSIM [8]0.8380.8460.6480.8990.9130.739
NQM [64]0.6080.6240.4610.7430.7400.564
PSNR [21]0.4470.4890.3460.8530.8090.599
ReSIFT [82]0.6270.6320.4840.8840.8680.695
RFSIM [37]0.8650.8680.6780.9120.9300.765
RVSIM [83]0.7890.7430.5660.9230.9030.728
SFF [65]0.8710.8510.6580.9640.9600.826
SR-SIM [66]0.8590.7990.6310.9250.9320.773
SSIM [13]0.6690.6750.4850.8120.8120.606
SSIM-CNN [16]0.7700.7370.5510.9520.9460.794
SUMMER [84]0.8170.8230.6230.8260.8300.658
VIF [62]0.8080.7490.5860.9280.9200.754
VSI [68]0.8980.8960.7090.9280.9420.785
GP-SSIM [85]0.8590.892-0.9280.953-
DeepSIM [86]0.8760.887-0.9190.919-
DeepIQA [44]0.9170.908-0.9010.909-
PieAPP [46]0.6100.788-0.8770.892-
LPIPS [45]0.7720.731-0.8960.876-
O F I Q A L I V E G A 0.8790.8880.7000.9100.9380.786
O F I Q A T I D 2013 G A 0.8920.9040.7220.9380.9230.754
O F I Q A T I D 2008 G A 0.9100.9110.7380.9660.9640.833
O F I Q A C S I Q G A 0.8770.8820.6930.9670.9650.835
Table 9. PLCC, SROCC, and KROCC performance comparison of the proposed fusion-based FR-IQA metrics with the state of the art. The best results are typed in red, the second best results are in green, and the third best results are in blue.
Table 9. PLCC, SROCC, and KROCC performance comparison of the proposed fusion-based FR-IQA metrics with the state of the art. The best results are typed in red, the second best results are in green, and the third best results are in blue.
Direct AverageWeighted Average
FR-IQA Metric PLCC SROCC KROCC PLCC SROCC KROCC
2stepQA [78]0.8180.8210.6520.7810.7830.605
CSV [79]0.9010.8970.7290.8770.8730.694
DISTS [80]0.8370.8160.6470.7920.7590.582
ESSIM [24]0.7940.8920.7330.7560.8570.691
FSIM [39]0.9010.8930.7290.8830.8600.689
FSIMc [39]0.9080.9080.7430.8930.8850.710
GSM [63]0.8530.8590.6850.8210.8230.638
IFC [9]0.7230.7000.5420.6440.6250.473
IFS [81]0.9190.9140.7500.9000.8930.715
IW-SSIM [26]0.8570.8780.7080.8460.8400.664
MAD [33]0.8940.8800.7200.8620.8380.667
MS-SSIM [8]0.8680.8740.6990.8380.8390.659
NQM [64]0.7380.7290.5600.7030.6840.516
PSNR [21]0.6970.7050.5250.6340.6540.480
ReSIFT [82]0.7760.7710.6220.7050.7000.550
RFSIM [37]0.8860.8780.7050.8650.8410.663
RVSIM [83]0.7790.7400.5770.7770.7230.558
SFF [65]0.9170.9080.7450.8950.8800.703
SR-SIM [66]0.9000.8730.7160.8800.8380.675
SSIM [13]0.7600.7640.5830.6980.7000.518
SSIM-CNN [16]0.8610.8490.6870.8140.8000.626
SUMMER [84]0.8080.8090.6470.7450.7460.582
VIF [62]0.8630.8280.6710.8250.7650.605
VSI [68]0.9190.9210.7440.9090.9080.716
GP-SSIM [85]0.8850.893-0.8680.864-
DeepSIM [86]0.9090.907-0.8910.883-
DeepIQA [44]0.8980.899-0.8780.877-
PieAPP [46]0.8140.869-0.8010.860-
LPIPS [45]0.8370.803-0.7980.747-
O F I Q A L I V E G A 0.9060.9130.7470.8920.8920.714
O F I Q A T I D 2013 G A 0.9230.9190.7510.9130.9060.733
O F I Q A T I D 2008 G A 0.9330.9170.7650.9140.8840.722
O F I Q A C S I Q G A 0.9140.9070.7540.8850.8690.704
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Varga, D. An Optimization-Based Family of Predictive, Fusion-Based Models for Full-Reference Image Quality Assessment. J. Imaging 2023, 9, 116. https://doi.org/10.3390/jimaging9060116

AMA Style

Varga D. An Optimization-Based Family of Predictive, Fusion-Based Models for Full-Reference Image Quality Assessment. Journal of Imaging. 2023; 9(6):116. https://doi.org/10.3390/jimaging9060116

Chicago/Turabian Style

Varga, Domonkos. 2023. "An Optimization-Based Family of Predictive, Fusion-Based Models for Full-Reference Image Quality Assessment" Journal of Imaging 9, no. 6: 116. https://doi.org/10.3390/jimaging9060116

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop