s common to measure image quality with a single metric like contrast, blurriness etc.goal is to provide a more complex and formal definition of human quality perception by identifying the top factors responsible for visual quality. To eliminate any subjectivity, we consider quality as an objective, non-reference multidimensional measure, that we want to be able to compute independently without comparing the image to the others. Our practical goal is to find a restricted set of features that are most responsible for quality perception. Such a set would become a first step in solving a practical issue of creating a useful tool for displaying medical images improving their quality.
1. Non-reference image quality measures
of research published on image quality uses quality measures estimated for original image and its distorted copies []. In this study we use so called non-reference measures when quality is estimated for single image independently. We use a number of previously developed measures and a number of basic measures like contrast as described below.
1.1 Blurriness measures
partly blurred image affects human perception of quality. That is why we consider blurriness as an important factor of image quality perception. In this work we use two different blurriness measures.
The first one described by F. Crete and T. Dolmiere [1] uses low-pass filter and is based on principle that gray level of neighboring pixels in a less blurred image changes with higher variation than in its blurred copy. So, they compute absolute vertical and horizontal difference D for neighboring pixels in original and blurred images (1):
(Eq. 1a)
(Eq. 1b)
I(x,y) is the intensity value at the (x,y) pixel, h and w are height and width of image. After that, variation of neighboring pixels before and after blurring needs to be analyzed: if variation is high, the original image is considered to be sufficiently sharp. To evaluate variation, we consider only the differences that decreased, and obtain variation V for vertical and horizontal directions (2):
(Eq. 2)
where DB_ver(x,y) is the absolute difference for blurred image B., blurriness for vertical direction is computed:
(Eq. 3)
blurriness is computed in the same way. Finally, maximum of two is selected as the final blurriness measure: Fblur = max(Fblur_hor, Fblur_ver). Further we will write it as Fblur_1.
Another blurriness measure was presented by Min Goo Choi [2], based on edge extraction using intensity gradient. The authors define horizontal and vertical absolute difference value of a pixel computed as a difference between its left and right or upper and lower neighboring pixels. Then they obtain the mean horizontal and vertical absolute differences Dhor_mean for the entire image as in (Eq. 4).
(Eq. 4)
each pixel value is compared with mean absolute horizontal difference values computed for the whole image to select edge candidates as Chor(x,y):
(Eq. 5)
candidate pixel Chor(x,y) has absolute horizontal value larger than its horizontal neighbors, this pixel will be classified as edge pixel Ehor(x,y) as shown in (6).
(Eq. 6)
edge pixel is examined to find whether it corresponds to a blurred edge or not. First, horizontal blurriness of a pixel is computed according to (7).
(Eq. 7)
value is obtained in the same way, maximum of two is selected for final decision. Pixel is considered blurred if its value is larger than a predefined threshold (0.1 suggested in the paper).
(Eq. 8)
, the resulting measure of blurriness for the whole image is called inversed blurriness and is computed as a ratio of blurred edged pixels count to edge pixels count (9).
(Eq. 9)
we will term this measure Fblur_2 to discern it from blur described in [1].assume that increase of blurriness should negatively affect quality perception because a very blurred image will loose important information and be less attractive.
1.2 Image entropy
basic idea behind entropy is to measure the uncertainty of the image. The more information and less noise the image contains, the more useful it would be, and we might relate image usefulness to its objective quality. In our study Shannon entropy was computed for the entire image, its foreground, and its background according to (Eq. 10).
(Eq. 10)
p(Ik) is the probability of the particular intensity value Ik.assume that higher entropy should mean that more signal is contained in the image. For example, if there are less details and mode plain surfaces, entropy would be less. However, noisy image would have more entropy, so we will consider entropy for three levels of image.
1.3 Segmentation
Presented in [3], it shows how much various segments of image can be separated. We use the simplest yet most intuitive implementation comparing two major segments: image background (seg1) and foreground (seg2). In case of this study we simply computed average intensity value and used classified all pixels with lower intensity as background, while the rest of pixels was foreground. To compute segmentation measure, average difference U for neighboring pixels in 3x3 sliding window is computed for each image segment (Eq. 11):
(Eq. 11)
to the following measure W:
(Eq. 12)
we compute average pixel intensity in each segment and obtain squared difference between average intensities of very pair of segments - in our case there is only one pair. Inversed sum of squared differences of average intensities is called B:
(Eq. 13)
measure is obtained as:
sep = 1000*W+B (Eq. 14)
it will be high for images with high separability between segments and low separability within segment. In our case this measure makes sense only for one set of images depicting trees because another set of medical images mostly presents dark background, which is clearly separated from the foreground.
1.4 Flatness
This measure is described in [4] and uses two-dimensional discrete Fourier transform of the image. First, we obtain 2D Discrete Fourier Transform of the image, which is transformed to one-dimensional vector FV. Next, spectral flatness SF is computed as ratio of geometric to arithmetic mean:
(Eq. 15)
resulting measure proposed in the paper is called entropy power and is obtained as a product of spectral flatness measure SF presented in (15) and image variance as shown:
(Eq. 16)
where is average intensity value for the image. This measure is assumed to be higher for less informative, non-predictive and redundant images.
1.5 Sharpness
This measure [5] is based on assumption that differences of neighboring pixels change more in the areas with sharp edges. Therefore the authors compute second-order difference for the neighboring pixels as a discrete analog of second derivative for the image passed through denoising median filter:
, (Eq. 17)
Im the original image passed through median filterauthors define vertical sharpness for each pixel Sver as shown below:
, (Eq. 18)
each pixel is treated as sharp if its sharpness exceeds 0.0001. Number of sharp pixels NSver is computed, and the edge pixels are found with Canny method, number NEver being their count. Then the same process is repeated in the horizontal direction, and the sharp to edge pixels ratio for vertical and horizontal directions is computed as:
(Eq. 19)
assume that sharper image should be percepted as a more attractive and informative.
1.6 Blockness measure
This measure estimates image from the point of block artifacts [6]. Absolute intensity differences for neighboring pixels are obtained for vertical and horizontal directions as shown in (1), each element of resulting matrix is then normalized:
(Eq. 20)
By taking the average for each column of matrix we obtain the horizontal profile of image Phor as shown in (Eq.21):
(Eq. 21)
vertical profile is assessed in the same way, and1-D DFT is applied to both profiles. Magnitude M of DFT coefficients is than considered:
(Eq. 22)
0 £ T £ w-2.
Vertical blockness measure Bl for the block size Z is computed as shown in (Eq. 23). Due to DFT nature, Mhor(T) will have peaks at T, where number b=1,2…Z. Values for Mhor(T) at these peak points correspond to horizontal blockness of image Blhor:
(Eq. 23)
blockness measure can be obtained similarly. In our study 2, 4, 6 and 8 pixels were used as block width. Resulting measure is shown in (24):
, (Eq. 24)
r and 1-r are weights for horizontal and vertical measures. We use r equal to 0.5. This measure will be higher for images distorted with block artifacts.
1.7 Fractal dimension
idea of possible relation between image quality and amount of image details brings us to the measures of fractal dimensions. We detect main contours in the image using Canny method and then estimate fractal dimension of the obtained curve. We use box-count to compute dimension (Eq. 25). N stands for the number of square blocks with side ε with ε =2, 3, 4, and 5.
assume that higher values measure of fractal dimension would correspond to more informative images containing more information.
1.8 Noise level
It is natural to assume that the presence of noise can be detrimental for the perceived image quality. Therefore we included a noise measure developed by Masayuki T. [7]. In this work, noise level is described as standard deviation of the Gaussian noise. The authors propose a patch-based algorithm. First, the original image is decomposed into overlapping patches, and the model for the whole image is written as pi = zi+ni, where zi is the original image patch with i-th pixel in its center transformed to a one-dimensional vector, and pi is the observed patch (also transformed to vector) distorted by Gaussian noise which is presented as vector ni. To estimate noise level we need to obtain unknown standard deviation using only the observed distorted noisy image.the image patches are treated as data in Euclidean space, its variance can be projected onto single axis which direction is defined by vector u. Variance of data V projected on u can be written as:
(Eq. 26)
where is standard deviation of the Gaussian noise.variance of data direction is than defined using Principal Component Analysis (PCA). First, data covariance matrix π is defined as:
(Eq. 27)
where b is number of patches, m is the average in dataset {pi}. Then the variance of the original data is projected onto minimum variance direction equals the minimum eigenvalue :
, (Eq. 28)
ϕ is covariance matrix for noise-free patches z. noise level can be estimated if we decompose minimum eigenvalue of the noisy patches covariance matrix, which is an ill-disposed problem because minimum eigenvalue for noiseless patches covariance matrix is unknown. Then the authors suggest selecting weak textured patches from noisy images because such patches span low-dimensional space and minimum eigenvalue of their covariance matrix is close to zero, so their noise level Fnoise can be estimated as:
, (Eq. 29)
where π is the covariance matrix for weak textured patches.
Undoubtedly, the most important part of the proposed algorithm is the selection of weak textured patches. The main idea is to compare maximum eigenvalue of gradient covariance matrix of patch with some threshold. Gradient covariance matrix C of patch j is computed as:
(Eq. 30)
Gj = [Dhorj, Dverj] and Dhor and Dver are horizontal and vertical derivative operators. select weak textured patch, statistical hypothesis is tested. Null hypothesis (patch has weak flat texture) is accepted if its gradient covariance matrix Cj maximum eigenvalue is less than threshold. Threshold τ for maximum eigenvalue of gradient covariant matrix can be found as:
(Eq. 31)
where is the significance level (we use 0.99), is the inverse-gamma cumulative distribution function with shape parameter b/2 and scale parameter. Inverse-gamma cumulative distribution function is defined as:
(Eq. 32)
where Γ(.) denotes gamma function, a is a scale parameter, b is a shape parameter. Gamma function for positive integer n is defined as:
(Eq. 33)
assume that noisier images would have worse quality and would be less informative.
1.9 Average gradient and edge intensity
Both measures are taken from [8]. Average gradient FAG shows how pixel values change on average for vertical and horizontal directions according to:
(Eq. 34)
intensity FEI is computed as:
Gver and Ghor are vertical and horizontal gradients obtained as: (Eq. 35)
(Eq. 36)
(Eq. 37)
, we use a number of simple image quality metrics. First of all, average intensity FAI is computed as:
(Eq. 38)
image contrast FC and contrast per pixel FCPP are obtained as:
(Eq. 39)
(Eq. 40)
Table 1. Correspondence between described measures and names of features in our dataset. 0, 1 and 2 prefixes relate to images on three levels of Laplacian pyramid
MetricNameCorresponding variablesNo-reference blur metric Fblur1Blur10, blur11, blur12Min Goo Choi methodFblur2Blur20, blur21, blur22Shannon entropyFent Ent10, ent11, ent12Local Shannon entropyEntB0, entF0Separability measureFsepSep0, sep1, sep2Flatness measureFflatFlat0, Flat1, Flat2SharpnessFsharpSharp0, sharp1, sharp2ContrastFCContr20, contr21, contr22Blockness measureFblockBlock20, block40, block60, block80, block21 etcFractal dimensionFfracFrac0, frac1, frac2Average intensityFAIIntens0, intens1, intens2Noise levelFnoiseNoise0, noise1, noise2CPP - contrast per pixelFCPPContr10, contr11, contr12Average gradientFAGAG0, AG1, AG2Edge intensityFEIEI0, EI1, EI2
2. Research design, data collecting and image markup
order to evaluate the performance of various quality measures and validate the results, we used two datasets of grayscale images of different nature and quality. Each image quality was assessed two times: first by human observers (thus capturing our visual perception of the image quality), and second, but a set of metrics described above. The metrics were applied to the original images as well as their lower-resolution copies, derived with Laplacian pyramid decomposition, which produced the total of 57 quality metric measurement per each image. Our main intention was to find the best sets of numerical metrics that would explain the observed human perception of image quality. image dataset used in this work consisted of similar images: the first set had 50 medical images (CT tomography of an abdomen), and the second - 50 scenery photographs of trees and forest landscapes. We intentionally chose the images of rather abstract and emotion-free nature to exclude any subjective bias in the human perception.
The human perception ranks for the images were obtained with pairwise comparisons between all images in each dataset. The images were presented in random pairs to 15 human spectators, asking them to choose the best of the two. This task was implemented using Amazon Mechanical Turk technology; Figure 1 Mechanical Turk assignment for image markup shows screenshot of assignment. ensure comparison robustness, we used markup with triple overlap: each pair of images was compared three times by different observers; final choice computed using the majority rule. As a result, more than 7000 pairs were presented and compared.get image features, 19 basic quality measures were computed for three copies of each image: the original image and its two lower-resolution derived as two levels of the Laplassian pyramid. The resulting 57 measurements were treated as57-dimensional image feature vectors, used as independent variables in models.
1. Mechanical Turk assignment for image markup
3. Experimental Results
.1 Linear regression with known target variable
the first step of research we are trying to solve our task using known quality measures of every image. In such approach we are trying to fit models to predict known outcome. on the pairwise image comparison results we computed a quality index for every image as the number of this images wins divided by the number of comparisons. This allowed us to put the images in a linear quality order. Note that in general this linear order cannot correspond to all the recorded comparisons: in some instances an image with a higher quality index might have been perceived as inferior when compared with some lower-quality image. This non-linearity in image grades originated from the differences in quality perception between different human observers, and we called such image pairs inverted. Overall, 10% of pairs were inverted in medical dataset and 14% were inverted in trees dataset. linear quality indices (rankings) as a target variable, we implemented linear regression with L2 norm as a basic model. We considered all possible regression models containing various combinations of k, k = 1…57 features, and extracted the best models for each k as providing the least regression error. Note that this resulted in an exhaustive search through millions of possible models (feature combinations), therefore we used branch-and-bound algorithm to speed-up the search, regression error for L2 regression, E, was defined as:
(Eq. 41)
Wp stands for the model-predicted image quality, and W - for the real observed quality. of main goals of the study was to find a set of factors that are responsible for the human perception of the image quality. We validated our feature-modeling results using medical (MS) and trees (TS) image datasets separately to make sure that models that perform well for one dataset would be good for another dataset.
Figure 2 shows various models for 1, 2, 3, 4 and 5 features. We used R squared as a metric to evaluate each model as a measure of the fraction of the original data variation explained by model. Treating the concept of image quality as a function of our visual perception rather than image selection, we therefore assumed that a good model should perform well for both MS and TS datasets. Figure 2 Regression models for both datasets visualizes our results. As one can see, R squared is not increasing dramatically after using more than 6 features, so we show only the models with up to 5 predictors. Circle sizes correspond to average error in each model. Largest circles are close to 0.27 while the best models have errors close to 0.08.
2 Regression models for both datasets
can also observe that the circles on the plot tend to cluster along the diagonal line, which means that most models perform similarly on both MS and TS datasets. Moreover, the higher is k (the number of model features/predictors), the closer are circles to the diagonal line. As a result, higher k generally corresponds to more accurate and more image-independent models, which can provide optimal quality predictions for both MS and TS sets.
Figures 3 a, b illustrate best models obtained for MS and TS independently. As the figure indicates, the models selected as the best for one dataset perform well on the other. This already can be viewed as a strong demonstration of the objectivity in the human image quality perception: despite the obvious differences between the images of CT scans and forest landscapes, the models optimal for one set were among the best performers for the other.
3 a, b
Finally, Figure 4 demonstrates top ten models for each model size, sorted by the average mean error on two datasets. It can be seen that most models lie on the diagonal line, models with 4, 5 and 6 features becoming increasingly closer to each other due to high R square for both datasets. 2 summarizes the best predictors selected for each number of features defined in Table 1. It provides us with some significant insights. First of all, there is a limited set of quality measures which occur in most optimal models derived for MS and TS data. It can be assumed that these factors play the most important role in our perception of the image quality:
4 Best ten models for both sets
·Entropy power of the image on first and second levels of Laplacian pyramid (metrics flat0, flat1). It is a product of spectral flatness and variance of the image and shows image signal compressibility, reflecting how much useful signal is contained in the image.
·Entropy of the background (entB0, entB1) and entropy of the whole image, present in many optimal models for both sets
·Blockness measures for all block sizes (of 2, 4, 6 and 8 pixels) are important for all sets of images on all three levels of pyramid
·Both blur measures, sharpness, contrast and edge intensity measures on all resolution levels are significant for all datasets, proving that that perception of contrast and blurriness is one of major image quality metrics.
·Fractal dimension on all levels of image resolution can be found in models for both sets.
·Average gradient is especially important for trees dataset. This measure shows how much pixel values change on average. According to it, images with more contrast edges between objects get higher mark.
·Object separability on first and second levels of pyramid can be found in models for both sets. This measure is higher for images with distinguishable and more contrast parts.
As a result, we identify the following major factors responsible for the human perception of image quality:
·Amount of information contained in image, which can be described by spectral flatness and entropy measures. It is remarkable that random noise is not taken into account, while larger objects have some impact.
·Contrast, average gradient and blurriness are the most important non-reference quality measures that affect visual perception of the whole image, while sharpness and noise level hardly appear in the best models. This might be explained by sensitivity of used metrics.
·Artifact measures like blockness appears to be significant in most models.
·Background entropy performs well only as a add-on factor which explains the variance that was not already covered by the other factors
All things considered, we obtained models containing restricted sets of features that are able to explain quality perception. However, basic matrix of comparisons is our ground truth and main source of information. To measure quality of described approach, we compared each pair of images by predicted quality measures computed by best models of five features mentioned above. To get vector of predicted values we performed leave-one-out cross validation for each of the two sets. This procedure enabled us to get more stable resulting vector of quality measures. On each step one image was separated from other images, so the model weights were learned using the rest of images to predict quality measure for a single image. Final vector of model quality measures was constructed of predicted values and normalized.
Average share of inverted pairs computed for predicted quality measures in comparison to initial matrix is 31% for medical images and 29% for trees. However, this result is far from original and could be improved.
Table 2 Best predictor values for models with restricted sets of factors. Table contains best three models according to average error on two datasets
Model size NBest L2 predictors for both datasetsBest L2 predictors for trees datasetBest L2 predictors for medical dataset1 · Blur10, Blur 12 · Sep0 · Blur20, blur22 · Intens2, · EntF0· AG1 · Sharp1 · EI1, EI0· Ent10 · ent11 · sep02· Blur20, sep0 · Blur20, sep1 · EntF1, blur20 · Blur20/21, intens0/1/2· Blur20, entF0 · EntB0, block60 · EntB0, frac0· Blur20, sep0 · Blur20, sep1 · Blur20, intens03· Blur20, EntB0, sharp1 · Blur10, Blur11,blur22 · Block measures + blur · Blur20, EntB0, frac0· Blur20, entB0, frac2 · entB0, sep0, flat2· Blur10, blur11,blur22 · Contr20, blur21, noise2 · Contr20, intens0, ent11 · Blur20, block22, block624· Blur20 + blockness measures · Blur11, entB1, intens1, block22 · Contr22, noise2, blur21, entB0· entB0, sep0, block80, flat2 · blur10, entB0, sep0, flat2· block62, blur20, contr10, block22 · blur20, contr20,block62, block225, 6· entB0, blur21, flat1, EI1, frac2 · entB0, blur21,flat1,EI1, block62· blur10m entB0, sep0, block40, flat2 · blur10, entB0, sep0, block80, flat2·