Human Vision

Colours

The human vision system perceives images in colour using receptors on the retina of the eye which respond to three relatively broad colour bands in the regions of red, green and blue (RGB) in the colour spectrum (red, orange, yellow, green, blue, indigo, violet).

Colours in between these are perceived as different linear combinations of RGB. Hence colour TVs and monitors can form almost any perceivable colour by controlling the relative intensities of R, G and B light sources. Thus most colour images which exist in electronic form are fundamentally represented by 3 intensities (R, G and B) at each picture element (pel) position.

The numerical values used for these intensities are usually chosen such that equal increments in value result in approximately equal apparent increases in brightness. In practise this means that the numerical value is approximately proportional to the log of the true light intensity (energy of the wave) - this is Weber's Law. Throughout this course, we shall refer to these numerical values as intensities, since for compression it is most convenient to use a subjectively linear scale.

The YUV Colour Space

The eye is much more sensitive to overall intensity (luminance) changes than to colour changes. Usually most of the information about a scene is contained in its luminance rather than its colour (chrominance).

This is why black-and-white (monochrome) reproduction was acceptable for photography and TV for many years until technology provided colour reproduction at a sufficient cheap price to make its modest advantages worth having.

The luminance (Y) of a pel may be obtained from its RGB components as:

Y=0.3R+0.6G+0.1B(1)

These coefficients are only approximate, and are the values defined in the JPEG Book. In other places values of 0.3, 0.59 and 0.11 are used.

RGB representations of images are normally defined so that if R=G=B, the pel is always some shade of gray, and if Y=R=G=B in these cases, the 3 coefficients in Equation 1 should sum to unity.

When Y defines the luminance of a pel, its chrominance is usually defined by U and V such that: U=0.5(B−Y)

V=0.625(R−Y)(2)

Note that gray pels will always have U=V=0.

The transformation between RGB and YUV colour spaces is linear and may be achieved by a 3×3 matrix C and its inverse:

(

)=C(

)(3)

where C=(

0.3	0.6	0.1
-0.15	-0.3	0.45
0.4375	-0.3750	-0.0625

) and

(

)=C^-1(

)(4)

where C^-1=(

1	0	1.6
1	-0.3333	-0.8
1	2	0

)

Visual Sensitivity

**Figure 1:** Sensitivity of the eye to luminance and chrominance intensity changes.

Figure 1 shows the sensitivity of the eye to luminance (Y) and chrominance (U, V) components of images. The horizontal scale is spatial frequency, and represents the frequency of an alternating pattern of parallel stripes with sinusoidally varying intensity. The vertical scale is the contrast sensitivity of human vision, which is the ratio of the maximum visible range of intensities to the minimum discernible peak-to-peak intensity variation at the specified frequency.

In Figure 1 we see that:

the maximum sensitivity to Y occurs for spatial frequencies around 5 cycles / degree, which corresponds to striped patterns with a half-period (stripe width) of 1.8 mm at a distance of 1 m (~arm's length).
The eye has very little response above 100 cycles / degree, which corresponds to a stripe width of 0.1 mm at 1 m. On a standard PC display of width 250 mm, this would require 2500 pels per line! Hence the current SVGA standard of 1024×768 pels still falls somewhat short of the ideal and is limited by CRT spot size. Modern laptop displays have a pel size of about 0.3 mm, but are pleasing to view because the pel edges are so sharp (and there is no flicker).
The sensitivity to luminance drops off at low spatial frequencies, showing that we are not very good at estimating absolute luminance levels as long as they do not change with time- the luminance sensitivity to temporal fluctuations (flicker) does not fall off at low spatial frequencies.
The maximum chrominance sensitivity is much lower than the maximum luminance sensitivity with blue-yellow (U) sensitivity being about half of red-green (V) sensitivity and about

1

6

of the maximum luminance sensitivity.
The chrominance sensitivities fall off above 1 cycle / degree, requiring a much lower spatial bandwidth than luminance.

We can now see why it is better to convert to the YUV domain before attempting image compression. The U and V components may be sampled at a lower rate than Y (due to narrower bandwidth) and may be quantised more coarsely (due to lower contrast sensitivity).

A colour demonstration on the computer will show this effect.

Colour compression Strategy

The 3 RGB samples at each pel are transformed into 3 YUV samples using Equation 3.

Most image compression systems then subsample the U and V information by 2:1 horizontally and vertically so that there is one U and one V pel for each 2×2 block of Y pels. The subsampledU and V pels are obtained by averaging the four U and V samples, from Equation 3. The quarter-size U and V subimages are then compressed using the same techniques as the full-size Yimage, except that coarser quantisation may be used for U and V, so the total cost of adding colour may only be about 25% increase in bit rate. Sometimes U and V are subsamples 4:1 each way (16:1 total), giving an even lower cost of colour.

From now on we will mostly be considering compression of the monochrome Y image, and assume that similar techniques will be used for the smaller U and V subimages.

Activity Masking

A final feature of human vision, which is useful for compression, is that the contrast sensitivity to a given pattern is reduced in the presence of other patterns (activity) in the same region. This is known as activity masking.

It is a complicated subject as it depends on the similarity between the given pattern and the background activity. However in general, the higher the variance of the pels in a given region (typically ~ 8 to 16 pels across), the lower is the contrast sensitivity.

Hence compression schemes which adapt the quantisation to local image activity tend to perform better than those which use uniform quantisation.

A computer demonstration will show the effect of reduced sensitivity to quantisation effects when noise is added to an image.