As stated before, we will need a large number of rectangles, of
which we need the average color. In order to compute the average
colors of rectangles fast we use summed area
tables [Crow84]. A separate table is built for each of the
three CIE XYZ color components for both images. That makes six
tables, each containing a number of entries equal to the total
number of pixels in the image (assuming that the compared images have
the same size). Element T(i,j) of the table T (see
fig. 8.3) contains the sum of values of all pixels X(x,y)
such that
and
. The average of the rectangle
defined with points (k,l) and (i,j) such that
and
is then:

Now, only the position and size of the rectangle has to be given to compute the average with only a few operations.
The weighting of the particular rectangles according to the contrast
sensitivity function is done implicitly using importance sampling. The
integral of the contrast sensitivity function
is precomputed and normalized by
dividing all values with g(60) (see fig. 8.4).

Figure 8.4: Contrast sensitivity function and its integral + random sampling

Figure 8.5: Possible orientations of the rectangles
Randomly sampling over the domain of the inverse of g(f) produces a frequency distribution as desired, so that no additional weights are necessary to get a distribution proportional to A(f) (see bottom line in fig. 8.4).
Our idea is to select the position on the y axis quasi-randomly, by
doing so, the metric has some meaning during the computation process
as well. Once the size of a rectangle is determined, the
orientation, i.e. longer and shorter side size, should be
determined. The size of the rectangle corresponds to the rectangle's
diagonal, and the maximum allowed ratio of the longer to the shorter
side of the rectangle is the golden section ratio
(
). We do not allow narrow rectangles as they
are not so important in visual comparison. We speculated using
various shapes, but since all shapes will be covered using a
sufficient number of rectangles, we use only rectangles which make
the method fast as well. The orientation of the rectangle is
determined by choosing the angle between the diagonal and the
horizontal axis. This angle is in the range
(see fig. 8.5) and a
particular angle is chosen quasi randomly. When the size and the
orientation of the rectangle are known, the position of the
rectangle can be determined. The rectangle position in the image is
also determined by quasi random determination of its lower left
corner. The position domain for this procedure is reduced, such that
the complete rectangle lies within the image. Figure 8.6
shows the first 100, 250, 500 and 1000 rectangles placed in an
image, assuming the maximum displayable frequency is 60
cycles/degree. Although there are lot of uncovered image areas after
1000 rectangles are placed in the image, this result will rarely be
significantly different from final solution. According to our
experiments 10000 rectangles will be sufficient in most cases.
Therefore we have a 4-dimensional quasi random problem. We have chosen the Halton sequence [Glas95] to compute these quasi-random numbers.
The Halton sequence for an N-dimensional point
is defined as:

where
refers to the ith prime number, and the function
is the radical-inverse function of m to the base
r. The value of the radical-inverse function
is
obtained by simply reflecting the digits of m written in base r
around the decimal point. Therefore if m is:

the radical inverse function
is:

As our problem is 4-dimensional we need the
,
,
and
functions. The size will be chosen using
, the orientation using
and the position in the
image using
and
. Now, the user needs only to set
the total number of rectangles and the corresponding metric can be
computed. Note, that the metric produces (slightly) different values
for different numbers of rectangles, i.e. for every number of
rectangles a different metric is actually defined.
If the maximum displayable frequency is much lower than 60 cycles/degree it could be a good idea to subsample the image and to make the summed area tables correspond to the sub-pixel resolution, e.g. four entries per pixel. In this way sub-pixel areas can be computed, and a pixel size area can cover four quarters of the neighboring pixels. It seems to us that a finer subdivision than four sub-pixels per pixel would be rarely needed. If the subdivision is not done, the weights of the sub-pixel range should be somehow added to the highest, still displayable frequency.
The whole idea of placing various rectangles in the image space, overcomes the drawback of wavelet transforms that various resolution sub-images have fixed positions. A quarter of an image sized square placed exactly in the center of the image will never be considered as an entity using wavelet transforms.