Paper

A Metric for Distributions with Applications to Image Databases, Rubner et al., 1998

Links

Demonstration of the project

Project sources

Graphical User Interface

Controls

Zooming: Mouse wheel, 2-finger scroll (touchpad), 2-finger drag (touchscreen)
Rotation: Left mouse button, 1-finger drag (touchscreen)
Panning: Right mouse button, 3-finger drag (touchscreen)

The rotation of the camera has 2 modes (in 3D representation):

Rotation around a target point (typically the origin of the coordinate system)
Free camera rotation

Explanation: The first mode is enabled if the camera is far away from images. The second mode becomes automatically enabled as soon as one gets close to images. This is useful to rotate one's viewpoint around and look at images.

Description

Rubner et al. describe in their paper a visualization method for presenting a large amount of images grouped by similarity. Similarity is given either by color similarity or by the texture of images.

In this project only similarity by color is considered. The goal is to arrange images in such a way that images with similar colors should be arranged close together while images with a different color appearance (e.g. an image of a sunrise and an image of the sea) should be far apart.

The original paper only considers arrangement in a 2D space. While this project also allows 2D visualization, it focuses on the possibility to arrange the images in a 3D Euclidean space.

Images in a 3D space are arranged along three axes, according to the CIE Lab color space. These axes are:

'L' (lightness), ranging from black to white
'a', the red/green opposing channel
'b', the yellow/blue opposing channel

Examples: An image containing mainly blue colors would be arranged on the negative side of the 'b' axis. An image containing both blue and red dominating colors would be located in between the axes 'a' and 'b' (i.e. in between the negative side of 'b' and the positive side of the 'a' axis). In the example below an image with both red and blue dominating colors would be located in the 'red/blue mix' area.

Technical Details

To arrange a given set of images according to their color similarities these steps are necessary:

Convert RGB images to CIE Lab color space
Create a signature for each image by using kd-trees for clustering
Calculate the Earth Mover's Distance (EMD) between each image
Use Multidimensional Scaling (MDS) to find positions for the images according to the calculated distances
Scale/Rotate/Flip the resulting Euclidean space such that the axes are always aligned in the same way

The conversion to CIE Lab color space is necessary, because this color space is perceptually uniform. This means that colors which appear similar to the human perception have a lower distance than colors which appear different. Since we need a useful distance metric for color similarity the CIE Lab color space is more useful than the RGB color space.

Signatures can be seen as a general form of histograms, but with variable-sized bins (see original paper for details). Signatures are created by clustering the pixels of an image with a kd-tree. This is done in multiple stages:

kd-tree clustering which stops until the size of a cluster is below a given threshold (i.e.: minimum cluster size of 25)
averaging of the points per cluster to create a centroid
another kd-tree clustering based on the centroids, but with shifted space coordinates by one half of the minimum cluster size (i.e. 12.5)
final averaging of the centroids of the first stage per cluster.

This clustering is proposed by Rubner et al. in The Earth Mover's Distance as a Metric for Image Retrieval. Please note that the kd-tree clustering is not done with the median of a point set, but by splitting an interval into two equal sub-intervals.

The EMD is calculated by using an adaptation of the JFastEMD framework.

The MDS is approximated with an algorithm based on landmark points. See V. de Silva, J.B. Tenenbaum, Sparse multidimensional scaling using landmark points. A fast implementation is provided in the MDSJ library.

To calculate the axes (more precisely: calculating the position of the axes for the given image set) of the CIE Lab color space the following algorithm is proposed:

Create 2 signatures for each axis.
- This means signatures for L+, L-, a+, a-, b+, b- are created.
- Where L+ and L- are white and black respectively.
- a+ and a- are the colors with medium lightness and maximum/minimum possible values on the 'a' axis in the CIE Lab color space.
- b+ and b- are the colors with medium lightness and maximum/minimum possible values on the 'b' axis in the CIE Lab color space.
Calculate the EMD between the signatures of the axes and the signatures of the current image set.
Include the distances between the images and the axes in the MDS algorithm.

The MDS will then deliver a position for each point of the axes, but the axes will be arbitrary rotated and the resulting Euclidean space will be arbitrary scaled. Therefore as a final step to allow consistent presentation of the resulting space, the following will be done:

Scaling of the resulting positions to a predefined size
Rotation of the 'L' axis to match the vector (0,1,0) of the coordinate system
Rotation of the 'a' axis, such that 'a-' is pointing in the same direction as the vector (-1, 0, 0), without rotating the 'L' axis or distorting the resulting space
If necessary, the space is flipped, such that 'b-' is pointing in the same direction as the vector (0, 0, 1), without rotating or distorting the space

In addition the Gram-Schmidt algorithm could have been used to create an orthogonal coordinate system. This can be seen as future work.

Software Architecture

The project consists of:

a server, written in C#
a client, written in Javascript

The server is responsible for loading images from Flickr and calculating the positions in a 2D/3D Euclidean space. This information can then be used by the client to visualize the images in WebGL in addition with the Three.js framework.

This architecture has the advantage to allow low-performance (mobile) clients to use the application without having to run the performance-consuming algorithms, such as MDS and the EMD.

Contributions of this project in comparison to the original paper

2D and 3D representation (instead of only 2D)
Automatic calculation and visualization of the color space axes
Interactive exploration of the 'image cloud' in real-time
Images in front cast shadows on the images behind them to emphasize depth perception.
Images are 'pushed away' automatically if the camera gets close to them to avoid intersection with the camera and improve navigation in 3D space.

Remarks

The axes are not necessarily orthogonal, since the used MDS algorithm only delivers an approximation of the positions in Euclidean space
Subsequent searches with the same term can be calculated faster since the server caches previous image results and previously calculated EMDs
The form of the CIE Lab color space is not symmetrical, thus images are not necessarily arranged in a symmetrical way (see image below. Image from Color Spaces)