Chapter 2 contains a detailed description of the significance of color sensation as a human experience. Several color measurement systems are introduced: first standardized model created at the beginning of the twentieth century and ones that are widely applied nowadays. The role and efficiency of these representations in computer vision applications are also examined. The idea of color constancy is also defined as it played a significant role in our color classification procedures. The last section specifies the task of place recognition applications and demonstrates the structure of Susan B.’s updated environment model.
2.1 Color Sensation
The Merriam Webster Dictionary gives, among others, the three following definitions for the term "color":
The above examples clearly demonstrate that the relationship between color measurement and color perception is not a trivial process. In order for humans to reproduce their visual experiences and to imitate the perception process with machines, it has been necessary to create a standardized procedure that would allow for more precisely expressing their visual sensation. The search for such a representation lies at the heart of the study of computer vision. Both in psychology and computer science, intriguing studies have been carried out to better explain, interpret and imitate the color perception process.
2.2 Related Research Applying Color
Several research projects have demonstrated the significance of color as a segmentation feature in color computer vision. Segmentation is one of the first steps of low-level image processing during which the input image is divided up into distinguishable units/areas based upon a collection of properties. In [Holla, 1982] the model of the human visual system was used as a preprocessing tool for scene analysis. In this case, color seeing was modeled by obtaining a pair of opponent colors (red-green and yellow-blue) as a two dimensional feature. Luminance and chrominance were distinguished from each other. Luminance proved to be a better detector of small details and chrominance, whose importance is emphasized, performed better in rendering coarser structures and areas.
In [Mustafa, 1996] spectral information was used together with curvature to achieve color-based three-dimensional object identification. Surface signatures were obtained by photometric stereo to describe the input images. The signatures were normalized histogram distributions that are invariant to change in pose, partial occlusion or shadowing effects.
In a head-tracking application of [Birchfield, 1998], color was applied because of its invariance to any geometric information. The intensity gradient around the head’s perimeter and the color histogram of the head’s interior were applied. Their (closely) orthogonal nature allowed one to complement the other in case of failure.
An image retrieval system, FOCUS, has demonstrated improved performance by analyzing color histograms [Das, 1997]. The peaks of the histograms provided the color content of the image, which was matched with the query object. Then the spatial relationship between the examined color regions was analyzed. This study put a heavy emphasis on the hue color-feature.
Color information was also successfully utilized in image filtering applications of [Tomasi, 1998]. A bilateral (combined range and domain) filtering smoothed the input image while the perceptually visible edges were preserved.
Although these research projects applied color as a property of secondary
or tertiary significance, color has not yet been widely examined as a sole
descriptor of surfaces and places. The reason for why such an important
object feature is "neglected" in some of the research applications was
intriguing and provided one of the main motivations towards the experiments
of the presented research.
2.3 Standard Color Measurements
2.3.1 Munsell Color Charts
It was Albert H. Munsell who, at the beginning of the twentieth century, published the very first version of his thorough analysis written on colors. His influential book, A Color Notation [Munsell, 1979], had a major impact on the way color has been categorized since its publication. In his work, Munsell described a representation of the three-dimensional color space in great detail. He introduced a Hue-Chroma-Value color basis whose equivalent system, the Hue-Saturation-Intensity (HSI), is widely used today. Munsell defined hue as the actual name of the color, which represents the family name for a group of chromatic colors. He used the term "chroma" to describe the saturation of a color. According to him, this property is the one that is most often ignored by humans in color description. In Munsell’s vocabulary, the term "value" refers to the luminosity property of the color. It ranges from black to white and it describes how light or dark the hue of the examined color is.
In A Color Notation, A.H. Munsell gave graphic representations of the color distribution space and presented his theory via vivid descriptions. For instance, he demonstrated the model of a color sphere and a color tree. Both the sphere and the tree are excellent examples of Munsell’s color-space and help the reader to visualize the three-dimensional structure of colors. In case of the sphere, for example, the north pole is white and the south pole is black. The equator consists of the middle values of five colors (red, yellow, green, blue and purple). Parallels above and below that describe lighter or darker values respectively. The vertical axis is a scale of gray measures. Perpendiculars to it are scales of saturation.
The author’s greatest contribution to the study of colors lies in the introduction of a standard color notation and the introduction of color charts and atlases. Color charts describe different cross sections of the color solid and atlases refer to a set of these. More specifically, a Munsell color atlas is a collection of Munsell chips that are colored papers spanning a broad range of natural colors. Each of the chips is assigned a lightness triplet that uniquely identifies the quality of the chip. All these tools are used both in industry and in the sciences. They have an essential role in classifying perceived colors and they are also used for calibrating cameras.
2.3.2 Color Representations Nowadays
Nowadays, there exist numerous other schemes that attempt to precisely specify color appearance. These systems serve different purposes. Some of them aim for providing an intuitive representation for humans and others for providing an efficient description for computational purposes. One of the most commonly applied characterization sets is the RGB color system. Its three-dimensional space is useful for color specifications in hardware and for displaying colors. However, it is not intuitive and adequate enough to distinctly represent colors in classification procedures. For instance, its color assignment is not unique. The response of all three types of cones (sensitive to short-, middle-, and long-wavelength) is calculated by taking the integral of the product of cone-sensitivity and the light stimulus. For all combinations of variables that produce the same outcome, the sensed color is expressed with the same RGB measures.
The majority of the other color bases that are commonly utilized in color vision describe colors using one dimension to describe intensity and a two-dimensional system describing the color space. These bases systems are the following: HSI, opponent processes and YIQ.
The HSI model (that has already been mentioned above) provides an intuitive way to specify colors. Hue (H) refers to the "color value" (e.g.: yellow, red, green), saturation (S) to the "purity" of color and intensity (I) to the brightness of the color. For example, pink is an unsaturated red that can be either vivid bright or dim. Artists and experts in the field of graphics extensively apply this system.
Opponent processes are believed to have neurological correlates and their complex structure accounts for psychophysical data. The theory behind this model originates in Hering’s theory about color vision. This examines opponent hues (red-green and yellow-blue) that cancel each other out if they are superimposed. This color model, though intriguing in itself, cannot be efficiently applied for computational purposes. It severely restricts the lighting characteristics for its algorithms and does not clearly specify how the rest of the hue values (besides, the above-mentioned two pairs) could be defined.
The YIQ color primary system can be obtained by a linear transformation of the RGB color cube. It consists of luminance (Y) and two color-difference signals: interphase (I) and quadrature (Q). Luminance measures intensity and is defined by the CIE standard to be:
The above is also the accepted formula for converting colors into grayscale or black-and-white. As systems using this color representation can be bandwidth-restricted, its use allowed for backward compatibility with black and white television. That is the reason why it was the National Television System Committee (NTSC) that adopted the YIQ color system. Black and white televisions only pay attention to the Y-component of the transmission, which contains relative illuminance information. This Y-component is defined to be the same as the CIE Y component.
The place recognition research project does not directly use any of the above-mentioned systems. As the intensity values are obtained from the color camera in the form of RGB triplets, an attempt is made to convert them into surface reflectance values. This transform is carried out as the surface reflectance values are proposed to be illumination invariant. The following section and Chapter 3 has a detailed description of the mechanics and the significance of this transform.
2.4 Color Perception
In the 17th century, Newton’s proposition about color sensation was the following: "Every body reflects the Rays of its own Colour more copiously than the rest, and from their excess and predominance in the reflected Light has its Colour." [Newton, 1704] Although it is true that the only information reaching the human eyes is the flux of radiant energy (the product of reflectance of the object and the incident illumination), it was adequately observed later that the actual determination of a color does not depend on it [Land, 1973].
Edwin H. Land was a photographer and inventor who extensively studied the mechanisms of human vision. His first studies could be termed the "black and white experiments" as they focused merely on grayscale information of regions. In a non-uniformly illuminated room, he carefully arranged a collection of gray cards that all had different reflectance values. Cards with lower reflectance measure were placed into highly lit areas, and sheets with lower reflectance were located in dimmer places. With a photometer, Land ascertained that from each gray surface the same amount of illumination was received. According to Newton’s Theory then, the sheet should have been perceived as having the same color. However, that did not happen. Despite the identical amount of radiance reaching the observers’ eyes, they could clearly determine the cards of different reflectance values. These and similar experiments were the initial ones teaching Land that non-uniformity of illumination, size and shape of area and length of edges were irrelevant to lightness.
His attention was then soon focused on color Mondrian experiments. These were similar in nature to the above-mentioned ones. He arranged different scenes with separately controlled light settings and a set of color boards with calibrated Munsell chips. With a photometer, he balanced the brightness of each individual region to be the same. Human observers could again identify the different colors despite the identical radiance reaching their vision sensors. These results further supported the fact that the human eyes can discover lightness values independent of flux and that these measures are neither significantly modified by the immediately neighboring areas, nor by even larger areas surrounding them. To summarize his findings Land introduced the color constancy theorem.
Land continued his experiments. His next goal was to understand how it was possible for humans to extract reliable color information from an unevenly illuminated environment. As a result of this study [Land, 1973], he concluded that the interpretation of color is accomplished by an ensemble of biological mechanisms that convert flux into a pattern of lightness values. In other words, the stream of illumination arriving to the vision sensors from the observed surface is processed by the human cone systems. These systems are receptive to the short, the middle or the long wavelengths and they all create a lightness image, independently form each other. A lightness image is the final product of a biological system, the human visual system. It estimates the surface reflectance values obtained from the input. The lightness images, stated Land in [Land, 1971], are then compared rather than mixed (simply superimposed or added) in order to produce the eventual color sensation. As Land suspected that the whole color perception process took place in the retina-and-cortex system, he labeled it as "retinex". He also demonstrated, through laboratory experiments, that no chromatic adaptation or eye motion is involved in the production of the perceived colors. His findings on this topic are collectively referred to as the Retinex Theory.
Land introduced a computational model as well. It emulates the human color vision properties and obtains color descriptor values that are to be independent of lighting conditions. These lightness measures are closely related to surface reflectance values from which color of faces can be estimated. Chapter 3 contains a detailed description of this algorithm.
2.5 The Environment Model
The environment surrounding the mobile agent is described in the form of a semantic net [Fennema, 1990, 1994]. This model is a hierarchical network of spatial entities, called locales that are related to each other by "whole-part"/"part of" relationships [Figure 2]. Each locale that has a relationship with another can be either a "parent" or the "child" in that relationship. On Figure 2, for example, the "second floor" and the "library" are connected, as the library (the child) is "contained by" the second floor (the parent). In such a hierarchy, the top-most locale, that does not have a parent is titled the "root" and the locales in the bottom-most level, which do not contain any other sub-locales are called "leaves". On Figure 2, "Environment" is the root and "Closet" is one of the leaves in the environment structure.

The position and orientation (or the pose) of each locale is precisely
defined in a three-dimensional coordinate system of the root. That is to
enable the robot to reconstruct the model of its current neighborhood at
any location. When the robot’s pose is declared in the given coordinate
system, the agent reconstructs its surrounding scene by projecting the
visible portion of its three-dimensional environment onto a two-dimensional
plane. An example of such a reconstruction is displayed on Figure 3.

The structure of a locale can be broken down into a network of elements, as indicated by an example on Figure 4. These elements are implemented as "frames" and they convey topological, geometric and physical information to characterize the locale. The spatial characteristics and the shape of the frame are described by its surfaces that form a collection of planar surfaces or "faces". The surfaces, on Figure 4, are the East and North wall. Surfaces, just like locales, are depicted as a network of nodes. Surface areas of common perceptual attributes are defined as "regions". These units store information about their relative location (with respect to the container locale’s coordinate system), their size and the nature of their characteristic attributes. In Susan B.’s case, the mostly examined features are surface reflectance indicators. They are to define the color of the individual regions in order differentiate between them.

The current environment model of Susan B. represents Clapp Laboratory at Mount Holyoke College. The experiments of our research project all took place on the fourth floor of this science building. The "4th floor" locale depicts the long hallway, classrooms and several offices. The detailed floor plan of this locale is demonstrated on Figure 5.

Prior to this research study, the description of this environment in VLSys was completed only to a very basic level. It contained spatial information about the largest surfaces, such as doors, walls, ceiling and the floor. However, it lacked any further details both about the characteristic attributes of the existing surfaces and about any smaller-sized entities. In order to investigate the proposed place recognition task, the environment model had to be completed. First, the current location of benches and sofas was measured and defined, and the coordinates of cabinets, bulletin boards and baseboards located all along the hallway were added. Then the appropriate region descriptors of these structures had to be identified.
Regions are elementary surface components that possess essential properties pertaining to perception, and in the current case to color visibility. To describe the distinguishing features of these environment components a new variable, the "appearance", was introduced. This region descriptor refers to a group of attributes that can be useful in identification tasks. The following list briefly explains the nature of the appearance representations. Naturally, these measurements all have three distinct values corresponding to the three main color bands.
Appearance descriptors for image regions:
Face: south_wall_1a {{p88 p87 c87 c88}}
InsideRegions:
Figure 6 displays a code segment describing a surface located in the current environment model of Susan B. The location of "South_wall_1a" is given by four constants in the first line. (The constant variables refer to triplets.) Then the coordinates and the reflectance values are defined for its three main regions: the wall, upper baseboard (the part of wooden texture) and the lower baseboard (the part made of black rubber). At present, the algorithm utilizes only the Lambertian reflectance values of the above-listed region-characterizing features. That is why in the "Description" of the regions only one indicator is specified and the others have the default F value. With only relying on one of the variables, it was desired to find a solution for a simplified setting first and then introduce the rest of the evaluators, such as specularity, fluorescence, etc. for more sophisticated search strategies.