107. The Representation of Images (2)

Hui Wang
Jan 20, 2024
17 min read

Return to "iOS/Android Audio and Video Development Guide "

On a peaceful night, let's embark on a journey into a mysterious and fascinating topic: how do images transform from the scenes our eyes see to the data in the digital world? This is a tale of colors, light, and digital magic. Our journey starts with the history of color models.

It began with the CIE RGB model, based on how our eyes perceive red, green, and blue. To address the limitations of the RGB model, CIE introduced the XYZ color model, redefining the mathematical foundation of colors. In the golden era of television, to tackle the challenges of analog signals, the NTSC developed the YIQ model, making color TV possible. Later, PAL improved this concept with the YUV model, enhancing the transmission of chrominance information. Eventually, in the digital age, ITU-R created the YCbCr model, now a widely used standard.

In our story, there's a key character - gamma correction. It's a technique adapted for early CRT monitors, addressing the difference between display output and human perception. Although times have changed, the principles of gamma correction are still in use today.

This story might be complex, but it reveals the gradual development of the mathematical description of images. I hope this journey inspires you to explore the rich history behind digital images.

How to mathematically describe an image?

In the fascinating world of digital images, we now find ourselves at a new starting point: how can we describe these colorful images in the language of mathematics? This question leads us into a magical stage of colors and light. Color, as the most central visual element in images, becomes key in our mathematical description.

Background of color modeling

Color, the core element of visual perception in images, has always been a focal point for scientists. It all started with CIE's (International Commission on Illumination) innovative attempt to define color models mathematically. Back in the 1920s, Wright and Guild's experiments led to the birth of the CIE RGB color model. Then, the emergence of the CIE XYZ color model opened a new chapter for the mathematical representation of colors.

As we entered the era of analog television, different technological standards emerged around the world. North America and Japan adopted the “NTSC standard,” based on the NTSC color space defined in 1953. China and Europe, on the other hand, chose the “PAL standard,” sharing the EBU color space with the SECAM standard. These diverse color models reflected not only technological advancements but also cultural and visual preferences in different regions.

From CIE RGB to CIE XYZ, and then to NTSC YIQ, PAL YUV, and finally to the commonly used ITU-R YCbCr model today, the concept of color has continually evolved throughout history, becoming an indispensable part of our audio-visual development.

Although our journey is filled with complex technical details, it reveals the significance and diversity of color in the digital age. I hope this journey deepens your understanding of the rich stories behind color models.

The CIE RGB color model

Light and color dance together, creating the colorful world we know. Our story begins with a fascinating phenomenon called "additive color mode." When lights of different primary frequencies mix, like a painter blending colors, we can create an array of astonishing hues.

The “additive color model” is typically applied to “active objects”. An object that can emit light waves on its own is called an active object. Its color is determined by the light waves it emits. When the object emits light of a certain color and it is received by the eyes, the corresponding color is perceived. Examples include the sun, electric lights, and display screens.

Corresponding to the additive color model is the “subtractive color model”, which represents the color obtained by subtracting certain colors from the light source. For example, CMYK is a subtractive color model. The subtractive color model is usually applied to “passive objects”. A passive object is one that does not emit light waves on its own. Its color is determined by the light waves absorbed (or reflected) by the object. When a third-party light source shines on the object, some wavelengths are absorbed and some are reflected. The reflected light waves are received by the eyes, thus perceiving the corresponding color. Examples include oil paintings and printed materials.

In this process, three primary colors—red, green, and blue—become the cornerstone of our entire color world. They not only mix to produce countless colors but also create dazzling white light. Yet, even this magic has its limits. The range of colors our eyes can perceive far exceeds what these three primary colors can combine to form.

In 1931, the International Commission on Illumination (CIE) conducted a revolutionary experiment. They chose specific wavelengths of red, green, and blue light, opening a new understanding of color. Their experiments not only defined the RGB representation but also discovered that some colors, like those around 500 nanometers, cannot be reproduced by simple RGB displays.

The experimental results obtained by CIE provide a simple explanation for the existence of lights with the same color but different spectral energy distributions, which are indistinguishable in perception. Note that two textiles or paint colors that appear as lights with the same color but different spectral energy distributions under one type of illumination does not mean they will do so under another type of illumination.

Imagine a cube defined on the RGB coordinate axes; this is our world of colors. In this cube, from the dark origin to the bright white diagonal, from each vertex representing a primary color to their complementary colors, each point tells a unique story of color.

RGB is an additive color model where the quantities of various primary colors are added together to generate another color. Each color point on the boundary of the cube can be represented as a weighted vector sum of the three primary colors.

In the RGB system, standard white light is a mixture of RGB light flux in the following proportions:

Light flux is a physical quantity that represents the power of light, measured in “lumens (lm)”, and is used as an indicator to measure the overall brightness of a light source. It refers to the light energy emitted by the light source or absorbed by the illuminated object per unit time.

Usually, red light with a light flux of 1 lm, green light with a light flux of 4.5907 lm, and blue light with a light flux of 0.0601 lm are used as the unit primary color quantities of the three primary colors, represented by R->, G->, and B->. The proportions of the three primary colors are represented by the coefficients R, G, and B. Therefore, the light flux of any colored light with a certain brightness is:

Where C(λ) represents the luminance of the colored light, corresponding to the brightness feature of the color. When we only care about the chromaticity of the color, we need to know that chromaticity depends on hue and saturation, reflecting the proportional relationship between R, G, and B. At this time, we can normalize:

We call r, g, b chromaticity coordinates. The chromaticity coordinates discard the absolute brightness of the given color sample and only represent its pure color. Since r + g + b = 1, any color only needs r, g two chromaticity coordinates for description, that is, the chromaticity space is two-dimensional. Generally, the RGB color model chromaticity diagram is given with r-g as the chromaticity coordinates. The position of the standard white light is at (r = 1/3, g = 1/3).

CIE XYZ color model

In a world seeking perfection, scientists faced a challenge: how to navigate more accurately in the ocean of colors? This is the story of the CIE XYZ color model.

The journey begins with the RGB model, a magical yet limited approach. In this model, we encountered a problem known as the "negative light" phenomenon, which hindered our accurate depiction of certain colors. To overcome this challenge, the International Commission on Illumination (CIE) proposed a new idea: the XYZ color model. This model is not based on actual visible colors but is a purely mathematical creation, a theoretical representation of color.

CIE replaced the RGB functions with a set of XYZ color matching functions, which are based on the linear combination of RGB but avoid the issue of negative color matching. Thus, the XYZ model not only provides a new method for describing any spectral color but also becomes the international standard for defining various colors.

In the XYZ color model, each color can be precisely expressed through X, Y, Z coefficients. These coefficients are like the DNA of colors, allowing us to precisely locate each color in the three-dimensional color space.

In the content, X, Y, Z are calculated from the color matching function corresponding to the above figure:

In the above calculation, the parameter k is 683 lumens/watt (lm/W), which is equal to the light flux emitted by a uniform point light source within a unit solid angle. The function I(λ) represents the spectral radiance, that is, the intensity of a certain light in a certain direction. The parameter Y corresponding to the color matching function fY is the brightness of the color. The range of brightness values is generally from 0 to 100.0, where 100.0 represents the brightness of white light.

In the XYZ color space, any color can be represented as the additive combination of the unit vectors X->, Y->, Z-> of the three primary colors, that is:

Where C(λ) represents the luminance of the colored light, corresponding to the brightness feature of the color. X, Y, Z are the proportion coefficients of the three primary colors. The XYZ color model represents colors and satisfies the following 3 conditions:

The proportion coefficients of the three colors X, Y, Z are greater than 0;
The value of Y exactly represents the brightness value of the light;
When X = Y = Z, it represents standard white light.

Based on these conditions, the conversion relationship between the RGB model and the XYZ model can be obtained:

Just like the RGB model, when we only care about the chromaticity of the color, we can normalize:

Similarly, here we call x, y, z chromaticity coordinates. Since x + y + z = 1, any color can be represented by x and y. Generally, the CIE XYZ color model chromaticity diagram is given with x-y as the chromaticity coordinates. The CIE 1931 x-y chromaticity diagram is shown in the figure below:

In the figure, the points on the curve are pure colors in the electromagnetic spectrum, marked from the red end to the purple end of the spectrum in order of wavelength. The straight line connecting the red and purple spectrum points is called the purple line, which does not belong to the spectrum. The points inside the curve represent all possible combinations of visible colors. The point C corresponds to the position (x = 1/3, y = 1/3), which represents white. In practice, point C is often used as an approximation of the chromaticity of white light or daylight. Due to the normalization process, there are no brightness values in the chromaticity diagram. All colors with uniform chromaticity but different brightness are mapped to the same point in the chromaticity diagram. The chromaticity diagram is mainly used for:

Comparing the entire color range for different primary color groups;
Identifying complementary colors;
Determining the main wavelength of a specified color;
Determining the saturation of a specified color.

1) Color Range

In a chromaticity diagram, color ranges can be represented by line segments or polygons.

For instance, if we take two colors C1 and C2 as base colors, their color range is the line connecting them. All the colors on the line from C1 to C2 in the diagram below can be obtained by mixing appropriate amounts of C1 and C2. If C1 has a larger proportion, the resulting color will be closer to C1.

The color range of three base colors C3, C4, and C5 is the triangular area (including the edges) formed by connecting them. These three base colors can only generate colors within this triangular area. Therefore, the chromaticity diagram helps us understand why no set of three base colors can generate all colors through additive color mixing, as no triangle can encompass all colors.

On the CIE chromaticity diagram, it is easy to compare the color ranges of different color standards. For example, the diagram below shows the color ranges of several different standards:

The table below shows the triangular coordinates in the CIE chromaticity diagram when several different standards use RGB as the three base colors:

2) Complementary Colors

As mentioned before, complementary colors are two colors that mix together to form white. Since the color range of two base colors on the chromaticity diagram is a line segment, a pair of complementary colors must be located on either side of the white point C on the chromaticity diagram, and their connecting line passes through C. As shown in the diagram below, a certain amount of C1 and C2 can produce white, and the distance of C1 and C2 from C determines the amount of the two colors needed to produce white.

3) Dominant Wavelength

How do we determine the dominant wavelength of a color in the chromaticity diagram? Taking C1 in the diagram below as an example, we can draw a line from C through C1 to intersect the spectral curve at Cs. At this point, we can consider that the color C1 can be represented as a mixture of white light C and spectral color Cs, so the dominant wavelength of C1 is Cs.

However, this method of determining the dominant wavelength is not applicable to the color points between C and the purple line. Taking C2 in the diagram below as an example, we draw a line from C through C2 to intersect the purple line at Cp. Cp is not in the visible spectrum. In this case, point C2 is called a non-spectral color. Its dominant wavelength is determined by the complement of Cp, Csp, which is the intersection of the spectral curve and the line extended in the opposite direction from C through C2. Non-spectral colors are in the purple-magenta range and have a spectral distribution that subtracts the dominant wavelength (such as Csp) from white light.

4) Saturation

For the calculation of saturation, we still take the color C1 in the diagram above as an example. We calculate the relative position of C1 to C along the line from C to Cs to determine the saturation. If dC1 represents the distance from C to C1, and dCs represents the distance from C to Cs, we can calculate the saturation by the ratio dC1/dCs. In the diagram above, the purity of color C1 is approximately 25%, because it is located about one quarter of the way from C to Cs. The saturation of the color at Cs is 100%.

NTSC YIQ color model

In an era filled with innovation and change, a new color model was born—the NTSC YIQ color model. This is a story about technological adaptation and progress, illustrating how television technology evolved from black and white to color while maintaining compatibility with older technology.

The YIQ color model is defined based on the CIE XYZ color model. Its parameter Y is the same as the parameter Y in the XYZ model, representing the brightness information of the image. In the absence of chroma, Y corresponds to a black-and-white image, or black-and-white TVs only receive the Y signal. Parameter I contains orange-cyan color information, providing the brightness of vivid colors. Parameter Q contains green-magenta color information.

The design of the NTSC composite color signal allows black-and-white TVs to extract the required grayscale information from an image with a bandwidth of 6 MHz. Therefore, YIQ information must be encoded under the restriction of 6 MHz bandwidth. Brightness values and chroma values are encoded with different analog signals, so that color information is added within the original bandwidth, and black-and-white TVs can still obtain the original brightness signal in the original way.

Brightness information (Y value) is transmitted by amplitude modulation with a carrier bandwidth of about 4.2 MHz; chroma information (I, Q values) is combined and transmitted with a carrier bandwidth of about 1.8 MHz. The parameter names I, Q refer to the modulation methods used to encode color information on the carrier. Subsequent new encoding methods have been developed, but we won’t discuss them here.

In the NTSC signal, the encoding precision of brightness information (4.2 MHz bandwidth) is higher than that of chroma information (1.8 MHz bandwidth). This is because the human eye is more sensitive to changes in brightness than to changes in chroma. Therefore, NTSC’s transmission of chroma information with lower precision does not cause a significant decline in image color quality.

The YIQ used in the NTSC system can be converted to and from RGB (where R, G, B are gamma corrected, denoted as R’, G’, B’). The brightness signal Y in YIQ is defined as:

The chrominance signals defined in YIQ are as follows:

When mapped to the chromaticity diagram, it can also be found that parameter I contains orange-cyan color information, and parameter Q contains green-magenta color information. The mixture of I and Q can enhance the hue and saturation of the color. According to the above formulas, the conversion matrix between YIQ and R’G’B’ can be obtained as:

The value ranges for the digital signals R’, G’, B’ and Y, U, V are:

In practical use, the above equations are usually scaled to simplify the implementation of encoding and decoding of NTSC system digital signals.

PAL YUV color model

Since the combined analog video signal of NTSC YIQ allocates a lower bandwidth to chroma information, the color quality of NTSC YIQ images can be affected. When we start to pursue higher image quality, this problem needs to be improved, so various variants of YIQ encoding have been born to improve the color quality of video transmission.

The “YUV Color Model” is one such variant, it provides combined color information for video transmission in the PAL broadcasting system, and it can adjust the amount of chroma information transmitted using different sampling formats.

The YUV color model uses brightness and chroma to represent color. Its brightness information and chroma information are separated, where Y represents the brightness channel, and U and V represent the chroma channels. If there is only Y information, without U, V information, then the image represented is a grayscale image. YUV is commonly used in various image processing scenarios. When encoding photos or videos in YUV, considering that the human eye is more sensitive to brightness information than chroma information, it allows the reduction of chroma bandwidth.

The conversion equations between YUV used in the PAL system and RGB are as follows (here R, G, B are gamma corrected, denoted as R’, G’, B’):

Expressed in matrix operations:

The value ranges for the digital signals R’, G’, B’ and Y, U, V are:

In practical use, the above equations are usually scaled to simplify the implementation of encoding and decoding of PAL system digital signals.

ITU-R YCbCr color model

The YCbCr color model, a widely used variant of YUV, is designed for digital video conversion. YCbCr was first established by ITU-R (International Telecommunication Union - Radiocommunication Sector, formerly known as the International Radio Consultative Committee CCIR) in ITU-R BT.601 and is also involved in subsequent standards such as ITU-R BT.709 and ITU-R BT.2020.

The mentioned ITU-R BT.601/709/2020 series of standards specify the sampling rate used when converting color video to digital images, the conversion relationship between RGB and YCbCr color models, etc. They are aimed at Standard Definition Television (SDTV), High Definition Television (HDTV), and Ultra High Definition Television (UDTV) application scenarios respectively.

The coordinates of the RGB tricolor and white point corresponding to their color gamut in the CIE 1931 x-y chromaticity diagram are as follows:

In fact, YCbCr is not an absolute color model (an absolute color model refers to a model that can accurately represent colors without relying on any external factors), but a scaled and offset version based on the YUV color model. In it, Y is consistent with Y in YUV, representing brightness; Cb and Cr, like U and V, represent chroma, but they are different in representation. Cb reflects the difference between the blue part of the RGB input signal and the brightness value of the RGB signal, and Cr reflects the difference between the red part and the brightness.

YCbCr is a color model widely used in digital image processing. JPEG, MPEG, DVD, camcorders, digital TV, etc., all use this model. Therefore, the commonly referred to YUV is mostly referring to YCbCr.

1) The conversion formula between YCbCr and RGB for SDTV

Here, R, G, B are gamma corrected, represented as R’, G’, B’:

Expressed in matrix operation:

In theory, the range of values for R’, G’, B’ and Y, Cb, Cr are:

When converting Y, Cb, Cr to R’, G’, B’, the nominal range of R’, G’, B’ is 16-235, but it is also possible to get values of 0-15 and 236-255. This is because video data processing or noise may cause the values of Y and CbCr to fall outside of 16-235 and 16-240. Therefore, when processing data, we still need to give them a numerical space of 0-255 to prevent overflow problems.

In computer systems, if the range of R’G’B’ can be guaranteed to be 0-255, the following conversion matrix will be more convenient:

The range of values for R’, G’, B’ and Y, Cb, Cr are:

2) The conversion formula between YCbCr and RGB for HDTV

Similar to SDTV, when the nominal range of R’, G’, B’ is 16-235, the conversion matrix between YCbCr and RGB is:

In theory, the range of values for R’, G’, B’ and Y, Cb, Cr are:

In computer systems, if the range of R’G’B’ can be guaranteed to be 0-255, the following conversion matrix will be more convenient:

The range of values for R’, G’, B’ and Y, Cb, Cr are:

3) YCbCr Sampling Formats

Common YCbCr sampling formats include:

4:4:4: Represents full sampling, retaining the complete information of the CbCr component.
4:2:2: Indicates that the CbCr component has a 2:1 horizontal sampling and full vertical sampling. Compared to the complete information, it compresses 33.3% of the data volume.
4:1:1: Indicates that the CbCr component has a 4:1 horizontal sampling and full vertical sampling. Compared to the complete information, it compresses 50% of the data volume.
4:2:0: Indicates that the CbCr component has a 2:1 horizontal sampling and 2:1 vertical sampling. This does not mean that there is only Cb and no Cr, but for each row, there is only one Cb or Cr component. If the first row is 4:2:0, then the next row is 4:0:2, and so on. Compared to the complete information, it compresses 50% of the data volume.

Several diagrams are used to intuitively represent the sampling methods of 4:4:4, 4:2:2, 4:1:1, and 4:2:0.

4) YCbCr Storage Formats

In YCbCr, common storage formats are generally divided into two types:

Planar format: First continuously store all pixel points of Y, followed by storing all pixel points of Cb and Cr.
Packed format: The Y, Cb, and Cr of each pixel point are continuously interleaved.

Sometimes, the planar format and packed format are used in combination.

The terms we often hear, such as I420, YV12, NV12, NV21, etc., actually correspond to the storage methods of various components of YCbCr. They all belong to the 4:2:0 sampling format, but they are divided into two major categories according to the storage method:

420P (planar) planar format, I420 and YV12 belong to this category.
420SP (semi-planar) mixed format, NV12 and NV21 belong to this category.

Gamma correction

In the previous introduction of color model conversion, we mentioned gamma correction, but did not explain what it is. Let's explore this concept here.

If you have come across older TVs or computer monitors, you should know a display device called a Cathode Ray Tube (CRT). The display principle of this device is to use a voltage to bombard its screen to emit light to display images. People found a problem when using CRT: adjusting the voltage to n times the original does not increase the screen brightness by n times, but a power-law curve relationship, roughly represented by the formula:

The brightness produced by a typical CRT monitor is approximately the 2.2 power of the input voltage, that is, in the above formula, γ is 2.2, which is called display gamma.

Due to the existence of the display gamma problem, a corresponding gamma correction needs to be performed before inputting the image to the display, so that the brightness of the final displayed image is linearly proportional to the brightness of the captured real scene. The gamma processing here is called encoding gamma. This process is usually completed in the circuit of the image acquisition device. For example, for a TV camera, the perceived brightness Y needs to be remapped by the inverse gamma with a typical value of 1/γ = 0.45:

The nonlinear conversion process of gamma correction here not only solves the display gamma problem, but also brings a beneficial side effect: the increased noise during transmission (in the era of analog signals) will be reduced in the darker signal area where the noise is more obvious (after gamma correction at the receiver). Because our visual system is sensitive to relative brightness differences, as shown in the figure below, the nonlinear gradient after gamma correction is obviously more uniform to human eye perception:

After the invention of color TV, the R, G, B signals will be gamma corrected separately and then combined to encode.

To this day, although we no longer have analog noise in the transmission system, quantization is still needed when compressing signals, so gamma correction on sensor data is still useful.

However, in some image processing scenarios in computer vision, the brightness information of the image needs to be in a linear space to proceed, so it is necessary to undo the gamma correction before processing. After the processing is completed, gamma correction may need to be redone before inputting the image to the display.

Therefore, a complete image acquisition and display system requires at least two gamma values: 1) encoding gamma, which reflects the relationship between the scene brightness value obtained by the device and the encoded pixel value; 2) display gamma, which reflects the relationship between the encoded pixel value and the brightness of the display.

The product of encoding gamma and display gamma is the end-to-end gamma of the entire image system. If this product is 1, then the brightness of the displayed image is linearly proportional to the brightness of the captured real scene.

When your display has been calibrated to the standard gamma of 2.2, the figure below shows the impact of different display gammas on the system's end-to-end gamma and the final image display effect when the encoding gamma is determined:

As mentioned above, display gamma is a problem brought by CRT monitors. Nowadays, we have basically said goodbye to CRT monitors and commonly use LCD monitors. So is there no problem with display gamma? In fact, LCD monitors themselves do not have the gamma effect of CRT monitors, but for compatibility, LCD and other non-CRT display devices have simulated this gamma effect to achieve previous compatibility.

Follow me on:

(Foks) Hui Wang's LinkedIn

(Foks) Hui Wang

Senior iOS Developer