第 3 節

Basic Image Operations

0瀏覽次數0訪問次數--跳出率--平均停留

After acquiring an image, the first step is to understand the basic operations for processing it, such as separating image colors, modifying pixels, stretching and rotating the image, and even adding basic shapes to the image for simple processing. Therefore, this chapter focuses on the fundamental image operations provided in OpenCV 4, including an introduction to color spaces, pixel operations, image shape transformations, drawing geometric shapes, and generating image pyramids.

Image Color Space

By mixing different proportions of red, green, and blue, an image can display a wide range of colors. This model is called the RGB color model. The RGB color model is one of the most common color models and is often used to represent and display images. To represent the mixing of these three colors, an image stores the red, green, and blue components separately in multiple channels. In addition to the RGB color model, there are also other color models such as YUV and HSV, which represent components like brightness, chrominance, and saturation. Understanding image color spaces is helpful for segmenting images with color-distinctive features. For example, extracting a red object from an image can be achieved by comparing pixel values in the red channel.

Color Models and Conversion

This section introduces several common color models that can be converted between each other in OpenCV 4, such as the RGB model, HSV model, Lab model, YUV model, and GRAY model. It also covers the mathematical conversion relationships between these models and the transformation functions provided in OpenCV 4 for converting between them.

1. RGB Color Model

Earlier, the RGB color model was introduced. This model is named after the first letters of three colors: Red, Green, and Blue. Although the naming order places red first, in OpenCV the order is reversed: the first channel is the blue (B) component, the second is the green (G) component, and the third is the red (R) component. Depending on the storage order, OpenCV 4 provides a reversed format to store images where the first channel is the red component. However, both formats share the same color space, as illustrated in Figure 3-1. The three channels describe color over the same range, so the RGB color model forms a cubic spatial structure. In the RGB color model, all colors are produced by mixing these three colors in different proportions. If all three color components are 0, the result is black; if all three components are at their maximum and equal, the result is white. Each channel represents the intensity of a specific color ranging from 0 to 1. Images with different bit depths divide this intensity range into different levels. For example, in an 8UC3 format image, each channel quantizes this range into 256 levels, represented by values from 0 to 255. Adding a fourth channel to this model results in the RGBA model, where the fourth channel represents transparency. When transparency is not needed, the RGBA model reverts to the RGB model.

alt text

2. YUV Color Model

The YUV model is a color encoding method used in television signal systems. The three variables represent the brightness (Y) of a pixel, the difference between the red component and brightness (U), and the difference between blue and brightness (V). This color model is primarily used for video and image transmission, and its development is closely tied to the evolution of television sets. Since color televisions were invented after black-and-white televisions, the video signals used for color TVs needed to be compatible with black-and-white TVs. Color TVs require three channels of data to display color, while black-and-white TVs only need one channel. Therefore, to make video signals compatible with both color and black-and-white TVs, the RGB encoding method was converted into the YUV encoding method. The Y channel represents the brightness of the image, and black-and-white TVs only need this channel to display a black-and-white video image. Meanwhile, color TVs can display color images by converting the YUV encoding back to RGB encoding, effectively solving the problem of a single video signal being compatible with different types of televisions. The conversion relationship between the RGB model and the YUV model is shown in Equation (3-1), where the RGB values range from 0 to 255.

alt text

3. HSV Color Model

HSV stands for Hue, Saturation, and Value. As the name suggests, this model describes colors using these three characteristics. Hue is the fundamental attribute of color—what we commonly refer to as colors like red or blue. Saturation refers to the purity of a color; the higher the saturation, the purer and more vivid the color, while lower saturation makes the color gradually appear grayer and darker. Saturation ranges from 0 to 100%. Value indicates the brightness of a color, ranging from 0 to the maximum value allowed by the computer. Because the ranges of hue, saturation, and value differ, the color space model is represented as a cone, as shown in Figure 3-2. Compared to the RGB model, where the relationship between the three color components and the final color is not intuitive, the HSV model aligns more closely with how humans perceive color: in terms of hue, depth, and brightness.

alt text

4. Lab Color Model

The Lab color model compensates for the shortcomings of the RGB model and is a device-independent color model based on physiological characteristics. In this model, L represents luminosity, while a and b are two color channels, each ranging from -128 to 127. The a channel transitions from green to red as its value increases, and the b channel transitions from blue to yellow as its value increases. The color space formed by the Lab color model is spherical, as shown in Figure 3-3.

alt text

5. GRAY Color Model

The GRAY model is not a color model, but a grayscale image model. Its name uses the English word "gray" in all capital letters. A grayscale image has only a single channel, and the grayscale values range from 0 to the maximum value depending on the image bit depth, representing black to white. For example, in the 8UC1 format, black to white is quantized into 256 levels, represented by 0 to 255, where 255 represents white. Color images have the characteristics of rich color and high information content, but grayscale images still have certain advantages in image processing. For instance, grayscale images of the same size and compression format occupy less storage space, are easier to capture, and are more convenient for transmission. A common method for converting an RGB model to a grayscale image is shown in Equation (3-2).

alt text

6. Conversion Between Different Color Models

For converting between different color models of images, OpenCV 4 provides the cvtColor() function to perform the conversion. The prototype of this function is given in Code Listing 3-1.

Code Listing 3-1 cvtColor() function prototype

void cv::cvtColor(InputArray src,
                  OutputArray dst,
                  int code,
                  int dstCn = 0);

src: The original image whose color model is to be converted.
dst: The target image after color model conversion.
code: Color space conversion flags, such as from RGB space to HSV space. Commonly used flags and their meanings are given in Table 3-1.
dstCn: The number of channels in the destination image. If the parameter is 0, the number of channels is automatically derived from src and code.

The function is used to convert an image from one color model to another. The first two parameters are for inputting the image to be converted and the target image after color space conversion. The third parameter declares the specific conversion model space for the function. Commonly used flags are listed in Table 3-1, and readers can refer to the OpenCV 4 tutorial for detailed flag information. The fourth parameter generally does not require special configuration and can use the default value. It is important to note the pixel value range of the image before and after the transformation. Since 8-bit unsigned images have pixel values from 0 to 255, 16-bit unsigned images range from 0 to 65535, and 32-bit floating-point images range from 0 to 1, the pixel range of the target image must be carefully considered. In the case of linear transformations, the range issue does not need to be addressed, as the target image's pixels will not exceed the range. However, for nonlinear transformations, the input RGB image should be normalized to an appropriate range to obtain correct results. For example, when converting an 8-bit unsigned image to a 32-bit floating-point image, the pixel values must first be scaled to the range of 0 to 1 by dividing by 255 to prevent erroneous results.

Note: If an alpha channel (the fourth channel in the RGB model, representing transparency) is added during the conversion, its value will be set to the maximum of the corresponding channel range: 255 for CV_8U, 65535 for CV_16U, and 1 for CV_32F.

Table 3-1 Common flag parameters for color model conversion in the cvtColor() function

Flag parameters	Quick note	effect
`COLOR_BGR2BGRA`	0	Add an alpha channel to the RGB image.
`COLOR_BGR2RGB`	4	Changing the color channel order of a color image
`COLOR_BGR2GRAY`	10	Convert a color image to a grayscale image.
`COLOR_GRAY2BGR`	8	Convert grayscale image to color image (pseudo-color)
`COLOR_BGR2YUV`	82	RGB color model converted to YUV color model
`COLOR_YUV2BGR`	84	Convert the YUV color model to the RGB color model.
`COLOR_BGR2HSV`	40	Convert the RGB color model to the HSV color model.
`COLOR_HSV2BGR`	54	HSV color model converted to RGB color model
`COLOR_BGR2Lab`	44	Convert the RGB color model to the Lab color model.
`COLOR_Lab2BGR`	56	Convert the Lab color model to the RGB color model.

To intuitively see what the same image looks like in different color spaces, Code Listing 3-2 provides a program for converting between the several color models mentioned earlier. The results are shown in Figure 3-4. It should be noted that the Lab color model includes negative values, and images displayed using the imshow() function cannot show negative values. Therefore, the results include an example of displaying the image in the Lab model using the Image Watch plugin. In the program, to prevent numerical overflow after conversion, we first convert the CV_8U type to CV_32F before performing the color model conversion.

Code Listing 3-2 myCvColor.cpp: Converting Between Image Color Models

#include "chapter3_1_cvclolor/inc/CvColor.hpp"
#include <cstdio>
#include <opencv2/opencv.hpp>


int opencv_function9(void)
{
  const std::string & file_name = std::string(MEDIA_PATH) + "林星阑L.jpg";

  cv::Mat img = cv::imread(file_name);
  if(img.empty() == true)
  {
      std::cout << "请确认图像文件是否正确，请检查输入格式" << std::endl;
      return 1;
  }
  else
  {
      std::cout << "图像成功读取!" << std::endl;
  }
  cv::Mat img32;
  cv::Mat gray,HSV,YUV,Lab;
  img.convertTo(img32,CV_32F,1.0/255);   //缩放因子:1.0/255指将现在的图像的范围转换为目标图像的范围需要乘以的因数
  cv::cvtColor(img32,HSV,cv::COLOR_BGR2HSV);
  cv::cvtColor(img32,YUV,cv::COLOR_BGR2YUV);
  cv::cvtColor(img32,Lab,cv::COLOR_BGR2Lab);
  cv::cvtColor(img32,gray,cv::COLOR_BGR2GRAY);

  cv::namedWindow("原图BGR",cv::WINDOW_FREERATIO);
  cv::namedWindow("HSV",cv::WINDOW_FREERATIO);
  cv::namedWindow("YUV",cv::WINDOW_FREERATIO);
  cv::namedWindow("Lab",cv::WINDOW_FREERATIO);
  cv::namedWindow("gray",cv::WINDOW_FREERATIO);
  cv::imshow("原图BGR",img32);
  cv::imshow("HSV",HSV);
  cv::imshow("YUV",YUV);
  cv::imshow("Lab",Lab);
  cv::imshow("gray",gray);
  cv::waitKey(0);

  return 0;
}

alt text

In our program, we used the convertTo() function, a built-in data type conversion method of the Mat class in OpenCV 4. When working with image data, we often encounter the need to convert between different data types. Therefore, the usage of this conversion function is explained in detail below. Code Listing 3-3 provides the function prototype.

Code Listing 3-3 convertTo() function prototype

void cv::Mat::convertTo(OutputArray m,
                        int rtype,
                        double alpha = 1,
                        double beta = 0);

m: The image output after converting the type.
Type: Convert the data type of the image.
alpha: scaling factor during the conversion process.
beta: bias factor during the conversion process.

This function is used to convert an existing image into an image of a specified data type. The first parameter outputs the image after the data type conversion, and the second parameter declares the data type of the converted image. The third and fourth parameters specify the conversion relationship between the two data types, with the specific conversion form shown in Equation (3-3).

alt text

From the conversion formula, it can be seen that this conversion method performs a linear transformation on the original data and outputs it according to the specified data type. Based on its conversion rules, this function can not only convert between different data types but also perform linear transformations within the same data type. In Code Listing 3-2, we provide an example of converting between the CV_8U and CV_32F types. Conversions between other types are similar and will not be repeated here. Readers are encouraged to explore on their own and gain hands-on experience with this function through practice.

Multi-channel separation and merging

In image color models, different components are stored in separate channels. If we only need a specific component of a color model—for example, processing only the red channel of an RGB image—we can extract the red channel from the three-channel data and process it separately. This approach reduces the memory occupied by the data and speeds up program execution. At the same time, after processing multiple channels individually, we need to merge all the channels back together to reconstruct the RGB image. For splitting and merging multi-channel images, OpenCV 4 provides the split() function and the merge() function to meet these needs.

1. Multi-channel separation function split()

In OpenCV 4, the multi-channel separation function split() has two overloaded prototypes, as shown in Code Listing 3-4.

Code Listing 3-4 split() function prototype

void cv::split(const Mat &src,
               Mat *mvbegin);

void cv::split(InputArray m,
               OutputArrayOfArrays mv);

mvbegin: The separated single-channel image, in array form, with the array size needing to match the number of channels in the image.
m: Multi-channel image to be separated.
mv: The separated single-channel image, in vector form.

This function is primarily used to split a multi-channel image into several single-channel images. The difference between the two function prototypes is that in the former, the second parameter outputs an array of type Mat, whose length must equal the number of channels in the multi-channel image and be defined in advance; in the second function prototype, the second parameter outputs a vector<Mat> container, without needing to know the number of channels in the multi-channel image. Although the types of the input parameters differ between the two function prototypes, the principle of channel splitting is the same.

2. Multi-channel merge function merge()

In OpenCV 4, the multi-channel merging function merge() also has two overloaded prototypes, as shown in Code Listing 3-5. Listing 3-5 merge() function prototype

void cv::merge(const Mat *mv,
               size_t count,
               OutputArray dst
                )

void cv::merge(InputArrayOfArrays mv,
               OutputArray dst);

mv (first overload prototype parameter): an array of images to be merged, where each image must have the same dimensions and data type.
count: The length of the input image array must be greater than 0.
mv (second overloaded prototype parameters): a vector of images to be merged, where each image must have the same size and data type.
dst: The merged output image has the same dimensions and data type as mv0, with the number of channels equal to the sum of the channels of all input images.

This function is primarily used to merge multiple images into a single multi-channel image. It also has two different function prototypes, each corresponding to the split() function. The two prototypes accept image data in array form and vector form, respectively. In the prototype that takes array-form data, the length of the array must also be provided. The output of the merge function is a multi-channel image, where the total number of channels is the sum of the channel counts of all input images. It is worth noting that the images being merged do not all have to be single-channel; they can also be images with different numbers of channels merged into an image with even more channels. Although the channel counts of these images can differ, all images must have the same size and data type.

3. Image Multi-Channel Separation and Merging Example

To help readers become more familiar with the operations of splitting and merging image channels, and to deepen their understanding of the roles of different image channels, Code Listing 3-6 implements the functionality for multi-channel image splitting and merging. The program uses two function prototypes to split RGB and HSV images respectively. To verify that the merge() function can merge images with different numbers of channels, the program uses both function prototypes to merge multiple images with varying channel counts. The resulting merged image has 5 channels and cannot be displayed using the imshow() function, so we use the Image Watch plugin to view the merged result. Since the three separate channels of the RGB image all appear gray and look very similar, Figure 3-5 does not show the split results. Instead, it only shows the merged image displayed in green, along with the HSV split results. Readers can run the program themselves to view the other results. Code Listing 3-6 mySplitAndMerge.cpp implements image splitting and merging

#include "chapter3_1_cvclolor/inc/SplitAndMerge.hpp"
#include <cstdio>
#include <opencv2/opencv.hpp>

bool Mat_Arr_Split_merge(const std::string &file_name);
bool Vector_Split_merge(const std::string &file_name);


int opencv_function10(void)
{
  Mat_Arr_Split_merge(std::string(MEDIA_PATH) + "林星阑L.jpg");
//   Vector_Split_merge(std::string(MEDIA_PATH) + "林星阑L.jpg");

  return 0;
}

bool Mat_Arr_Split_merge(const std::string &file_name)
{
  cv::Mat img = cv::imread(file_name);
  if(img.empty() == false)
  {
    printf("成功读取图片!");
  }
  else
  {
    printf("无法读取图片，请确定图片文件是否存在，输入格式是否正确!");
    return false;
  }
  cv::Mat imgs[3];
  cv::Mat result[2];
  cv::split(img,imgs);
  cv::namedWindow("RGB-B通道",cv::WINDOW_FREERATIO);
  cv::namedWindow("RGB-G通道",cv::WINDOW_FREERATIO);
  cv::namedWindow("RGB-R通道",cv::WINDOW_FREERATIO);
  cv::imshow("RGB-B通道",imgs[0]);
  cv::imshow("RGB-G通道",imgs[1]);
  cv::imshow("RGB-R通道",imgs[2]);
  imgs[2] = img;  //改变图像通道数量
  cv::merge(imgs,3,result[0]);
  // cv::namedWindow("合并图像结果0",cv::WINDOW_FREERATIO);
  // cv::imshow("合并图像结果0",result[0]);   //imshow最多显示4个通道，需要Image Watch插件
  cv::Mat zero = cv::Mat::zeros(img.rows,img.cols,CV_8UC1);  //一个通道的0矩阵
  imgs[0] = zero;
  imgs[2] = zero;
  cv::merge(imgs,3,result[1]);
  cv::namedWindow("合并图像结果1",cv::WINDOW_FREERATIO);
  cv::imshow("合并图像结果1",result[1]);   //显示合并结果
  cv::waitKey(0);
  cv::destroyAllWindows();
  return true;
}

bool Vector_Split_merge(const std::string &file_name)
{
  cv::Mat img = cv::imread(file_name);
  if(img.empty() == false)
  {
    printf("成功读取图片!");
  }
  else
  {
    printf("无法读取图片，请确定图片文件是否存在，输入格式是否正确!");
    return false;
  }
  cv::Mat HSV;
  cv::Mat result;
  cv::cvtColor(img,HSV,cv::COLOR_RGB2HSV);
  std::vector<cv::Mat> imgv;
  cv::split(HSV,imgv);
  cv::namedWindow("HSV-H通道",cv::WINDOW_FREERATIO);
  cv::namedWindow("HSV-S通道",cv::WINDOW_FREERATIO);
  cv::namedWindow("HSV-V通道",cv::WINDOW_FREERATIO);
  cv::imshow("HSV-H通道",imgv.at(0));
  cv::imshow("HSV-S通道",imgv.at(1));
  cv::imshow("HSV-V通道",imgv.at(2));
  imgv.push_back(HSV);  //将imgv中的图像通道变成不一致
  cv::merge(imgv,result);
  // cv::namedWindow("合并图像结果2",cv::WINDOW_FREERATIO);
  // cv::imshow("合并图像结果2",result);   //imshow最多显示4个通道，需要Image Watch插件
  cv::waitKey(0);
  cv::destroyAllWindows();
  return true;
}

alt text

Image pixel manipulation processing

After understanding the different channels of an image, the next step is to introduce operations related to image pixels within each channel. The concept of pixels has already been covered earlier—for example, in a CV_8U image, pixel values range from black to white across 256 levels, with grayscale values from 0 to 255 representing this transition. Therefore, the grayscale value of a pixel indicates the brightness level at a given position, and the degree of change in grayscale values also reflects the variation in image texture. As a result, understanding pixel-related operations is the first step in comprehending image content.

Image pixel statistics

We can understand a digital image as a matrix of a certain size, where the value of each element represents the brightness of each pixel in the image. Therefore, finding the maximum value in the matrix means locating the pixel with the highest grayscale value in the image, and calculating the average means computing the mean grayscale of the image pixels, which can be used to represent the overall brightness of the image. As a result, statistical operations on matrix data also hold significance and utility for image pixels. OpenCV 4 integrates numerous functions for statistical analysis of image pixels, such as finding the maximum, minimum, mean, and standard deviation. The following sections provide a detailed introduction to these related functions.