第 2 節

Data loading, display, and saving

0瀏覽次數0訪問次數--跳出率--平均停留

To process an image, you first need to obtain it. In everyday life, we can capture photos using devices like cameras and smartphones, and store them on a hard drive in a certain format. Similarly, when performing image processing in a computer program, it is also necessary to obtain image data in a specific way, store it in an appropriate container using a certain data type, and then display it to the user in some form. Therefore, this chapter will cover the loading, storage, and output of image data, including reading images and videos, creating and using image storage containers, and saving processed results in the form of images or videos.

Image storage container

Unlike the images we see in daily life, digital images are stored in computers in the form of matrices. Each element in the matrix is used to describe some kind of information in the image, such as brightness, color, etc., as shown in Figure 2-1. The essence of digital image processing is the process of extracting deeper information from these matrix data through a series of operations. Therefore, the first step in learning image processing is to master how to manipulate these matrix data. For readers familiar with C++ programming, strings are typically stored as the string type, and integers as the int type. Similarly, OpenCV provides a Mat class for storing matrix data. This section will provide a detailed introduction to the usage of the Mat class and the operations it supports. By studying this, you will be able to flexibly use variables of the Mat type in your programs.

alt text

Introduction to the Mat Class

In early versions of OpenCV 1.0, images were stored using a C language structure called IplImage. As a result, it can still be seen in some older OpenCV tutorials.

However, the IplImage type has a clear drawback — it requires the user to manually release memory. If there are still unreleased IplImage variables when the program ends, it will lead to memory leaks.

As OpenCV continues to evolve, the library has introduced a C++ interface and provides the Mat class for data storage. The Mat class uses an automatic memory management mechanism, effectively solving the problem of memory release. When a variable is no longer in use, the memory it occupies is automatically freed.

The Mat class is used to store matrix-type data, including vectors, matrices, as well as grayscale images and color images.

Structurally, the Mat class consists of two parts: a matrix header and a pointer to the actual data. The matrix header contains information such as the matrix size, storage method, data address, and reference count. The size of the matrix header is fixed and does not change with the matrix dimensions.

In the vast majority of cases, the matrix header occupies far less space than the matrix data itself. Therefore, during image copying and transmission, the primary overhead comes from the data portion.

To solve this problem, when copying or passing images, OpenCV does not copy the full data — it only copies the matrix header and the pointer to the data. Therefore, when creating a Mat object, you can first create the matrix header and then assign data to it. The specific method is shown in Listing 2-1.

cv::Mat a;                      // 创建一个名为 a 的矩阵头
a = cv::imread("test.jpg");    // 读取图像数据，使 a 指向图像像素数据
cv::Mat b = a;                 // 复制矩阵头，并命名为 b

The code above first creates a matrix header named a, then reads an image and makes the matrix pointer in a point to the pixel data of that image. Finally, it copies the matrix header of a to b. It is important to note that although a and b each have their own independent matrix headers, their matrix pointers point to the same block of data. Therefore, modifying the data through either matrix header will also change the data accessed by the other matrix header. Additionally, when variable a is deleted, b does not become empty data; the underlying matrix data is only truly released when both a and b have been freed. This is because a reference count is maintained in the matrix header, which records how many objects are currently sharing the same block of data. The data is only released when the reference count drops to zero.

Tip: Using reference counting to release stored content is a common approach in C++. This method prevents program crashes that could occur if data is deleted while still being referenced by a variable, while also significantly reducing the memory footprint during program execution.

Next, we will explain the data types that can be stored in the Mat class. According to the official Mat class inheritance diagram shown in Figure 2-2, we can see that the data types that can be stored in the Mat class include double, float, uchar, unsigned char, as well as custom templates.

alt text

We can declare a Mat class variable that stores a specified type using the method shown in Code Listing 2-2.

cv::Mat A = cv::Mat_<double>(3, 3);  // 创建一个 3×3 的 double 类型矩阵

Since the Mat class in OpenCV is primarily used to store image data, the range of pixel values directly affects image quality. Using 8-bit unsigned integers to store 16-bit image data can cause severe color distortion and even data errors.

Additionally, compilers with different bit widths may define data type lengths differently. To prevent program issues caused by inconsistent variable bit widths across different platforms or environments, OpenCV standardizes data types.

Therefore, OpenCV defines a dedicated data type system based on the number of bits used to store numeric variables. Table 2-1 lists the commonly used data types in OpenCV and their value ranges.

Data Type	Specific type	value range
CV_8U	8-bit unsigned integer	0 ～ 255
CV_8S	8-bit signed integer	-128 ～ 127
CV_16U	16-bit unsigned integer	0 ～ 65535
CV_16S	16-bit signed integer	-32768 ～ 32767
CV_32S	32-bit signed integer	-2147483648 ～ 2147483647
CV_32F	32-bit floating point number	-FLT_MAX ～ FLT_MAX，INF，NaN
CV_64F	64-bit floating-point number	-DBL_MAX ～ DBL_MAX，INF，NaN

Data types alone are not sufficient to fully describe image data; the number of image channels also needs to be defined. For example, a grayscale image is single-channel data, while a color image is typically 3-channel (RGB) or 4-channel (such as RGBA) data.

To meet this requirement, OpenCV defines channel count identifiers: C1, C2, C3, and C4, representing single-channel, dual-channel, three-channel, and four-channel respectively. Since each data type may correspond to a different number of channels, OpenCV combines the "data type" and "number of channels" to fully describe the image data type. For example: CV_8UC1: represents 8-bit unsigned, single-channel data, typically used for grayscale images. CV_8UC3: Represents 8-bit unsigned, three-channel data, typically used for color images. We can create a Mat object with a specified data type and number of channels using the method shown in Code Listing 2-3.

cv::Mat a(640, 480, CV_8UC3);  // 创建一个 640×480 的 3 通道矩阵（用于彩色图像）

cv::Mat b(3, 3, CV_8UC1);      // 创建一个 3×3 的 8 位无符号单通道矩阵

cv::Mat c(3, 3, CV_8U);        // 创建单通道矩阵，C1 标识可以省略

Although both uchar and CV_8U represent 8-bit unsigned integers in a 64-bit compiler, there is a strict distinction between them: CV_8U can only be used as a data type identifier within the Mat class. Using Mat_<CV_8U>(3,3) or Mat a(3,3,uchar) will result in a creation error.

Mat Class Construction and Assignment

The previous section introduced three methods for constructing Mat class variables, but the latter two did not initialize the variables with values. This section will focus on how to flexibly construct and assign values to Mat class variables. According to the OpenCV source code definition, there are over 20 ways to construct the Mat class. However, in many simple everyday applications, most of these complex construction methods are not particularly useful. Therefore, this book emphasizes the construction and assignment methods that the author commonly uses in learning and project development.

1. Construction of the Mat Class (1) Using the default constructor (see Code Listing 2-4)

cv::Mat::Mat();

Using Code Listing 2-4, a Mat class is constructed with the default constructor. This construction method does not require any input parameters. When assigning values to the variable later, it automatically determines the matrix type and size, enabling flexible storage. It is commonly used to store read image data and the output results of certain function operations.

(2) Construct based on the input matrix size and type (see Code Listing 2-5)

cv::Mat::Mat(int rows,
             int cols,
             int type);

rows: Number of rows in the constructed matrix.
cols: The number of columns in the matrix.
type: In addition to CV_8UC1, CV_64FC4, and other 1-to-4 channel types, it also provides parameters for more channels. A multi-channel matrix can be constructed using n in CV_8UC(n), where n can be up to 512.

We have seen this constructor in the previous post as well. It constructs the matrix by specifying the number of rows, columns, and the data type for storage. This definition is clear, intuitive, and easy to read. It is commonly used when the data size and type are clearly known, such as for a camera's intrinsic matrix or an object's rotation matrix. There is a variation of the method for constructing a Mat class using the input matrix dimensions and data type, which assigns values by combining rows and columns into a Size() structure. Code Listing 2-6 provides the prototype for this construction method.

Code Listing 2-6 Constructing a Mat class using the Size() structure

cv::Mat::Mat(Size size(),
             int type);

size: The dimension of a two-dimensional array variable, assigned using Size(cols, rows).
type: Consistent with the parameters in Code Listing 2-5.

When constructing a Mat class in this way, be especially careful: in the Size() structure, the order of rows and columns is reversed compared to the method in Code Listing 2-5. When using Size() (see Code Listing 2-7), the column comes first and the row comes after. If you don't pay attention, the Mat class will still be constructed successfully, but when you need to access a specific element, you may not realize the rows and columns are swapped, which could lead to array out-of-bounds errors.

Code Listing 2-7 Example of constructing a Mat using the Size structure

cv::Mat a(Size(480, 640), CV_8UC1);   // 构造一个行为640、列为480的单通道矩阵
cv::Mat b(Size(480, 640), CV_32FC3); // 构造一个行为640、列为480的3通道矩阵

(3) Construct using an existing matrix (see Code Listing 2-8)

cv::Mat::Mat(const Mat &m);

m: The Mat class matrix data that has already been constructed.

This construction method is very simple and can create a variable that stores the same content as an existing Mat class variable. Note that this method only copies the matrix header of the Mat class, and the matrix pointer points to the same address. Therefore, if the data in the matrix is modified through one Mat class variable, the data in the other variable will also change.

Tip: If you want to create two identical Mat classes that do not affect each other, you can use the clone() function.

If the matrix to be constructed is smaller than the existing matrix and stores a subset of its content, the method in Code Listing 2-9 can be used to build it.

Listing 2-9 Constructing a subclass of an existing Mat class

cv::Mat::Mat(const Mat &m,
             const Range &rowRange,
             const Range &colRange = Range::all());

m: The Mat class matrix data that has already been constructed.
rowRange: The range of rows to extract from the existing matrix. It is a Range variable. For example, extracting rows 2 through 5 can be expressed as Range(2,5).
colRange: The range of columns to extract from the existing matrix. It is a Range variable. For example, extracting from column 2 to column 5 can be expressed as Range(2,5). When no value is entered, all columns are extracted.

This method is mainly used for taking screenshots within the original image. However, it's important to note that the Mat class constructed in this way shares the same data with the existing Mat class — meaning if the data in one Mat class changes, the other will change accordingly.

Code Listing 2-10: Extracting a sub-Mat class from the original Mat

cv::Mat b(a, Range(2, 5), Range(2, 5));  // 从a中截取部分数据构造b
cv::Mat c(a, Range(2, 5));               // 默认最后一个参数构造c

2. Assigning values to the Mat class Once the Mat class is constructed, the variable does not yet contain any data; data needs to be assigned to it. For different scenarios, OpenCV 4.1 provides multiple assignment methods. The following explains how to assign values to Mat class variables.

(1) Assignment during construction (see Code Listing 2-11)

cv::Mat::Mat(int rows,
             int cols,
             int type,
             const Scalar &s);

rows: The number of rows in the matrix.
cols: The number of columns in the matrix.
type: The type of data to be stored.
s: Parameter variable for assigning values to each pixel in the matrix, for example Scalar(0,0,255).

This method assigns values during construction (see Code Listing 2-12), by placing the value to assign to each element into a Scalar structure. Note that this approach assigns the same value to every element in the image; for example, Scalar(0,0,255) assigns the three channel values of each pixel to 0, 0, and 255 respectively.

Code Listing 2-12 Example of assignment during construction

cv::Mat a(2, 2, CV_8UC3, cv::Scalar(0, 0, 255));  // 创建一个3通道矩阵，每个像素都是0,0,255
cv::Mat b(2, 2, CV_8UC2, cv::Scalar(0, 255));     // 创建一个2通道矩阵，每个像素都是0,255
cv::Mat c(2, 2, CV_8UC1, cv::Scalar(255));         // 创建一个单通道矩阵，每个像素都是255

We add a breakpoint before the program's return statement for debugging, and use Image Watch to inspect the data in each Mat class variable. The results are shown in Figure 2-3, confirming that the matrix has been successfully constructed and assigned values.

Tip: The number of variables in the Scalar structure must correspond to the number of channels defined. If the number of variables in the Scalar structure exceeds the number of channels, the values beyond the channel count will not be read. For example, after executing a(2,2,CV_8UC2,Scalar(0,0,255)), each pixel value will be (0,0), and 255 will not be read. If the number of variables in the Scalar structure is less than the number of channels, the missing values will be filled with 0.

alt text

(2) Assignment using enumeration

This assignment method lists all the elements in the matrix one by one and assigns them to the Mat class in the form of a data stream. The specific assignment format is shown in Code Listing 2-13.

Code Listing 2-13 Example of Assignment Using Enumeration

cv::Mat a = (cv::Mat_<int>(3, 3) << 1, 2, 3, 4, 5, 6, 7, 8, 9);
cv::Mat b = (cv::Mat_<double>(2, 3) << 1.0, 2.1, 3.2, 4.0, 5.1, 6.2);

The first line of code creates a 3×3 matrix containing the integers 1 through 9. The matrix is filled row by row: the first row stores 1, 2, 3; the second row stores 4, 5, 6; and the third row stores 7, 8, 9. The second line of code creates a 2×3 matrix, which is filled in the same way as matrix a.

Tip: When using the enumeration method, the number of input data items must match the number of matrix elements. For example, in the first line of code in Listing 2-13, if only 8 numbers (1 through 8) are entered, the assignment process will result in an error. Therefore, this method is typically used when the matrix data is relatively small.

(3) Assigning values using a loop

Similar to the enumeration assignment method, the loop assignment method also assigns values to each element in the matrix. However, it does not require assignment at the time of variable declaration, and it allows assigning values to any part of the matrix. The specific assignment format is shown in Code Listing 2-14.

Code Listing 2-14 Example of Assignment Using a Loop


cv::Mat c = cv::Mat(3, 3, CV_8UC1);  // 定义一个3*3的矩阵
for (int i = 0; i < c.rows; i++)    // 矩阵行数循环
{
    for (int j = 0; j < c.cols; j++)  // 矩阵列数循环
    {
        c.at<uchar>(i, j) = i + j;
    }
}

The code above also creates a 3×3 matrix, assigning a value to each element in the matrix using a for loop. It is important to note that when assigning values to each element of the matrix, the variable type declared in the assignment function must match the variable type defined for the matrix. That is, the variable types in line 1 and line 6 of Code Listing 2-14 must be the same. If line 6 is changed to c.at<double>, the program will report an error and fail to assign the values.

(4) Class method assignment

The Mat class provides methods for quick assignment, allowing initialization of specified matrices. For example, generating identity matrices, diagonal matrices, or matrices where all elements are 0 or 1. The specific usage is shown in Code Listing 2-15.

Code Listing 2-15 Example of Assignment Using Class Methods

cv::Mat a = cv::Mat::eye(3, 3, CV_8UC1);
cv::Mat b = (cv::Mat_<int>(1, 3) << 1, 2, 3);
cv::Mat c = cv::Mat::diag(b);
cv::Mat d = cv::Mat::ones(3, 3, CV_8UC1);
cv::Mat e = cv::Mat::zeros(4, 2, CV_8UC3);

The role of each function in the code above and the meaning of its parameters are explained as follows:

eye(): Constructs an identity matrix. The first two parameters are the number of rows and columns of the matrix, and the third parameter is the data type and number of channels for the matrix storage. If the number of rows and columns are not equal, the values at the main diagonal positions (1,1), (2,2), (3,3), etc. are set to 1.
diag(): Constructs a diagonal matrix. Its parameter must be a one-dimensional variable of type Mat, used to store the values of the diagonal elements.
ones(): Constructs a matrix of all ones, with the same parameter meanings as eye().
zeros(): Constructs a matrix filled entirely with zeros. The parameters have the same meaning as those in eye().

(5) Using arrays for assignment

This method is similar to the enumeration method, but it allows changing the number of channels in the Mat class matrix based on requirements, and can be seen as an extension of the enumeration method. Code Listing 2-16 shows the assignment format for this method.

Code Listing 2-16 Example of assignment using arrays

float a[8] = {5, 6, 7, 8, 1, 2, 3, 4};
cv::Mat b = cv::Mat(2, 2, CV_32FC2, a);
cv::Mat c = cv::Mat(2, 4, CV_32FC1, a);

This assignment method first stores the variables to be placed into a Mat object into an array. Then, by setting the dimensions and number of channels of the Mat matrix, the array variables are split into a matrix. This splitting method allows the number of channels in the matrix to be freely defined. When the number of elements in the matrix is greater than the amount of data in the array, the matrix will be filled with the value 1.0737418e+08. When the number of elements in the matrix is less than the amount of data in the array, after the matrix assignment is complete, the remaining data in the array will not be assigned. The process of assigning from the array to the matrix first assigns all channels of the first element in the matrix sequentially, then moves on to assign the next element. To better understand this process, the defined matrices b and c are shown in Figure 2-4.

alt text

Operations supported by the Mat class

When processing data, it is often necessary to perform addition, subtraction, multiplication, and division operations. For example, operations like image filtering and enhancement require pixel-level arithmetic. To facilitate these computations, variables of the Mat class support matrix arithmetic operations. This means that when using Mat variables, they can be treated as ordinary matrices. For instance, multiplying a Mat variable by a constant follows the same rules as matrix-scalar multiplication. When performing arithmetic between a Mat object and a constant, the standard arithmetic operators (+, -, *, /) can be used directly. Listing 2-17 provides an example program demonstrating addition, subtraction, multiplication, and division between a Mat variable and a constant.

Code Listing 2-17: Arithmetic operations on the Mat class

cv::Mat a = (cv::Mat_<int>(3, 3) << 1, 2, 3, 4, 5, 6, 7, 8, 9);
cv::Mat b = (cv::Mat_<int>(3, 3) << 1, 2, 3, 4, 5, 6, 7, 8, 9);
cv::Mat c = (cv::Mat_<double>(3, 3) << 1.0, 2.1, 3.2, 4.0, 5.1, 6.2, 2, 2, 2);
cv::Mat d = (cv::Mat_<double>(3, 3) << 1.0, 2.1, 3.2, 4.0, 5.1, 6.2, 2, 2, 2);
cv::Mat e, f, g, h, i;
e = a + b;
f = c - d;
g = 2 * a;
h = d / 2.0;
i = a - 1;

It is important to note that when performing addition or subtraction between two Mat class variables, the data types of both matrices must be identical. For example, two Mat class variables storing int and double data types respectively cannot be added or subtracted. Unlike regular multiplication and division, when a constant is used in an operation with a Mat class variable, the resulting data type retains that of the Mat class variable. For instance, if a double constant is used with an int type Mat class variable, the final result will still be of type int. In the last line of code in Listing 2-17, subtracting a constant from a Mat class variable means that every element in the Mat class variable is reduced by that constant.

When performing convolution operations on an image, two matrices need to be multiplied. OpenCV not only provides multiplication operations for two Mat class matrices, but also defines the inner product and element-wise multiplication of two matrices. Code Listing 2-18 shows the code implementation for multiplying two Mat class matrices.

Code Listing 2-18 Multiplication of two Mat class matrices

cv::Mat j, m;
double k;
j = c * d;  //乘法
k = a.dot(b);  //内积
m = a.mul(b);   //对位乘法

The matrix definition and assignment in Listing 2-18 are the same as in Listing 2-17. In the code, two Mat class variables and one double variable are defined, implementing multiplication, inner product, and element-wise multiplication of the two Mat class matrices, respectively.

The @ operator in line 3 represents the mathematical product of two matrices. For example, given two matrices A₃ₓ₃ and B₃ₓ₃, the result of the @ operation is matrix C₃ₓ₃, where each element in C₃ₓ₃ is expressed as: alt text

It is important to note that the "*" operation requires the number of columns in the first Mat matrix to equal the number of rows in the second Mat matrix. Additionally, this operation requires the data type in the Mat class to be one of the following four types: CV_32FC1, CV_64FC1, CV_32FC2, or CV_64FC2. In other words, for a two-dimensional Mat matrix, the stored data type must be either float or double. Line 4 in Code Listing 2-18 represents the inner product of two Mat class matrices. Based on the output, the dot() method returns a double type variable. This operation computes the dot product of a row vector and a column vector. For example, given two vectors d = d₁ d₂ d₃ and e = e₁ e₂ e₃, the result of the dot() method is:

alt text

It is important to note that the two input Mat matrices must have the same number of elements. However, regardless of the dimensions of the two input Mat matrices, both matrices will be expanded into a row vector and a column vector. Therefore, the result of the dot() operation is always a double-type variable.

Line 5 in Code Listing 2-18 represents the element-wise product of two Mat class matrices. Based on the output, it can be seen that the result of the mul() method is also a Mat class matrix. For two matrices A₃ₓ₃ and B₃ₓ₃, each element in the resulting matrix C₃ₓ₃ from the mul() method can be expressed as: alt text

It is important to note that, unlike the first two multiplication operations, the data stored in the two Mat matrices involved in the mul() method can be of any type, provided they are the same, and the default output data type remains consistent with the two Mat matrices. In the field of image processing, the commonly used data type is CV_8U, which ranges from 0 to 255. When two relatively large integers are multiplied together, overflow can occur, resulting in an output value of 255. Therefore, when using the mul() method, care must be taken to prevent data overflow.

Reading Mat class elements

For reading and modifying Mat class matrices, we have already introduced how to use the at method to assign values to each element of the matrix in the section on matrix loop assignment. This is just one of the many ways OpenCV provides to read matrix elements. This section will detail how to read elements from a Mat class matrix and modify their values.

Before learning how to read the elements of a Mat class matrix, you first need to understand how Mat class variables are stored in a computer. A multi-channel Mat matrix is similar to three-dimensional data, but a computer's storage space is two-dimensional. Therefore, when a Mat matrix is stored in a computer, the three-dimensional data is flattened into two-dimensional data: the data for each channel of the first element is stored first, followed by the data for each channel of the second element. The elements in each row are stored in this manner. Thus, if we find the starting position of each element, we can locate the data for each channel within that element. Figure 2-5 illustrates the storage method of a three-channel matrix, where consecutive blue, green, and red squares represent the three channels of each element.

alt text

Now that we understand how Mat class variables are stored, let's look at the properties of the Mat class. Table 2-2 lists the commonly used properties of the Mat class matrix, along with a detailed explanation of each property's purpose.

Table 2-2 Common attributes of the Mat class matrix

attribute	effect
cols	The number of columns in the matrix
rows	Number of rows in the matrix
step	Effective width of the matrix in bytes
elemSize()	Number of bytes per element
total()	Number of elements in a matrix
channels()	The number of channels in the matrix

Combining these attributes can yield most properties of a Mat matrix. For example, combining the step attribute with the cols attribute can calculate the number of bytes occupied by each element. Further combining this with the channels() attribute reveals the number of bytes per channel, thereby indicating the type of data stored in the matrix. The following example illustrates the use of each attribute: define a matrix using Mat(3, 4, CV_32FC3). In this case, the number of channels channels() is 3; the number of columns cols is 4; the number of rows rows is 3; the total number of elements in the matrix is 3 × 4, resulting in 12; the number of bytes per element is 32/8 × channels(), which in this example gives 12; the effective length in bytes, step, is elemSize() × cols, resulting in 48 in this example.

Common methods for reading elements of the Mat class matrix include reading via the at method, reading via the pointer ptr, reading via iterators, and reading via matrix element address positioning. The following provides a detailed introduction to these four reading methods.

1. Reading elements from a Mat class matrix using the at method

Reading matrix elements using the at method is divided into methods for single-channel and multi-channel access. Code Listing 2-19 provides the code for reading single-channel matrix elements via the at method. Code Listing 2-19 Reading a Single-Channel Matrix Element of the Mat Class Using the at Method

cv::Mat a = (cv::Mat_<uchar>(3, 3) << 1, 2, 3, 4, 5, 6, 7, 8, 9);
int value = (int)a.at<uchar>(0, 0);

When reading an element using the at method, you need to append "<data type>" afterward. If the data type specified here does not match the data type defined for the matrix, an error will occur due to a data type mismatch. This method provides the coordinates of the element to be read in the form (row, column). It is important to note that if the matrix is defined with the uchar data type, the data must be explicitly cast to int type when outputting; otherwise, the output will not be an integer.

Since a single-channel image is a two-dimensional matrix, you can access the element at a given position by providing the 2D planar coordinates at the end of the at method. In a multi-channel matrix, each coordinate contains multiple data points, so a variable is introduced to represent these multiple data values for the same element. In OpenCV, for three-channel matrices, six types are defined to represent the three channel data of the same element: cv::Vec3b, cv::Vec3s, cv::Vec3w, cv::Vec3d, cv::Vec3f, and cv::Vec3i. From these six data types, a naming convention can be inferred: the number indicates the number of channels, and the final letter is an abbreviation of the data type — b for uchar, s for short, w for ushort, d for double, f for float, and i for int. OpenCV also defines corresponding variable types for two-channel and four-channel matrices, following the same naming convention. For example, the uchar types for two and four channels are represented as cv::Vec2b and cv::Vec4b, respectively. Listing 2-20 provides the implementation code for reading a multi-channel matrix using the at method.

Code Listing 2-20 Reading multi-channel matrix elements of the Mat class using the at method

cv::Mat b(3, 4, CV_8UC3, cv::Scalar(0, 0, 1));
cv::Vec3b vc3 = b.at<cv::Vec3b>(0, 0);
int first = (int)vc3.val[0];
int second = (int)vc3.val[1];
int third = (int)vc3.val[2];

When using multi-channel variable types, it is also important to ensure that the data variable type in the at method corresponds to the data variable type of the matrix. Additionally, when inputting data for each channel with the cv::Vec3b type, the variable type must be explicitly cast to int. However, if the data read by the at method is directly assigned to a cv::Vec3i type variable, there is no need to perform a type cast when outputting data for each channel.

2. Reading elements from a Mat class matrix using the pointer ptr

Earlier, we analyzed how Mat class matrices are stored in memory. The elements in each row of the matrix are stored contiguously. If we find the starting address of each row's elements, we can read elements at different positions within that row by moving the pointer backward or forward by a certain number of bits from the starting position. Code Listing 2-21 provides an implementation for reading Mat class matrix elements using the pointer ptr. Code Listing 2-21 Reading Mat class matrix elements using pointer ptr

cv::Mat b(3, 4, CV_8UC3, cv::Scalar(0, 0, 1));
for (int i = 0; i < b.rows; i++)
{
    uchar* ptr = b.ptr<uchar>(i);
    for (int j = 0; j < b.cols * b.channels(); j++)
    {
        cout << (int)ptr[j] << endl;
    }
}

In the program, there is first a large loop that controls each row in the matrix. Then, a pointer of type uchar is defined. When defining it, the variable type of the Mat class matrix must be declared, and at the end of the definition, parentheses are used to specify which row of the Mat class matrix the pointer points to. The second loop controls the output of data from all channels in each row of the matrix. According to the storage format shown in Figure 2-5, the amount of data stored in each row is the product of the number of columns and the number of channels. That is, the pointer can move backward by cols × channels() positions, as shown in line 7 of the code, where the number of positions the pointer moves backward is given in brackets. The program provides a method for traversing every piece of data in a Mat class matrix using a loop. When we know which data needs to be accessed, we can directly access it by specifying the row number and the number of positions the pointer has moved. For example, when reading the third piece of data in the second row, it can be accessed directly via ptr[2].

3. Accessing elements in a Mat class matrix via iterators

The Mat class variable is also a container variable, so it has iterators for accessing the data within the Mat class variable. Using iterators, you can traverse every element in the matrix. The code implementation is provided in Listing 2-22. Code Listing 2-22 Reading Mat Class Matrix Elements via Iterator

cv::MatIterator_<uchar> it = a.begin<uchar>();
cv::MatIterator_<uchar> it_end = a.end<uchar>();
for (int i = 0; it != it_end; it++)
{
    cout << (int)(*it) << " ";
    if ((++i % a.cols) == 0)
    {
        cout << endl;
    }
}

The iterator variable type for the Mat class is cv::MatIterator_<>, and when defining it, you also need to declare the data type inside the angle brackets. The start of the Mat class iterator is Mat.begin<>(), and the end is Mat.end<>(), which works the same as other iterators. The pointer position moves forward using the ++ operator, and data is read by first reading every channel of the first element, then every channel of the second element, and so on, until the last channel of the last element.

4. Accessing elements via matrix element address positioning

The first three methods of reading elements all require knowing the data type stored in the Mat matrix. Additionally, from a conceptual standpoint, we prefer to read data within a specific channel by declaring "row X, column X, channel X." The method for reading data in this way is provided in Code Listing 2-23.

Code Listing 2-23 Accessing elements via address positioning of matrix elements

(int)(*(b.data + b.step[0] * row + b.step[1] * col + channel));

In the code, the variable row represents the row index of a data element, col represents the column index, and channel represents the channel index. This approach is similar to reading data through pointers, where the address pointer of the first data element is shifted by a certain number of positions to point to the desired data. However, this method allows direct reading by specifying the row, column, and channel numbers, eliminating the need for the user to calculate the position of a data element within the row's data storage space.

Reading and Displaying Images

This section will provide a detailed introduction to the functions related to image reading and display.

The image reading function `imread`

We have already introduced the calling method of the image reading function imread() earlier (see Code Listing 1-1). Here we provide the function prototype (see Code Listing 2-24). Code Listing 2-24 Prototype of the imread() function

cv::Mat cv::imread(const String &filename,
                   int flags = IMREAD_COLOR);

filename: The name of the image file to read, including the image path, name, and file extension.
flags: Flags for reading images, such as reading a color image as a grayscale image. The default parameter reads the image in color format. Optional parameters are listed in Table 2-3.

The function is used to read a specified image and return it as a Mat class variable. If the image file does not exist, is corrupted, or is in an unsupported format, the image cannot be read, and the function returns an empty matrix. Therefore, you can check whether the image was successfully read by verifying if the data attribute of the returned matrix is empty or if the empty() function returns true. If reading the image fails, the data attribute returns 0, and the empty() function returns 1.

The function can read image files in multiple formats. However, due to differences in codecs across operating systems, an image file that can be read on one system may not be readable on another. BMP and DIB files are always readable regardless of the system. On Windows and macOS, OpenCV uses its built-in codecs (libjpeg, libpng, libtiff, and libjasper) by default, so it can read JPEG (jpg, jpeg, jpe), PNG, and TIFF (tiff, tif) files. On Linux systems, these codecs need to be installed manually; once installed, the same file types can be read.

However, it should be noted that whether this function can read file data is unrelated to the file extension. Instead, it determines the image type based on the file's content. For example, if a file's extension is changed from .png to .exe, the function can still read the image. However, if the extension is changed from .exe to .png, the function will not be able to load the file. The first parameter of this function provides the address of the image to be read as a string, and the second parameter sets the mode for reading the image. The default mode reads the image in color. Depending on different requirements, this parameter can be changed. OpenCV 4.1 offers 13 modes for reading images, which can be summarized as reading in the original format, reading as a grayscale image, reading as a color image, reading with multiple bits, and reading while scaling the image to a certain size. The specific selectable parameters and their functions are shown in Table 2-3. It should be noted that converting a color image to grayscale through internal codec conversion may produce results that differ from converting a color image to grayscale within an OpenCV program. These flag parameters can be declared simultaneously as long as their functions do not conflict, with different parameters separated by "|"separated by.

Table 2-3 Parameters for reading image formats in the imread() function

Flag parameters	Quick note	effect
IMREAD_UNCHANGED	-1	Read the image as-is, preserving the alpha channel (4th channel).
IMREAD_GRAYSCALE	0	After converting the image to a single-channel grayscale image, read it.
IMREAD_COLOR	1	Convert the image to a 3-channel BGR color image.
IMREAD_ANYDEPTH	2	Preserve the original image's 16-bit and 32-bit depth; if this parameter is not specified, it will be converted to 8-bit for reading.
IMREAD_ANYCOLOR	4	Read the image in any possible color.
IMREAD_LOAD_GDAL	8	Load the image using the GDAL driver.
IMREAD_REDUCED_GRAYSCALE_2	16	Convert the image to a single-channel grayscale image and reduce its size by half. You can change the last digit to 4 to reduce it by 1/4, or to 8 to reduce it by 1/8.
IMREAD_REDUCED_COLOR_2	17	Convert the image to a 3-channel color image, and reduce its size by half. You can change the last digit to achieve a reduction to 1/4 (change the last digit to 4) or 1/8 (change the last digit to 8).
IMREAD_IGNORE_ORIENTATION	128	Do not rotate the image based on EXIF orientation.

Note: By default, the number of pixels in a read image must be less than 2³⁰. This requirement does not affect most image processing fields, but satellite remote sensing images and ultra-high-resolution images may exceed this threshold. You can adjust the maximum number of readable pixels by modifying the OPENCV_IO_MAX_IMAGE_PIXELS parameter in the system variables.

Image window function namedWindow

In our previous programs, we did not introduce window functions because when displaying an image without explicitly defining an image window, the program automatically generates a window for display. However, sometimes it is necessary to perform operations on the image window before displaying the image, such as adding a slider. In such cases, the image window needs to be created in advance. Code Listing 2-25 provides the prototype for the window creation function. Code Listing 2-25 Prototype of the namedWindow() function

void cv::namedWindow(const String &winname,
                     int flags = WINDOW_AUTOSIZE);

winname: Window name, used as the window identifier.
flags: Window property setting flags.

This function creates a window variable used to display images and trackbars. The window is referenced by its name. If a window with the same name already exists when this function is called, the function does nothing. Creating a window consumes some memory resources, so after creating a window with this function, it is necessary to close the window when it is no longer needed to free up memory resources. OpenCV provides two functions for closing windows: cv::destroyWindow() and cv::destroyAllWindows(). As their names suggest, the first function is used to close a window with a specific name — simply pass the window name as a string inside the parentheses to close that window. The second function closes all windows in the program and is typically used at the end of the program.

However, in a simple program, we don't actually need to call these functions, because all application resources and windows are automatically closed when the program exits. Although not actively releasing windows will still free window resources when the program ends, OpenCV version 4.0 will report an error about unreleased windows upon termination, while OpenCV version 4.1 will not.

The first parameter of this function declares the name of the window, used for unique window identification. The second parameter declares the window's properties, mainly used to set whether the window size is adjustable and whether the displayed image fills the window. The specific selectable parameters and their meanings are given in Table 2-4. By default, the function's loaded flag parameter is WINDOW_AUTOSIZE.|WINDOW_KEEPRATIO|WINDOW_GUI_EXPANDED。

Table 2-4 namedWindow() function window property flag parameters

Flag parameters	Quick note	effect
WINDOW_NORMAL	0x00000000	After displaying the image, allow the user to freely resize the window.
WINDOW_AUTOSIZE	0x00000001	Display the window according to the image size, and do not allow the user to resize it.
WINDOW_OPENGL	0x00001000	When creating a window, OpenGL will be supported.
WINDOW_FULLSCREEN	1	Full-screen display window
WINDOW_FREERATIO	0x00000100	Resize the image to fill the window.
WINDOW_KEEPRATIO	0x00000000	Maintain the image's aspect ratio.
WINDOW_GUI_EXPANDED	0x00000000	The created window allows adding a toolbar and a status bar.
WINDOW_GUI_NORMAL	0x00000010	Create a window without a status bar and toolbar.

Image display function imshow

We have already introduced how to call the image display function imshow(). Here we provide the function prototype (see Code Listing 2-26). Code Listing 2-26 Prototype of the imshow() function

void cv::imshow(const String &winname,
                InputArray mat);

winname: The name of the window to display the image, assigned as a string.
mat: The image matrix to be displayed.

This function displays an image in the specified window. If no image window with the same name has been created before this function is called, it creates a window with the WINDOW_AUTOSIZE flag, displaying the image at its original size. If an image window already exists, the image is scaled to fit the window's properties. The function scales the image based on its depth, following the specific scaling rules below:

If the image is of an 8-bit unsigned type, it will be displayed as is.
If the image is of 16-bit unsigned type or 32-bit integer type, the pixel values will be divided by 256, mapping the range from 0, 255×256 to 0, 255.
If the image is of 32-bit or 64-bit floating-point type, multiply the pixels by 255 to map the range from 0,1 to 0,255.

The first parameter of the function is the name of the image display window, and the second parameter is the Mat class matrix of the image to be displayed. A special note here: the second parameter is not the common Mat class, but InputArray, which is a type declaration reference defined by OpenCV, used as an identifier for input parameters. When we encounter it, we can treat it as requiring a Mat class data input. Similarly, OpenCV also defines the OutputArray type for output, which we can also treat as outputting Mat class data.

After this function runs, the program continues executing subsequent code. If the subsequent code finishes and exits directly, the displayed image may flash and disappear instantly. Therefore, in programs that need to display images, the imshow() function is often followed by the cv::waitKey() function, which pauses the program for a period of time. The waitKey() function specifies a waiting duration in milliseconds. If the parameter is left as default or set to 0, it means the function will wait until the user presses a key to end it.

Code to display an image:

#include "chapter2_2_show_image/inc/show_image.hpp"
#include <iostream>
#include <opencv2/opencv.hpp>


void opencv_function1(void)
{
    cv::Mat picture_demo_mat = cv::imread(std::string(MEDIA_PATH) + "林星阑L.jpg");
    cv::imshow("xiaoshen", picture_demo_mat);

    std::cout << "成功运行OpenCV!" << std::endl; 

    cv::waitKey(0);             // 这句确保窗口一直打开
}

Code to display an image (using NVIDIA GPU CUDA acceleration):

#include "chapter2_2_show_image/inc/show_image_CUDA.hpp"
#include <iostream>
#include <opencv2/opencv.hpp>

void opencv_function2(void)
{
    cv::Mat picture_demo_mat = cv::imread(std::string(MEDIA_PATH) + "林星阑H.jpg");
    cv::cuda::GpuMat gpuImage; 
    gpuImage.upload(picture_demo_mat); 
    cv::Mat result; 
    gpuImage.download(result);
    cv::namedWindow("林星阑",cv::WINDOW_NORMAL);
    cv::imshow("林星阑", result);
    std::cout << "CUDA成功运行!" << std::endl;
    cv::waitKey(0);             // 这句确保窗口一直打开

}

Video loading and camera access

Earlier, we introduced how to read image data through programs. This section will cover the VideoCapture class in OpenCV, which is designed for reading video files and accessing cameras.

Reading video data

Although a video file is composed of multiple images, the imread() function cannot directly read video files. A dedicated video reading function is required to read the video and save each frame into a Mat class matrix. Code Listing 2-27 shows how to construct the VideoCapture class when reading a video file. Code Listing 2-27 VideoCapture Class Constructor for Reading Video Files

cv::VideoCapture::VideoCapture();  // 默认构造函数

cv::VideoCapture::VideoCapture(const String& filename,
                               int apiPreference = CAP_ANY);

filename: The name of the video file or image sequence to read.
apiPreference: Properties set when reading data, such as encoding format, whether to call OpenNI, etc.

This function constructs a video stream capable of reading and processing video files. The first line in Code Listing 2-27 is the default constructor of the VideoCapture class, which simply declares a class capable of reading video data. The specific video file to be read needs to be specified at runtime using the open() function. For example, cap.open("1.avi") instructs the VideoCapture class variable cap to read the video file 1.avi.

The second constructor declares the variable while also assigning the video data to it. The types of files that can be read include video files (e.g., video.avi), image sequences, or the URL of a video stream. When reading an image sequence, the image names must follow the format "prefix + number," and are called using "prefix + %02d." For example, if a folder contains images img_00.jpg, img_01.jpg, and img_02.jpg, the file name is specified as img_%02d.jpg when loading. The video reading property flag in the function defaults to automatically searching for an appropriate flag, so in everyday use, it can be left as default, and only the video name needs to be provided. Like the imread() function, the constructor may also fail to read the file, so it is necessary to check using the isOpened() function. If the read is successful, the return value is true; if it fails, the return value is false. The constructor only loads the video file into a VideoCapture class variable. When we need to use the images from the video, we must export them from the VideoCapture variable into a Mat variable for subsequent data processing. This operation can be done using the >> operator, which assigns images from the VideoCapture variable to the Mat variable in the order they appear in the video. Once all images in the VideoCapture variable have been assigned to the Mat variable, any further assignment will result in the Mat variable becoming an empty matrix. Therefore, the empty() method can be used to check whether all images in the VideoCapture variable have been read.

The VideoCapture class also provides the get() function to view video properties by passing specific flags to retrieve attributes such as pixel dimensions, frame count, frame rate, and more. Commonly used flags and their meanings for the get() method in the VideoCapture class are listed in Table 2-5.

Table 2-5 Flag parameters in the get() method of the VideoCapture class

Flag parameters	Quick note	effect
CAP_PROP_POS_MSEC	0	The current position of the video file (in milliseconds)
CAP_PROP_FRAME_WIDTH	3	Width of the image in the video stream
CAP_PROP_FRAME_HEIGHT	4	Height of the image in the video stream
CAP_PROP_FPS	5	Frame rate of images in a video stream (frames per second)
CAP_PROP_FOURCC	6	4-character code for codec
CAP_PROP_FRAME_COUNT	7	Number of frames in a video stream
CAP_PROP_FORMAT	8	The format of the returned Mat object
CAP_PROP_BRIGHTNESS	10	Image brightness (only applicable to supported cameras)
CAP_PROP_CONTRAST	11	Image contrast (camera only)
CAP_PROP_SATURATION	12	Image saturation (camera only)
CAP_PROP_HUE	13	Image hue (camera only)
CAP_PROP_GAIN	14	Image gain (only applicable to supported cameras)

To become more familiar with the VideoCapture class, Code Listing 2-28 provides a program that reads a video, outputs its properties, and displays it at the original frame rate. The results are shown in Figure 2-6.

Code Listing 2-28 VideoCapture.cpp reads a video file.

#include "chapter2_3_video_capture/inc/read_video.hpp"
#include <cstdio>
#include <opencv2/opencv.hpp>


int opencv_function3(void)
{
    cv::VideoCapture video__(std::string(MEDIA_PATH) + "hei.mp4");
    if(video__.isOpened() == true)  //判断视频是否导入成功
    {
        printf("视频中图像的宽度=%lf",video__.get(cv::CAP_PROP_FRAME_WIDTH));
        printf("视频中图像的高度=%lf",video__.get(cv::CAP_PROP_FRAME_HEIGHT));
        printf("视频的帧率=%lf",video__.get(cv::CAP_PROP_FPS));
        printf("视频的总帧数=%lf",video__.get(cv::CAP_PROP_FRAME_COUNT));
    }
    else
    {
        printf("导入视频失败，请确认视频文件是否正确");
        return 1;
    }

    while(true)
    {
        cv::Mat frame__;
        video__ >> frame__;
        if(frame__.empty() == true) //检测图像是不是空的，如果是空的，说明视频最后一帧也已经导入完了
        {
        break;
        }
        cv::imshow("视频播放",frame__);
        cv::waitKey(1000 / video__.get(cv::CAP_PROP_FPS));   //FPS为1秒每帧（也就是帧的数量），用1000ms / 帧的数量 = 每帧所需时间
    }
    cv::waitKey(0);             // 这句确保窗口视频播放完后一直打开，不关闭
}

Figure 2-6 Video reading program execution result alt text

Direct camera call

The VideoCapture class can also call the camera, with the construction method shown in Code Listing 2-29.

Code Listing 2-29 VideoCapture class calls the camera constructor

cv::VideoCapture::VideoCapture(int index,
                               int apiPreference = CAP_ANY);

Comparing with Code Listing 2-27, the only difference between calling a camera and reading a video file is the first parameter. When calling a camera, the first parameter is the ID of the camera device to open, with IDs starting from 0. The method for reading image data from the camera is the same as reading from a video file — using the >> operator to capture the image captured by the camera at the current moment. Additionally, the attributes of the VideoCapture class used when reading video are also available. We changed the video file in Code Listing 2-28 to the camera ID (0) and ran the modified program from Code Listing 2-28 again. The result is shown in Figure 2-7.

Linux Query Camera ID:

ls /dev/video*

Then output

/dev/video0  /dev/video1  /dev/video2  /dev/video3

The 4 camera devices here are not necessarily all real cameras. On my end, device 0 is the laptop's built-in camera, while device 2 is an external USB camera I connected. Just try each one.

Figure 2-7 Result of running the camera program alt text

Data saving

During image processing, new images are generated—for example, sharpening a blurry image through algorithms or converting a color image to grayscale. The processed results also need to be saved as image or video files. This section will detail how to save a Mat class matrix as an image or video file.

Saving images

OpenCV provides the imwrite() function for saving a Mat class matrix as an image file. The prototype of this function is given in Code Listing 2-30.

Code Listing 2-30 imwrite() function prototype

bool cv::imwrite(const String& filename,
                 InputArray img,
                 const std::vector<int>& params = std::vector<int>());

filename: The address and filename for saving the image, including the image format.
img: The Mat class matrix variable to be saved.
params: Save image format attribute setting flag. This function is used to save a Mat matrix as an image file. It returns true if the image is saved successfully, and false otherwise. For supported image formats, refer to the image file formats that the imread() function can read. Typically, this function can only save 8-bit single-channel images and 3-channel BGR color images, but you can save images in different formats by changing the third parameter. The bit depths that different image formats can support are as follows:
A 16-bit unsigned (CV_16U) image can be saved as PNG, JPEG, or TIFF format files.
32-bit floating-point (CV_32F) images can be saved as PFM, TIFF, OpenEXR, and Radiance HDR format files.
A 4-channel (Alpha channel) image can be saved as a PNG format file. In general, the third parameter of this function does not need to be filled in. To save in a specified file format, simply change the file extension directly after the first parameter. However, when the data in the Mat matrix to be saved is special (e.g., 16-bit depth data), the third parameter needs to be set. The method for setting the third parameter is shown in Listing 2-31, and common optional flags are given in Table 2-6.

Code Listing 2-31 Setting the third parameter in the imwrite() function

vector<int> compression_params;
compression_params.push_back(IMWRITE_PNG_COMPRESSION);
compression_params.push_back(9);
imwrite(filename, img, compression_params);

Table 2-6 Selectable flags and their effects for the third parameter of the imwrite() function

Flag parameters	Quick note	effect
IMWRITE_JPEG_QUALITY	1	The image quality for files saved in JPEG format is graded from 0 to 100, with a default of 95.
IMWRITE_JPEG_PROGRESSIVE	2	Enable JPEG enhancement, set to 1; default value is 0 (False).
IMWRITE_JPEG_OPTIMIZE	3	Optimize JPEG format, enable set to 1, default parameter is 0 (False).
IMWRITE_JPEG_LUMA_QUALITY	5	JPEG format file's individual brightness quality level, ranging from 0 to 100, with a default of 0.
IMWRITE_JPEG_CHROMA_QUALITY	6	Separate chroma quality level for JPEG format files, ranging from 0 to 100, default is 0.
IMWRITE_PNG_COMPRESSION	16	Save as PNG format file compression level, 0–9. Higher values mean smaller file size and longer compression time. The default value is 1 (best speed setting).
IMWRITE_TIFF_COMPRESSION	259	Save as TIFF format file compression scheme

To better understand how to use the imwrite() function, Code Listing 2-32 provides a program that generates a matrix with an Alpha channel and saves it as a PNG image. After running the program, a 4-channel PNG image is saved. To visualize the result more intuitively, Figure 2-8 shows the image as seen in the Image Watch plugin alongside the saved PNG image.

Code Listing 2-32 imgWriter.cpp Saving an Image

#include "chapter2_4_save_media_file/inc/save_image.hpp"
#include <cstdio>
#include <opencv2/opencv.hpp>


void AlphaMat(cv::Mat *mat);

int opencv_function6(void)
{
    cv::Mat mat__(480,640,CV_8UC4);
    AlphaMat(&mat__);
    std::vector<int> compression_params;
    compression_params.push_back(cv::IMWRITE_PNG_COMPRESSION);   //PNG图像压缩标志
    compression_params.push_back(9);   //设置最高压缩质量
    if(cv::imwrite(std::string(SRCSRC_PATH) + "chapter2_4_save_media_file/save_files/save_image_test1.png",mat__,compression_params) == false)
    {
        printf("保存成PNG图像失败");
        return 1;
    }
    printf("保存成PNG图像成功");
    return 0;
}

void AlphaMat(cv::Mat *mat)
{
  CV_Assert(mat->channels() == 4);  //如果通道不等于4，那么抛出异常
  for(int i = 0;i < mat->rows;i++)   //行
  {
    for(int j = 0;j < mat->cols;j++)    //列
    {
      /*
      cv::Vec4b bgra = mat->at<cv::Vec4b>(i,j);
      bgra.val[0] = cv::saturate_cast<uint8_t>(255);    //B:255   //该函数防止溢出，里面的数只能是0-255
      bgra.val[1] = cv::saturate_cast<uint8_t>(0);      //G:0
      bgra.val[2] = cv::saturate_cast<uint8_t>(0);      //R:0
      bgra.val[3] = cv::saturate_cast<uint8_t>(180);    //α:180
      */
      cv::Vec4b &bgra = mat->at<cv::Vec4b>(i,j);
      bgra[0] = cv::saturate_cast<uint8_t>(255);     //B:255   //该函数防止溢出，里面的数只能是0-255
      bgra[1] = cv::saturate_cast<uint8_t>(0);      //G:0   //重载运算符[]
      bgra[2] = cv::saturate_cast<uint8_t>(0);      //R:0
      bgra[3] = cv::saturate_cast<uint8_t>(90);     //α:90
    }
  }
}

Figure 2-8 The 4-channel image in the program and after saving (left: Image Watch, right: PNG file) alt text

Saving the video

Sometimes we need to generate a video from multiple images, or directly save camera-captured data as a video file. OpenCV provides the VideoWriter class for saving multiple images into a video file. The prototype of this class's constructor is given in Code Listing 2-33.

Listing 2-33: VideoWriter Class Constructor for Saving Video Files

cv::VideoWriter::VideoWriter();  // 默认构造函数

cv::VideoWriter::VideoWriter(const String& filename,
                             int fourcc,
                             double fps,
                             Size frameSize,
                             bool isColor = true);

filename: The address and filename for saving the video, including the video format.
fourcc: A 4-character codec code for compressing frames, with detailed parameters provided in Table 2-7.
fps: The frame rate at which the video is saved, meaning the number of images per second in the video.
frameSize: The dimensions of the video frame.
isColor: Whether the saved video is a color video.

The first line of default constructor usage in Code Listing 2-33 is the same as VideoCapture(): both create a data stream for saving video. Later, the open() function is used to set parameters such as the save file name, codec, and frame count. The second constructor requires the first input parameter to be the name of the video file to save, and the second parameter is the codec identifier. The available codec options are listed in Table 2-7. If the value "-1" is assigned, it will automatically search for a suitable codec. Note that the input method differs slightly between OpenCV 4.0 and OpenCV 4.1, with the specific differences given in Table 2-7. The third parameter is the frame rate for saving the video, which can be set freely based on requirements—for example, to achieve double-speed playback or slow-motion playback of the original video. The fourth parameter sets the size of the saved video file. It is important to ensure that this size matches the image dimensions; otherwise, the video cannot be saved. The last parameter sets whether the saved video is in color. By default, the program saves the video as a color video.

This function is very similar to VideoCapture(). Both can use the isOpened() function to check whether a video stream was successfully created, and get() to view various properties of the video stream. When saving a video, you simply assign each generated frame to the video stream one by one using the << operator (or the write() function), and finally close the video stream with release().

Table 2-7 Video encoding formats

OpenCV 4.1 version logo	OpenCV 4.0 version logo	effect
`VideoWriter::fourcc('D', 'I', 'V', 'X')`	`CV_FOURCC('D', 'I', 'V', 'X')`	MPEG-4 encoding
`VideoWriter::fourcc('P', 'I', 'M', '1')`	`CV_FOURCC('P', 'I', 'M', '1')`	MPEG-1 encoding
`VideoWriter::fourcc('M', 'J', 'P', 'G')`	`CV_FOURCC('M', 'J', 'P', 'G')`	JPEG encoding (average performance)
`VideoWriter::fourcc('M', 'P', '4', '2')`	`CV_FOURCC('M', 'P', '4', '2')`	MPEG-4.2 encoding
`VideoWriter::fourcc('D', 'I', 'V', '3')`	`CV_FOURCC('D', 'I', 'V', '3')`	MPEG-4.3 encoding
`VideoWriter::fourcc('U', '2', '6', '3')`	`CV_FOURCC('U', '2', '6', '3')`	H.263 encoding
`VideoWriter::fourcc('I', '2', '6', '3')`	`CV_FOURCC('I', '2', '6', '3')`	H263I encoding
`VideoWriter::fourcc('F', 'L', 'V', '1')`	`CV_FOURCC('F', 'L', 'V', '1')`	FLV1 encoding

To better understand how to use the VideoWriter() class, Code Listing 2-34 provides an example of generating a new video file using existing video file data or directly from a camera feed. Readers should focus on understanding the similarities between the VideoWriter() and VideoCapture() classes, as well as important considerations when using them.

Code Listing 2-34 VideoWriter.cpp saves a video file

#include "chapter2_4_save_media_file/inc/save_video.hpp"
#include <cstdio>
#include <opencv2/opencv.hpp>


template <typename T>
bool Save_Video(T file_opened_name_or_index,const std::string & file_saved_name);

int opencv_function5(void)
{
    // 截取视频里的一段视频并保存到本地
    Save_Video<const std::string &>(std::string(MEDIA_PATH) + "hei.mp4",std::string(SRCSRC_PATH) + "chapter2_4_save_media_file/save_files/save_video_test1.mp4");

    // 截取摄像头里的一段视频并保存到本地
    // Save_Video<int>(2,std::string(SRCSRC_PATH) + "chapter2_4_save_media_file/save_files/save_video_test2.mp4");
    
    return 0;
}


template <typename T>
bool Save_Video(T file_opened_name_or_index,const std::string & file_saved_name)
{
    cv::Mat img;
    cv::VideoCapture video(file_opened_name_or_index);
    if(video.isOpened() == true)
    {
        printf("调用摄像头或打开视频成功!");
        printf("视频中图像的宽度=%lf",video.get(cv::CAP_PROP_FRAME_WIDTH));
        printf("视频中图像的高度=%lf",video.get(cv::CAP_PROP_FRAME_HEIGHT));
        printf("视频的帧率=%lf",video.get(cv::CAP_PROP_FPS));
        printf("视频的总帧数=%lf",video.get(cv::CAP_PROP_FRAME_COUNT));
    }
    else
    {
        printf("调用摄像头或者视频失败，请检查摄像头是否连接成功或者视频文件是否存在");
        return false;
    }
    video >> img;
    if(img.empty() == true)
    {
        printf("帧图像获取失败!");
        return false;
    }
    cv::VideoWriter writer;
    auto file_name = file_saved_name;
    int codec = writer.fourcc('M','J','P','G');
    double fps = 30.0;
    auto size = img.size();
    bool isColor = (img.type() == CV_8UC3);
    writer.open(file_name,codec,fps,size,isColor);

    if(writer.isOpened() == true)  //判断视频流是否创建成功!
    {
        printf("视频流创建成功!");
    }
    else
    {
        printf("视频流创建失败，请确认是否为合法输入!");
        return false;
    }

    while(true)
    {
        if(video.read(img) == false)   //判断是否还能够继续从摄像头或者视频中读出一帧图像
        {
            printf("摄像头断开连接或者视频读取完成!");
            break;
        }
        writer << img;  //writer.write(img);
        cv::namedWindow("Live");
        cv::imshow("Live",img);
        int8_t keyborad_value = cv::waitKey(1000 / video.get(cv::CAP_PROP_FPS));   //FPS为1秒每帧（也就是帧的数量），用1000ms / 帧的数量 = 每帧所需时间
        if(keyborad_value == 27)  //ESC键的ASCII码值为27，按下ESC键退出循环
        {
            break;
        }
    }
    video.release();
    writer.release();
    return true;
}

Save images from a video.

#include "chapter2_4_save_media_file/inc/save_image_in_video.hpp"
#include <cstdio>
#include <opencv2/opencv.hpp>


int opencv_function7(void)
{
    cv::VideoCapture video__(0);
    cv::Mat frame__;
    if(video__.isOpened() == true)  //判断视频是否导入成功
    {
        printf("视频中图像的宽度=%lf",video__.get(cv::CAP_PROP_FRAME_WIDTH));
        printf("视频中图像的高度=%lf",video__.get(cv::CAP_PROP_FRAME_HEIGHT));
        printf("视频的帧率=%lf",video__.get(cv::CAP_PROP_FPS));
        printf("视频的总帧数=%lf",video__.get(cv::CAP_PROP_FRAME_COUNT));
    }
    else
    {
        printf("导入摄像头视频失败，请确认摄像头是否正常打开");
        return 1;
    }
    printf("按下空格键截图!");
    printf("按下ESC结束播放!");
    while(true)
    {
        video__ >> frame__;
        cv::imshow("Live",frame__);
        int32_t key_boards_val = cv::waitKey(1000/video__.get(cv::CAP_PROP_FPS));
        if(key_boards_val==32)   //检测到按下了空格键
        {
            if(cv::imwrite(std::string(SRCSRC_PATH) + "chapter2_4_save_media_file/save_files/save_image_test2.png",frame__) == false)
            {
                printf("截图失败!");
            }
            else
            {
                printf("截图成功!");
            }
            if(frame__.empty() == true) //检测图像是不是空的，如果是空的，说明视频最后一帧也已经导入完了
            {
                printf("视频播放结束,请选择截图或是按ESC退出!");
            }
        }
        else if(key_boards_val==27)   //检测到按下了ESC按键
        {
            break;
        }
    }
    video__.release();
    printf("播放被终止或已结束!");

    return 0;
}

Save and read XML and YAML files

Besides image data, sometimes smaller-sized Mat matrices, strings, arrays, and other data in a program also need to be saved. This data is typically stored as XML or YAML files. This section introduces how to use functions in OpenCV 4 to save data as XML or YAML files, as well as how to read data from these two file formats. XML is a meta-markup language. By "meta-markup," it means that users can define their own tags based on their needs. For example, tags like <age> and <color> can be used to define the meaning of data, such as using <age>24</age> to indicate that the value of the age data is 24. XML is a structured language that allows you to understand the hierarchical relationships between data. For instance, <color><red>100</red><blue>150</blue></color> indicates that within the color data, there are two pieces of data named red and blue, with values of 100 and 150 respectively. By using tags, no matter how the data is stored, as long as the file conforms to the XML format, the data read out will be free from confusion and ambiguity. YAML is a data-centric language that represents each data value in the form of "variable: value" and uses different levels of indentation to indicate the structure and hierarchical relationships between data. YAML is highly readable and is often used to express data serialization formats. It draws inspiration from multiple languages, including XML, C, Python, and Perl. OpenCV 4 provides the FileStorage class for generating and reading XML and YAML files. This class defines methods for initializing the class, writing data, and reading data. When using the FileStorage class, it must first be initialized, which can be understood as declaring the file to be operated on and the type of operation. OpenCV 4 offers two initialization methods: one without any parameters (which can be thought of as merely declaring the object without initializing it), and another that takes a file name and operation type as input. The prototype of the constructor for the latter initialization method is given in Code Listing 2-35. Code Listing 2-35 FileStorage() function prototype

cv::FileStorage::FileStorage(const String &filename,
                             int flags,
                             const String &encoding = String());

filename: The name of the opened file.
flags: The type of operation flag applied to the file. Common parameters and their meanings are given in Table 2-8.
encoding: Encoding format. UTF-16 XML encoding is currently not supported; UTF-8 XML encoding must be used. Table 2-8 Common flags and their meanings for file operation types in the FileStorage() constructor

Flag parameters	Quick note	Meaning
READ	0	Read the data from the file.
WRITE	1	Writing data back to the file will overwrite the previous data.
APPEND	2	Continue writing data to the file, appending the new data after the existing data.
MEMORY	4	Write data to or read data from the internal buffer.

This function is the constructor of the FileStorage class, used to declare the name of the file to be opened and the type of operation. The first parameter is the name of the file to be opened, which is a string type, and the file extension is either ".xml" or ".yaml" (or ".yml"). The file to be opened may already exist or not, but when performing a read operation on the file, it must already exist. The second parameter is the operation type flag for the file, such as reading or writing. Common parameters and their meanings are given in Table 2-8. Since this flag belongs to the FileStorage class, the class name must be used as a prefix when using it, for example, FileStorage::WRITE. The last parameter is the file's encoding format. UTF-16 XML encoding is currently not supported; UTF-8 XML encoding must be used. In most cases, the default value of this parameter can be used.

After opening a file, you can use the isOpened() function in the FileStorage class to check whether the file was successfully opened. If the file opens successfully, the function returns true; if it fails to open, the function returns false.

Since the default constructor in the FileStorage class has no parameters, it does not declare the opened file or the type of operation. In this case, the open() function in the FileStorage class must be used to declare them separately. The prototype of this function is given in Code Listing 2-36.

Code Listing 2-36 open() function prototype

virtual bool cv::FileStorage::open(const String &filename,
                                    int flags,
                                    const String &encoding = String());

filename: The name of the opened file.
flags: The type of operation flag applied to the file. Common parameters and their meanings are given in Table 2-8.
encoding: Encoding format. UTF-16 XML encoding is currently not supported; UTF-8 XML encoding must be used. This function solves the problem that the default constructor does not declare opening a file. The function can specify the file opened by the FileStorage class. If the file is successfully opened, the return value is true; otherwise, it is false. All parameters and their meanings in this function are the same as those in Code Listing 2-35, so they will not be repeated here. Similarly, after opening a file using this function, you can still use the isOpened() function in the FileStorage class to check whether the file was successfully opened. After opening a file, similar to creating a data stream in C++, you can use the "<<" operator to write data into the file, or the ">>" operator to read data from the file. Additionally, you can write data to the file using the write() function in the FileStorage class, whose prototype is given in Code Listing 2-37. Code Listing 2-37 write() function prototype

void cv::FileStorage::write(const String &name,
                            int val);

name: The variable name written in the file.
val: variable value.

This function can write variable names and values of different data types into a file. The first parameter of the function is the variable name to be written into the file. The second parameter is the variable value. In Code Listing 2-37, the variable value is of type int, but the FileStorage class provides multiple overloaded functions of write(), each used to write double、String、Mat、vector<String>.

When using operators to write data to a file, it is similar to the write() function in that both require declaring variable names and variable values. For example, if the variable name is "age" and the variable value is "24", this can be achieved with file << "age" << 24. If a variable's data is an array, you can use [] to mark the values belonging to the same variable, e.g., file << "age" << "[" << 24 << 25 << "]". If some variables belong to another variable, you can use {} to indicate the hierarchical relationship between variables, e.g., file << "age" << "{" << "Xiaoming" << 24 << "Wanghua" << 25 << "}".

When reading data, you can use file["x"] >> xRead to read the value of a variable named x. However, when a variable contains multiple data points or sub-variables, you need to use the FileNode node type and the FileNodeIterator iterator to read it. For example, if a variable's value is an array, you first define a FileNode node variable like file["age"], and then iterate through the data using the iterator. Another method avoids using an iterator by appending [] (address) after the variable, such as FileNode[0] to access the first element in the array variable, or FileNode["Xiaoming"] to access the data of the "Xiaoming" variable within the "age" variable. By chaining [] (address) in this way, you can read data from multiple nodes.

To understand how to generate and read XML and YAML files, Code Listing 2-38 provides an example program that implements file writing and reading. This program uses both the write() function and the << operator to write data to files, and uses both iterators and [] (address) to read data from files. The methods for writing and reading data have been introduced earlier; in Code Listing 2-38, the key focus is on understanding how to implement writing and reading through code. The data in the XML and YAML files generated by this program is shown in Figure 2-9, and the results of reading the file data are shown in Figure 2-10.

Code Listing 2-38 myXMLandYAML.cpp: Saving and Reading XML and YAML Files

#include "chapter2_4_save_media_file/inc/save_XMLandYMAL.hpp"
#include <cstdio>
#include <opencv2/opencv.hpp>

using namespace std;
using namespace cv;

int opencv_function8(void)
{
system("color F0");  //修改运行程序背景和文字颜色
    // string fileName = std::string(SRCSRC_PATH) + "chapter2_4_save_media_file/save_files/datas.xml";  //文件的名称
    string fileName = std::string(SRCSRC_PATH) + "chapter2_4_save_media_file/save_files/datas.yaml"; //文件的名称
    //以写入的模式打开文件
    cv::FileStorage fwrite(fileName, cv::FileStorage::WRITE);
    
    //存入矩阵Mat类型的数据
    Mat mat = Mat::eye(3, 3, CV_8U);
    fwrite.write("mat", mat);  //使用write()函数写入数据
    //存入浮点型数据，节点名称为x
    float x = 100;
    fwrite << "x" << x;
    //存入字符串型数据，节点名称为str
    String str = "Learn OpenCV 4";
    fwrite << "str" << str;
    //存入数组,节点名称为number_array
    fwrite << "number_array" << "[" <<4<<5<<6<< "]";
    //存入多node节点数据,主名称为multi_nodes
    fwrite << "multi_nodes" << "{" << "month" << 8 << "day" << 28 << "year"
        << 2019 << "time" << "[" << 0 << 1 << 2 << 3 << "]" << "}";

    //关闭文件
    fwrite.release();

    //以读取的模式打开文件
    cv::FileStorage fread(fileName, cv::FileStorage::READ);
    //判断是否成功打开文件
    if (!fread.isOpened())
    {
        cout << "打开文件失败，请确认文件名称是否正确！" << endl;
        return -1;
    }

    //读取文件中的数据
    float xRead;
    fread["x"] >> xRead;  //读取浮点型数据
    cout << "x=" << xRead << endl;

    //读取字符串数据
    string strRead;
    fread["str"] >> strRead;
    cout << "str=" << strRead << endl;

    //读取含多个数据的number_array节点
    FileNode fileNode = fread["number_array"];
    cout << "number_array=[";
    //循环遍历每个数据
    for (FileNodeIterator i = fileNode.begin(); i != fileNode.end(); i++)
    {
        float a;
        *i >> a;
        cout << a<<" ";
    }
    cout << "]" << endl;

    //读取Mat类型数据
    Mat matRead;
    fread["mat="] >> matRead;
    cout << "mat=" << mat << endl;

    //读取含有多个子节点的节点数据，不使用FileNode和迭代器进行读取
    FileNode fileNode1 = fread["multi_nodes"];
    int month = (int)fileNode1["month"];
    int day = (int)fileNode1["day"];
    int year = (int)fileNode1["year"];
    cout << "multi_nodes:" << endl 
        << "  month=" << month << "  day=" << day << "  year=" << year;
    cout << "  time=[";
    for (int i = 0; i < 4; i++)
    {
        int a = (int)fileNode1["time"][i];
        cout << a << " ";
    }
    cout << "]" << endl;
    
    //关闭文件
    fread.release();
    return 0;
}

Figure 2-9 XML and YAML files generated by the myXMLandYAML.cpp program alt text

Figure 2-10 Reading results of the myXMLandYAML.cpp program file alt text

## Chapter Summary

In this chapter, we first introduced how to use the Mat class in OpenCV 4 for storing image data, then covered image reading and display, video loading and camera access, and finally discussed how to save images, video files, as well as how to save and read XML and YAML files.

Here is the list of main functions in this chapter.

Function name	Function Description	Code Listing
`imread()`	Read the image file	2-24
`namedWindow()`	Create a window to display an image.	2-25
`imshow()`	Display an image in a specified window	2-26
`VideoCapture()`	Call the camera or read/save video files	2-27
`imwrite()`	Save the image to a file.	2-30
`VideoWriter()`	Save multiple frames of images as a video file.	2-33
`FileStorage()`	Read or save XML, YAML files	2-35