第 2 節

Data Types and Data Storage

0瀏覽次數0訪問次數--跳出率--平均停留

Data Type

integer

Function: The integer variable represents ==integer type== data.

The types that can represent integers in C++ include the following, differing in the amount of memory space they occupy:

Data Type	Space Usage	value range
short (short integer type)	2 bytes	(-2^15 ~ 2^15-1)
integer (int)	4 bytes	(-2^31 ~ 2^31-1)
long (long integer)	Linux 64-bit systems typically allocate 8 bytes, whereas 32-bit environments usually allocate 4 bytes.	Determined by the platform bit width
long long (long long integer)	8 bytes	(-2^63 ~ 2^63-1)

sizeof keyword

Purpose: The sizeof keyword ==determines the memory size occupied by a data type==.

Syntax: sizeof( 数据类型 / 变量)

Example:

int main() {
    // 程序从 main 函数开始执行，下面的语句会按顺序运行。

    cout << "short 类型所占内存空间为： " << sizeof(short) << endl;

    cout << "int 类型所占内存空间为： " << sizeof(int) << endl;

    cout << "long 类型所占内存空间为： " << sizeof(long) << endl;

    cout << "long long 类型所占内存空间为： " << sizeof(long long) << endl;


    // 返回 0 表示程序正常结束。
    return 0;
}

Running/Observation Results: After running, the corresponding content will be printed according to the output statements. The variable values can be inferred based on the order of initialization, assignment, and function calls.

Integer Type Conclusion: ==short < int <= long <= long long==

Real type (floating-point)

Function: Used to ==represent decimals==

浮点型变量主要分为两种：单精度浮点型（float）和双精度浮点型（double）。

single-precision float
Double precision

The difference between the two lies in their different ranges of effective significant digits.

Data Type	Space Usage	Range of significant figures
float	4 bytes	7 significant figures
double	8 bytes	15-16 significant figures

Example:

int main() {

    float f1 = 3.14f;
    double d1 = 3.14;

    cout << f1 << endl;
    cout << d1<< endl;

    cout << "float  sizeof = " << sizeof(f1) << endl;
    cout << "double sizeof = " << sizeof(d1) << endl;

    //科学计数法
    float f2 = 3e2; // 3 * 10 ^ 2 
    cout << "f2 = " << f2 << endl;

    float f3 = 3e-2;  // 3 * 0.1 ^ 2
    cout << "f3 = " << f3 << endl;


    return 0;
}

character type

Function: Character-type variables are used to display individual characters.

Syntax: char ch = 'a';

Note 1: When displaying character variables, enclose the character with single quotes, not double quotes.

Note 2: Only one character can be inside single quotes, not a string.

In C and C++, character variables occupy exactly 1 byte.
A character variable does not store the character itself directly in memory; instead, it stores the corresponding ASCII encoding in a memory unit.

Example:

int main() {
    
    char ch = 'a';
    cout << ch << endl;
    cout << sizeof(char) << endl;

    //ch = "abcde"; //错误，不可以用双引号
    //ch = 'abcde'; //错误，单引号内只能引用一个字符

    cout << (int)ch << endl;  //查看字符a对应的ASCII码
    ch = 97; //可以直接用ASCII给字符型变量赋值
    cout << ch << endl;


    return 0;
}

ASCII code table:

ASCII value	Control characters	ASCII value	Character	ASCII value	Character	ASCII value	Character
0	NUT	32	(space)	64	@	96	、
1	SOH	33	!	65	A	97	a
2	STX	34	"	66	B	98	b
3	ETX	35	#	67	C	99	c
4	EOT	36	$	68	D	100	d
5	ENQ	37	%	69	E	101	e
6	ACK	38	&	70	F	102	f
7	BEL	39	,	71	G	103	g
8	BS	40	(	72	H	104	h
9	HT	41	)	73	I	105	i
10	LF	42	*	74	J	106	j
11	VT	43	+	75	K	107	k
12	FF	44	,	76	L	108	l
13	CR	45	-	77	M	109	m
14	SO	46	.	78	N	110	n
15	SI	47	/	79	O	111	o
16	DLE	48	0	80	P	112	p
17	DCI	49	1	81	Q	113	q
18	DC2	50	2	82	R	114	r
19	DC3	51	3	83	S	115	s
20	DC4	52	4	84	T	116	t
21	NAK	53	5	85	U	117	u
22	SYN	54	6	86	V	118	v
23	TB	55	7	87	W	119	w
24	CAN	56	8	88	X	120	x
25	EM	57	9	89	Y	121	y
26	SUB	58	:	90	Z	122	z
27	ESC	59	;	91	[	123	{
28	FS	60	<	92	/	124	\|
29	GS	61	=	93	]	125	}
30	RS	62	>	94	^	126	`
31	US	63	?	95	_	127	DEL

ASCII codes are roughly composed of the following two main parts:

ASCII non-printing control characters: The numbers 0-31 on the ASCII table are assigned to control characters, which are used to control peripheral devices such as printers.
ASCII printable characters: numbers 32-126 are assigned to characters found on a keyboard, which appear when viewing or printing documents.

escape character

Purpose: Used to represent ==ASCII characters that cannot be displayed==

Currently, the escape characters we commonly use are: \n \\ \t

Escape character	Definition	ASCII code value (decimal)
\a	警报

有什么具体问题需要我帮忙处理吗？|007| |\b|Backspace (BS), moves the cursor to the previous column|008| |\f|Page Break (FF), move the current position to the beginning of the next page.|012| |\n|Line Feed (LF), moves the current position to the beginning of the next line|010| |\r|Carriage Return (CR), which moves the cursor to the beginning of the current line.|013| |\t|Horizontal Tab (HT) (jump to the next tab position)|009| |\v|Vertical Tab (VT)|011| |\\|Represents a backslash character ""|092| |'|represent a single quotation mark (apostrophe) character|039| |"|Represents a double quote character.|034| |?|Represents a question mark|063| |\0|Number 0|000| |\ddd|Octal escape characters, with d ranging from 0 to 7.|3-digit octal| |\xhh|Hexadecimal escape characters, h range 0~~9, a~~f, A~F|3-digit hexadecimal|

Example:

int main() {
    // 程序从 main 函数开始执行，下面的语句会按顺序运行。
    
    
    cout << "\\" << endl;
    cout << "\tHello" << endl;
    cout << "\n" << endl;


    // 返回 0 表示程序正常结束。
    return 0;
}

String type

Purpose: Used to represent a string of characters

Two Styles

C-style string: char 变量名[] = "字符串值"

Example:

int main() {

    char str1[] = "hello world";
    cout << str1 << endl;
    

    return 0;
}

Note: C-style strings should be enclosed in double quotes.

C++ style strings: string 变量名 = "字符串值"

Example:

int main() {

    string str = "hello world";
    cout << str << endl;
    

    return 0;
}

Note: C++ style strings require the header ==#include<string>==.

Boolean type bool

Function: The Boolean data type represents true or false values.

A bool type has only two values:

true --- true (essentially 1)
false --- false (essentially 0)

bool type occupies ==exactly 1 byte== in size

Example:

int main() {

    bool flag = true;
    cout << flag << endl; // 1

    flag = false;
    cout << flag << endl; // 0

    cout << "size of bool = " << sizeof(bool) << endl; //1
    

    return 0;
}

type alias typedef

C/C++ provides the typedef keyword, which you can use to assign a new name to a type. The following example defines the term BYTE for a single-byte number:

typedef unsigned char byte;
typedef unsigned char uint8_t;
typedef float fp32;
typedef double fp64;

Run/Observation Results: This section is syntax definition-focused, typically requiring compilation together with the calling code. Pay attention to the definition method and usage location.

Data Units, Binary, and Complement Code

Data units and their conversion
1. 1 byte = 8 bits = 8 binary digits
2. Data exists in computer memory in the form of binary.
3. Number system conversion
  1. In C, data starting with 0b represents binary, and data starting with 0x represents hexadecimal. Since binary notation is lengthy, we typically convert binary values to hexadecimal for representation in code.
4. The original code of the data
  1. Unsigned data types: All bits in unsigned data represent the magnitude of the value. Assuming it occupies 8 binary bits, the maximum value is 255 (corresponding binary: 1111 1111), and the minimum value is 0 (corresponding binary: 0000 0000).
  2. Signed data types: In signed data, the highest bit in binary serves as the sign bit — a 0 indicates a positive number, and a 1 indicates a negative number. Assuming an 8-bit binary representation, after removing 1 bit for the sign, 7 bits remain to represent the magnitude of the number. The maximum value is 127 (binary 0111 1111), and the minimum is -128 (binary 1000 0000; -127 is 1111 1111). This -128 actually uses the binary representation of -0, because we already have 0 (0000 0000), so -0 (theoretically 1000 0000) has no practical meaning. Therefore, we define the binary representation of -0 as -128.
5. Data Naming:
  1. Since data type names like unsigned char and int don't clearly indicate whether the data is signed or unsigned, or how many bits it occupies, we use typedef to give them new aliases (such as the new aliases uint8_t, int32_t).
    1. The u in uint8_t stands for unsigned, meaning no sign, and 8 indicates it occupies 8 bits of binary, which is 1 byte.
    2. int32_t does not have a u prefix, so it is signed, occupies 32 bits of binary, meaning it takes up 4 bytes, and is therefore an alias for the int type.
    3. Because a char occupies 1 byte, 8 bits, 8 binary digits, and is signed, we call it int8_t.
    4. Because unsigned short int is unsigned, occupies 2 bytes, 16 bits, and is a 16-bit binary value, it is called uint16_t. The same logic applies to others.
    5. float occupies 4 bytes, or 32 bits in binary, which is why it is called fp32.
    6. double occupies 8 bytes, or 64 bits in binary, which is why it is called fp64.
  2. Specific code (C++ includes some aliases by default; you can omit some aliases, but it's recommended to write them):
    1. typedef unsigned char uint8_t;
    2. typedef short int int16_t;
    3. typedef unsigned int uint32_t;
    4. typedef float fp32;
Data Parsing
1. Application Scenario Introduction: Sensors typically continuously send data to our microcontroller or industrial PC. For example, they might transmit the distance traveled by our robot (assuming the unit is millimeters and the value is a short integer).
2. Data processing goal: for example, obtaining distance (short integer, 2 bytes, 16-bit binary).
3. Sensor data transmission: In typical communication, sensors send data byte by byte continuously to microcontrollers and industrial PCs. Generally, a physical quantity that occupies n bytes is split into n bytes, meaning n variables, for transmission. Since data is stored in binary form, the highest 8 bits are usually sent first.
4. Data received by the microcontroller and industrial computer: For example, to obtain the robot's traveled distance, this data occupies 2 bytes, so the sensor must split it into two variables for transmission. The variable received first is called DH (the high 8 bits of the data in binary), and the variable received later is called DL (the low 8 bits of the data in binary).
5. Data processing approach: We currently have two ordinary 8-bit binary variables (of type uint8_t), and we want to obtain signed 16-bit binary data representing the robot's traveled distance (of type int16_t).
  1. Bit manipulation: For example, if DH is 0x9D (binary 1001 1101) and DL is 0x57 (binary 0101 0111), then the 16-bit data we want is formed by treating DH as the upper 8 bits and DL as the lower 8 bits — that is, the desired DATA (binary 1001 1101 0101 0111). To achieve this, we need bit manipulation: shift DH left by 8 bits to get 1001 1101 0000 0000, then perform a bitwise OR between the shifted DH and DL — that is, 1001 1101 0000 0000 OR 0000 0000 0101 0111 — which finally yields DATA as 1001 1101 0101 0111.
Type Casting: We have already obtained the binary representation of data above. Now we need to cast it into a signed number, which means converting the highest bit (1) into a sign bit. The specific code is int16_t DATA = (int16_t)((uint16_t)DH << 8).|DL ), and that's how it's successfully processed.
Binary representation of data in memory
1. Prerequisite Knowledge for Computer Data Storage
  1. When computers store data at the hardware level, they use binary numbers. However, when storing a number, the computer does not directly store the binary representation of that number. Instead, it stores the two's complement of the number's binary form.
  2. Machine code: The storage form of a number in a computer is a binary number. We refer to these binary numbers as machine numbers. Machine numbers are signed; in a computer, the sign bit is stored to the left of the second-highest bit of the machine number, where 0 represents a positive number and 1 represents a negative number.
  3. True value: Because machine numbers include a sign bit, their formal value is not equal to the actual value they represent (true value). Taking the machine number 1000 0001 as an example, the actual value it represents (with the first bit as the sign bit) is -1, while its formal value (where the first bit represents 1) is 129. Therefore, the actual value represented by a signed machine number is called its true value.
2. Original Code, One's Complement, Two's Complement:
  1. The representation of the true form is the same as the representation of the true value of a machine number, i.e., the first bit indicates the sign, and the remaining bits represent the magnitude. That is,

Positive numbers: they are their corresponding binary numbers. Negative numbers: change the leftmost bit of the binary representation of the absolute value to 1.

【+1】= 原：[ 0000 0001 ]
【-1】= 原：[ 1000 0001 ]
```
4.  **One's complement:**

Positive numbers: Same as the original code.  
Negative numbers: Based on their original code, the sign bit remains unchanged, and all other bits are inverted.

```Java
【+1】= 原： [ 0000 0001 ] = 反：[ 0000 0001 ]
【-1】= 原： [ 1000 0001 ] = 反：[ 1111 1110 ]
```
7.  **Two's complement**:

Positive numbers: Their two's complement is the same as their original code.  
Negative numbers: The two's complement is obtained by keeping the sign bit unchanged, inverting all other bits, and then adding 1 (i.e., adding 1 to the one's complement).

```Java
【+1】= 原： [ 0000 0001 ] = 反：[ 0000 0001 ] = 补：[ 0000 0001 ]
【-1】= 原： [ 1000 0001 ] = 反：[ 1111 1110 ] = 补：[ 1111 1111 ]
```

3.  Data storage format in a computer.

1.  Computers actually only store two's complement, so the process of converting from sign-magnitude to two's complement can also be understood as the process of storing data into computer memory:

![](https://cdn.tungchiahui.cn/tungwebsite/assets/images/2023/10/05/image4.webp)

3.  Positive numbers: In original, one's complement, and two's complement codes, the representation of positive numbers is exactly the same.

4.  Negative numbers: The representation of negative numbers differs, so for the two's complement of a negative number, we cannot directly convert it to a decimal value using base conversion, because that would not yield the actual decimal number stored in the computer. Instead, we should first convert it to its original code (sign-magnitude representation), and then convert that original code to a decimal number (machine numbers include the sign bit).

5.  Why were sign-magnitude, ones' complement, and two's complement created?

    1.  Reason: For addition and subtraction operations involving only positive numbers, using the original code poses no issues. However, when both positive and negative numbers are present, the computer cannot determine whether the most significant bit is a sign bit.

4.  Summary:

1.  **The *****most significant bit***** of binary is the sign bit: 0 indicates a positive number, 1 indicates a negative number.**

2.  The original code, inverse code, and two's complement of a positive number are all the same — the three codes are unified.

3.  Negative number

    1.  The one's complement of a negative number = keep its sign bit unchanged, and invert all other bits.

    2.  The two's complement of a negative number = its one's complement + 1. The one's complement of a negative number = the two's complement of the negative number - 1.

4.  The one's complement and two's complement of 0 are both 0.

5.  **When computers perform calculations, they always do so using "two's complement" representation.**

6.  The result of the operation is displayed in **original code** form.