The character type char in Rust is used to represent a single Unicode scalar value, occupying 4 bytes (32 bits). Unlike many other languages, Rust's char directly supports Unicode, meaning it can represent any valid Unicode character, including ASCII, Chinese, emojis, etc.
let c1: char = 'a';
let c2: char = '中';
let c3: char = '❤';
println!("{}, {}, {}", c1, c2, c3);
// Output: a, 中, ❤
You can also use Unicode escapes:
let emoji: char = '\u{1F600}';
Note: Rust uses single quotes ' to define characters and double quotes " to define strings. Confusing the two will result in a compilation error.
Basic Operations of the Character Type
Rust provides various built-in methods for the char type for character classification and conversion.
fn main() {
let c = 'A';
// Type judgment
println!("Is it alphabetic? {}", c.is_alphabetic());
println!("Is it numeric? {}", c.is_numeric());
println!("Is it alphanumeric? {}", c.is_alphanumeric());
println!("Is it a control character? {}", c.is_control());
println!("Is it whitespace? {}", c.is_whitespace());
println!("Is it lowercase? {}", c.is_lowercase());
println!("Is it uppercase? {}", c.is_uppercase());
}
Single-Byte Characters
In Rust, char defaults to occupying 4 bytes; if you need to use single-byte characters exactly like in C, you should use u8 or i8 combined with byte literals.
// C language:
// char c = 'A'; // 1 byte, may be signed or unsigned
// unsigned char uc = 'B'; // 1 byte, unsigned
// signed char sc = -1; // 1 byte, signed
// Rust:
let c: u8 = b'A'; // 1 byte, equivalent to unsigned char
let sc: i8 = -1; // 1 byte, equivalent to signed char
let raw: u8 = 65; // 1 byte, direct numerical value
println!("{}",c as char); // Output A