The character type char in Rust is used to represent a single Unicode scalar value, occupying 4 bytes (32 bits). Unlike many other languages, Rust's char directly supports Unicode, meaning it can represent any valid Unicode character, including ASCII, Chinese, emojis, etc.

let c1: char = 'a';
let c2: char = '中';
let c3: char = '❤';
println!("{}, {}, {}", c1, c2, c3);
// Output: a, 中, ❤

You can also use Unicode escapes:

let emoji: char = '\u{1F600}';

Note: Rust uses single quotes ' to define characters and double quotes " to define strings. Confusing the two will result in a compilation error.

Basic Operations of the Character Type

Rust provides various built-in methods for the char type for character classification and conversion.

fn main() {
    let c = 'A';
    
    // Type judgment
    println!("Is it alphabetic? {}", c.is_alphabetic());
    println!("Is it numeric? {}", c.is_numeric());
    println!("Is it alphanumeric? {}", c.is_alphanumeric());
    println!("Is it a control character? {}", c.is_control());
    println!("Is it whitespace? {}", c.is_whitespace());
    println!("Is it lowercase? {}", c.is_lowercase());
    println!("Is it uppercase? {}", c.is_uppercase());
    
}

Single-Byte Characters

In Rust, char defaults to occupying 4 bytes; if you need to use single-byte characters exactly like in C, you should use u8 or i8 combined with byte literals.

// C language:
// char c = 'A';           // 1 byte, may be signed or unsigned
// unsigned char uc = 'B'; // 1 byte, unsigned
// signed char sc = -1;    // 1 byte, signed

// Rust:
let c: u8 = b'A';      // 1 byte, equivalent to unsigned char
let sc: i8 = -1;       // 1 byte, equivalent to signed char
let raw: u8 = 65;      // 1 byte, direct numerical value

println!("{}",c as char); // Output A