Unicode

#146 May 2026

146. char::MAX_LEN_UTF8 — Size UTF-8 Buffers Without Magic Numbers

Every time you’ve called char::encode_utf8, you’ve written [0u8; 4] from memory. Rust 1.93 stabilises char::MAX_LEN_UTF8 so you don’t have to keep that magic number in your head.

The magic number you keep typing

encode_utf8 writes the UTF-8 bytes of a char into a &mut [u8] and returns a &mut str pointing at the written portion. The slice has to be big enough — which means knowing that the worst-case UTF-8 encoding is 4 bytes:

1
2
3
let mut buf = [0u8; 4]; // why 4? because UTF-8, that's why
let s = '🦀'.encode_utf8(&mut buf);
assert_eq!(s, "🦀");

That 4 is correct but unexplained. Anyone reading your code has to either trust you or go re-derive the UTF-8 spec.

The named version

Rust 1.93 stabilises two constants on char:

1
2
assert_eq!(char::MAX_LEN_UTF8, 4);
assert_eq!(char::MAX_LEN_UTF16, 2);

MAX_LEN_UTF8 is the maximum number of u8s encode_utf8 can ever write. MAX_LEN_UTF16 is the same for encode_utf16 (a surrogate pair = 2 u16s). Drop them straight into your buffer declarations:

1
2
3
4
5
6
7
8
let mut buf = [0u8; char::MAX_LEN_UTF8];
let s = '🦀'.encode_utf8(&mut buf);
assert_eq!(s, "🦀");
assert_eq!(s.len(), 4);

let mut wide = [0u16; char::MAX_LEN_UTF16];
let w = '🦀'.encode_utf16(&mut wide);
assert_eq!(w.len(), 2);

Same behaviour, but the intent is self-documenting — the buffer is sized to hold exactly one char, by definition.

Sizing a buffer for N chars

Where this really pays off is when you’re computing a buffer for several chars on the stack:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
const N: usize = 8;
let mut buf = [0u8; N * char::MAX_LEN_UTF8];

let mut pos = 0;
for c in ['h', 'é', 'l', 'l', 'o'] {
    let s = c.encode_utf8(&mut buf[pos..]);
    pos += s.len();
}

assert_eq!(&buf[..pos], "héllo".as_bytes());

Now if Unicode ever expanded its scalar value range and MAX_LEN_UTF8 grew, your code would still be correct. With a hardcoded 4, you’d have a silent buffer overflow waiting to happen the day someone bumps the constant.

Why bother?

It’s a small change — one constant, no new behaviour. But it kills a real source of off-by-one bugs (people writing [0u8; 3] because they “only handle Latin-1”) and makes UTF-8 buffer code legible at a glance. Available since Rust 1.93 (January 2026).