#224 Jun 26, 2026

224. String::from_utf8_lossy — Returns a Cow, So Valid Bytes Cost Zero

from_utf8_lossy doesn’t always allocate. It hands back a Cow<str> that borrows your bytes when they’re already valid UTF-8 — you only pay for a String when there’s an invalid byte to replace.

The assumption that costs allocations

It’s easy to read this and assume every call builds a fresh String:

1
let text = String::from_utf8_lossy(bytes);

It doesn’t. The return type is Cow<'_, str> — clone-on-write. If bytes is valid UTF-8 (the common case for most files, headers, and protocol fields), you get back a Cow::Borrowed that points straight at your slice. No copy, no heap allocation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
use std::borrow::Cow;

fn read_name(bytes: &[u8]) -> Cow<'_, str> {
    String::from_utf8_lossy(bytes)
}

let valid = b"hello world";
let s = read_name(valid);
assert!(matches!(s, Cow::Borrowed(_))); // borrowed — nothing allocated
assert_eq!(s, "hello world");

You only pay on the rare path

The allocation happens only when there’s an invalid byte to swap for the replacement character U+FFFD. Then — and only then — it builds an owned String:

1
2
3
4
let invalid = &[b'c', b'a', b'f', 0xFF];
let s2 = read_name(invalid);
assert!(matches!(s2, Cow::Owned(_))); // owned — had to fix a bad byte
assert_eq!(s2, "caf\u{fffd}");

So the cost scales with how messy your input is, not with how often you call it.

Don’t undo it with a reflexive .to_string()

The anti-pattern is forcing an allocation right back on:

1
let owned = String::from_utf8_lossy(bytes).to_string(); // ⚠️ always allocates

Keep the Cow for as long as you’re only reading. If a caller genuinely needs ownership, into_owned() allocates on the borrowed path but reuses the buffer on the owned path — no double allocation:

1
2
let owned: String = read_name(b"abc").into_owned();
assert_eq!(owned, "abc");

When you’re decoding bytes you’ll mostly just inspect, let from_utf8_lossy stay a Cow. Valid input — the usual case — flows through without touching the heap.

This post is licensed under CC BY 4.0 by the author.