224. String::from_utf8_lossy — Returns a Cow, So Valid Bytes Cost Zero
from_utf8_lossy doesn’t always allocate. It hands back a Cow<str> that borrows your bytes when they’re already valid UTF-8 — you only pay for a String when there’s an invalid byte to replace.
The assumption that costs allocations
It’s easy to read this and assume every call builds a fresh String:
| |
It doesn’t. The return type is Cow<'_, str> — clone-on-write. If bytes is valid UTF-8 (the common case for most files, headers, and protocol fields), you get back a Cow::Borrowed that points straight at your slice. No copy, no heap allocation.
| |
You only pay on the rare path
The allocation happens only when there’s an invalid byte to swap for the replacement character U+FFFD. Then — and only then — it builds an owned String:
| |
So the cost scales with how messy your input is, not with how often you call it.
Don’t undo it with a reflexive .to_string()
The anti-pattern is forcing an allocation right back on:
| |
Keep the Cow for as long as you’re only reading. If a caller genuinely needs ownership, into_owned() allocates on the borrowed path but reuses the buffer on the owned path — no double allocation:
| |
When you’re decoding bytes you’ll mostly just inspect, let from_utf8_lossy stay a Cow. Valid input — the usual case — flows through without touching the heap.