This CL avoids allocating in utf16.Decode for code point sequences
with less than 64 elements. It does so by splitting the function in two,
one that can be inlined that preallocates a buffer and the other that
does the heavy-lifting.
The mid-stack inliner will allocate the buffer in the caller stack,
and in many cases this will be enough to avoid the allocation.
unicode/utf16 benchmarks:
name old time/op new time/op delta
DecodeValidASCII-12 60.1ns ± 3% 16.0ns ±20% -73.40% (p=0.000 n=8+10)
DecodeValidJapaneseChars-12 61.3ns ±10% 14.9ns ±39% -75.71% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
DecodeValidASCII-12 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
DecodeValidJapaneseChars-12 48.0B ± 0% 0.0B -100.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
DecodeValidASCII-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
DecodeValidJapaneseChars-12 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
I've also benchmarked os.File.ReadDir with this change applied
to demonstrate that it does make a difference in the caller site, in this
case via syscall.UTF16ToString:
name old time/op new time/op delta
ReadDir-12 592µs ± 8% 620µs ±16% ~ (p=0.280 n=10+10)
name old alloc/op new alloc/op delta
ReadDir-12 30.4kB ± 0% 22.4kB ± 0% -26.10% (p=0.000 n=8+10)
name old allocs/op new allocs/op delta
ReadDir-12 402 ± 0% 272 ± 0% -32.34% (p=0.000 n=10+10)
Change-Id: I65cf5caa3fd3b3a466c0ed837a50a96e975bbe6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/453415 Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>