With the Garbage collector (GC), we observed a false-sharing between
work.full and work.empty. (referenced most from runtime.gcDrain and
runtime.getempty)
This false-sharing becomes worse and impact performance on multi core
system. On Intel Xeon 8480+ and default GC setting(GC=100), we can
observed top HITM>4% (by perf c2c)caused by it.
After resolveed this false-sharing issue, we can get performance 8%~9.7%
improved. Verify workloads:
DeathStarBench/hotelReservation: 9.7% of RPS improved
https://github.com/delimitrou/DeathStarBench/tree/master/hotelReservation
gRPC-go/benchmark: 8% of RPS improved
https://github.com/grpc/grpc-go/tree/master/benchmark
gRPC-go/benchmark 9 iterations' data with master branch:
master w/ fs opt.
208862.4 246390.9
221680.0 266019.3
223886.9 248789.7
212169.3 257837.8
219922.4 234331.8
197401.7 261627.7
214562.4 255429.7
214328.5 237087.8
229443.2 230591.3
max 229443.2 266019.3 116%
med 214562.4 248789.7 116%
avg 215806.3 248678.5 115%
Change-Id: Ib386de021cd2dbb802a107f487556d848ba9212d
Reviewed-on: https://go-review.googlesource.com/c/go/+/496915
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
type workType struct {
full lfstack // lock-free list of full blocks workbuf
+ _ cpu.CacheLinePad // prevents false-sharing between full and empty
empty lfstack // lock-free list of empty blocks workbuf
- pad0 cpu.CacheLinePad // prevents false-sharing between full/empty and nproc/nwait
+ _ cpu.CacheLinePad // prevents false-sharing between empty and nproc/nwait
wbufSpans struct {
lock mutex