Cypherpunks.ru repositories - gostls13.git/commit

runtime: implement GC pacer redesign

This change implements the GC pacer redesign outlined in #44167 and the
accompanying design document, behind a GOEXPERIMENT flag that is on by
default.

In addition to adding the new pacer, this CL also includes code to track
and account for stack and globals scan work in the pacer and in the
assist credit system.

The new pacer also deviates slightly from the document in that it
increases the bound on the minimum trigger ratio from 0.6 (scaled by
GOGC) to 0.7. The logic behind this change is that the new pacer much
more consistently hits the goal (good!) leading to slightly less
frequent GC cycles, but _longer_ ones (in this case, bad!). It turns out
that the cost of having the GC on hurts throughput significantly (per
byte of memory used), though tail latencies can improve by up to 10%! To
be conservative, this change moves the value to 0.7 where there is a
small improvement to both throughput and latency, given the memory use.

Because the new pacer accounts for the two most significant sources of
scan work after heap objects, it is now also safer to reduce the minimum
heap size without leading to very poor amortization. This change thus
decreases the minimum heap size to 512 KiB, which corresponds to the
fact that the runtime has around 200 KiB of scannable globals always
there, up-front, providing a baseline.

Benchmark results: https://perf.golang.org/search?q=upload:20211001.6

tile38's KNearest benchmark shows a memory increase, but throughput (and
latency) per byte of memory used is better.

gopher-lua showed an increase in both CPU time and memory usage, but
subsequent attempts to reproduce this behavior are inconsistent.
Sometimes the overall performance is better, sometimes it's worse. This
suggests that the benchmark is fairly noisy in a way not captured by the
benchmarking framework itself.

biogo-igor is the only benchmark to show a significant performance loss.
This benchmark exhibits a very high GC rate, with relatively little work
to do in each cycle. The idle mark workers are quite active. In the new
pacer, mark phases are longer, mark assists are fewer, and some of that
time in mark assists has shifted to idle workers. Linux perf indicates
that the difference in CPU time can be mostly attributed to write-barrier
slow path related calls, which in turn indicates that the write barrier
being on for longer is the primary culprit. This also explains the memory
increase, as a longer mark phase leads to more memory allocated black,
surviving an extra cycle and contributing to the heap goal.

For #44167.

Change-Id: I8ac7cfef7d593e4a642c9b2be43fb3591a8ec9c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/309869
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>

author	Michael Anthony Knyszek <mknyszek@google.com>
	Tue, 13 Apr 2021 03:07:27 +0000 (03:07 +0000)
committer	Michael Knyszek <mknyszek@google.com>
	Thu, 4 Nov 2021 20:00:31 +0000 (20:00 +0000)
commit	a108b280bc724779ebaa6656d35f0fb307fb2a9b
tree	a3966a35f17d9a768c17ad140446d18d5d916870	tree
parent	988efd58197205060ace508d29984fbab6eb3840	commit \| diff

src/internal/goexperiment/exp_pacerredesign_off.go	[new file with mode: 0644]	blob
src/internal/goexperiment/exp_pacerredesign_on.go	[new file with mode: 0644]	blob
src/internal/goexperiment/flags.go		diff \| blob \| history
src/runtime/export_test.go		diff \| blob \| history
src/runtime/mgc.go		diff \| blob \| history
src/runtime/mgcmark.go		diff \| blob \| history
src/runtime/mgcpacer.go		diff \| blob \| history
src/runtime/mgcpacer_test.go		diff \| blob \| history
src/runtime/mgcwork.go		diff \| blob \| history