cmd/compile: add prefetch intrinsic support on PPC64
This CL enables intrinsic support to emit the following prefetch
instructions for PPC64 platform that are already emitted on other
platforms
1. Prefetch - prefetches data from memory address to cache;
2. PrefetchStreamed - prefetches data from memory address, with a
hint that this data is being streamed.
Benchmarks picked from go/test/bench/garbage
Parameters tested with:
GOMAXPROCS=8
tree2 -heapsize=
1000000000 -cpus=8
tree -n=18
parser
peano
Performance results with this change on POWER9
name old time/op new time/op delta
Tree2-8 75.3ms ± 2% 65.0ms ± 6% -13.61% (p=0.003 n=5+7)
Tree-8 576ms ± 2% 576ms ± 1% ~ (p=0.756 n=11+10)
Parser-8 3.60s ± 2% 3.59s ± 1% ~ (p=0.818 n=6+6)
Peano-8 84.8ms ± 1% 84.6ms ± 1% ~ (p=0.180 n=6+6)
Results on POWER8 and POWER10 are similar
Change-Id: If4ac95a85aaa7b2266014e1f8fb7cd7440cbf906
Reviewed-on: https://go-review.googlesource.com/c/go/+/353730
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Michael Knyszek <mknyszek@google.com>