internal/bytealg: optimize Count/CountString in arm64
For #63678
goos: darwin
goarch: arm64
pkg: strings
│ count_old.txt │ count_new.txt │
│ sec/op │ sec/op vs base │
CountHard1-8 368.7µ ± 11% 332.0µ ± 1% -9.95% (p=0.002 n=10)
CountHard2-8 348.8µ ± 5% 333.1µ ± 1% -4.51% (p=0.000 n=10)
CountHard3-8 402.7µ ± 25% 359.5µ ± 1% -10.75% (p=0.000 n=10)
CountTorture-8 10.536µ ± 23% 9.913µ ± 0% -5.91% (p=0.000 n=10)
CountTortureOverlapping-8 74.86µ ± 9% 67.56µ ± 1% -9.75% (p=0.000 n=10)
CountByte/10-8 6.905n ± 3% 6.690n ± 1% -3.11% (p=0.001 n=10)
CountByte/32-8 3.247n ± 13% 3.207n ± 2% -1.23% (p=0.030 n=10)
CountByte/4096-8 83.72n ± 1% 82.58n ± 1% -1.36% (p=0.007 n=10)
CountByte/
4194304-8 85.17µ ± 5% 84.02µ ± 8% ~ (p=0.075 n=10)
CountByte/
67108864-8 1.497m ± 8% 1.397m ± 2% -6.69% (p=0.000 n=10)
geomean 9.977µ 9.426µ -5.53%
│ count_old.txt │ count_new.txt │
│ B/s │ B/s vs base │
CountByte/10-8 1.349Gi ± 3% 1.392Gi ± 1% +3.20% (p=0.002 n=10)
CountByte/32-8 9.180Gi ± 11% 9.294Gi ± 2% +1.24% (p=0.029 n=10)
CountByte/4096-8 45.57Gi ± 1% 46.20Gi ± 1% +1.38% (p=0.007 n=10)
CountByte/
4194304-8 45.86Gi ± 5% 46.49Gi ± 7% ~ (p=0.075 n=10)
CountByte/
67108864-8 41.75Gi ± 8% 44.74Gi ± 2% +7.16% (p=0.000 n=10)
geomean 16.10Gi 16.55Gi +2.85%
Change-Id: Ifc2173ba3a926b0fa9598372d4404b8645929d45
Reviewed-on: https://go-review.googlesource.com/c/go/+/538116
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>