math/rand/v2: add, optimize N, UintN, Uint32N, Uint64N
Now that we can break the value stream, we can take advantage
of better algorithms that have been suggested since the original
code was written.
Also optimizes IntN, Int32N, Int64N, Perm (indirectly).
All the N variants (IntN, Int32N, Int64N, UintN, N, etc) now
return the same values given a Source and parameter n, so that
for example uint(r.IntN(10)) and r.UintN(10) and r.N(uint(10))
are completely interchangeable.
Int64N4e18 gets slower but that is a near worst case for
the algorithm and is extremely unlikely in practice.
32-bit Int32N variants got slower too, by 15-30%, in exchange
for speeding up everything on 64-bit systems and consistency
across the N functions.
Also rename previously missed benchmark
GlobalInt63Parallel to GlobalInt64Parallel.
goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│
11ad9fdddc.amd64 │
4d84a369d1.amd64 │
│ sec/op │ sec/op vs base │
SourceUint64-32 1.335n ± 1% 1.348n ± 2% ~ (p=0.335 n=20)
GlobalInt64-32 2.046n ± 1% 2.082n ± 2% ~ (p=0.310 n=20)
GlobalInt63Parallel-32 0.1037n ± 1%
GlobalInt64Parallel-32 0.1036n ± 1%
GlobalUint64-32 2.075n ± 0% 2.077n ± 2% ~ (p=0.228 n=20)
GlobalUint64Parallel-32 0.1013n ± 1% 0.1012n ± 1% ~ (p=0.878 n=20)
Int64-32 1.726n ± 2% 1.750n ± 0% +1.39% (p=0.000 n=20)
Uint64-32 1.673n ± 1% 1.707n ± 2% +2.03% (p=0.002 n=20)
GlobalIntN1000-32 3.895n ± 2% 3.192n ± 1% -18.05% (p=0.000 n=20)
IntN1000-32 3.403n ± 1% 2.462n ± 2% -27.65% (p=0.000 n=20)
Int64N1000-32 3.053n ± 2% 2.470n ± 1% -19.11% (p=0.000 n=20)
Int64N1e8-32 2.718n ± 1% 2.503n ± 2% -7.91% (p=0.000 n=20)
Int64N1e9-32 2.712n ± 1% 2.487n ± 1% -8.31% (p=0.000 n=20)
Int64N2e9-32 2.690n ± 1% 2.487n ± 1% -7.57% (p=0.000 n=20)
Int64N1e18-32 3.084n ± 2% 3.006n ± 2% -2.53% (p=0.000 n=20)
Int64N2e18-32 4.026n ± 1% 3.368n ± 1% -16.33% (p=0.000 n=20)
Int64N4e18-32 4.049n ± 2% 4.763n ± 1% +17.62% (p=0.000 n=20)
Int32N1000-32 2.730n ± 0% 2.403n ± 1% -11.94% (p=0.000 n=20)
Int32N1e8-32 2.916n ± 2% 2.405n ± 1% -17.53% (p=0.000 n=20)
Int32N1e9-32 3.375n ± 1% 2.402n ± 2% -28.83% (p=0.000 n=20)
Int32N2e9-32 3.292n ± 1% 2.384n ± 1% -27.58% (p=0.000 n=20)
Float32-32 2.673n ± 1% 2.641n ± 2% ~ (p=0.147 n=20)
Float64-32 2.485n ± 1% 2.483n ± 1% ~ (p=0.804 n=20)
ExpFloat64-32 3.577n ± 2% 3.486n ± 2% -2.57% (p=0.000 n=20)
NormFloat64-32 3.797n ± 2% 3.648n ± 1% -3.92% (p=0.000 n=20)
Perm3-32 35.79n ± 2% 33.04n ± 1% -7.68% (p=0.000 n=20)
Perm30-32 205.1n ± 1% 171.9n ± 1% -16.14% (p=0.000 n=20)
Perm30ViaShuffle-32 111.2n ± 2% 100.3n ± 1% -9.76% (p=0.000 n=20)
ShuffleOverhead-32 100.5n ± 2% 102.5n ± 1% +1.99% (p=0.007 n=20)
Concurrent-32 2.188n ± 5% 2.101n ± 0% ~ (p=0.013 n=20)
goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
│
11ad9fdddc.arm64 │
4d84a369d1.arm64 │
│ sec/op │ sec/op vs base │
SourceUint64-8 2.272n ± 1% 2.261n ± 1% ~ (p=0.172 n=20)
GlobalInt64-8 2.155n ± 1% 2.160n ± 1% ~ (p=0.482 n=20)
GlobalInt63Parallel-8 0.4352n ± 0%
GlobalInt64Parallel-8 0.4299n ± 0%
GlobalUint64-8 2.173n ± 1% 2.169n ± 1% ~ (p=0.262 n=20)
GlobalUint64Parallel-8 0.4340n ± 0% 0.4293n ± 1% -1.08% (p=0.000 n=20)
Int64-8 2.544n ± 1% 2.473n ± 1% -2.83% (p=0.000 n=20)
Uint64-8 2.552n ± 1% 2.453n ± 1% -3.90% (p=0.000 n=20)
GlobalIntN1000-8 3.856n ± 0% 2.814n ± 2% -27.02% (p=0.000 n=20)
IntN1000-8 3.820n ± 0% 2.933n ± 2% -23.22% (p=0.000 n=20)
Int64N1000-8 3.219n ± 2% 2.934n ± 2% -8.85% (p=0.000 n=20)
Int64N1e8-8 3.221n ± 2% 2.935n ± 2% -8.91% (p=0.000 n=20)
Int64N1e9-8 3.276n ± 2% 2.934n ± 2% -10.44% (p=0.000 n=20)
Int64N2e9-8 3.217n ± 0% 2.935n ± 2% -8.78% (p=0.000 n=20)
Int64N1e18-8 3.502n ± 2% 3.778n ± 1% +7.91% (p=0.000 n=20)
Int64N2e18-8 4.968n ± 1% 4.359n ± 1% -12.26% (p=0.000 n=20)
Int64N4e18-8 4.963n ± 0% 6.546n ± 1% +31.92% (p=0.000 n=20)
Int32N1000-8 3.189n ± 1% 2.940n ± 2% -7.81% (p=0.000 n=20)
Int32N1e8-8 3.514n ± 1% 2.937n ± 2% -16.41% (p=0.000 n=20)
Int32N1e9-8 4.133n ± 0% 2.938n ± 0% -28.91% (p=0.000 n=20)
Int32N2e9-8 4.137n ± 0% 2.938n ± 2% -28.97% (p=0.000 n=20)
Float32-8 3.468n ± 1% 3.486n ± 0% +0.52% (p=0.000 n=20)
Float64-8 3.478n ± 0% 3.480n ± 0% ~ (p=0.063 n=20)
ExpFloat64-8 4.563n ± 0% 4.533n ± 0% -0.67% (p=0.000 n=20)
NormFloat64-8 4.768n ± 0% 4.764n ± 0% -0.07% (p=0.001 n=20)
Perm3-8 28.94n ± 0% 26.66n ± 0% -7.88% (p=0.000 n=20)
Perm30-8 175.9n ± 0% 143.4n ± 0% -18.50% (p=0.000 n=20)
Perm30ViaShuffle-8 152.6n ± 1% 142.9n ± 0% -6.29% (p=0.000 n=20)
ShuffleOverhead-8 119.6n ± 1% 120.7n ± 0% +0.96% (p=0.000 n=20)
Concurrent-8 2.452n ± 3% 2.360n ± 2% -3.73% (p=0.007 n=20)
goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
│
11ad9fdddc.386 │
4d84a369d1.386 │
│ sec/op │ sec/op vs base │
SourceUint64-32 2.091n ± 1% 2.101n ± 2% ~ (p=0.672 n=20)
GlobalInt64-32 3.514n ± 2% 3.518n ± 2% ~ (p=0.723 n=20)
GlobalInt63Parallel-32 0.3197n ± 0%
GlobalInt64Parallel-32 0.3206n ± 0%
GlobalUint64-32 3.542n ± 1% 3.538n ± 1% ~ (p=0.304 n=20)
GlobalUint64Parallel-32 0.3218n ± 0% 0.3231n ± 0% ~ (p=0.071 n=20)
Int64-32 2.552n ± 2% 2.554n ± 2% ~ (p=0.693 n=20)
Uint64-32 2.566n ± 1% 2.575n ± 2% ~ (p=0.606 n=20)
GlobalIntN1000-32 5.965n ± 2% 6.292n ± 1% +5.46% (p=0.000 n=20)
IntN1000-32 4.652n ± 1% 4.735n ± 1% +1.77% (p=0.000 n=20)
Int64N1000-32 14.485n ± 1% 5.489n ± 2% -62.11% (p=0.000 n=20)
Int64N1e8-32 14.675n ± 1% 5.528n ± 2% -62.33% (p=0.000 n=20)
Int64N1e9-32 16.805n ± 2% 5.438n ± 2% -67.64% (p=0.000 n=20)
Int64N2e9-32 14.515n ± 1% 5.474n ± 1% -62.28% (p=0.000 n=20)
Int64N1e18-32 16.165n ± 1% 9.053n ± 1% -44.00% (p=0.000 n=20)
Int64N2e18-32 17.945n ± 2% 9.685n ± 2% -46.03% (p=0.000 n=20)
Int64N4e18-32 18.35n ± 2% 12.18n ± 1% -33.62% (p=0.000 n=20)
Int32N1000-32 3.608n ± 1% 4.862n ± 1% +34.77% (p=0.000 n=20)
Int32N1e8-32 3.767n ± 1% 4.758n ± 2% +26.31% (p=0.000 n=20)
Int32N1e9-32 4.130n ± 2% 4.772n ± 1% +15.54% (p=0.000 n=20)
Int32N2e9-32 4.206n ± 1% 4.847n ± 0% +15.24% (p=0.000 n=20)
Float32-32 22.18n ± 4% 22.18n ± 4% ~ (p=0.195 n=20)
Float64-32 20.75n ± 4% 21.21n ± 3% ~ (p=0.394 n=20)
ExpFloat64-32 12.58n ± 3% 12.39n ± 2% ~ (p=0.032 n=20)
NormFloat64-32 7.920n ± 3% 7.422n ± 1% -6.29% (p=0.000 n=20)
Perm3-32 40.27n ± 1% 38.00n ± 2% -5.65% (p=0.000 n=20)
Perm30-32 213.2n ± 2% 212.7n ± 1% ~ (p=0.995 n=20)
Perm30ViaShuffle-32 164.2n ± 2% 187.5n ± 2% +14.22% (p=0.000 n=20)
ShuffleOverhead-32 134.7n ± 2% 159.7n ± 1% +18.52% (p=0.000 n=20)
Concurrent-32 3.301n ± 2% 3.470n ± 0% +5.10% (p=0.000 n=20)
For #61716.
Change-Id: Id1481b04202883cd0b23e21bb58d1bca4e482bd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/502500
Reviewed-by: Rob Pike <r@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>