The original goal of rounding to readable b.N
was to make it easier to eyeball times.
However, proper analysis requires tooling
(such as benchstat) anyway.
Instead, take b.N as it comes.
This will reduce the impact of external noise
such as GC on benchmarks.
This requires reworking our iteration estimates.
We used to calculate the estimated ns/op
and then divide our target ns by that estimate.
However, this order of operations was destructive
when the ns/op was very small; rounding could
hide almost an order of magnitude of variation.
Instead, multiply first, then divide.
Also, make n an int64 to avoid overflow.
Prior to this change, we attempted to cap b.N at 1e9.
Due to rounding up, it was possible to get b.N as high as 2e9.
This change consistently enforces the 1e9 cap.
This change also reduces the wall time required to run benchmarks.
Here's the impact of this change on the wall time to run
all benchmarks once with benchtime=1s on some std packages: