Now, this is embarrassing. For the whole Go 1.20 and Go 1.21 cycles, we
based RSA public key operation (verification and decryption) benchmarks
on the keys in rsa_test.go, which had E = 3. Most keys in use, including
all those generated by GenerateKey, have E = 65537. This significantly
skewed even relative benchmarks, because the new constant-time
algorithms would incur a larger slowdown for larger exponents.
I noticed this only because I got a production profile for an
application that does a lot of RSA verifications, saw ExpShort show up,
made ExpShort faster, and the crypto/rsa profiles didn't move.
We were measuring the wrong thing, and the slowdown was worse than we
thought. My apologies.
(If E had not been parametrized, it would have avoided issues like this
one, too. Grumble. https://words.filippo.io/parameters/#fn9)