internal/bytealg: port more performance-critical functions to ABIInternal
CL 308931 ported several runtime assembly functions to ABIInternal so
that compiler-generated ABIInternal calls don't go through ABI
wrappers, but it missed the runtime assembly functions that are
actually defined in internal/bytealg.
This eliminates the cost of wrappers for the BleveQuery and
GopherLuaKNucleotide benchmarks, but there's still more to do for
Tile38.