runtime: never call into race detector with retaken P
cgocall could previously invoke the race detector on an M whose P had
been retaken. The race detector would attempt to use the P-local state
from this stale P, racing with the thread that was actually wired to
that P. The result was memory corruption of ThreadSanitizer's internal
data structures that presented as hard-to-understand assertion failures
and segfaults.
Reorder cgocall so that it always acquires a P before invoking the race
detector, and add a test that stresses the interaction between cgo and
the race detector to protect against future bugs of this kind.