chore: optimize dictionary access in strip_padding numpy (#3994) * emplace field descriptors * reserve sufficient capacity * remove std::move * properly iterate through dict * make handle casting more explicit * Revert to old dict api