Add compile-time option PB_BUFFER_ONLY. This allows slight optimizations if only memory buffer support (as opposed to stream callbacks) is wanted. On ARM difference is -12% execution time, -4% code size when enabled.