Rewrite of the decoder (#263)

New code is smaller (in both source size and compiled size) and faster.

# Speed

The decoder speeds up on all machines I tested, though the amount of speedup varies. I was only able to test Intel CPUs.

### Linux Desktop
```
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
OS: Linux

name         	old time/op  new time/op  delta
CreateArena   	 4.72ns ± 0%  4.93ns ± 0%   +4.47%  (p=0.000 n=11+11)
ParseDescriptor  12.4µs ± 1%   9.1µs ± 1%  -26.65%  (p=0.000 n=11+11)
```

### Mac Laptop
```
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
OS: macOS

name             old time/op  new time/op  delta
CreateArena      5.33ns ± 3%  5.58ns ± 2%   +4.69%  (p=0.000 n=12+12)
ParseDescriptor  15.0µs ± 2%  11.9µs ± 2%  -20.20%  (p=0.000 n=12+12)
```

### Linux Workstation
```
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
OS: Linux

name             old time/op  new time/op  delta
CreateArena      5.29ns ± 0%  5.52ns ± 0%   +4.37%  (p=0.000 n=10+12)
ParseDescriptor  18.6µs ± 0%  16.4µs ± 0%  -11.54%  (p=0.000 n=12+12)
```

# Size

A few source files grow marginally because of some arena functionality moved inline. But `upb/decode.c` shrinks by 30% on Linux:

```
     VM SIZE    
 -------------- 
  +2.1%    +283    upb/json_decode.c
   +24%    +205    upb/msg.c
  +8.4%    +115    upb/upb.c
  +0.9%     +28    upb/reflection.c
  [ = ]       0    upb/def.c
  [ = ]       0    upb/encode.c
  [ = ]       0    upb/json_encode.c
  [ = ]       0    upb/table.c
 -30.3% -1.51Ki    upb/decode.c
  -0.7%    -738    TOTAL
```
diff --git a/benchmark.py b/benchmark.py
new file mode 100755
index 0000000..9c59674
--- /dev/null
+++ b/benchmark.py
@@ -0,0 +1,32 @@
+#!/usr/bin/env python3
+
+import json
+import subprocess
+import re
+
+def Run(cmd):
+  subprocess.check_call(cmd, shell=True)
+
+def RunAgainstBranch(branch, outfile, runs=12):
+  tmpfile = "/tmp/bench-output.json"
+  Run("rm -rf {}".format(tmpfile))
+  Run("git checkout {}".format(branch))
+  Run("bazel build -c opt :benchmark")
+
+  Run("./bazel-bin/benchmark --benchmark_out_format=json --benchmark_out={} --benchmark_repetitions={}".format(tmpfile, runs))
+
+  with open(tmpfile) as f:
+    bench_json = json.load(f)
+
+  with open(outfile, "w") as f:
+    for run in bench_json["benchmarks"]:
+      name = re.sub(r'^BM_', 'Benchmark', run["name"])
+      if name.endswith("_mean") or name.endswith("_median") or name.endswith("_stddev"):
+        continue
+      values = (name, run["iterations"], run["cpu_time"])
+      print("{} {} {} ns/op".format(*values), file=f)
+
+RunAgainstBranch("master", "/tmp/old.txt")
+RunAgainstBranch("decoder", "/tmp/new.txt")
+
+Run("~/go/bin/benchstat /tmp/old.txt /tmp/new.txt")