| benchmark |
| ========= |
| [](https://travis-ci.org/google/benchmark) |
| [](https://ci.appveyor.com/project/google/benchmark/branch/master) |
| [](https://coveralls.io/r/google/benchmark) |
| |
| A library to support the benchmarking of functions, similar to unit-tests. |
| |
| Discussion group: https://groups.google.com/d/forum/benchmark-discuss |
| |
| IRC channel: https://freenode.net #googlebenchmark |
| |
| Example usage |
| ------------- |
| Define a function that executes the code to be measured a |
| specified number of times: |
| |
| ```c++ |
| static void BM_StringCreation(benchmark::State& state) { |
| while (state.KeepRunning()) |
| std::string empty_string; |
| } |
| // Register the function as a benchmark |
| BENCHMARK(BM_StringCreation); |
| |
| // Define another benchmark |
| static void BM_StringCopy(benchmark::State& state) { |
| std::string x = "hello"; |
| while (state.KeepRunning()) |
| std::string copy(x); |
| } |
| BENCHMARK(BM_StringCopy); |
| |
| BENCHMARK_MAIN(); |
| ``` |
| |
| Sometimes a family of microbenchmarks can be implemented with |
| just one routine that takes an extra argument to specify which |
| one of the family of benchmarks to run. For example, the following |
| code defines a family of microbenchmarks for measuring the speed |
| of `memcpy()` calls of different lengths: |
| |
| ```c++ |
| static void BM_memcpy(benchmark::State& state) { |
| char* src = new char[state.range_x()]; char* dst = new char[state.range_x()]; |
| memset(src, 'x', state.range_x()); |
| while (state.KeepRunning()) |
| memcpy(dst, src, state.range_x()); |
| state.SetBytesProcessed(int64_t(state.iterations()) * |
| int64_t(state.range_x())); |
| delete[] src; |
| delete[] dst; |
| } |
| BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); |
| ``` |
| |
| The preceding code is quite repetitive, and can be replaced with the |
| following short-hand. The following invocation will pick a few |
| appropriate arguments in the specified range and will generate a |
| microbenchmark for each such argument. |
| |
| ```c++ |
| BENCHMARK(BM_memcpy)->Range(8, 8<<10); |
| ``` |
| |
| You might have a microbenchmark that depends on two inputs. For |
| example, the following code defines a family of microbenchmarks for |
| measuring the speed of set insertion. |
| |
| ```c++ |
| static void BM_SetInsert(benchmark::State& state) { |
| while (state.KeepRunning()) { |
| state.PauseTiming(); |
| std::set<int> data = ConstructRandomSet(state.range_x()); |
| state.ResumeTiming(); |
| for (int j = 0; j < state.range_y(); ++j) |
| data.insert(RandomNumber()); |
| } |
| } |
| BENCHMARK(BM_SetInsert) |
| ->ArgPair(1<<10, 1) |
| ->ArgPair(1<<10, 8) |
| ->ArgPair(1<<10, 64) |
| ->ArgPair(1<<10, 512) |
| ->ArgPair(8<<10, 1) |
| ->ArgPair(8<<10, 8) |
| ->ArgPair(8<<10, 64) |
| ->ArgPair(8<<10, 512); |
| ``` |
| |
| The preceding code is quite repetitive, and can be replaced with |
| the following short-hand. The following macro will pick a few |
| appropriate arguments in the product of the two specified ranges |
| and will generate a microbenchmark for each such pair. |
| |
| ```c++ |
| BENCHMARK(BM_SetInsert)->RangePair(1<<10, 8<<10, 1, 512); |
| ``` |
| |
| For more complex patterns of inputs, passing a custom function |
| to Apply allows programmatic specification of an |
| arbitrary set of arguments to run the microbenchmark on. |
| The following example enumerates a dense range on one parameter, |
| and a sparse range on the second. |
| |
| ```c++ |
| static void CustomArguments(benchmark::internal::Benchmark* b) { |
| for (int i = 0; i <= 10; ++i) |
| for (int j = 32; j <= 1024*1024; j *= 8) |
| b->ArgPair(i, j); |
| } |
| BENCHMARK(BM_SetInsert)->Apply(CustomArguments); |
| ``` |
| |
| Templated microbenchmarks work the same way: |
| Produce then consume 'size' messages 'iters' times |
| Measures throughput in the absence of multiprogramming. |
| |
| ```c++ |
| template <class Q> int BM_Sequential(benchmark::State& state) { |
| Q q; |
| typename Q::value_type v; |
| while (state.KeepRunning()) { |
| for (int i = state.range_x(); i--; ) |
| q.push(v); |
| for (int e = state.range_x(); e--; ) |
| q.Wait(&v); |
| } |
| // actually messages, not bytes: |
| state.SetBytesProcessed( |
| static_cast<int64_t>(state.iterations())*state.range_x()); |
| } |
| BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); |
| ``` |
| |
| Three macros are provided for adding benchmark templates. |
| |
| ```c++ |
| #if __cplusplus >= 201103L // C++11 and greater. |
| #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. |
| #else // C++ < C++11 |
| #define BENCHMARK_TEMPLATE(func, arg1) |
| #endif |
| #define BENCHMARK_TEMPLATE1(func, arg1) |
| #define BENCHMARK_TEMPLATE2(func, arg1, arg2) |
| ``` |
| |
| In a multithreaded test (benchmark invoked by multiple threads simultaneously), |
| it is guaranteed that none of the threads will start until all have called |
| KeepRunning, and all will have finished before KeepRunning returns false. As |
| such, any global setup or teardown you want to do can be |
| wrapped in a check against the thread index: |
| |
| ```c++ |
| static void BM_MultiThreaded(benchmark::State& state) { |
| if (state.thread_index == 0) { |
| // Setup code here. |
| } |
| while (state.KeepRunning()) { |
| // Run the test as normal. |
| } |
| if (state.thread_index == 0) { |
| // Teardown code here. |
| } |
| } |
| BENCHMARK(BM_MultiThreaded)->Threads(2); |
| ``` |
| |
| If the benchmarked code itself uses threads and you want to compare it to |
| single-threaded code, you may want to use real-time ("wallclock") measurements |
| for latency comparisons: |
| |
| ```c++ |
| BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); |
| ``` |
| |
| Without `UseRealTime`, CPU time is used by default. |
| |
| To prevent a value or expression from being optimized away by the compiler |
| the `benchmark::DoNotOptimize(...)` function can be used. |
| |
| ```c++ |
| static void BM_test(benchmark::State& state) { |
| while (state.KeepRunning()) { |
| int x = 0; |
| for (int i=0; i < 64; ++i) { |
| benchmark::DoNotOptimize(x += i); |
| } |
| } |
| } |
| ``` |
| |
| Benchmark Fixtures |
| ------------------ |
| Fixture tests are created by |
| first defining a type that derives from ::benchmark::Fixture and then |
| creating/registering the tests using the following macros: |
| |
| * `BENCHMARK_F(ClassName, Method)` |
| * `BENCHMARK_DEFINE_F(ClassName, Method)` |
| * `BENCHMARK_REGISTER_F(ClassName, Method)` |
| |
| For Example: |
| |
| ```c++ |
| class MyFixture : public benchmark::Fixture {}; |
| |
| BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { |
| while (st.KeepRunning()) { |
| ... |
| } |
| } |
| |
| BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { |
| while (st.KeepRunning()) { |
| ... |
| } |
| } |
| /* BarTest is NOT registered */ |
| BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); |
| /* BarTest is now registered */ |
| ``` |
| |
| Output Formats |
| -------------- |
| The library supports multiple output formats. Use the |
| `--benchmark_format=<tabular|json>` flag to set the format type. `tabular` is |
| the default format. |
| |
| The Tabular format is intended to be a human readable format. By default |
| the format generates color output. Context is output on stderr and the |
| tabular data on stdout. Example tabular output looks like: |
| ``` |
| Benchmark Time(ns) CPU(ns) Iterations |
| ---------------------------------------------------------------------- |
| BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s |
| BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s |
| BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s |
| ``` |
| |
| The JSON format outputs human readable json split into two top level attributes. |
| The `context` attribute contains information about the run in general, including |
| information about the CPU and the date. |
| The `benchmarks` attribute contains a list of ever benchmark run. Example json |
| output looks like: |
| ``` json |
| { |
| "context": { |
| "date": "2015/03/17-18:40:25", |
| "num_cpus": 40, |
| "mhz_per_cpu": 2801, |
| "cpu_scaling_enabled": false, |
| "build_type": "debug" |
| }, |
| "benchmarks": [ |
| { |
| "name": "BM_SetInsert/1024/1", |
| "iterations": 94877, |
| "real_time": 29275, |
| "cpu_time": 29836, |
| "bytes_per_second": 134066, |
| "items_per_second": 33516 |
| }, |
| { |
| "name": "BM_SetInsert/1024/8", |
| "iterations": 21609, |
| "real_time": 32317, |
| "cpu_time": 32429, |
| "bytes_per_second": 986770, |
| "items_per_second": 246693 |
| }, |
| { |
| "name": "BM_SetInsert/1024/10", |
| "iterations": 21393, |
| "real_time": 32724, |
| "cpu_time": 33355, |
| "bytes_per_second": 1199226, |
| "items_per_second": 299807 |
| } |
| ] |
| } |
| ``` |
| |
| The CSV format outputs comma-separated values. The `context` is output on stderr |
| and the CSV itself on stdout. Example CSV output looks like: |
| ``` |
| name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label |
| "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, |
| "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, |
| "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, |
| ``` |
| |
| Debug vs Release |
| ---------------- |
| By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use: |
| |
| ``` |
| cmake -DCMAKE_BUILD_TYPE=Release |
| ``` |
| |
| To enable link-time optimisation, use |
| |
| ``` |
| cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true |
| ``` |
| |
| Linking against the library |
| --------------------------- |
| When using gcc, it is necessary to link against pthread to avoid runtime exceptions. This is due to how gcc implements std::thread. See [issue #67](https://github.com/google/benchmark/issues/67) for more details. |