Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 1 | # benchmark |
Evgeny Safronov | 6f69246 | 2014-11-14 11:11:45 +0400 | [diff] [blame] | 2 | [](https://travis-ci.org/google/benchmark) |
Dominic Hamon | 375e66c | 2015-05-11 12:34:03 -0700 | [diff] [blame] | 3 | [](https://ci.appveyor.com/project/google/benchmark/branch/master) |
Dominic Hamon | d8c7605 | 2015-05-12 11:32:44 -0700 | [diff] [blame] | 4 | [](https://coveralls.io/r/google/benchmark) |
Dominic Hamon | 373a7dd | 2014-01-07 17:04:19 -0800 | [diff] [blame] | 5 | |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 6 | A library to support the benchmarking of functions, similar to unit-tests. |
| 7 | |
Dominic Hamon | 96446f2 | 2014-01-09 10:48:18 -0800 | [diff] [blame] | 8 | Discussion group: https://groups.google.com/d/forum/benchmark-discuss |
| 9 | |
Dominic Hamon | 559c71d | 2015-10-13 12:02:08 -0700 | [diff] [blame] | 10 | IRC channel: https://freenode.net #googlebenchmark |
| 11 | |
Eric Fiselier | 07ee194 | 2016-09-03 00:19:37 -0600 | [diff] [blame] | 12 | [Known issues and common problems](#known-issues) |
Eric Fiselier | 61f570e | 2016-08-30 03:41:58 -0600 | [diff] [blame] | 13 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 14 | ## Example usage |
| 15 | ### Basic usage |
| 16 | Define a function that executes the code to be measured. |
Dominic Hamon | 80162ca | 2013-12-20 14:53:25 -0800 | [diff] [blame] | 17 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 18 | ```c++ |
| 19 | static void BM_StringCreation(benchmark::State& state) { |
| 20 | while (state.KeepRunning()) |
| 21 | std::string empty_string; |
| 22 | } |
| 23 | // Register the function as a benchmark |
| 24 | BENCHMARK(BM_StringCreation); |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 25 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 26 | // Define another benchmark |
| 27 | static void BM_StringCopy(benchmark::State& state) { |
| 28 | std::string x = "hello"; |
| 29 | while (state.KeepRunning()) |
| 30 | std::string copy(x); |
| 31 | } |
| 32 | BENCHMARK(BM_StringCopy); |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 33 | |
Dominic Hamon | bdf4a5f | 2015-03-12 21:56:45 -0700 | [diff] [blame] | 34 | BENCHMARK_MAIN(); |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 35 | ``` |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 36 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 37 | ### Passing arguments |
| 38 | Sometimes a family of benchmarks can be implemented with just one routine that |
| 39 | takes an extra argument to specify which one of the family of benchmarks to |
| 40 | run. For example, the following code defines a family of benchmarks for |
| 41 | measuring the speed of `memcpy()` calls of different lengths: |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 42 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 43 | ```c++ |
| 44 | static void BM_memcpy(benchmark::State& state) { |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 45 | char* src = new char[state.range(0)]; |
| 46 | char* dst = new char[state.range(0)]; |
| 47 | memset(src, 'x', state.range(0)); |
Paul Redmond | 0ce150e | 2014-07-23 13:36:58 -0400 | [diff] [blame] | 48 | while (state.KeepRunning()) |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 49 | memcpy(dst, src, state.range(0)); |
Eli Bendersky | f338ce7 | 2015-09-17 20:14:10 -0700 | [diff] [blame] | 50 | state.SetBytesProcessed(int64_t(state.iterations()) * |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 51 | int64_t(state.range(0))); |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 52 | delete[] src; |
| 53 | delete[] dst; |
| 54 | } |
| 55 | BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); |
| 56 | ``` |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 57 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 58 | The preceding code is quite repetitive, and can be replaced with the following |
| 59 | short-hand. The following invocation will pick a few appropriate arguments in |
| 60 | the specified range and will generate a benchmark for each such argument. |
Dominic Hamon | 80162ca | 2013-12-20 14:53:25 -0800 | [diff] [blame] | 61 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 62 | ```c++ |
| 63 | BENCHMARK(BM_memcpy)->Range(8, 8<<10); |
| 64 | ``` |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 65 | |
Dominic Hamon | 2440b75 | 2016-05-24 13:25:59 -0700 | [diff] [blame] | 66 | By default the arguments in the range are generated in multiples of eight and |
| 67 | the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the |
| 68 | range multiplier is changed to multiples of two. |
Ismael | 5812d54 | 2016-05-21 12:16:40 +0200 | [diff] [blame] | 69 | |
| 70 | ```c++ |
| 71 | BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10); |
| 72 | ``` |
Ismael | 07efafb | 2016-05-21 16:34:12 +0200 | [diff] [blame] | 73 | Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ]. |
Ismael | 5812d54 | 2016-05-21 12:16:40 +0200 | [diff] [blame] | 74 | |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 75 | You might have a benchmark that depends on two or more inputs. For example, the |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 76 | following code defines a family of benchmarks for measuring the speed of set |
| 77 | insertion. |
Dominic Hamon | 80162ca | 2013-12-20 14:53:25 -0800 | [diff] [blame] | 78 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 79 | ```c++ |
| 80 | static void BM_SetInsert(benchmark::State& state) { |
| 81 | while (state.KeepRunning()) { |
| 82 | state.PauseTiming(); |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 83 | std::set<int> data = ConstructRandomSet(state.range(0)); |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 84 | state.ResumeTiming(); |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 85 | for (int j = 0; j < state.range(1); ++j) |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 86 | data.insert(RandomNumber()); |
| 87 | } |
| 88 | } |
| 89 | BENCHMARK(BM_SetInsert) |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 90 | ->Args({1<<10, 1}) |
| 91 | ->Args({1<<10, 8}) |
| 92 | ->Args({1<<10, 64}) |
| 93 | ->Args({1<<10, 512}) |
| 94 | ->Args({8<<10, 1}) |
| 95 | ->Args({8<<10, 8}) |
| 96 | ->Args({8<<10, 64}) |
| 97 | ->Args({8<<10, 512}); |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 98 | ``` |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 99 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 100 | The preceding code is quite repetitive, and can be replaced with the following |
| 101 | short-hand. The following macro will pick a few appropriate arguments in the |
| 102 | product of the two specified ranges and will generate a benchmark for each such |
| 103 | pair. |
Dominic Hamon | 80162ca | 2013-12-20 14:53:25 -0800 | [diff] [blame] | 104 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 105 | ```c++ |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 106 | BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {1, 512}}); |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 107 | ``` |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 108 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 109 | For more complex patterns of inputs, passing a custom function to `Apply` allows |
| 110 | programmatic specification of an arbitrary set of arguments on which to run the |
| 111 | benchmark. The following example enumerates a dense range on one parameter, |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 112 | and a sparse range on the second. |
Dominic Hamon | 80162ca | 2013-12-20 14:53:25 -0800 | [diff] [blame] | 113 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 114 | ```c++ |
Dominik Czarnota | d2917bc | 2015-11-30 16:15:00 +0100 | [diff] [blame] | 115 | static void CustomArguments(benchmark::internal::Benchmark* b) { |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 116 | for (int i = 0; i <= 10; ++i) |
| 117 | for (int j = 32; j <= 1024*1024; j *= 8) |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 118 | b->Args({i, j}); |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 119 | } |
| 120 | BENCHMARK(BM_SetInsert)->Apply(CustomArguments); |
| 121 | ``` |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 122 | |
Ismael | dc667d0 | 2016-05-21 12:40:27 +0200 | [diff] [blame] | 123 | ### Calculate asymptotic complexity (Big O) |
Dominic Hamon | 2440b75 | 2016-05-24 13:25:59 -0700 | [diff] [blame] | 124 | Asymptotic complexity might be calculated for a family of benchmarks. The |
| 125 | following code will calculate the coefficient for the high-order term in the |
| 126 | running time and the normalized root-mean square error of string comparison. |
Ismael | dc667d0 | 2016-05-21 12:40:27 +0200 | [diff] [blame] | 127 | |
| 128 | ```c++ |
| 129 | static void BM_StringCompare(benchmark::State& state) { |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 130 | std::string s1(state.range(0), '-'); |
| 131 | std::string s2(state.range(0), '-'); |
Nick | d147797 | 2016-06-27 13:24:13 -0500 | [diff] [blame] | 132 | while (state.KeepRunning()) { |
Ismael | dc667d0 | 2016-05-21 12:40:27 +0200 | [diff] [blame] | 133 | benchmark::DoNotOptimize(s1.compare(s2)); |
Nick | d147797 | 2016-06-27 13:24:13 -0500 | [diff] [blame] | 134 | } |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 135 | state.SetComplexityN(state.range(0)); |
Ismael | dc667d0 | 2016-05-21 12:40:27 +0200 | [diff] [blame] | 136 | } |
| 137 | BENCHMARK(BM_StringCompare) |
Dominic Hamon | 2440b75 | 2016-05-24 13:25:59 -0700 | [diff] [blame] | 138 | ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN); |
Ismael | dc667d0 | 2016-05-21 12:40:27 +0200 | [diff] [blame] | 139 | ``` |
| 140 | |
Dominic Hamon | 2440b75 | 2016-05-24 13:25:59 -0700 | [diff] [blame] | 141 | As shown in the following invocation, asymptotic complexity might also be |
| 142 | calculated automatically. |
Ismael | dc667d0 | 2016-05-21 12:40:27 +0200 | [diff] [blame] | 143 | |
| 144 | ```c++ |
| 145 | BENCHMARK(BM_StringCompare) |
Ismael | 90a8508 | 2016-05-25 23:06:27 +0200 | [diff] [blame] | 146 | ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(); |
Ismael | dc667d0 | 2016-05-21 12:40:27 +0200 | [diff] [blame] | 147 | ``` |
| 148 | |
Ismael | 3ef6339 | 2016-06-02 20:58:14 +0200 | [diff] [blame] | 149 | The following code will specify asymptotic complexity with a lambda function, |
| 150 | that might be used to customize high-order term calculation. |
| 151 | |
| 152 | ```c++ |
| 153 | BENCHMARK(BM_StringCompare)->RangeMultiplier(2) |
Ismael | 240ba4e | 2016-06-02 22:21:52 +0200 | [diff] [blame] | 154 | ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; }); |
Ismael | 3ef6339 | 2016-06-02 20:58:14 +0200 | [diff] [blame] | 155 | ``` |
| 156 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 157 | ### Templated benchmarks |
| 158 | Templated benchmarks work the same way: This example produces and consumes |
| 159 | messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the |
| 160 | absence of multiprogramming. |
Dominic Hamon | 80162ca | 2013-12-20 14:53:25 -0800 | [diff] [blame] | 161 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 162 | ```c++ |
| 163 | template <class Q> int BM_Sequential(benchmark::State& state) { |
| 164 | Q q; |
| 165 | typename Q::value_type v; |
| 166 | while (state.KeepRunning()) { |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 167 | for (int i = state.range(0); i--; ) |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 168 | q.push(v); |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 169 | for (int e = state.range(0); e--; ) |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 170 | q.Wait(&v); |
| 171 | } |
| 172 | // actually messages, not bytes: |
| 173 | state.SetBytesProcessed( |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 174 | static_cast<int64_t>(state.iterations())*state.range(0)); |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 175 | } |
| 176 | BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); |
| 177 | ``` |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 178 | |
Eric Fiselier | daa8a67 | 2015-03-18 16:34:43 -0400 | [diff] [blame] | 179 | Three macros are provided for adding benchmark templates. |
| 180 | |
| 181 | ```c++ |
| 182 | #if __cplusplus >= 201103L // C++11 and greater. |
| 183 | #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. |
| 184 | #else // C++ < C++11 |
| 185 | #define BENCHMARK_TEMPLATE(func, arg1) |
| 186 | #endif |
| 187 | #define BENCHMARK_TEMPLATE1(func, arg1) |
| 188 | #define BENCHMARK_TEMPLATE2(func, arg1, arg2) |
| 189 | ``` |
| 190 | |
Eric | 238e558 | 2016-05-27 13:37:10 -0600 | [diff] [blame] | 191 | ## Passing arbitrary arguments to a benchmark |
| 192 | In C++11 it is possible to define a benchmark that takes an arbitrary number |
| 193 | of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)` |
| 194 | macro creates a benchmark that invokes `func` with the `benchmark::State` as |
| 195 | the first argument followed by the specified `args...`. |
| 196 | The `test_case_name` is appended to the name of the benchmark and |
| 197 | should describe the values passed. |
| 198 | |
| 199 | ```c++ |
| 200 | template <class ...ExtraArgs>` |
| 201 | void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) { |
| 202 | [...] |
| 203 | } |
| 204 | // Registers a benchmark named "BM_takes_args/int_string_test` that passes |
| 205 | // the specified values to `extra_args`. |
| 206 | BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc")); |
| 207 | ``` |
| 208 | Note that elements of `...args` may refer to global variables. Users should |
| 209 | avoid modifying global state inside of a benchmark. |
| 210 | |
Eric | 5f5ca31 | 2016-08-02 17:22:46 -0600 | [diff] [blame] | 211 | ## Using RegisterBenchmark(name, fn, args...) |
| 212 | |
| 213 | The `RegisterBenchmark(name, func, args...)` function provides an alternative |
| 214 | way to create and register benchmarks. |
| 215 | `RegisterBenchmark(name, func, args...)` creates, registers, and returns a |
| 216 | pointer to a new benchmark with the specified `name` that invokes |
| 217 | `func(st, args...)` where `st` is a `benchmark::State` object. |
| 218 | |
| 219 | Unlike the `BENCHMARK` registration macros, which can only be used at the global |
| 220 | scope, the `RegisterBenchmark` can be called anywhere. This allows for |
| 221 | benchmark tests to be registered programmatically. |
| 222 | |
| 223 | Additionally `RegisterBenchmark` allows any callable object to be registered |
| 224 | as a benchmark. Including capturing lambdas and function objects. This |
| 225 | allows the creation |
| 226 | |
| 227 | For Example: |
| 228 | ```c++ |
| 229 | auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ }; |
| 230 | |
| 231 | int main(int argc, char** argv) { |
| 232 | for (auto& test_input : { /* ... */ }) |
| 233 | benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input); |
| 234 | benchmark::Initialize(&argc, argv); |
| 235 | benchmark::RunSpecifiedBenchmarks(); |
| 236 | } |
| 237 | ``` |
| 238 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 239 | ### Multithreaded benchmarks |
Eli Bendersky | c7ab1b9 | 2015-12-30 06:01:19 -0800 | [diff] [blame] | 240 | In a multithreaded test (benchmark invoked by multiple threads simultaneously), |
| 241 | it is guaranteed that none of the threads will start until all have called |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 242 | `KeepRunning`, and all will have finished before KeepRunning returns false. As |
| 243 | such, any global setup or teardown can be wrapped in a check against the thread |
| 244 | index: |
Dominic Hamon | 01af2bc | 2013-12-20 14:51:56 -0800 | [diff] [blame] | 245 | |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 246 | ```c++ |
| 247 | static void BM_MultiThreaded(benchmark::State& state) { |
| 248 | if (state.thread_index == 0) { |
| 249 | // Setup code here. |
| 250 | } |
| 251 | while (state.KeepRunning()) { |
| 252 | // Run the test as normal. |
| 253 | } |
| 254 | if (state.thread_index == 0) { |
| 255 | // Teardown code here. |
| 256 | } |
| 257 | } |
| 258 | BENCHMARK(BM_MultiThreaded)->Threads(2); |
Dominic Hamon | 4499e8e | 2015-11-05 09:53:08 -0800 | [diff] [blame] | 259 | ``` |
Eric Fiselier | e428b9e | 2015-03-27 16:35:46 -0400 | [diff] [blame] | 260 | |
Eli Bendersky | c7ab1b9 | 2015-12-30 06:01:19 -0800 | [diff] [blame] | 261 | If the benchmarked code itself uses threads and you want to compare it to |
| 262 | single-threaded code, you may want to use real-time ("wallclock") measurements |
| 263 | for latency comparisons: |
| 264 | |
| 265 | ```c++ |
| 266 | BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); |
| 267 | ``` |
| 268 | |
| 269 | Without `UseRealTime`, CPU time is used by default. |
| 270 | |
Jussi Knuuttila | e253a28 | 2016-04-30 16:23:58 +0300 | [diff] [blame] | 271 | |
| 272 | ## Manual timing |
| 273 | For benchmarking something for which neither CPU time nor real-time are |
| 274 | correct or accurate enough, completely manual timing is supported using |
| 275 | the `UseManualTime` function. |
| 276 | |
| 277 | When `UseManualTime` is used, the benchmarked code must call |
| 278 | `SetIterationTime` once per iteration of the `KeepRunning` loop to |
| 279 | report the manually measured time. |
| 280 | |
| 281 | An example use case for this is benchmarking GPU execution (e.g. OpenCL |
| 282 | or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot |
| 283 | be accurately measured using CPU time or real-time. Instead, they can be |
| 284 | measured accurately using a dedicated API, and these measurement results |
| 285 | can be reported back with `SetIterationTime`. |
| 286 | |
| 287 | ```c++ |
| 288 | static void BM_ManualTiming(benchmark::State& state) { |
Marcin Kolny | dfe0260 | 2016-08-04 21:30:14 +0200 | [diff] [blame] | 289 | int microseconds = state.range(0); |
Jussi Knuuttila | e253a28 | 2016-04-30 16:23:58 +0300 | [diff] [blame] | 290 | std::chrono::duration<double, std::micro> sleep_duration { |
| 291 | static_cast<double>(microseconds) |
| 292 | }; |
| 293 | |
| 294 | while (state.KeepRunning()) { |
| 295 | auto start = std::chrono::high_resolution_clock::now(); |
| 296 | // Simulate some useful workload with a sleep |
| 297 | std::this_thread::sleep_for(sleep_duration); |
| 298 | auto end = std::chrono::high_resolution_clock::now(); |
| 299 | |
| 300 | auto elapsed_seconds = |
| 301 | std::chrono::duration_cast<std::chrono::duration<double>>( |
| 302 | end - start); |
| 303 | |
| 304 | state.SetIterationTime(elapsed_seconds.count()); |
| 305 | } |
| 306 | } |
| 307 | BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime(); |
| 308 | ``` |
| 309 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 310 | ### Preventing optimisation |
Eric Fiselier | e428b9e | 2015-03-27 16:35:46 -0400 | [diff] [blame] | 311 | To prevent a value or expression from being optimized away by the compiler |
Eric Fiselier | 7e40ff9 | 2016-07-11 14:58:50 -0600 | [diff] [blame] | 312 | the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()` |
| 313 | functions can be used. |
Eric Fiselier | e428b9e | 2015-03-27 16:35:46 -0400 | [diff] [blame] | 314 | |
| 315 | ```c++ |
| 316 | static void BM_test(benchmark::State& state) { |
| 317 | while (state.KeepRunning()) { |
| 318 | int x = 0; |
| 319 | for (int i=0; i < 64; ++i) { |
| 320 | benchmark::DoNotOptimize(x += i); |
| 321 | } |
| 322 | } |
| 323 | } |
Chris Seymour | 465cb09 | 2014-02-09 19:45:17 +0000 | [diff] [blame] | 324 | ``` |
Dominic Hamon | fd7d288 | 2014-12-26 08:44:14 -0800 | [diff] [blame] | 325 | |
Eric Fiselier | 7e40ff9 | 2016-07-11 14:58:50 -0600 | [diff] [blame] | 326 | `DoNotOptimize(<expr>)` forces the *result* of `<expr>` to be stored in either |
| 327 | memory or a register. For GNU based compilers it acts as read/write barrier |
| 328 | for global memory. More specifically it forces the compiler to flush pending |
| 329 | writes to memory and reload any other values as necessary. |
| 330 | |
| 331 | Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>` |
| 332 | in any way. `<expr>` may even be removed entirely when the result is already |
| 333 | known. For example: |
| 334 | |
| 335 | ```c++ |
| 336 | /* Example 1: `<expr>` is removed entirely. */ |
| 337 | int foo(int x) { return x + 42; } |
| 338 | while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42); |
| 339 | |
| 340 | /* Example 2: Result of '<expr>' is only reused */ |
| 341 | int bar(int) __attribute__((const)); |
| 342 | while (...) DoNotOptimize(bar(0)); // Optimized to: |
| 343 | // int __result__ = bar(0); |
| 344 | // while (...) DoNotOptimize(__result__); |
| 345 | ``` |
| 346 | |
| 347 | The second tool for preventing optimizations is `ClobberMemory()`. In essence |
| 348 | `ClobberMemory()` forces the compiler to perform all pending writes to global |
| 349 | memory. Memory managed by block scope objects must be "escaped" using |
| 350 | `DoNotOptimize(...)` before it can be clobbered. In the below example |
| 351 | `ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized |
| 352 | away. |
| 353 | |
| 354 | ```c++ |
| 355 | static void BM_vector_push_back(benchmark::State& state) { |
| 356 | while (state.KeepRunning()) { |
| 357 | std::vector<int> v; |
| 358 | v.reserve(1); |
| 359 | benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered. |
| 360 | v.push_back(42); |
| 361 | benchmark::ClobberMemory(); // Force 42 to be written to memory. |
| 362 | } |
| 363 | } |
| 364 | ``` |
| 365 | |
| 366 | Note that `ClobberMemory()` is only available for GNU based compilers. |
| 367 | |
Kai Wolf | f352c30 | 2016-04-29 21:42:21 +0200 | [diff] [blame] | 368 | ### Set time unit manually |
Kai Wolf | 0b4111c | 2016-03-28 21:32:11 +0200 | [diff] [blame] | 369 | If a benchmark runs a few milliseconds it may be hard to visually compare the |
| 370 | measured times, since the output data is given in nanoseconds per default. In |
| 371 | order to manually set the time unit, you can specify it manually: |
| 372 | |
| 373 | ```c++ |
| 374 | BENCHMARK(BM_test)->Unit(benchmark::kMillisecond); |
| 375 | ``` |
| 376 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 377 | ## Controlling number of iterations |
| 378 | In all cases, the number of iterations for which the benchmark is run is |
| 379 | governed by the amount of time the benchmark takes. Concretely, the number of |
| 380 | iterations is at least one, not more than 1e9, until CPU time is greater than |
| 381 | the minimum time, or the wallclock time is 5x minimum time. The minimum time is |
| 382 | set as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on |
| 383 | the registered benchmark object. |
| 384 | |
Eric Fiselier | 84bc4d7 | 2016-05-24 21:52:23 -0600 | [diff] [blame] | 385 | ## Reporting the mean and standard devation by repeated benchmarks |
| 386 | By default each benchmark is run once and that single result is reported. |
| 387 | However benchmarks are often noisy and a single result may not be representative |
| 388 | of the overall behavior. For this reason it's possible to repeatedly rerun the |
| 389 | benchmark. |
| 390 | |
| 391 | The number of runs of each benchmark is specified globally by the |
| 392 | `--benchmark_repetitions` flag or on a per benchmark basis by calling |
| 393 | `Repetitions` on the registered benchmark object. When a benchmark is run |
| 394 | more than once the mean and standard deviation of the runs will be reported. |
| 395 | |
Eric | a11fb69 | 2016-08-10 18:20:54 -0600 | [diff] [blame] | 396 | Additionally the `--benchmark_report_aggregates_only={true|false}` flag or |
| 397 | `ReportAggregatesOnly(bool)` function can be used to change how repeated tests |
| 398 | are reported. By default the result of each repeated run is reported. When this |
| 399 | option is 'true' only the mean and standard deviation of the runs is reported. |
| 400 | Calling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides |
| 401 | the value of the flag for that benchmark. |
| 402 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 403 | ## Fixtures |
Eric Fiselier | 9ed538f | 2015-04-06 17:56:05 -0400 | [diff] [blame] | 404 | Fixture tests are created by |
| 405 | first defining a type that derives from ::benchmark::Fixture and then |
| 406 | creating/registering the tests using the following macros: |
| 407 | |
| 408 | * `BENCHMARK_F(ClassName, Method)` |
| 409 | * `BENCHMARK_DEFINE_F(ClassName, Method)` |
| 410 | * `BENCHMARK_REGISTER_F(ClassName, Method)` |
| 411 | |
| 412 | For Example: |
| 413 | |
| 414 | ```c++ |
| 415 | class MyFixture : public benchmark::Fixture {}; |
| 416 | |
| 417 | BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { |
| 418 | while (st.KeepRunning()) { |
| 419 | ... |
| 420 | } |
| 421 | } |
| 422 | |
| 423 | BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { |
| 424 | while (st.KeepRunning()) { |
| 425 | ... |
| 426 | } |
| 427 | } |
| 428 | /* BarTest is NOT registered */ |
| 429 | BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); |
| 430 | /* BarTest is now registered */ |
| 431 | ``` |
Eric Fiselier | ffb67dc | 2015-03-17 18:42:41 -0400 | [diff] [blame] | 432 | |
Eric Fiselier | 90c9ab1 | 2016-05-23 20:35:09 -0600 | [diff] [blame] | 433 | ## Exiting Benchmarks in Error |
| 434 | |
Eric Fiselier | 924b8ce | 2016-05-24 15:21:41 -0600 | [diff] [blame] | 435 | When errors caused by external influences, such as file I/O and network |
| 436 | communication, occur within a benchmark the |
| 437 | `State::SkipWithError(const char* msg)` function can be used to skip that run |
| 438 | of benchmark and report the error. Note that only future iterations of the |
| 439 | `KeepRunning()` are skipped. Users may explicitly return to exit the |
| 440 | benchmark immediately. |
Eric Fiselier | 90c9ab1 | 2016-05-23 20:35:09 -0600 | [diff] [blame] | 441 | |
| 442 | The `SkipWithError(...)` function may be used at any point within the benchmark, |
| 443 | including before and after the `KeepRunning()` loop. |
| 444 | |
| 445 | For example: |
| 446 | |
| 447 | ```c++ |
| 448 | static void BM_test(benchmark::State& state) { |
| 449 | auto resource = GetResource(); |
| 450 | if (!resource.good()) { |
| 451 | state.SkipWithError("Resource is not good!"); |
| 452 | // KeepRunning() loop will not be entered. |
| 453 | } |
| 454 | while (state.KeepRunning()) { |
| 455 | auto data = resource.read_data(); |
| 456 | if (!resource.good()) { |
| 457 | state.SkipWithError("Failed to read data!"); |
| 458 | break; // Needed to skip the rest of the iteration. |
| 459 | } |
| 460 | do_stuff(data); |
| 461 | } |
| 462 | } |
| 463 | ``` |
| 464 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 465 | ## Output Formats |
Eric Fiselier | ffb67dc | 2015-03-17 18:42:41 -0400 | [diff] [blame] | 466 | The library supports multiple output formats. Use the |
Eric Fiselier | 44128d8 | 2016-08-02 15:12:43 -0600 | [diff] [blame] | 467 | `--benchmark_format=<console|json|csv>` flag to set the format type. `console` |
| 468 | is the default format. |
Eric Fiselier | ffb67dc | 2015-03-17 18:42:41 -0400 | [diff] [blame] | 469 | |
Eric Fiselier | 44128d8 | 2016-08-02 15:12:43 -0600 | [diff] [blame] | 470 | The Console format is intended to be a human readable format. By default |
Dominic Hamon | 9934396 | 2015-04-01 10:51:37 -0400 | [diff] [blame] | 471 | the format generates color output. Context is output on stderr and the |
| 472 | tabular data on stdout. Example tabular output looks like: |
Eric Fiselier | ffb67dc | 2015-03-17 18:42:41 -0400 | [diff] [blame] | 473 | ``` |
Eric Fiselier | ffb67dc | 2015-03-17 18:42:41 -0400 | [diff] [blame] | 474 | Benchmark Time(ns) CPU(ns) Iterations |
| 475 | ---------------------------------------------------------------------- |
| 476 | BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s |
| 477 | BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s |
| 478 | BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s |
| 479 | ``` |
| 480 | |
| 481 | The JSON format outputs human readable json split into two top level attributes. |
| 482 | The `context` attribute contains information about the run in general, including |
| 483 | information about the CPU and the date. |
| 484 | The `benchmarks` attribute contains a list of ever benchmark run. Example json |
| 485 | output looks like: |
Arkady Shapkin | 8da907c | 2016-02-16 23:29:24 +0300 | [diff] [blame] | 486 | ``` json |
Eric Fiselier | ffb67dc | 2015-03-17 18:42:41 -0400 | [diff] [blame] | 487 | { |
| 488 | "context": { |
| 489 | "date": "2015/03/17-18:40:25", |
| 490 | "num_cpus": 40, |
| 491 | "mhz_per_cpu": 2801, |
| 492 | "cpu_scaling_enabled": false, |
| 493 | "build_type": "debug" |
| 494 | }, |
| 495 | "benchmarks": [ |
| 496 | { |
| 497 | "name": "BM_SetInsert/1024/1", |
| 498 | "iterations": 94877, |
| 499 | "real_time": 29275, |
| 500 | "cpu_time": 29836, |
| 501 | "bytes_per_second": 134066, |
| 502 | "items_per_second": 33516 |
| 503 | }, |
| 504 | { |
| 505 | "name": "BM_SetInsert/1024/8", |
| 506 | "iterations": 21609, |
| 507 | "real_time": 32317, |
| 508 | "cpu_time": 32429, |
| 509 | "bytes_per_second": 986770, |
| 510 | "items_per_second": 246693 |
| 511 | }, |
| 512 | { |
| 513 | "name": "BM_SetInsert/1024/10", |
| 514 | "iterations": 21393, |
| 515 | "real_time": 32724, |
| 516 | "cpu_time": 33355, |
| 517 | "bytes_per_second": 1199226, |
| 518 | "items_per_second": 299807 |
| 519 | } |
| 520 | ] |
| 521 | } |
| 522 | ``` |
| 523 | |
Dominic Hamon | 9934396 | 2015-04-01 10:51:37 -0400 | [diff] [blame] | 524 | The CSV format outputs comma-separated values. The `context` is output on stderr |
| 525 | and the CSV itself on stdout. Example CSV output looks like: |
| 526 | ``` |
| 527 | name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label |
| 528 | "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, |
| 529 | "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, |
| 530 | "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, |
| 531 | ``` |
Eric Fiselier | ffb67dc | 2015-03-17 18:42:41 -0400 | [diff] [blame] | 532 | |
Eric Fiselier | 44128d8 | 2016-08-02 15:12:43 -0600 | [diff] [blame] | 533 | ## Output Files |
| 534 | The library supports writing the output of the benchmark to a file specified |
| 535 | by `--benchmark_out=<filename>`. The format of the output can be specified |
| 536 | using `--benchmark_out_format={json|console|csv}`. Specifying |
| 537 | `--benchmark_out` does not suppress the console output. |
| 538 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 539 | ## Debug vs Release |
Dominic Hamon | 211f23e | 2016-02-14 09:28:10 -0800 | [diff] [blame] | 540 | By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use: |
| 541 | |
| 542 | ``` |
| 543 | cmake -DCMAKE_BUILD_TYPE=Release |
| 544 | ``` |
| 545 | |
| 546 | To enable link-time optimisation, use |
| 547 | |
| 548 | ``` |
| 549 | cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true |
| 550 | ``` |
| 551 | |
Dominic Hamon | d6f96ed | 2016-04-19 09:34:13 -0700 | [diff] [blame] | 552 | ## Linking against the library |
Eric | de4ead7 | 2016-08-09 12:31:44 -0600 | [diff] [blame] | 553 | When using gcc, it is necessary to link against pthread to avoid runtime exceptions. |
| 554 | This is due to how gcc implements std::thread. |
Eric Fiselier | 9820035 | 2016-08-07 16:31:43 -0600 | [diff] [blame] | 555 | See [issue #67](https://github.com/google/benchmark/issues/67) for more details. |
Eric | de4ead7 | 2016-08-09 12:31:44 -0600 | [diff] [blame] | 556 | |
| 557 | ## Compiler Support |
| 558 | |
| 559 | Google Benchmark uses C++11 when building the library. As such we require |
| 560 | a modern C++ toolchain, both compiler and standard library. |
| 561 | |
| 562 | The following minimum versions are strongly recommended build the library: |
| 563 | |
| 564 | * GCC 4.8 |
| 565 | * Clang 3.4 |
| 566 | * Visual Studio 2013 |
| 567 | |
| 568 | Anything older *may* work. |
| 569 | |
| 570 | Note: Using the library and its headers in C++03 is supported. C++11 is only |
| 571 | required to build the library. |
Eric Fiselier | 61f570e | 2016-08-30 03:41:58 -0600 | [diff] [blame] | 572 | |
| 573 | # Known Issues |
| 574 | |
| 575 | ### Windows |
| 576 | |
| 577 | * Users must manually link `shlwapi.lib`. Failure to do so may result |
| 578 | in resolved symbols. |