Add user-defined counters. (#262)

* Added user counters, and move use of bytes_processed and items_processed to user counter logic.

Each counter is a string-value pair. The counters were
made available through the State class. Two helper virtual
methods were added to the Fixture class to allow convenient
initialization and termination of the counters: InitState()
and TerminateState(). The reporting of the counters is buggy
and is still a work in progress, to be completed in the next commits.

* fix bad removal of BenchmarkCounters code during the merge

* add myself to AUTHORS/CONTRIBUTORS

* fix printing to std::cout in csv_reporter

* bytes_per_second and items_per_second are now in the UserCounters class

* add user counters to json reporter

* moving bytes_per_second and items_per_second to their old state

* console reporter dealing ok with user counters.

* update unit tests for user counters

* CSVReporter now prints user counters too.

* cleanup user counters

* reverted changes to cmake files which should have gone into later commits

* fixture_test: fix gcc 4.6 compilation

* remove ctor with default argument

see https://github.com/google/benchmark/pull/262#discussion_r72298055

* use (auto-defined) BENCHMARK_HAS_CXX11 instead of BENCHMARK_INITLIST.

https://github.com/google/benchmark/pull/262#discussion_r72298310

* leanify counters API

Discussions:
API complexity: https://github.com/google/benchmark/pull/262#discussion_r72298731
remove std::string dependency (WIP): https://github.com/google/benchmark/pull/262#discussion_r72298142
spacing & alignment: https://github.com/google/benchmark/pull/262#discussion_r72298422

* remove std::string dependency on public API - changed counter name storage to char*

* Counter ctor: use overloads instead of default arguments

discussion:
https://github.com/google/benchmark/pull/262#discussion_r72298055

* Use raw pointers to remove dependency on std::vector from public API .

For more info, see discussion at https://github.com/google/benchmark/pull/262#discussion_r72319678 .

* Move counter implementation from benchmark.cc to counter.cc.

    See discussion: https://github.com/google/benchmark/pull/262#discussion_r72298980 .

* Remove unused (commented-out) code.

* Moved thread counters to ThreadStats.

* Counters: fixed copy and move constructors.

* Counter: use an inplace buffer for small names.

* benchmark_test: move counters test out of CXX11 preprocessor conditional.

* Counter: fix VS2013 compilation error in char[] initialization.

* Fix typo.

* Expose counters from State.

See discussion: https://github.com/google/benchmark/pull/262#issuecomment-237156951

* Changed counters interface to map-like.

* Fix printing of user counters in ConsoleReporter.

* Applied clang-format to counter.cc and console_reporter.cc.

Command was `clang-format -style=Google -i counter.cc console_reporter.cc`
I also applied to all other files, but the changes were very
far-reaching so I rolled those back.

* Rename Counter::Flags_e to Counter::Flags

* Fix use of reserved names in Counter and BenchmarkCounters.

* Counter: Fix move ctor bug + change order of members.

* Fixture: remove tentative methods InitState() and TerminateState().

* Update fixture_test to the new Fixture interface.

* BenchmarkCounters: fixed a bug in the move ctor. Remove call to CHECK_LT().

CHECK_LT() was making the size_t lookup take ~double the time of a string lookup!

* BenchmarkCounters: add option to not print zero counters (defaults to false).

* Add test to compare counter storage and access with std::map.

* README: clarify cost of counter access modes.

* move counter access test to an own test.

* BenchmarkCounters: add move Insert()

* Counters access test: add accelerated lookup by name.

* Fix old range syntax.

* Fix missing include of cstdio

* Fix Visual Studio warning

* VS2013 and lower: fix use of snprintf()

* VS2013: fix use of char[] as a member of std::pair<>.

* change counter storage to std::map

* Remove skipZeroCounters logic

* Fix VS compilation error.

* Implemented request changes to PR #262.

* PR #262: More requested changes.

* README: cleanup counter text.

* PR #262: remove clang-format changes for preexisting code

* Complexity+Counters: fix counter flags which were being ignored.

* Document all Counter::Flag members

* fixed loss of counter values

* ConsoleReporter: remove tabular printing of user counters.

* ConsoleReporter: header printing should not be contingent on user counter names.

* Minor white space and alignment fixes.

* cxx03_test + counters: reuse the BM_empty() function.

* user counters: add note to README on how counters are gathered across threads
diff --git a/AUTHORS b/AUTHORS
index 5a545fa..8c15b87 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -19,6 +19,7 @@
 Felix Homann <linuxaudio@showlabor.de>
 Google Inc.
 Ismael Jimenez Martinez <ismael.jimenez.martinez@gmail.com>
+Joao Paulo Magalhaes <joaoppmagalhaes@gmail.com>
 JianXiong Zhou <zhoujianxiong2@gmail.com>
 Jussi Knuuttila <jussi.knuuttila@gmail.com>
 Kaito Udagawa <umireon@gmail.com>
diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 33cd941..96cd15a 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -34,6 +34,7 @@
 Evgeny Safronov <division494@gmail.com>
 Felix Homann <linuxaudio@showlabor.de>
 Ismael Jimenez Martinez <ismael.jimenez.martinez@gmail.com>
+Joao Paulo Magalhaes <joaoppmagalhaes@gmail.com>
 JianXiong Zhou <zhoujianxiong2@gmail.com>
 Jussi Knuuttila <jussi.knuuttila@gmail.com>
 Kaito Udagawa <umireon@gmail.com>
diff --git a/README.md b/README.md
index 2cfb70b..456b0a6 100644
--- a/README.md
+++ b/README.md
@@ -432,6 +432,65 @@
 /* BarTest is now registered */
 ```
 
+
+## User-defined counters
+
+You can add your own counters with user-defined names. The example below
+will add columns "Foo", "Bar" and "Baz" in its output:
+
+```c++
+static void UserCountersExample1(benchmark::State& state) {
+  double numFoos = 0, numBars = 0, numBazs = 0;
+  while (state.KeepRunning()) {
+    // ... count Foo,Bar,Baz events
+  }
+  state.counters["Foo"] = numFoos;
+  state.counters["Bar"] = numBars;
+  state.counters["Baz"] = numBazs;
+}
+```
+
+The `state.counters` object is a `std::map` with `std::string` keys
+and `Counter` values. The latter is a `double`-like class, via an implicit
+conversion to `double&`. Thus you can use all of the standard arithmetic
+assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
+
+In multithreaded benchmarks, each counter is set on the calling thread only.
+When the benchmark finishes, the counters from each thread will be summed;
+the resulting sum is the value which will be shown for the benchmark.
+
+The `Counter` constructor accepts two parameters: the value as a `double`
+and a bit flag which allows you to show counters as rates and/or as
+per-thread averages:
+
+```c++
+  // sets a simple counter
+  state.counters["Foo"] = numFoos;
+
+  // Set the counter as a rate. It will be presented divided
+  // by the duration of the benchmark.
+  state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
+
+  // Set the counter as a thread-average quantity. It will
+  // be presented divided by the number of threads.
+  state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
+
+  // There's also a combined flag:
+  state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
+```
+
+When you're compiling in C++11 mode or later you can use `insert()` with
+`std::initializer_list`:
+
+```c++
+  // With C++11, this can be done:
+  state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
+  // ... instead of:
+  state.counters["Foo"] = numFoos;
+  state.counters["Bar"] = numBars;
+  state.counters["Baz"] = numBazs;
+```
+
 ## Exiting Benchmarks in Error
 
 When errors caused by external influences, such as file I/O and network
@@ -503,7 +562,7 @@
 information about the CPU and the date.
 The `benchmarks` attribute contains a list of ever benchmark run. Example json
 output looks like:
-``` json
+```json
 {
   "context": {
     "date": "2015/03/17-18:40:25",
diff --git a/include/benchmark/benchmark_api.h b/include/benchmark/benchmark_api.h
index 66cbd7e..58bf987 100644
--- a/include/benchmark/benchmark_api.h
+++ b/include/benchmark/benchmark_api.h
@@ -155,11 +155,13 @@
 
 #include <string>
 #include <vector>
+#include <map>
 
 #include "macros.h"
 
 #if defined(BENCHMARK_HAS_CXX11)
 #include <type_traits>
+#include <initializer_list>
 #include <utility>
 #endif
 
@@ -248,6 +250,39 @@
 // FIXME Add ClobberMemory() for non-gnu compilers
 #endif
 
+
+
+// This class is used for user-defined counters.
+class Counter {
+public:
+
+  enum Flags {
+    kDefaults   = 0,
+    // Mark the counter as a rate. It will be presented divided
+    // by the duration of the benchmark.
+    kIsRate     = 1,
+    // Mark the counter as a thread-average quantity. It will be
+    // presented divided by the number of threads.
+    kAvgThreads = 2,
+    // Mark the counter as a thread-average rate. See above.
+    kAvgThreadsRate = kIsRate|kAvgThreads
+  };
+
+  double value;
+  Flags  flags;
+
+  BENCHMARK_ALWAYS_INLINE
+  Counter(double v = 0., Flags f = kDefaults) : value(v), flags(f) {}
+
+  BENCHMARK_ALWAYS_INLINE operator double const& () const { return value; }
+  BENCHMARK_ALWAYS_INLINE operator double      & ()       { return value; }
+
+};
+
+// This is the container for the user-defined counters.
+typedef std::map<std::string, Counter> BenchmarkCounters;
+
+
 // TimeUnit is passed to a benchmark in order to specify the order of magnitude
 // for the measured time.
 enum TimeUnit { kNanosecond, kMicrosecond, kMillisecond };
@@ -438,6 +473,8 @@
   bool error_occurred_;
 
  public:
+  // Container for user-defined counters.
+  BenchmarkCounters counters;
   // Index of the executing thread. Values from [0, threads).
   const int thread_index;
   // Number of threads concurrently executing the benchmark.
diff --git a/include/benchmark/reporter.h b/include/benchmark/reporter.h
index 8c39e7f..63dbebb 100644
--- a/include/benchmark/reporter.h
+++ b/include/benchmark/reporter.h
@@ -19,6 +19,7 @@
 #include <string>
 #include <utility>
 #include <vector>
+#include <set>
 
 #include "benchmark_api.h"  // For forward declaration of BenchmarkReporter
 
@@ -54,7 +55,8 @@
           complexity_lambda(),
           complexity_n(0),
           report_big_o(false),
-          report_rms(false) {}
+          report_rms(false),
+          counters() {}
 
     std::string benchmark_name;
     std::string report_label;  // Empty if not set by benchmark.
@@ -93,6 +95,8 @@
     // Inform print function whether the current run is a complexity report
     bool report_big_o;
     bool report_rms;
+
+    BenchmarkCounters counters;
   };
 
   // Construct a BenchmarkReporter with the output stream set to 'std::cout'
@@ -163,7 +167,10 @@
 
  protected:
   virtual void PrintRunData(const Run& report);
+  virtual void PrintHeader(const Run& report);
+
   size_t name_field_width_;
+  bool printed_header_;
 
  private:
   bool color_output_;
@@ -184,11 +191,15 @@
 
 class CSVReporter : public BenchmarkReporter {
  public:
+  CSVReporter() : printed_header_(false) {}
   virtual bool ReportContext(const Context& context);
   virtual void ReportRuns(const std::vector<Run>& reports);
 
  private:
   void PrintRunData(const Run& report);
+
+  bool printed_header_;
+  std::set< std::string > user_counter_names_;
 };
 
 inline const char* GetTimeUnitString(TimeUnit unit) {
diff --git a/src/benchmark.cc b/src/benchmark.cc
index d37dbd9..9aab2de 100644
--- a/src/benchmark.cc
+++ b/src/benchmark.cc
@@ -37,6 +37,7 @@
 #include "colorprint.h"
 #include "commandlineflags.h"
 #include "complexity.h"
+#include "counter.h"
 #include "log.h"
 #include "mutex.h"
 #include "re.h"
@@ -145,6 +146,7 @@
     std::string report_label_;
     std::string error_message_;
     bool has_error_ = false;
+    BenchmarkCounters counters;
   };
   GUARDED_BY(GetBenchmarkMutex()) Result results;
 
@@ -249,6 +251,7 @@
     report.complexity_n = results.complexity_n;
     report.complexity = b.complexity;
     report.complexity_lambda = b.complexity_lambda;
+    report.counters = results.counters;
   }
   return report;
 }
@@ -272,6 +275,7 @@
     results.bytes_processed += st.bytes_processed();
     results.items_processed += st.items_processed();
     results.complexity_n += st.complexity_length_n();
+    internal::Increment(&results.counters, st.counters);
   }
   manager->NotifyThreadComplete();
 }
@@ -386,6 +390,7 @@
       items_processed_(0),
       complexity_n_(0),
       error_occurred_(false),
+      counters(),
       thread_index(thread_i),
       threads(n_threads),
       max_iterations(max_iters),
diff --git a/src/benchmark_api_internal.h b/src/benchmark_api_internal.h
index 8b97ce6..d1ae5b7 100644
--- a/src/benchmark_api_internal.h
+++ b/src/benchmark_api_internal.h
@@ -24,6 +24,7 @@
   bool use_manual_time;
   BigO complexity;
   BigOFunc* complexity_lambda;
+  BenchmarkCounters counters;
   bool last_benchmark_instance;
   int repetitions;
   double min_time;
diff --git a/src/complexity.cc b/src/complexity.cc
index dfab791..015db4c 100644
--- a/src/complexity.cc
+++ b/src/complexity.cc
@@ -171,6 +171,22 @@
   // All repetitions should be run with the same number of iterations so we
   // can take this information from the first benchmark.
   int64_t const run_iterations = reports.front().iterations;
+  // create stats for user counters
+  struct CounterStat {
+    Counter c;
+    Stat1_d s;
+  };
+  std::map< std::string, CounterStat > counter_stats;
+  for(Run const& r : reports) {
+    for(auto const& cnt : r.counters) {
+      auto it = counter_stats.find(cnt.first);
+      if(it == counter_stats.end()) {
+        counter_stats.insert({cnt.first, {cnt.second, Stat1_d{}}});
+      } else {
+        CHECK_EQ(counter_stats[cnt.first].c.flags, cnt.second.flags);
+      }
+    }
+  }
 
   // Populate the accumulators.
   for (Run const& run : reports) {
@@ -183,6 +199,12 @@
         Stat1_d(run.cpu_accumulated_time / run.iterations, run.iterations);
     items_per_second_stat += Stat1_d(run.items_per_second, run.iterations);
     bytes_per_second_stat += Stat1_d(run.bytes_per_second, run.iterations);
+    // user counters
+    for(auto const& cnt : run.counters) {
+      auto it = counter_stats.find(cnt.first);
+      CHECK_NE(it, counter_stats.end());
+      it->second.s += Stat1_d(cnt.second, run.iterations);
+    }
   }
 
   // Get the data from the accumulator to BenchmarkReporter::Run's.
@@ -196,6 +218,11 @@
   mean_data.bytes_per_second = bytes_per_second_stat.Mean();
   mean_data.items_per_second = items_per_second_stat.Mean();
   mean_data.time_unit = reports[0].time_unit;
+  // user counters
+  for(auto const& kv : counter_stats) {
+    auto c = Counter(kv.second.s.Mean(), counter_stats[kv.first].c.flags);
+    mean_data.counters[kv.first] = c;
+  }
 
   // Only add label to mean/stddev if it is same for all runs
   mean_data.report_label = reports[0].report_label;
@@ -215,6 +242,11 @@
   stddev_data.bytes_per_second = bytes_per_second_stat.StdDev();
   stddev_data.items_per_second = items_per_second_stat.StdDev();
   stddev_data.time_unit = reports[0].time_unit;
+  // user counters
+  for(auto const& kv : counter_stats) {
+    auto c = Counter(kv.second.s.StdDev(), counter_stats[kv.first].c.flags);
+    stddev_data.counters[kv.first] = c;
+  }
 
   results.push_back(mean_data);
   results.push_back(stddev_data);
diff --git a/src/console_reporter.cc b/src/console_reporter.cc
index 7e0cca3..3f3de02 100644
--- a/src/console_reporter.cc
+++ b/src/console_reporter.cc
@@ -14,6 +14,7 @@
 
 #include "benchmark/reporter.h"
 #include "complexity.h"
+#include "counter.h"
 
 #include <algorithm>
 #include <cstdint>
@@ -34,6 +35,7 @@
 
 bool ConsoleReporter::ReportContext(const Context& context) {
   name_field_width_ = context.name_field_width;
+  printed_header_ = false;
 
   PrintBasicContext(&GetErrorStream(), context);
 
@@ -45,16 +47,32 @@
     color_output_ = false;
   }
 #endif
-  std::string str =
-      FormatString("%-*s %13s %13s %10s\n", static_cast<int>(name_field_width_),
-                   "Benchmark", "Time", "CPU", "Iterations");
-  GetOutputStream() << str << std::string(str.length() - 1, '-') << "\n";
 
   return true;
 }
 
+void ConsoleReporter::PrintHeader(const Run& run) {
+  std::string str =
+      FormatString("%-*s %13s %13s %10s\n", static_cast<int>(name_field_width_),
+                   "Benchmark", "Time", "CPU", "Iterations");
+  if(!run.counters.empty()) {
+    str += " UserCounters...";
+  }
+  std::string line = std::string(str.length(), '-');
+  GetOutputStream() << line << "\n" << str << line << "\n";
+}
+
 void ConsoleReporter::ReportRuns(const std::vector<Run>& reports) {
-  for (const auto& run : reports) PrintRunData(run);
+  for (const auto& run : reports) {
+    // print the header if none was printed yet
+    if (!printed_header_) {
+      printed_header_ = true;
+      PrintHeader(run);
+    }
+    // As an alternative to printing the headers like this, we could sort
+    // the benchmarks by header and then print like that.
+    PrintRunData(run);
+  }
 }
 
 static void IgnoreColorPrint(std::ostream& out, LogColor, const char* fmt,
@@ -114,6 +132,11 @@
     printer(Out, COLOR_CYAN, "%10lld", result.iterations);
   }
 
+  for (auto& c : result.counters) {
+    auto const& s = HumanReadableNumber(c.second.value);
+    printer(Out, COLOR_DEFAULT, " %s=%s", c.first.c_str(), s.c_str());
+  }
+
   if (!rate.empty()) {
     printer(Out, COLOR_DEFAULT, " %*s", 13, rate.c_str());
   }
diff --git a/src/counter.cc b/src/counter.cc
new file mode 100644
index 0000000..4cf6fcb
--- /dev/null
+++ b/src/counter.cc
@@ -0,0 +1,68 @@
+// Copyright 2015 Google Inc. All rights reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "counter.h"
+
+namespace benchmark {
+namespace internal {
+
+double Finish(Counter const& c, double cpu_time, double num_threads) {
+  double v = c.value;
+  if (c.flags & Counter::kIsRate) {
+    v /= cpu_time;
+  }
+  if (c.flags & Counter::kAvgThreads) {
+    v /= num_threads;
+  }
+  return v;
+}
+
+void Finish(BenchmarkCounters *l, double cpu_time, double num_threads) {
+  for (auto &c : *l) {
+    c.second = Finish(c.second, cpu_time, num_threads);
+  }
+}
+
+void Increment(BenchmarkCounters *l, BenchmarkCounters const& r) {
+  // add counters present in both or just in *l
+  for (auto &c : *l) {
+    auto it = r.find(c.first);
+    if (it != r.end()) {
+      c.second = c.second + it->second;
+    }
+  }
+  // add counters present in r, but not in *l
+  for (auto const &tc : r) {
+    auto it = l->find(tc.first);
+    if (it == l->end()) {
+      (*l)[tc.first] = tc.second;
+    }
+  }
+}
+
+bool SameNames(BenchmarkCounters const& l, BenchmarkCounters const& r) {
+  if (&l == &r) return true;
+  if (l.size() != r.size()) {
+    return false;
+  }
+  for (auto const& c : l) {
+    if ( r.find(c.first) == r.end()) {
+      return false;
+    }
+  }
+  return true;
+}
+
+} // end namespace internal
+} // end namespace benchmark
diff --git a/src/counter.h b/src/counter.h
new file mode 100644
index 0000000..d0c70a7
--- /dev/null
+++ b/src/counter.h
@@ -0,0 +1,26 @@
+// Copyright 2015 Google Inc. All rights reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "benchmark/benchmark_api.h"
+
+namespace benchmark {
+
+// these counter-related functions are hidden to reduce API surface.
+namespace internal {
+void Finish(BenchmarkCounters *l, double time, double num_threads);
+void Increment(BenchmarkCounters *l, BenchmarkCounters const& r);
+bool SameNames(BenchmarkCounters const& l, BenchmarkCounters const& r);
+} // end namespace internal
+
+} //end namespace benchmark
diff --git a/src/csv_reporter.cc b/src/csv_reporter.cc
index 18ab3b6..6779815 100644
--- a/src/csv_reporter.cc
+++ b/src/csv_reporter.cc
@@ -24,6 +24,7 @@
 
 #include "string_util.h"
 #include "timers.h"
+#include "check.h"
 
 // File format reference: http://edoceo.com/utilitas/csv-file-format.
 
@@ -38,21 +39,51 @@
 
 bool CSVReporter::ReportContext(const Context& context) {
   PrintBasicContext(&GetErrorStream(), context);
-
-  std::ostream& Out = GetOutputStream();
-  for (auto B = elements.begin(); B != elements.end();) {
-    Out << *B++;
-    if (B != elements.end()) Out << ",";
-  }
-  Out << "\n";
   return true;
 }
 
-void CSVReporter::ReportRuns(const std::vector<Run>& reports) {
-  for (const auto& run : reports) PrintRunData(run);
+void CSVReporter::ReportRuns(const std::vector<Run> & reports) {
+  std::ostream& Out = GetOutputStream();
+
+  if (!printed_header_) {
+    // save the names of all the user counters
+    for (const auto& run : reports) {
+      for (const auto& cnt : run.counters) {
+        user_counter_names_.insert(cnt.first);
+      }
+    }
+
+    // print the header
+    for (auto B = elements.begin(); B != elements.end();) {
+      Out << *B++;
+      if (B != elements.end()) Out << ",";
+    }
+    for (auto B = user_counter_names_.begin(); B != user_counter_names_.end();) {
+      Out << ",\"" << *B++ << "\"";
+    }
+    Out << "\n";
+
+    printed_header_ = true;
+  } else {
+    // check that all the current counters are saved in the name set
+    for (const auto& run : reports) {
+      for (const auto& cnt : run.counters) {
+        CHECK(user_counter_names_.find(cnt.first) != user_counter_names_.end())
+              << "All counters must be present in each run. "
+              << "Counter named \"" << cnt.first
+              << "\" was not in a run after being added to the header";
+      }
+    }
+  }
+
+  // print results for each run
+  for (const auto& run : reports) {
+    PrintRunData(run);
+  }
+
 }
 
-void CSVReporter::PrintRunData(const Run& run) {
+void CSVReporter::PrintRunData(const Run & run) {
   std::ostream& Out = GetOutputStream();
 
   // Field with embedded double-quote characters must be doubled and the field
@@ -102,6 +133,13 @@
     Out << "\"" << label << "\"";
   }
   Out << ",,";  // for error_occurred and error_message
+
+  // Print user counters
+  for (const auto &ucn : user_counter_names_) {
+    auto it = run.counters.find(ucn);
+    CHECK(it != run.counters.end());
+    Out << "," << it->second;
+  }
   Out << '\n';
 }
 
diff --git a/src/json_reporter.cc b/src/json_reporter.cc
index cea5f9b..5a65308 100644
--- a/src/json_reporter.cc
+++ b/src/json_reporter.cc
@@ -154,10 +154,15 @@
         << indent
         << FormatKV("items_per_second", RoundDouble(run.items_per_second));
   }
+  for(auto &c : run.counters) {
+    out << ",\n"
+        << indent
+        << FormatKV(c.first, RoundDouble(c.second));
+  }
   if (!run.report_label.empty()) {
     out << ",\n" << indent << FormatKV("label", run.report_label);
   }
   out << '\n';
 }
 
-}  // end namespace benchmark
+} // end namespace benchmark
diff --git a/test/benchmark_test.cc b/test/benchmark_test.cc
index d832f81..dfcf092 100644
--- a/test/benchmark_test.cc
+++ b/test/benchmark_test.cc
@@ -209,11 +209,27 @@
                   std::pair<int, double>(42, 3.8));
 
 void BM_non_template_args(benchmark::State& state, int, double) {
-  while (state.KeepRunning()) {
-  }
+  while(state.KeepRunning()) {}
 }
 BENCHMARK_CAPTURE(BM_non_template_args, basic_test, 0, 0);
 
+static void BM_UserCounter(benchmark::State& state) {
+  static const int depth = 1024;
+  while (state.KeepRunning()) {
+    benchmark::DoNotOptimize(CalculatePi(depth));
+  }
+  state.counters["Foo"] = 1;
+  state.counters["Bar"] = 2;
+  state.counters["Baz"] = 3;
+  state.counters["Bat"] = 5;
+#ifdef BENCHMARK_HAS_CXX11
+  state.counters.insert({{"Foo", 2}, {"Bar", 3}, {"Baz", 5}, {"Bat", 6}});
+#endif
+}
+BENCHMARK(BM_UserCounter)->Threads(8);
+BENCHMARK(BM_UserCounter)->ThreadRange(1, 32);
+BENCHMARK(BM_UserCounter)->ThreadPerCpu();
+
 #endif  // __cplusplus >= 201103L
 
 static void BM_DenseThreadRanges(benchmark::State& st) {
diff --git a/test/cxx03_test.cc b/test/cxx03_test.cc
index 4f3d0fb..a79d964 100644
--- a/test/cxx03_test.cc
+++ b/test/cxx03_test.cc
@@ -39,4 +39,10 @@
 BENCHMARK_TEMPLATE(BM_template1, long);
 BENCHMARK_TEMPLATE1(BM_template1, int);
 
+void BM_counters(benchmark::State& state) {
+    BM_empty(state);
+    state.counters["Foo"] = 2;
+}
+BENCHMARK(BM_counters);
+
 BENCHMARK_MAIN()
diff --git a/test/reporter_output_test.cc b/test/reporter_output_test.cc
index 2e6d2b2..cb52aec 100644
--- a/test/reporter_output_test.cc
+++ b/test/reporter_output_test.cc
@@ -9,8 +9,10 @@
 // ---------------------- Testing Prologue Output -------------------------- //
 // ========================================================================= //
 
-ADD_CASES(TC_ConsoleOut, {{"^Benchmark %s Time %s CPU %s Iterations$", MR_Next},
-                          {"^[-]+$", MR_Next}});
+ADD_CASES(TC_ConsoleOut,
+          {{"^[-]+$", MR_Next},
+           {"^Benchmark %s Time %s CPU %s Iterations$", MR_Next},
+           {"^[-]+$", MR_Next}});
 ADD_CASES(TC_CSVOut,
           {{"name,iterations,real_time,cpu_time,time_unit,bytes_per_second,"
             "items_per_second,label,error_occurred,error_message"}});