TraceProcessor: optimize std::sort<TimestampedTracePiece> TraceSorter::Sort() is a very hot path in trace processor. That does a std::sort on a CircularQueue<TTP>. std::sort uses std::swap, which falls back on 2 std::moves when not implemented. TTP happens to be precisely 512-bits wide. This allows to implement a very efficient swap which leverages XMM registers on x86_64. This saves 7-10% of trace ingestion time on a large trace. Bug: 205302474 Change-Id: Ie57fc26e79599b2e27945dc5cef176c1d03d95cc
Perfetto is a production-grade open-source stack for performance instrumentation and trace analysis. It offers services and libraries and for recording system-level and app-level traces, native + java heap profiling, a library for analyzing traces using SQL and a web-based UI to visualize and explore multi-GB traces.
See https://perfetto.dev/docs or the /docs/ directory for documentation.