Tprof: Performance profiling via structural aggregation and automated analysis of distributed systems traces

Lexiang Huang, Timothy Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Scopus citations

Abstract

The traditional approach for performance debugging relies upon performance profilers (e.g., gprof, VTune) that provide average function runtime information. These aggregate statistics help identify slow regions affecting the entire workload, but they are ill-suited for identifying slow regions that only impact a fraction of the workload, such as tail latency effects. This paper takes a new approach to performance profiling by utilizing distributed tracing systems (e.g., Dapper, Zipkin, Jaeger). Since traces provide detailed timing information on a per-request basis, it is possible to group and aggregate tracing data in many different ways to identify the slow parts of the system. Our new approach to trace aggregation uses the structure embedded within traces to hierarchically group similar traces and calculate increasingly detailed aggregate statistics based on how the traces are grouped. We also develop an automated tool for analyzing the hierarchy of statistics to identify the most likely performance issues. Our case study across two complex distributed systems illustrates how our tool is able to find multiple performance issues that lead to 10x and 28x performance improvements in terms of average and tail latency, respectively. Our comparison with a state-of-the-art industry tool shows that our tool can pinpoint performance slowdowns more accurately than current approaches.

Original languageEnglish (US)
Title of host publicationSoCC 2021 - Proceedings of the 2021 ACM Symposium on Cloud Computing
PublisherAssociation for Computing Machinery, Inc
Pages76-91
Number of pages16
ISBN (Electronic)9781450386388
DOIs
StatePublished - Nov 1 2021
Event12th Annual ACM Symposium on Cloud Computing, SoCC 2021 - Virtual, Online, United States
Duration: Nov 1 2021Nov 4 2021

Publication series

NameSoCC 2021 - Proceedings of the 2021 ACM Symposium on Cloud Computing

Conference

Conference12th Annual ACM Symposium on Cloud Computing, SoCC 2021
Country/TerritoryUnited States
CityVirtual, Online
Period11/1/2111/4/21

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Tprof: Performance profiling via structural aggregation and automated analysis of distributed systems traces'. Together they form a unique fingerprint.

Cite this