CAREER: Auto-generated experimentation for performance diagnosis of distributed systems

Project: Research project

Project Details

Description

Debugging, the process of finding and fixing problems in systems, is one of the most critical and time-consuming activities for computer scientists. This research focuses on performance debugging, one of the most challenging forms of debugging. Performance debugging is difficult because slowdowns typically do not break functionality in easily identifiable locations. Diagnosing where the slowdowns are, and their causes, requires gathering and analyzing detailed performance measurements. This is particularly challenging for slowdowns that only appear sporadically or only affect a fraction of the workload. Coupled with the fact that many large and small companies build distributed systems composed of hundreds to thousands of services/components, it is no surprise that companies often need to hire teams of specialized performance engineers to track down the main performance issues. The goal of this research is to develop new tools and methodologies for automatically diagnosing performance issues within distributed systems. Rather than identifying faulty or misbehaving components, this research tackles the harder problem of identifying fundamental inefficiencies within the design and implementation of a system. The research will pioneer a novel diagnosis approach that auto-generates experiments to validate or refute performance hypotheses. Experiments generated based on these hypotheses will be used to progressively narrow down the problem scope and identify the root cause(s) of slowdowns. The resulting tools will provide engineers insights into where and what to investigate so that their efforts will be focused on fixing problems rather than diagnosing them.The direct benefit of this research is in developing new automated performance diagnosis methodologies and open-sourced tools for assisting both general software developers and specialized performance engineers in finding sources of slowdowns in their systems. This saves costly engineering time and could help engineers build more cost- and energy-efficient systems. By integrating code analysis and performance modeling principles into the automated tool, the ideas from this research are more easily accessible to a broader base of engineers that might not otherwise have this specialized knowledge. To have a lasting effect on debugging methodologies and practices, this project also includes a significant education component that aims to transform debugging education in undergraduate curricula through (i) developing a new debugging course, where concepts from this research will be integrated as a course module; and (ii) creating a teaching assistant module for training teaching assistants on how to teach debugging.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date5/1/234/30/28

Funding

  • National Science Foundation: $599,389.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.