TY - JOUR
T1 - Techniques for classifying executions of deployed software to support software engineering tasks
AU - Haran, Murali
AU - Karr, Alan
AU - Last, Michael
AU - Orso, Alessandro
AU - Porter, Adam A.
AU - Sanil, Ashish
AU - Fouche, Sandro
N1 - Funding Information:
This work was supported in part by US National Science Foundation awards CCF-0205118 to the US National Institute of Statistical Sciences (NISS), CCR-0098158 and CCR-0205265 to the University of Maryland, and CCR-0205422, CCR-0306372, and CCF-0541080 to the Georgia Institute of Technology. The authors used the R statistical computing software and the randomForest library, available at http://cran.r-project.org/, to perform all statistical analyses. Jim Jones prepared and provided the 19 single-fault program versions. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the US National Science Foundation.
PY - 2007/5
Y1 - 2007/5
N2 - There is an increasing interest in techniques that support analysis and measurement of fielded software systems. These techniques typically deploy numerous instrumented instances of a software system, collect execution data when the instances run in the field, and analyze the remotely collected data to better understand the system's in-the-field behavior. One common need for these techniques is the ability to distinguish execution outcomes (e.g., to collect only data corresponding to some behavior or to determine how often and under which condition a specific behavior occurs). Most current approaches, however, do not perform any kind of classification of remote executions and either focus on easily observable behaviors (e.g., crashes) or assume that outcomes' classifications are externally provided (e.g., by the users). To address the limitations of existing approaches, we have developed three techniques for automatically classifying execution data as belonging to one of several classes. In this paper, we introduce our techniques and apply them to the binary classification of passing and failing behaviors. Our three techniques impose different overheads on program instances and, thus, each is appropriate for different application scenarios. We performed several empirical studies to evaluate and refine our techniques and to investigate the trade-offs among them. Our results show that 1) the first technique can build very accurate models, but requires a complete set of execution data; 2) the second technique produces slightly less accurate models, but needs only a small fraction of the total execution data; and 3) the third technique allows for even further cost reductions by building the models incrementally, but requires some sequential ordering of the software instances' instrumentation.
AB - There is an increasing interest in techniques that support analysis and measurement of fielded software systems. These techniques typically deploy numerous instrumented instances of a software system, collect execution data when the instances run in the field, and analyze the remotely collected data to better understand the system's in-the-field behavior. One common need for these techniques is the ability to distinguish execution outcomes (e.g., to collect only data corresponding to some behavior or to determine how often and under which condition a specific behavior occurs). Most current approaches, however, do not perform any kind of classification of remote executions and either focus on easily observable behaviors (e.g., crashes) or assume that outcomes' classifications are externally provided (e.g., by the users). To address the limitations of existing approaches, we have developed three techniques for automatically classifying execution data as belonging to one of several classes. In this paper, we introduce our techniques and apply them to the binary classification of passing and failing behaviors. Our three techniques impose different overheads on program instances and, thus, each is appropriate for different application scenarios. We performed several empirical studies to evaluate and refine our techniques and to investigate the trade-offs among them. Our results show that 1) the first technique can build very accurate models, but requires a complete set of execution data; 2) the second technique produces slightly less accurate models, but needs only a small fraction of the total execution data; and 3) the third technique allows for even further cost reductions by building the models incrementally, but requires some sequential ordering of the software instances' instrumentation.
UR - http://www.scopus.com/inward/record.url?scp=34247624499&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34247624499&partnerID=8YFLogxK
U2 - 10.1109/TSE.2007.1004
DO - 10.1109/TSE.2007.1004
M3 - Article
AN - SCOPUS:34247624499
SN - 0098-5589
VL - 33
SP - 287
EP - 304
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 5
ER -