TY - JOUR
T1 - Process mining on noisy logs - Can log sanitization help to improve performance?
AU - Cheng, Hsin Jung
AU - Kumar, Akhil
N1 - Funding Information:
Hsin-Jung Cheng was a visiting researcher at Penn State from Taiwan when this work was done. She was partly supported by a grant from HP . Wen Yao wrote the code for the noise generation program. We also thank the anonymous reviewers for their constructive and helpful comments.
Publisher Copyright:
© 2015 Elsevier B.V. All rights reserved.
PY - 2015/11/12
Y1 - 2015/11/12
N2 - Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.
AB - Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.
UR - http://www.scopus.com/inward/record.url?scp=84941280475&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84941280475&partnerID=8YFLogxK
U2 - 10.1016/j.dss.2015.08.003
DO - 10.1016/j.dss.2015.08.003
M3 - Article
AN - SCOPUS:84941280475
SN - 0167-9236
VL - 79
SP - 138
EP - 149
JO - Decision Support Systems
JF - Decision Support Systems
ER -