Abstract
In recent years, many algorithms have been proposed to extract process models from process execution logs. The process models describe the ordering relationships between tasks in a process in terms of standard constructs like sequence, parallel, choice, and loop. Most algorithms assume that each trace in a log represents a correct execution sequence based on a model. In practice, logs are often noisy, and algorithms designed for correct logs are not able to handle noisy logs. In this paper we share our key insights from a study of noise in process logs both real and synthetic. We found that all process logs can be explained by a block-structured model with two special self-loop and optional structures, making it trivial to build a fully accurate process model for any given log, even one with inaccurate data or noise present in it. However, such a model suffers from low quality. By controlling the use of self-loop and optional structures of tasks and blocks of tasks, we can balance the quality and accuracy trade-off to derive high-quality process models that explain a given percentage of traces in the log. Finally, new quality metrics and a novel quality-based algorithm for model extraction from noisy logs are described. The results of the experiments with the algorithm on real and synthetic data are reported and analyzed at length.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 311-327 |
| Number of pages | 17 |
| Journal | INFORMS Journal on Computing |
| Volume | 24 |
| Issue number | 2 |
| DOIs | |
| State | Published - Mar 2012 |
All Science Journal Classification (ASJC) codes
- Software
- Information Systems
- Computer Science Applications
- Management Science and Operations Research