TY - GEN
T1 - Searching correlated objects in a long sequence
AU - Lee, Ken C.K.
AU - Lee, Wang Chien
AU - Peuquet, Donna
AU - Zheng, Baihua
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2008
Y1 - 2008
N2 - Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation query that finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA) and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.
AB - Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation query that finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA) and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.
UR - http://www.scopus.com/inward/record.url?scp=49049102115&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49049102115&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-69497-7_28
DO - 10.1007/978-3-540-69497-7_28
M3 - Conference contribution
AN - SCOPUS:49049102115
SN - 3540694765
SN - 9783540694762
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 436
EP - 454
BT - Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings
T2 - 20th International Conference on Scientific and Statistical Database Management, SSDBM 2008
Y2 - 9 July 2008 through 11 July 2008
ER -