TY - GEN
T1 - Allan Variance-based Granulation Technique for Large Temporal Databases
AU - Sinanaj, Lorina
AU - Haeri, Hossein
AU - Gao, Liming
AU - Maddipatla, Satya Prasad
AU - Chen, Cindy
AU - Jerath, Kshitij
AU - Beal, Craig
AU - Brennan, Sean
N1 - Publisher Copyright:
Copyright © 2021 by SCITEPRESS – Science and Technology Publications, Lda.All rights reserved.
PY - 2021
Y1 - 2021
N2 - In the era of Big Data, conducting complex data analysis tasks efficiently, becomes increasingly important and challenging due to large amounts of data available. In order to decrease query response time with limited main memory and storage space, data reduction techniques that preserve data quality are needed. Existing data reduction techniques, however, are often computationally expensive and rely on heuristics for deciding how to split or reduce the original dataset. In this paper, we propose an effective granular data reduction technique for temporal databases, based on Allan Variance (AVAR). AVAR is used to systematically determine the temporal window length over which data remains relevant. The entire dataset to be reduced is then separated into granules with size equal to the AVAR-determined window length. Data reduction is achieved by generating aggregated information for each such granule. The proposed method is tested using a large database that contains temporal information for vehicular data. Then comparison experiments are conducted and the outstanding runtime performance is illustrated by comparing with three clustering-based data reduction methods. The performance results demonstrate that the proposed Allan Variance-based technique can efficiently generate reduced representation of the original data without losing data quality, while significantly reducing computation time.
AB - In the era of Big Data, conducting complex data analysis tasks efficiently, becomes increasingly important and challenging due to large amounts of data available. In order to decrease query response time with limited main memory and storage space, data reduction techniques that preserve data quality are needed. Existing data reduction techniques, however, are often computationally expensive and rely on heuristics for deciding how to split or reduce the original dataset. In this paper, we propose an effective granular data reduction technique for temporal databases, based on Allan Variance (AVAR). AVAR is used to systematically determine the temporal window length over which data remains relevant. The entire dataset to be reduced is then separated into granules with size equal to the AVAR-determined window length. Data reduction is achieved by generating aggregated information for each such granule. The proposed method is tested using a large database that contains temporal information for vehicular data. Then comparison experiments are conducted and the outstanding runtime performance is illustrated by comparing with three clustering-based data reduction methods. The performance results demonstrate that the proposed Allan Variance-based technique can efficiently generate reduced representation of the original data without losing data quality, while significantly reducing computation time.
UR - http://www.scopus.com/inward/record.url?scp=85139945637&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139945637&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85139945637
T3 - International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings
SP - 17
EP - 28
BT - 13th International Conference on Knowledge Management and Information Systems, KMIS 2021 as part of IC3K 2021 - Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
A2 - Bernardino, Jorge
A2 - Masciari, Elio
A2 - Rolland, Colette
A2 - Filipe, Joaquim
PB - Science and Technology Publications, Lda
T2 - 13th International Conference on Knowledge Management and Information Systems, KMIS 2021 as part of 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2021
Y2 - 25 October 2022 through 27 October 2022
ER -