TY - GEN
T1 - LP-explain
T2 - 20th IEEE International Conference on Data Mining, ICDM 2020
AU - Liu, Haoyu
AU - Ma, Fenglong
AU - Wang, Yaqing
AU - He, Shibo
AU - Chen, Jiming
AU - Gao, Jing
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - Outlier detection is of vital importance for various fields and applications. Existing works mainly focus on identifying outliers from underlying datasets, while how to provide sense-making explanations is largely ignored. In this paper, we propose to visualize data points in a set of scatter plots on two-dimensional (2-D) feature spaces that can provide meaningful explanations about the outlying behavior of outliers. Data are typically multidimensional and the number of 2-D combinations could be huge. Also, outliers may have diverse characteristics, and thus the global scatter plots containing all of outliers may degrade the explanation effectiveness for those outliers having idiosyncratic abnormal 2-D spaces. To address this problem, we propose a new outlier explanation approach, called LP-Explain, which tries to identify the set of best Local Pictorial explanations (defined as the scatter plots in the 2-D space of the feature pairs) that can Explain the behavior for cluster of outliers. We first define an effective measure to quantify the similarity between outliers, and then cluster outliers into different groups based on their abnormal feature pairs. We then propose to weigh the importance of feature pairs within each cluster through a multi-task learning framework to select the set of top feature pairs that best explain various outlier clusters. By adjusting a user-defined parameter indicating the 'localization level', the proposed method can attain both global and local results for the explanation of the outliers. 2-D visual explanations can be plotted for the top-weighted feature pairs of each cluster. We conduct experiments on various public datasets, which show that the proposed approach can provide more meaningful explanations about the outlying behavior in a dataset.
AB - Outlier detection is of vital importance for various fields and applications. Existing works mainly focus on identifying outliers from underlying datasets, while how to provide sense-making explanations is largely ignored. In this paper, we propose to visualize data points in a set of scatter plots on two-dimensional (2-D) feature spaces that can provide meaningful explanations about the outlying behavior of outliers. Data are typically multidimensional and the number of 2-D combinations could be huge. Also, outliers may have diverse characteristics, and thus the global scatter plots containing all of outliers may degrade the explanation effectiveness for those outliers having idiosyncratic abnormal 2-D spaces. To address this problem, we propose a new outlier explanation approach, called LP-Explain, which tries to identify the set of best Local Pictorial explanations (defined as the scatter plots in the 2-D space of the feature pairs) that can Explain the behavior for cluster of outliers. We first define an effective measure to quantify the similarity between outliers, and then cluster outliers into different groups based on their abnormal feature pairs. We then propose to weigh the importance of feature pairs within each cluster through a multi-task learning framework to select the set of top feature pairs that best explain various outlier clusters. By adjusting a user-defined parameter indicating the 'localization level', the proposed method can attain both global and local results for the explanation of the outliers. 2-D visual explanations can be plotted for the top-weighted feature pairs of each cluster. We conduct experiments on various public datasets, which show that the proposed approach can provide more meaningful explanations about the outlying behavior in a dataset.
UR - http://www.scopus.com/inward/record.url?scp=85100886292&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100886292&partnerID=8YFLogxK
U2 - 10.1109/ICDM50108.2020.00046
DO - 10.1109/ICDM50108.2020.00046
M3 - Conference contribution
AN - SCOPUS:85100886292
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 372
EP - 381
BT - Proceedings - 20th IEEE International Conference on Data Mining, ICDM 2020
A2 - Plant, Claudia
A2 - Wang, Haixun
A2 - Cuzzocrea, Alfredo
A2 - Zaniolo, Carlo
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 November 2020 through 20 November 2020
ER -