TY - GEN
T1 - Reconciling malware labeling discrepancy via consensus learning
AU - Wang, Ting
AU - Hu, Xin
AU - Meng, Shicong
AU - Sailer, Reiner
PY - 2014
Y1 - 2014
N2 - Anti-virus systems developed by different vendors often demonstrate strong discrepancy in the labels they assign to given malware, which significantly hinders threat intelligence sharing. The key challenge of addressing this discrepancy stems from the difficulty of re-standardizing already-in-use systems. In this paper we explore a non-intrusive alternative. We propose to leverage the correlation between the malware labels of different anti-virus systems to create a 'consensus' classification system, through which different systems can share information without modifying their own labeling conventions. To this end, we present a novel classification integration framework Latin which exploits the correspondence between participating anti-virus systems as reflected in heterogeneous information at instance-instance, instance-class, and class-class levels. We provide results from extensive experimental studies using real datasets and concrete use cases to verify the efficacy of Latin in reconciling the malware labeling discrepancy.
AB - Anti-virus systems developed by different vendors often demonstrate strong discrepancy in the labels they assign to given malware, which significantly hinders threat intelligence sharing. The key challenge of addressing this discrepancy stems from the difficulty of re-standardizing already-in-use systems. In this paper we explore a non-intrusive alternative. We propose to leverage the correlation between the malware labels of different anti-virus systems to create a 'consensus' classification system, through which different systems can share information without modifying their own labeling conventions. To this end, we present a novel classification integration framework Latin which exploits the correspondence between participating anti-virus systems as reflected in heterogeneous information at instance-instance, instance-class, and class-class levels. We provide results from extensive experimental studies using real datasets and concrete use cases to verify the efficacy of Latin in reconciling the malware labeling discrepancy.
UR - https://www.scopus.com/pages/publications/84901772299
UR - https://www.scopus.com/inward/citedby.url?scp=84901772299&partnerID=8YFLogxK
U2 - 10.1109/ICDEW.2014.6818308
DO - 10.1109/ICDEW.2014.6818308
M3 - Conference contribution
AN - SCOPUS:84901772299
SN - 9781479934805
T3 - Proceedings - International Conference on Data Engineering
SP - 84
EP - 89
BT - 2014 IEEE 30th International Conference on Data Engineering Workshops, ICDEW 2014
PB - IEEE Computer Society
T2 - 2014 IEEE 30th International Conference on Data Engineering Workshops, ICDEW 2014
Y2 - 31 March 2014 through 4 April 2014
ER -