TY - GEN
T1 - Towards a two-phase unsupervised system for cybersecurity concepts extraction
AU - Xiao, Zhifeng
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2018/6/21
Y1 - 2018/6/21
N2 - This paper explores a novel named entity recognition approach to locate and classify cybersecurity concepts from unstructured texts. The proposed system follows a two-phase procedure. Phase one aims for named entity location, which can be achieved by any existing NER system that is well trained on generic and annotated English articles. The output of phase one is a processed text in which named entities are located and well marked. In phase two, we prepare two core components that define the domain knowledge. One is a word2vec-based domain model trained on a large corpus of cybersecurity articles, and the other is a domain ontology that comprises hierarchical concept classes as well as instances. With these two components and the output from phase one, we propose a voting-based model to classify the marked named entities into fine-grained classes. The proposed NER system is unsupervised, meaning that no annotated training corpus is needed. As such, it can be easily customized to suit a variety of domains in addition to cybersecurity. Our evaluation shows promising results which indicate the potential of the proposed system.
AB - This paper explores a novel named entity recognition approach to locate and classify cybersecurity concepts from unstructured texts. The proposed system follows a two-phase procedure. Phase one aims for named entity location, which can be achieved by any existing NER system that is well trained on generic and annotated English articles. The output of phase one is a processed text in which named entities are located and well marked. In phase two, we prepare two core components that define the domain knowledge. One is a word2vec-based domain model trained on a large corpus of cybersecurity articles, and the other is a domain ontology that comprises hierarchical concept classes as well as instances. With these two components and the output from phase one, we propose a voting-based model to classify the marked named entities into fine-grained classes. The proposed NER system is unsupervised, meaning that no annotated training corpus is needed. As such, it can be easily customized to suit a variety of domains in addition to cybersecurity. Our evaluation shows promising results which indicate the potential of the proposed system.
UR - http://www.scopus.com/inward/record.url?scp=85050246096&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050246096&partnerID=8YFLogxK
U2 - 10.1109/FSKD.2017.8393106
DO - 10.1109/FSKD.2017.8393106
M3 - Conference contribution
AN - SCOPUS:85050246096
T3 - ICNC-FSKD 2017 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery
SP - 2161
EP - 2168
BT - ICNC-FSKD 2017 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery
A2 - Zhao, Liang
A2 - Wang, Lipo
A2 - Cai, Guoyong
A2 - Li, Kenli
A2 - Liu, Yong
A2 - Xiao, Guoqing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, ICNC-FSKD 2017
Y2 - 29 July 2017 through 31 July 2017
ER -