Towards a two-phase unsupervised system for cybersecurity concepts extraction

Zhifeng Xiao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

This paper explores a novel named entity recognition approach to locate and classify cybersecurity concepts from unstructured texts. The proposed system follows a two-phase procedure. Phase one aims for named entity location, which can be achieved by any existing NER system that is well trained on generic and annotated English articles. The output of phase one is a processed text in which named entities are located and well marked. In phase two, we prepare two core components that define the domain knowledge. One is a word2vec-based domain model trained on a large corpus of cybersecurity articles, and the other is a domain ontology that comprises hierarchical concept classes as well as instances. With these two components and the output from phase one, we propose a voting-based model to classify the marked named entities into fine-grained classes. The proposed NER system is unsupervised, meaning that no annotated training corpus is needed. As such, it can be easily customized to suit a variety of domains in addition to cybersecurity. Our evaluation shows promising results which indicate the potential of the proposed system.

Original languageEnglish (US)
Title of host publicationICNC-FSKD 2017 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery
EditorsLiang Zhao, Lipo Wang, Guoyong Cai, Kenli Li, Yong Liu, Guoqing Xiao
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2161-2168
Number of pages8
ISBN (Electronic)9781538621653
DOIs
StatePublished - Jun 21 2018
Event13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, ICNC-FSKD 2017 - Guilin, Guangxi, China
Duration: Jul 29 2017Jul 31 2017

Publication series

NameICNC-FSKD 2017 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery

Other

Other13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, ICNC-FSKD 2017
Country/TerritoryChina
CityGuilin, Guangxi
Period7/29/177/31/17

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Logic
  • Modeling and Simulation
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Towards a two-phase unsupervised system for cybersecurity concepts extraction'. Together they form a unique fingerprint.

Cite this