TY - GEN
T1 - Towards the detection of inconsistencies in public security vulnerability reports
AU - Dong, Ying
AU - Guo, Wenbo
AU - Chen, Yueqi
AU - Xing, Xinyu
AU - Zhang, Yuqing
AU - Wang, Gang
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Public vulnerability databases such as the Common Vulnerabilities and Exposures (CVE) and the National Vulnerability Database (NVD) have achieved great success in promoting vulnerability disclosure and mitigation. While these databases have accumulated massive data, there is a growing concern for their information quality and consistency. In this paper, we propose an automated system VIEM to detect inconsistent information between the fully standardized NVD database and the unstructured CVE descriptions and their referenced vulnerability reports. VIEM allows us, for the first time, to quantify the information consistency at a massive scale, and provides the needed tool for the community to keep the CVE/NVD databases up-to-date. VIEM is developed to extract vulnerable software names and vulnerable versions from unstructured text. We introduce customized designs to deep-learning-based named entity recognition (NER) and relation extraction (RE) so that VIEM can recognize previous unseen software names and versions based on sentence structure and contexts. Ground-truth evaluation shows the system is highly accurate (0.941 precision and 0.993 recall). Using VIEM, we examine the information consistency using a large dataset of 78,296 CVE IDs and 70,569 vulnerability reports in the past 20 years. Our result suggests that inconsistent vulnerable software versions are highly prevalent. Only 59.82% of the vulnerability reports/CVE summaries strictly match the standardized NVD entries, and the inconsistency level increases over time. Case studies confirm the erroneous information of NVD that either overclaims or underclaims the vulnerable software versions.
AB - Public vulnerability databases such as the Common Vulnerabilities and Exposures (CVE) and the National Vulnerability Database (NVD) have achieved great success in promoting vulnerability disclosure and mitigation. While these databases have accumulated massive data, there is a growing concern for their information quality and consistency. In this paper, we propose an automated system VIEM to detect inconsistent information between the fully standardized NVD database and the unstructured CVE descriptions and their referenced vulnerability reports. VIEM allows us, for the first time, to quantify the information consistency at a massive scale, and provides the needed tool for the community to keep the CVE/NVD databases up-to-date. VIEM is developed to extract vulnerable software names and vulnerable versions from unstructured text. We introduce customized designs to deep-learning-based named entity recognition (NER) and relation extraction (RE) so that VIEM can recognize previous unseen software names and versions based on sentence structure and contexts. Ground-truth evaluation shows the system is highly accurate (0.941 precision and 0.993 recall). Using VIEM, we examine the information consistency using a large dataset of 78,296 CVE IDs and 70,569 vulnerability reports in the past 20 years. Our result suggests that inconsistent vulnerable software versions are highly prevalent. Only 59.82% of the vulnerability reports/CVE summaries strictly match the standardized NVD entries, and the inconsistency level increases over time. Case studies confirm the erroneous information of NVD that either overclaims or underclaims the vulnerable software versions.
UR - http://www.scopus.com/inward/record.url?scp=85076336215&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076336215&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Proceedings of the 28th USENIX Security Symposium
SP - 869
EP - 885
BT - Proceedings of the 28th USENIX Security Symposium
PB - USENIX Association
T2 - 28th USENIX Security Symposium
Y2 - 14 August 2019 through 16 August 2019
ER -