Journals & Magazines >IEEE Transactions on Dependab... >Volume: 19 Issue: 6

Cleaning the NVD: Comprehensive Quality Assessment, Improvements, and Analyses
清洗 NVD：全面质量评估、改进与分析

Abstract:

Vulnerability databases are vital sources of information on emergent software security concerns. Security professionals, from system administrators to developers to researchers, heavily depend on these databases to track vulnerabilities and analyze security trends. However, are these databases reliable and accurate? In this article, we explore this question with the National Vulnerability Database (NVD), the U.S. government's repository of vulnerability information that arguably serves as the industry standard. Through a systematic investigation, we uncover inconsistent or incomplete data in the NVD that can impact its practical uses, affecting information such as the vulnerability publication dates, applications affected by the vulnerability, their severity scores, and their high level type categorization. We explore the extent of these discrepancies and identify methods for their automated corrections. Finally, we demonstrate the impact that these data issues can pose by comparing analyses using the original and our rectified versions of the NVD. Ultimately, our investigation of the NVD not only produces an improved source of vulnerability information, but also provides important insights and guidance for the security community on the curation and use of such data sources.

Published in: IEEE Transactions on Dependable and Secure Computing ( Volume: 19, Issue: 6, 01 Nov.-Dec. 2022)

Page(s): 4255 - 4269

Date of Publication: 04 November 2021

ISSN Information:

DOI: 10.1109/TDSC.2021.3125270

Funding Agency:

SECTION 1 第一章

Introduction 引言

Securing computer systems in practice entails identifying, understanding, and remediating the stream of software security concerns that are continuously uncovered. To effectively do so, security professionals and researchers depend on various sources of information to acquaint themselves of the new security issues. One vital source is vulnerability databases, which operate as a repository of vulnerability information. However, is the information in these vulnerability databases reliable?
确保计算机系统安全在实践中涉及识别、理解和修复不断发现的软件安全问题的流程。为了有效地做到这一点，安全专家和研究人员依赖于各种信息来源来了解新的安全问题。一个重要的来源是漏洞数据库，它作为一个漏洞信息的存储库。然而，这些漏洞数据库中的信息是否可靠呢？

In this work, we explore this question by identifying limitations of existing vulnerability datasets and their implications on the real-world security operation. While several vulnerability databases exist, we focus on the one that is (arguably) the most widely used: the National Vulnerability Database (NVD). The NVD, maintained by the US government, strives to accurately document all publicly known vulnerabilities, and effectively serves as the industry's standard. Both commercial security services (e.g., Hakiri [1], Snyk [2], and SourceClear [3]), and open-source security tools (e.g., Bundler-audit [4], OWASP OSSIndex [5], and Dependency-check [6]) depend on the NVD's vulnerability information to function effectively. Furthermore, researchers [7], [8], [9] have used the NVD as a core data source to shed light on aspects of the vulnerability discovery and remediation process. Given the importance of the NVD, it is crucial that we understand the quality of its data, lest some incorrect information leads to a critical security lapse [10].
在这项工作中，我们通过识别现有漏洞数据集的局限性及其对现实世界安全运营的影响来探讨这个问题。虽然存在多个漏洞数据库，但我们专注于其中（有争议地）最广泛使用的一个：国家漏洞数据库（NVD）。由国家政府维护的 NVD 致力于准确记录所有公开已知的漏洞，并有效地成为行业的标准。商业安全服务（例如，Hakiri [1] ，Snyk [2] ，和 SourceClear [3] ）以及开源安全工具（例如，Bundler-audit [4] ，OWASP OSSIndex [5] ，和 Dependency-check [6] ）都依赖于 NVD 的漏洞信息来有效运行。此外，研究人员 [7] ， [8] ， [9] 已将 NVD 作为核心数据源，以揭示漏洞发现和修复过程中的各个方面。鉴于 NVD 的重要性，了解其数据质量至关重要，以免某些错误信息导致关键安全漏洞 [10] 。

The prior work [9], [11], [12], [13] have investigated certain types of data quality concerns in the NVD. However, to the best of our knowledge, there has not been a systematic and comprehensive analysis of inconsistencies and incompleteness of the data in the NVD, to date. To close this gap, in this paper, we perform an in-depth large-scale analysis of the NVD, systematically evaluating each data field it contains. In particular, we identify significant data issues with the vulnerability publication date, vendor and products affected by the vulnerability, their severity scores, and their type. We quantify the scope of each issue within the NVD, providing an understanding of each issue's ramifications. Then, we develop accurate and automated methods of correcting the information, thus producing an improved and more reliable NVD dataset for the security community to use. We have open-sourced the tools created for correcting the NVD data quality concerns, as well as the rectified dataset itself. Finally, we perform several analysis case studies using our improved NVD. Beyond providing more reliable analysis results for core questions on vulnerability discovery, disclosure, and remediation, our case studies demonstrate how analysis conclusions and their practical implications can greatly differ due to data quality issues. Ultimately, this work will not only directly impact real-world security by an improved dataset used in practice, but highlight common pitfalls that can affect other sources of vulnerability information, providing lessons for improving them as well as their effective usages.
先前的研究文献[0]、[1]、[2]、[3]探讨了 NVD 中某些类型的数据质量问题。然而，据我们所知，迄今为止，尚未对 NVD 数据的不一致性和不完整性进行过系统而全面的分析。为了填补这一空白，在本文中，我们对 NVD 进行了深入的、大规模的分析，系统地评估了其中包含的每个数据字段。特别是，我们识别了与漏洞发布日期、受漏洞影响的供应商和产品、它们的严重性评分以及它们的类型相关的重大数据问题。我们量化了每个问题在 NVD 中的范围，提供了对每个问题影响的了解。然后，我们开发了准确且自动化的方法来纠正信息，从而为安全社区提供了一个改进且更可靠的数据集。我们已经开源了用于纠正 NVD 数据质量问题的工具，以及修正后的数据集本身。最后，我们使用改进后的 NVD 进行了几个分析案例研究。在为漏洞发现、披露和修复的核心问题提供更可靠的分析结果之外，我们的案例研究展示了由于数据质量问题，分析结论及其实际影响可能存在很大差异。最终，这项工作不仅将直接影响现实世界的安全，通过在实践中使用改进的数据集，而且将突出可能影响其他漏洞信息来源的常见陷阱，提供改进它们及其有效使用的经验教训。

Applications and Implications. We show the pitfalls of using NVD by highlighting NVD's various inconsistencies and propose methods to fix them. Overall, the study can be utilized by the NVD towards the following end goals: 1) The estimated disclosure date identification can enrich the vulnerability report for the end-user's perusal. 2) The vendor and product inconsistency finding tool can be leveraged during the vulnerability reporting process to suggest suitable vendor and product names to analysts. Moreover, the observations from our analyses and measurements can used as a best practice when adding new vendors and product names in NVD. 3) The deep learning-based CVSS v3 prediction engine can be leveraged by NVD and security analysts alike for uniform severity metric generation across the vulnerabilities in the database.
应用与影响。我们通过突出 NVD 的各种不一致性来展示使用 NVD 的陷阱，并提出修复这些不一致性的方法。总体而言，这项研究可以帮助 NVD 实现以下目标：1）估计披露日期的识别可以丰富漏洞报告，便于最终用户查阅。2）在漏洞报告过程中，可以利用供应商和产品不一致性检测工具为分析师建议合适的供应商和产品名称。此外，我们的分析和测量观察结果可以作为在 NVD 中添加新供应商和产品名称的最佳实践。3）NVD 和安全分析师都可以利用基于深度学习的 CVSS v3 预测引擎，在数据库中的漏洞生成统一的严重性指标。

Contributions. This work studies the incompleteness and inconsistencies in NVD, making the following contributions. (1) Through an extensive data-driven approach backed by web scraping, manual investigation, and machine learning-based automation, we assess the quality of NVD, identifying concerns affecting each vulnerability data field. (2) We identify methods to automatically remedy the data quality issues in NVD, providing a more reliable source of vulnerability information. (3) As case studies, we conduct several large-scale analyses of vulnerabilities, providing the most accurate findings to several basic but core questions on vulnerability discovery, disclosure, and remediation. (4) We shared the results of this work with the US National Institute of Standards and Technology, which maintains the NVD. Following that, NVD's schemas have been updated to remove the free-form vendor and product names that we identify as oft problematic [14].
贡献。本研究探讨了 NVD 的不完整性和不一致性，做出了以下贡献。（1）通过基于网络爬取、人工调查和基于机器学习的自动化的大规模数据驱动方法，我们评估了 NVD 的质量，确定了影响每个漏洞数据字段的担忧。（2）我们确定了自动修复 NVD 数据质量问题的方法，提供了更可靠的安全漏洞信息来源。（3）作为案例研究，我们进行了几个大规模的漏洞分析，为关于漏洞发现、披露和修复的几个基本但核心问题提供了最准确的结果。（4）我们将这项工作的结果与美国国家标准与技术研究院分享，该研究院维护 NVD。此后，NVD 的架构已更新，移除了我们识别为经常出现问题的自由格式供应商和产品名称 [14] 。

We note that a significant contribution this paper is the introduction of various “case studies” in the form of measurements obtained from the refined NVD. In conducting those measurements, we frame the pursuit of each analysis as a research question, and answer the research questions through an extended analysis with rationale, procedure, findings, and takeaways. While some of this work is indeed customary of various security vendors through their annual reports, such reports are oftentimes limited in scope and nature, justifying the measurement contribution in this paper. For example, our measurements expand over a number of verticals that–to the best of our knowledge–is not covered in any single report that we could find online. Moreover, our analysis is unique in providing a longitudinal study from which those analyses are obtained, providing insights into long-term trends.
本文的一个重要贡献是引入了各种“案例研究”，这些研究以从精炼的 NVD 获得的测量数据的形式呈现。在进行这些测量时，我们将每个分析的追求视为一个研究问题，并通过扩展分析、理由、程序、发现和启示来回答这些研究问题。虽然这项工作确实是一些安全供应商通过其年度报告所做，但这些报告往往在范围和性质上有限，这也正是本文测量贡献的合理性。例如，我们的测量覆盖了多个垂直领域，据我们所知，这些领域在任何我们能够找到的在线报告中都没有被涵盖。此外，我们的分析独特之处在于提供了一个纵向研究，从中可以获得分析，从而对长期趋势提供洞见。

Organization. We provide a review of the literature in Section 2, followed by an overview of the dataset in Section 3. In Section 4, we analyze, identify, and fix the inconsistencies, followed by studying the impact of their improvement. We then conduct case studies on the improved dataset in Section 5. We then discuss the implications and analysis outcomes in Section 6. We conclude our work in Section 7.
组织。我们回顾了文献 Section 2 ，随后概述了数据集 Section 3 。在 Section 4 中，我们分析、识别和修正了不一致性，接着研究了其改进的影响。然后，我们对改进后的数据集在 Section 5 中进行了案例研究。随后，我们在 Section 6 中讨论了影响和分析结果。最后，我们在 Section 7 中总结了我们的工作。

The reader of this work should be mindful of the justification for this organization: in order to provide up-to-date and faithful (accurate) measurements of the various aspects of interest from the NVD, one has to start by addressing various shortcomings of the vulnerability database, which we pursue in Section 4, followed by the analyses in Section 5, and discussion in Section 6.
本作品的读者应关注这一组织的合理性：为了提供关于 NVD 各方面感兴趣内容的最新和忠实（准确）的测量，必须首先解决漏洞数据库的诸多不足，这是我们将在 Section 4 中追求的，随后是 Section 5 中的分析，以及 Section 6 中的讨论。

SECTION 2 第二章

Related Work 相关工作

Reliability of NVD. Quality issues in vulnerability databases have been previously noted and studied. Nguyen and Massaci [13] pointed out that the affected product versions in NVD are often incorrect, where 25% of Google Chrome CVEs had an incorrect Chrome version string. Christey and Martin [15] similarly explored issues in the NVD data and suggested reporting biases as a root cause. Attila et al. [7] showed that CVSS metrics are more suitable for enterprise software products than personal ones. Dong et al. [12] analyzed the inconsistencies in public security vulnerability reports, including the NVD, and found overclaims and underclaims in the affected software product versions.
NVD 的可靠性。漏洞数据库的质量问题之前已被指出并研究。Nguyen 和 Massaci [13] 指出，NVD 中受影响的产品版本往往不正确，其中 25%的 Google Chrome CVEs 具有错误的 Chrome 版本字符串。Christey 和 Martin [15] 类似地探讨了 NVD 数据中的问题，并建议报告偏差是根本原因。Attila 等人 [7] 表明，CVSS 指标比个人软件产品更适合企业软件产品。Dong 等人 [12] 分析了包括 NVD 在内的公共安全漏洞报告中的不一致性，并发现受影响软件产品版本中存在过度和不足声明。

While these studies call attention to certain inconsistencies, our study stands out by providing a comprehensive and systematic investigation of incompleteness and inconsistencies across the NVD data fields. In addition to identifying and quantifying the data quality issues therein, we also develop methods for correcting them.
尽管这些研究引起了某些不一致性的注意，但我们的研究通过提供一个全面和系统的对 NVD 数据字段中不完整和不一致性的调查而脱颖而出。除了识别和量化其中的数据质量问题外，我们还开发了纠正这些问题的方法。

Vulnerability Analysis. Our work provides vulnerability analyses using more consistent vulnerability information, thus expanding on the literature on vulnerability dynamics.
漏洞分析。我们的工作使用更一致性的漏洞信息进行漏洞分析，从而扩展了关于漏洞动态的文献。

Previously, Shahzad et al. [16] analyzed the vulnerability life cycle, and pointed out that remotely exploitable vulnerabilities represent 80% of all of them. Earlier, Clark et al. [17] outlined a relation between a product's familiarity and its first vulnerability disclosure: a shorter time between product release and first vulnerability discovery is shown for familiar products. Ozment and Schechter [18] observed that 62% of vulnerabilities in the OpenBSD system were foundational and took 2.5 years for them to be reported.
先前，Shahzad 等人分析了漏洞生命周期，指出可远程利用的漏洞占所有漏洞的 80%。早些时候，Clark 等人概述了产品熟悉度与其首次漏洞披露之间的关系：对于熟悉的产品，产品发布与首次漏洞发现之间的时间更短。Ozment 和 Schechter 观察到，在 OpenBSD 系统中，62%的漏洞是基础性的，它们被报告出来花了 2.5 年时间。

Stock et al. [19] and Li et al. [20] studied the vulnerability notification channels and their significance. Zhao et al. [21] empirically studied data from two web vulnerability discovery ecosystems for trend analyses. Trinh et al. [22] studied vulnerabilities in web applications. Saha [23] extended an attack graph-based vulnerability analysis framework to include complex security policies for efficient vulnerability analysis. Zhang et al. [24] used data from NVD to predict the time to next vulnerability, and argued that NVD provides poor predictions while pointing out inconsistencies, e.g., missing version information, release time, and other obvious errors. Votipka et al. [25] suggested integrating hackers and improved security training for testers in vulnerability discovery. Xiao et al. [26] detected vulnerability exploitation at a 90% rate. Sabottke et al. [27] proposed a Twitter-based detector to identify vulnerabilities likely to be exploited. Homaei and Shahriari [28] analyzed vulnerability reports between 2008 and 2014 and observed that security professionals can prevent 60% of them using only seven vulnerability categories. William et al. [29] proposed a framework to discover evolutionary patterns in the vulnerabilities.
Stock 等人研究了漏洞通知渠道及其重要性。Zhao 等人对两个网络漏洞发现生态系统的数据进行实证研究，以分析趋势。Trinh 等人研究了 Web 应用程序中的漏洞。Saha 将基于攻击图的漏洞分析框架扩展到包括复杂的网络安全策略，以实现高效的漏洞分析。张等人使用 NVD 的数据来预测下一个漏洞的时间，并指出 NVD 的预测较差，同时指出不一致之处，例如缺少版本信息、发布时间和其他明显错误。Votipka 等人建议在漏洞发现中整合黑客和改进的安全培训。Xiao 等人以 90%的准确率检测到漏洞利用。Sabottke 等人提出了一种基于 Twitter 的检测器，以识别可能被利用的漏洞。Homaei 和 Shahriari 分析了 2008 年至 2014 年间的漏洞报告，并观察到安全专业人士仅使用七个漏洞类别就可以防止 60%的漏洞。William 等人 [29] 提出了一种发现漏洞进化模式的框架。

Differences With Zhang et al. [30]. Zhang et al. [30] analyze the NVD to study the features that are reflective of the number of vulnerabilities an application contains. While the work did not appear during the conduction of this work, but given its publication prior to our work, we contrast the two works. Conducting a study on the vulnerabilities in open-source applications in the NVD, they identify the unique CVE in their dataset. They identify that CVEs may not include the references to GitHub repositories leading them to analyze the products affected by the vulnerabilities. To account for that, they use series of heuristics to map the products to their GitHub. Overall, while Zhang et al. analyze the open source projects, the analyzed vulnerabilities account for less than 5,800 vulnerabilities which is far less than the actual size of the NVD. While the previous work inspect the products affected in a CVE, we take a two-step process. First, we identify the inconsistencies in the vendor names throughout the NVD. Second, we cluster the products of the vendors under a consistent name and then analyze the inconsistencies in the product name. With our efforts, we analyze all the occurrences and re-occurrences of open-source and closed-source vulnerabilities alike, and propose methods to limit their re-occurrence in the future (see Section 6).
与张等人 [30] 的不同。张等人 [30] 分析了 NVD 以研究反映应用程序包含漏洞数量的特征。虽然这项工作在我们进行这项工作期间并未出现，但鉴于其在我们工作之前发表，我们对比了这两项工作。他们对 NVD 中开源应用程序的漏洞进行了研究，并在他们的数据集中识别出独特的 CVE。他们发现 CVE 可能不包括对 GitHub 仓库的引用，这导致他们分析了受漏洞影响的产品。为了解决这个问题，他们使用一系列启发式方法将产品映射到 GitHub。总的来说，虽然张等人分析了开源项目，但分析出的漏洞数量不到 5,800 个，这远低于 NVD 的实际规模。而前一项工作检查了 CVE 中受影响的产品，我们采取了两步过程。首先，我们识别了 NVD 中供应商名称的不一致性。其次，我们将供应商的产品聚类在一致名称下，然后分析产品名称的不一致性。通过我们的努力，我们分析了开源和闭源漏洞的所有发生和重复发生情况，并提出了限制其未来重复出现的方法（见 Section 6 ）。

SECTION 3 第三节

Dataset 数据集

We study the National Vulnerability Database (NVD) [31], the U.S. government's repository of public vulnerability information, actively maintained by the National Institute of Standards and Technology (NIST). While there are other databases, we focused on the NVD because it is widely used (in part because it is public and free), and arguably serves as the industry standard for tracking vulnerabilities. Nevertheless, our exploration of the NVD can provide insights into using other vulnerability databases.
我们研究了国家漏洞数据库（NVD） [31] ，这是美国政府公共漏洞信息的存储库，由国家标准与技术研究院（NIST）积极维护。尽管存在其他数据库，但我们专注于 NVD，因为它被广泛使用（部分原因是因为它是公开和免费的），并且可以说是跟踪漏洞的行业标准。尽管如此，我们对 NVD 的探索可以为使用其他漏洞数据库提供见解。

NVD Studied Attributes. For the NVD, reported vulnerabilities are analyzed and added in a standardized format. Specifically, NVD entries contain the following. (1) A Common Vulnerability Exposure (CVE) ID number [32] that uniquely identifies the vulnerability. (2) The vulnerability entry's publication date. (3) The vulnerability type/category, as classified by the Common Weakness Enumeration (CWE) [33]. (4) The severity, as rated by the Common Vulnerability Severity Score (CVSS) [34]. Note that there are two CVSS versions, the historical CVSS v2 (v2) and the modern CVSS v3 (v3) [35], both on a scale from 0 to 10. Table 1 shows the CVSS severity level thresholds. Note that the v3 introduces a critical level of severity. (5) A list of vendors and products affected, as classified under the Common Platform Enumeration (CPE) [36]. (6) Free-form vulnerability descriptions. There can be multiple descriptions, although the typical one explains the security concern. Another common description is a comment by the CVE entry evaluator. (7) Optionally, reference URLs (e.g., security advisories) are sometimes listed, providing vulnerability details.
NVD 研究属性。对于 NVD，报告的漏洞以标准化格式进行分析和添加。具体来说，NVD 条目包含以下内容。（1）一个唯一标识漏洞的通用漏洞披露（CVE）ID 号码 [32] 。（2）漏洞条目的发布日期。（3）根据通用弱点枚举（CWE） [33] 分类的漏洞类型/类别。（4）根据通用漏洞严重性评分（CVSS） [34] 评定的严重性。请注意，有两个 CVSS 版本，历史 CVSS v2（v2）和现代 CVSS v3（v3） [35] ，两者均在 0 到 10 的范围内。 Table 1 显示 CVSS 严重性级别阈值。请注意，v3 引入了严重性关键级别。（5）受影响的供应商和产品列表，按通用平台枚举（CPE） [36] 分类。（6）自由形式的漏洞描述。可能有多个描述，尽管典型的描述解释了安全关注点。另一种常见的描述是 CVE 条目评估者的评论。（7）可选地，有时会列出参考 URL（例如安全警告），提供漏洞详细信息。

TABLE 1 Score Thresholds of v2 & v3 CVSS Severity Levels
表 1 v2 & v3 CVSS 严重程度等级的得分阈值

NVD Scale and Longitude. We use a snapshot of NVD captured on May 21, 2018. This snapshot includes 107.2K CVEs added to NVD over two decades (1998–2018). These vulnerabilities are categorized into 453 CWE types, affecting 18.9K vendors and 46.6K products. We observe that 37.5K recent CVEs have the modern v3 severity label, in addition to v2 labels, while the remaining CVEs only have v2 labels.
NVD 规模和经度。我们使用 2018 年 5 月 21 日捕获的 NVD 快照。此快照包括过去二十年中（1998-2018）添加到 NVD 的 107.2K CVE。这些漏洞被分类为 453 种 CWE 类型，影响 18.9K 个供应商和 46.6K 个产品。我们观察到 37.5K 最近 CVE 具有现代 v3 严重性标签，除 v2 标签外，其余 CVE 仅具有 v2 标签。

SECTION 4 第四章

Inconsistencies and Improvements
不一致性与改进

The quality of data in a vulnerability database can heavily impact vulnerability tracking and trend analyses. Prior work by Mu et al. [11] already identified that crowd-sourcing vulnerability information has limitations. In this section, we analyzed the NVD CVE entries for inconsistencies and explored methods for rectifying them.
数据漏洞数据库的质量会严重影响漏洞跟踪和趋势分析。Mu 等人 [11] 的先前工作已经确定，众包漏洞信息存在局限性。在本节中，我们分析了 NVD CVE 条目中的不一致性，并探讨了纠正这些不一致性的方法。

We assess the standardized non-free-form fields, i.e., publication date, CWE class, CVSS rating, and the affected CPE. The remaining NVD fields (the vulnerability description and reference URLs) are free-form without a standardized structure, making it challenging to conceptually define and identify inconsistencies. Since the description is not guided by standardized rules, the extracted features are not predictable and may not be meaningful.
我们评估标准化的非自由格式字段，即出版日期、CWE 类别、CVSS 评分和受影响的 CPE。剩余的 NVD 字段（漏洞描述和参考 URL）为自由格式，没有标准化结构，这使得在概念上定义和识别不一致性具有挑战性。由于描述不受标准化规则的指导，提取的特征不可预测，可能不具有意义。

Note that we focused on data consistency issues, not data error problems. We assumed that the data in the NVD is correct but perhaps represented inconsistently, such that one could identify the correct information without resorting to investigation beyond what is provided through the NVD.
请注意，我们关注的是数据一致性方面的问题，而不是数据错误问题。我们假设 NVD 中的数据是正确的，但可能表示不一致，这样人们可以在不超出 NVD 提供的信息范围的情况下识别出正确的信息。

4.1 Publication Dates 4.1 发表日期

Incompleteness. Vulnerability analysis often depends on tracking when vulnerabilities became public. For example, security analysts must consider how long a vulnerability has been public when prioritizing patching, calculating windows of exposure, or investigating incidents (such as in log analysis). NVD records have a publication date, but this date only indicates when the entry was added to the database. We observed cases where the NVD publication date does not give a clear picture of vulnerability. For example, CVE-2011-0700 is a WordPress XSS vulnerability with an NVD publication date of March 14, 2011. However, the CVE entry includes a reference URL for a public advisory disclosing the vulnerability over a month earlier.
不完整性。漏洞分析通常依赖于追踪漏洞何时公开。例如，安全分析师在优先考虑打补丁、计算暴露窗口或调查事件（如日志分析）时，必须考虑漏洞公开的时间长度。NVD 记录有发布日期，但这个日期仅表示条目被添加到数据库的时间。我们观察到 NVD 发布日期并不能清楚地反映漏洞情况。例如，CVE-2011-0700 是一个发布日期为 2011 年 3 月 14 日的 WordPress XSS 漏洞。然而，CVE 条目包括一个公开披露该漏洞的参考 URL，该 URL 在一个月前就已经公开。

Identification and Improvement. We identify the disclosure dates leveraging the reference URLs. Li and Paxson [9] and Anwar et al. [8] previously suggested approximating the disclosure date by mining these references, as many are web pages about the vulnerability and its publication date.
识别与改进。我们利用参考 URL 识别披露日期。Li 和 Paxson [9] 以及 Anwar 等人 [8] 之前建议通过挖掘这些参考来近似披露日期，因为其中许多是关于漏洞及其发布日期的网页。

We first extracted the domains from the URL references, finding that the 591.4K URLs in our data corresponded to 5,997 domains. We focused on the top 50 domains, covering more than 85% of all URLs (we observed diminishing returns from considering additional domains).
首先，我们从 URL 引用中提取了域名，发现我们数据中的 591.4K 个 URL 对应 5,997 个域名。我们专注于前 50 个域名，涵盖了所有 URL 的超过 85%（考虑更多域名时，我们观察到收益递减）。

These top domains fall into three high-level categories: (1) other vulnerability databases (e.g., SecurtiyFocus), (2) bug reports or email archives threads (e.g., Bugzilla), and (3) security advisories (e.g., cisco.com). Note that some domains are not in English (e.g., jvn.jp is in Japanese). Each of the webpages may have a different structure. Thus, we built a separate crawler for each domain to extract the relevant publication date for the vulnerability information (if any). We note that 14 domains are no longer responsive (e.g., osvdb.org shut down in 2016). For a given CVE, we approximated its public disclosure date as the minimum of the dates extracted from the reference URLs or publication date.
这些顶级域名分为三个高级类别：（1）其他漏洞数据库（例如，SecurityFocus），（2）错误报告或电子邮件存档线程（例如，Bugzilla），和（3）安全警告（例如，cisco.com）。请注意，一些域名不是英文（例如，jvn.jp 是日文）。每个网页可能具有不同的结构。因此，我们为每个域名构建了一个单独的爬虫来提取相关信息（如果有）的发布日期。我们注意到有 14 个域名不再响应（例如，osvdb.org 于 2016 年关闭）。对于给定的 CVE，我们将其公开披露日期近似为从参考 URL 或发布日期提取的日期中的最小值。

Improvement Impact. We evaluated how many days the CVE published date preceded our estimated disclosure date, which we call the lag time. Fig. 1 plots the percentage of CVEs within a lag time. Notice that ≈38% of the vulnerabilities have a lag of zero days. The growth of vulnerabilities by lag time slows after accounting for the vulnerabilities with a lag of ≤6 days (≈70%). We observed that ≈ 28% of the vulnerabilities have a lag of more than a week. Moreover, we distributed the lag among the v2 labels and observed that we improved on the publication date for only 37% of low severity vulnerabilities, in comparison to 41% medium and 65% high severity vulnerabilities. This observation is particularly interesting as vulnerability tracking and analysis of high severity vulnerabilities are likely most valuable and can be most affected by this inconsistency.
改进影响。我们评估了 CVE 发布日期与我们的估计披露日期之间的天数，我们称之为滞后时间。图 Fig. 1 显示了滞后时间内的 CVE 百分比。请注意，图 ≈ 中有 38%的漏洞滞后时间为零天。在考虑了滞后时间为 ≤6 天（ ≈ 70%）的漏洞后，漏洞的增长速度放缓。我们观察到， ≈ 28%的漏洞滞后时间超过一周。此外，我们在 v2 标签中分配了滞后时间，并观察到我们仅在 37%的低严重性漏洞上改进了发布日期，相比之下，中等严重性漏洞为 41%，高严重性漏洞为 65%。这一观察特别有趣，因为对高严重性漏洞的跟踪和分析可能是最有价值的，也可能最受这种不一致性的影响。

$Fig. 1. - CDF of vulnerability lag times. Lag time is the number of days after our estimated disclosure date when a vulnerability enters into the NVD. Note, $\approx$≈38% of the vulnerabilities have no lag.$

Fig. 1. 图 1.

CDF of vulnerability lag times. Lag time is the number of days after our estimated disclosure date when a vulnerability enters into the NVD. Note, ≈38% of the vulnerabilities have no lag.
漏洞滞后时间的累积分布函数。滞后时间是指自我们估计的披露日期起，漏洞进入 NVD 的天数。注意， ≈ 38%的漏洞没有滞后时间。

Show All

Limitations. We crawled 50 top level domains to estimate the vulnerability disclosure date cover 85% of all URLs. We note that including additional domain names may lead to an earlier disclosure date, although it would entail more engineering effort and time.
局限性。我们爬取了 50 个顶级域名以估计漏洞披露日期，覆盖了所有 URL 的 85%。我们注意到，包括更多的域名可能会导致更早的披露日期，尽管这将涉及更多的工程努力和时间。

4.2 Vendor and Product Names
4.2 供应商和产品名称

Inconsistencies. Practitioners depend on lists of vendors and products affected by a CVE to identify vulnerabilities affecting software they use [37], or to monitor the security trends of various software systems. We observed inconsistencies in these vendor and product names. For example, BEA Systems (vendor) is labeled as both bea (171 associated CVEs) and bea_systems (14 associated CVEs). We observed AVG's anti-virus product has multiple names, including antivirus and anti-virus. Thus, those monitoring for vulnerabilities by vendor or product names will obtain incorrect results unless carefully accounting for these inconsistencies.
不一致性。从业者依赖于受 CVE 影响的供应商和产品列表来识别影响他们使用的软件的漏洞，或监控各种软件系统的安全趋势。我们观察到这些供应商和产品名称存在不一致性。例如，BEA Systems（供应商）被标记为 bea（与 171 个 CVE 相关）和 bea_systems（与 14 个 CVE 相关）。我们观察到 AVG 的反病毒产品有多个名称，包括 antivirus 和 anti-virus。因此，通过供应商或产品名称监控漏洞的人将获得错误的结果，除非仔细考虑这些不一致性。

Product Version Inconsistency. The NVD is also subject to inconsistent product versions, as demonstrated by Nguyen and Massaci [13]. Dong et al. [12] leveraged NLP methods to find and correct inconsistencies in product versions through mining the NVD reference URLs. Thus, we did not investigate product versions further.
产品版本不一致。NVD 也受到不一致的产品版本的影响，如 Nguyen 和 Massaci 所示 [13] 。Dong 等人 [12] 利用 NLP 方法通过挖掘 NVD 参考 URL 来查找和纠正产品版本的不一致性。因此，我们没有进一步调查产品版本。

Identification and Improvement. Initially, we lack a general understanding of the nature of the vendor and product name inconsistencies. Thus, we resorted to manually analyzing name pairs to determine if both names represent the same entity (which we will call matching pairs). However, the manual analysis does not scale to the number of unique name pairs. We used heuristics to filter pairs down to those that are likely matching (i.e., related to the same entity yet with inconsistent names). We recognize these heuristics provide a broad coverage but may not be truly comprehensive.
识别与改进。最初，我们对供应商和产品名称不一致的性质缺乏一般理解。因此，我们通过手动分析名称对来确定这两个名称是否代表同一实体（我们将它们称为匹配对）。然而，手动分析无法扩展到独特的名称对的数量。我们使用启发式方法筛选出可能匹配的名称对（即与同一实体相关但名称不一致）。我们认识到这些启发式方法提供了广泛的覆盖范围，但可能并不真正全面。

Vendor Names. Informed by manual exploration, we developed three heuristics to identify probable inconsistent vendor name pairs as follows. (1) Vendor name pairs share characters in common. This accounts for various scenarios such as where one name is misspelled (e.g., microsoft and microsft), represented in a different format (e.g., avast and avast!), abbreviated (e.g., lan_management_system and lms), or a strict substring of another (e.g., lynx and lynx_project). (2) A product name is used as a vendor name (e.g., microsoft and windows both appearing as vendors). (3) Vendor pairs share the same product name.
供应商名称。根据手动探索，我们开发了三种启发式方法来识别可能的供应商名称不一致对，如下所示。 (1) 供应商名称对共享共同字符。这解释了各种场景，例如一个名称拼写错误（例如，microsoft 和 microsft）、以不同的格式表示（例如，avast 和 avast!）、缩写（例如，lan_management_system 和 lms）或严格是另一个名称的子串（例如，lynx 和 lynx_project）。 (2) 产品名称被用作供应商名称（例如，microsoft 和 windows 都作为供应商出现）。 (3) 供应商对共享相同的产品名称。

We filtered out vendor name pairs that do not satisfy any of these heuristics, and manually investigated each remaining pair by checking their products, developers, and associated organizations. For each group of matching name pairs for the same vendor, we created a mapping of vendor names to consolidate those representing the same vendor under a consistent name. Note that there may be multiple matching pairs associated with the same vendor, indicating multiple inconsistent names. For the names associated with a vendor, we considered the one with the most associated CVEs as the consistent name, and remapped inconsistent vendor names in the NVD using our mapping. Technically, while we provide simple examples to understand the heuristics, those that are reasonably easy to identify by the human eye, we also identify inconsistent vendor names, such as kingsoft (CVE-2018-7546¹) and ksoffice (CVE-2010-5208²). The identification of such inconspicuous inconsistencies demonstrate the usefulness of our tool.
我们过滤掉了不满足任何这些启发式规则的供应商名称对，并通过检查他们的产品、开发者和相关组织手动调查了每个剩余的名称对。对于同一供应商的相同名称对组，我们创建了一个供应商名称到名称映射，以将代表同一供应商的名称统一。请注意，可能存在与同一供应商关联的多个匹配名称对，这表明存在多个不一致的名称。对于与供应商关联的名称，我们考虑了具有最多关联 CVE 的名称作为一致名称，并在 NVD 中使用我们的映射重新映射不一致的供应商名称。技术上，虽然我们提供了简单的示例来理解启发式规则，这些规则对于人类眼睛来说相对容易识别，但我们还识别了不一致的供应商名称，例如 kingsoft（CVE-2018-7546 ¹ ）和 ksoffice（CVE-2010-5208 ² ）。这种不明显的不一致性的识别展示了我们工具的有用性。

To shed light on common patterns in inconsistent vendor naming, in Table 2, we listed those common patterns, as well as how likely those patterns signals a matching pair. We observed that 260 name pairs were identical except for the inclusion of special characters (e.g., ! or _), and all were matching vendor name pairs. For other name pairs, when the longest substring match was at least 3 characters, the majority (at least 60%) of name pairs were matching under the other patterns. Notably, when the two vendor names in the pair were both associated with the same product name, or when one vendor name was a string prefix of the other, the pair were matched in over 90% of cases. When the longest substring match was less than 3 characters, only a minority of name pairs were still matching under the different patterns.
为了揭示不一致的供应商命名中的常见模式，在 Table 2 中，我们列出了这些常见模式，以及这些模式表明匹配对的可能性。我们观察到，260 个名称对除了包含特殊字符（例如！或_）外完全相同，并且所有这些都是匹配的供应商名称对。对于其他名称对，当最长子串匹配至少 3 个字符时，大多数（至少 60%）的名称对在其他模式下也是匹配的。值得注意的是，当一对供应商名称都与同一产品名称相关联，或者当一个供应商名称是另一个供应商名称的字符串前缀时，该对在超过 90%的情况下被匹配。当最长子串匹配少于 3 个字符时，只有少数名称对在不同模式下仍然匹配。

TABLE 2 Common Inconsistency Patterns in Vendor Naming
表 2 厂商命名中的常见不一致模式

Product Names. After consolidating vendor names (above), we identified likely matching product names under the same (consolidated) vendor using two heuristics, and then manually evaluated the pairs. For the first heuristic, we tokenized product names by splitting by white spaces and special characters, and considered a product name pair as likely matching if the two tokenized names are identical. This captures cases such as internet-explorer, internet_explorer, and internet explorer.
产品名称。在整合供应商名称（如上所述）后，我们使用两种启发式方法识别了同一（整合）供应商下可能匹配的产品名称，然后手动评估这些配对。对于第一种启发式方法，我们通过空格和特殊字符分割产品名称进行标记化，如果两个标记化名称相同，则认为产品名称配对可能匹配。这包括诸如 internet-explorer、internet_explorer 和 internet explorer 之类的案例。

For the second heuristic, if one product name in the pair is tokenized into multiple components and the other is a single component, we concatenated the first character of the multi-component name, and compared the concatenated string with the other product name. This captures abbreviations, such as with internet-explorer and ie. Next, we investigated replacing, adding, and swapping of characters. We did so by determining the edit distance between product pairs. This is followed by manual verification of the pairs. The product names varying by characters can be different products altogether, e.g., cisco's ucs-e160dp-m1_firmware and ucs-e140dp-m1_firmware have an edit distance of one, but are different products. With our analysis, we focused on pairs that can be a result of human error, e.g., nativesolutions's tbe_banner_engine and the_banner_engine. As with vendor names, we mapped inconsistent product names to a consistent name based on the name associated with the most CVEs, and remapped product names in the NVD based on this mapping. Table 3 depicts that we found over 3K products inconsistently named affecting 700 vendors.
对于第二个启发式方法，如果一个产品名称在成对时被分解成多个组件，而另一个是单个组件，我们将多组件名称的第一个字符连接起来，并将连接的字符串与另一个产品名称进行比较。这可以捕捉缩写，例如与 internet-explorer 和 ie。接下来，我们研究了字符的替换、添加和交换。我们通过确定产品对之间的编辑距离来实现这一点。然后是对这些对进行人工验证。字符不同的产品名称可能是完全不同的产品，例如，cisco 的 ucs-e160dp-m1_firmware 和 ucs-e140dp-m1_firmware 之间的编辑距离为 1，但它们是不同的产品。在我们的分析中，我们关注可能是人为错误的结果的对，例如，nativesolutions 的 tbe_banner_engine 和 the_banner_engine。与供应商名称一样，我们将不一致的产品名称映射到基于与最多 CVE 相关的名称的一致名称，并根据此映射重新映射 NVD 中的产品名称。 Table 3 描述了我们发现了超过 3K 个名称不一致的产品，影响了 700 家供应商。

TABLE 3 Vendor and Product Name Inconsistencies in NVD, SecurityFocus (SF), and SecurityTracker (ST)
表 3 NVD、SecurityFocus（SF）和 SecurityTracker（ST）中的供应商和产品名称不一致性

We note these two heuristics are more limited than those considered for vendor names, as we found that product names are often quite similar without representing the same product. For example, we explored using substring matching heuristics (as with vendor names), but found the number of pairs flagged for analysis to be too large and with many false positives (i.e., non-matching pairs).
我们注意到，这两种启发式方法比用于供应商名称的方法更为有限，因为我们发现产品名称通常非常相似，但并不代表相同的产品。例如，我们探讨了使用子串匹配启发式方法（与供应商名称类似），但发现标记为分析的对数太多，且存在许多误报（即非匹配对）。

Improvement Impact. Table 3 lists the extent of the vendor and product naming inconsistencies we identified. The NVD includes ≈19K distinct vendors, and about 10% of them were impacted by vendor naming inconsistencies. These ≈1.8K vendor names could be consolidated under 871 vendor names, thus removing ≈5% of distinct vendors. Inconsistencies similarly affected 6% of distinct NVD product names, and consolidating names would reduce the number of product names also by about 5%. Thus, inconsistencies affect a non-trivial fraction of vendors and products. These numbers are lower bounds on the extent of vendor and product name inconsistencies in the NVD, since our identification and correction method relied on heuristics that may not be all-encompassing.
改进影响。 Table 3 列出了我们发现的供应商和产品命名不一致的范围。NVD 包括 ≈ 19K 个不同的供应商，其中大约 10% 受到供应商命名不一致的影响。这些 ≈ 1.8K 个供应商名称可以合并为 871 个供应商名称，从而消除了 ≈ 5% 的不同供应商。类似的不一致也影响了 6% 的不同 NVD 产品名称，合并名称也将减少约 5% 的产品名称数量。因此，不一致影响了相当一部分的供应商和产品。这些数字是 NVD 中供应商和产品名称不一致程度的下限，因为我们的识别和纠正方法依赖于可能并不全面的启发式方法。

We also explored vendor naming inconsistencies in two other vulnerability databases with this information, SecurityTracker [38], and SecurityFocus [39]. We used the same vendor name mapping that we generated (above) for correcting to consistent names, and applied it to the vendor strings in these two databases. As a result, we found as shown in Table 3 that 3% and 8% of vendor names were inconsistent for SecurityTracker and SecurityFocus, respectively. Exploration of these databases specifically will likely yield further inconsistencies, highlighting that this data quality issue is prominent in vulnerability database generally, and our approach for rectifying the NVD could be used for our datasets as well.
我们还在两个其他漏洞数据库中使用了这些信息来探索供应商命名的不一致性，即 SecurityTracker [38] 和 SecurityFocus [39] 。我们使用了上面生成的相同的供应商名称映射来纠正名称的一致性，并将其应用于这两个数据库中的供应商字符串。结果，我们发现如图 Table 3 所示，SecurityTracker 和 SecurityFocus 的供应商名称不一致率分别为 3%和 8%。对这些数据库的特定探索可能会发现更多的不一致性，这突显了数据质量问题在漏洞数据库中普遍存在，我们纠正 NVD 的方法也可以用于我们的数据集。

We now delve deeper into the vulnerabilities to understand what type of vulnerabilities are impacted by such inconsistencies? Are they unimportant so that they can be considered as those that may not have much impact on host systems and can thus be ignored? To answer these questions, we consider the vulnerabilities that have inconsistent vendor or product names. Among those that are corresponding to well-known vendors, we select 10 CVEs randomly, shown in Table 4. To evaluate their impact, we focus on their severity and vulnerability type. Notice that all except one (CVE-2006-6601) are of High severity (v2). This CVE-2006-6601 vulnerability is in windows media player though of Medium severity, which can be exploited by a crafted header of .MID (MIDI) file to and cause a DoS attack. Among the other nine vulnerabilities, four can be exploited remotely. Additionally, CVE-2018-16983, a vulnerability in tor browser, and can be exploited by an attacker to bypass by using text/html;/json Content-Type, which can pose to be a privacy risk.
我们现在更深入地研究这些漏洞，以了解哪些类型的漏洞会受到此类不一致的影响？它们是否不重要，以至于可以将它们视为可能对主机系统没有太大影响的那些，因此可以忽略它们？为了回答这些问题，我们考虑了供应商或产品名称不一致的漏洞。在对应于知名供应商的 CVE 中，我们随机选择 10 个 CVE，如 Table 4 所示。为了评估其影响，我们重点关注其严重性和漏洞类型。请注意，除一个（CVE-2006-6601）外，所有漏洞的严重性均为高（v2）。此 CVE-2006-6601 漏洞存在于 Windows Media Player 中，但严重性为“中等”，可被构建的标头利用。MID （MIDI）文件并导致 DoS 攻击。在其他 9 个漏洞中，有 4 个可以被远程利用。此外，CVE-2018-16983 是 tor 浏览器中的一个漏洞，攻击者可以使用 text/html;/json Content-Type 来利用该漏洞绕过该漏洞，这可能会带来隐私风险。

TABLE 4 Case Study: A Sample of Vulnerabilities Corresponding to Known Vendors
表 4 案例研究：与已知供应商相对应的漏洞样本

These analyses show that the vulnerabilities corresponding to the inconsistent vendor names are impacting, severe, and thus cannot be ignored. Additionally, it exhibits the importance of having a consistent vendor/product name.
这些分析表明，与不一致的供应商名称相对应的漏洞影响严重，因此不容忽视。此外，它还展示了保持供应商/产品名称一致的重要性。

We note that Dong et al. [12] also investigated product names specifically, where their heuristic was to split product names by white spaces into words, and label two products as matching if they shared words. In comparison, their method does not account for abbreviations or special character separators, and yield false positives when different products share similar words (e.g., Microsoft's Internet Explorer and Internet Information Services products).
我们注意到 Dong 等人专门研究了产品名称，他们的启发式方法是按照空格将产品名称分割成单词，如果两个产品共享单词，则将它们标记为匹配。相比之下，他们的方法没有考虑到缩写或特殊字符分隔符，当不同产品共享相似单词时（例如，微软的 Internet Explorer 和 Internet 信息服务产品）会产生假阳性。

Limitations. The vendor and product inconsistency numbers present a lower bound on the inconsistencies that NVD may have. During our experimentation, we do not group the vendors if another vendor acquired a probable inconsistent vendor. For example, CVE-2021-2161 is a vulnerability in the Java SE. Although Java was previously owned by Sun, the recent vulnerabilities in it have been associated with Oracle and not Sun, or both Sun and Oracle. An approach to improve the bounds would require determining the date of acquisition of the probable inconsistent vendor and then correlating it with their estimated disclosure date. Moreover, we take into account the project forks during our inconsistency analysis, i.e., open-source applications being utilized by other applications. However, we argue that forks cannot be considered as inconsistencies.
局限性。供应商和产品不一致的数字是 NVD 可能存在的不一致性的下限。在我们的实验中，如果一个供应商收购了一个可能不一致的供应商，我们不将供应商分组。例如，CVE-2021-2161 是 Java SE 的一个漏洞。尽管 Java 之前属于 Sun，但其中最近出现的漏洞与 Oracle 有关，而不是 Sun，或者 Sun 和 Oracle 都有关。为了提高界限，需要确定可能不一致供应商的收购日期，并将其与他们的估计披露日期相关联。此外，我们在不一致性分析中考虑了项目分支，即开源应用程序被其他应用程序使用。然而，我们认为分支不能被视为不一致。

4.3 Severity Scores 4.3 严重程度评分

Inconsistencies. NVD uses the CVSS standard for rating severity [34]. However, CVSS has had multiple versions, with the modern v3 addressing limitations of prior versions. As v3 was only released in 2015, only a third of the CVEs in our NVD dataset have v3 scores. Security analysts monitoring vulnerabilities over time must either rely on v2 and its limitations (e.g., inaccurate security ratings), or evaluate a subset of the NVD data. Vulnerabilities pre-dating the release of v3 are still relevant, as age-old vulnerabilities are often still used in active attacks. For example, CVE-2011-0997 (a DHCP client vulnerability) was disclosed in 2011 yet could be used to target Avaya desk and IP conference phones in 2019 [40]. Similarly, CVE-2004-0113 is a medium severity vulnerability under v2 that was actively exploited in 2018 (over 14 years after disclosure) to exploit hosts and install crypto-mining malware [41]. Thus, we would ideally be able to backport v3 scores throughout the NVD, providing a more modern security rating for all vulnerabilities.
不一致性。NVD 使用 CVSS 标准对严重性进行评级。然而，CVSS 有多个版本，现代的 v3 版本解决了先前版本的局限性。由于 v3 仅在 2015 年发布，我们 NVD 数据集中只有三分之一的 CVE 具有 v3 评分。监控漏洞随时间变化的网络安全分析师必须依赖 v2 及其局限性（例如，不准确的网络安全评级），或者评估 NVD 数据的一个子集。在 v3 发布之前出现的漏洞仍然相关，因为一些古老的漏洞仍然被用于活跃攻击中。例如，CVE-2011-0997（一个 DHCP 客户端漏洞）在 2011 年公开，但到 2019 年仍可用于针对 Avaya 桌面和 IP 会议电话。同样，CVE-2004-0113 是 v2 下的中等严重性漏洞，在 2018 年被积极利用（在公开后 14 年），以攻击主机并安装加密货币挖矿软件。因此，我们理想情况下能够将 v3 评分回溯到 NVD 中，为所有漏洞提供更现代的安全评级。

Motivation. Vulnerabilities that have occurred in the past have been shown to re-appear as new attack vectors. This has been attributed to the inability of security teams to generate a prioritized list of patches for the operations team [42]. With the increasing vulnerability disclosures over the years, it is essential to re-assess the priority of the vulnerabilities given the current threat landscape. With v3, CVSS attains this objective. However, the vulnerabilities in the NVD that do not have the v3 scores are left behind and cannot be assigned an updated priority. Therefore, with our efforts, we estimate the v3 score of such vulnerabilities that do not have a v2 score.
动机。过去发生过的漏洞已被证明会以新的攻击向量重新出现。这归因于安全团队无法为运维团队生成一个优先级列表的补丁 [42] 。随着近年来漏洞披露的增加，鉴于当前的威胁环境，重新评估漏洞的优先级变得至关重要。在 v3 版本中，CVSS 实现了这一目标。然而，NVD 中未获得 v3 评分的漏洞被遗留下，无法分配更新的优先级。因此，通过我们的努力，我们估计了那些没有 v2 评分的漏洞的 v3 评分。

Identification and Improvement. Identifying CVEs with only v2 is straightforward, as NVD entries list the CVSS version associated with a score. The challenge is then improving the NVD by automatically assigning v3 scores to all CVEs that only have the v2 scores. Both CVSS versions are calculated from a weighted aggregation of an input set of feature values, with v3 providing additional features and refined weightings. Thus, our approach is to develop a machine learning model that inputs v2 features, as well as other CVE entry information, and output approximate and meaningful v3 scores (despite lacking explicit features that normally are input into the v3 calculations). To evaluate the accuracy, we aimed not to necessarily produce identical severity scores as v3 would output, but predict the correct severity category (low, medium, high, critical) as the v3 score, which is commonly used for vulnerability prioritization [34]. We specifically applied machine and deep learning approaches to model the potentially complex weighting and interactions between different features despite lacking the explicit v3 features.
识别与改进。仅使用 v2 版本识别 CVEs 很简单，因为 NVD 条目列出了与分数关联的 CVSS 版本。挑战在于通过自动将所有只有 v2 分数的 CVEs 分配 v3 分数来改进 NVD。两个 CVSS 版本都是通过对一组输入特征值的加权聚合来计算的，v3 提供了额外的特征和更精细的权重。因此，我们的方法是通过开发一个机器学习模型，输入 v2 特征以及其他 CVE 条目信息，并输出近似且具有意义的 v3 分数（尽管缺乏通常输入到 v3 计算中的显式特征）。为了评估准确性，我们的目标是预测 v3 分数的正确严重程度类别（低、中、高、关键），而不是产生与 v3 输出相同的严重程度分数，这在漏洞优先级排序中常用。我们特别应用了机器学习和深度学习方法来模拟不同特征之间可能复杂的权重和相互作用，尽管缺乏显式的 v3 特征。

Features. While most parameters required for the severity scores remain the same in v3 as in v2, the parameters in v3 capture a fine-grained impact of the vulnerability. For example, “access vector” in v2 was transformed into “attack vector” in v3 with the specific effect of vulnerability into Physical (P), Network (N), Adjacent (A), and Local (L) impacts. Where v2 considered P attacks as L, v3 divides the scores and introduces a new scope parameter for vulnerabilities impacts beyond the exploitable system. The access complexity in v2 was divided into attack complexity and user interaction in v3, and the temporal metric influence is decreased in v3. To this end, we used the following v2 parameters as features to extrapolate v3 scores: access vector and complexity, authentication, integrity, availability, all privilege, user privilege, and other privilege flags.
特性。虽然大多数用于严重程度评分的参数在 v3 版和 v2 版中保持不变，但 v3 版的参数能够捕捉到漏洞的细微影响。例如，v2 版中的“访问向量”在 v3 版中变成了“攻击向量”，具体效果是将漏洞影响分为物理（P）、网络（N）、相邻（A）和本地（L）影响。在 v2 版中，将 P 攻击视为 L 攻击，而 v3 版则将评分分开，并引入了一个新的范围参数，用于超出可利用系统的漏洞影响。v2 版中的访问复杂性在 v3 版中分为攻击复杂性和用户交互，而时间度量影响在 v3 版中降低。为此，我们使用以下 v2 参数作为特征来外推 v3 评分：访问向量复杂性、身份验证、完整性、可用性、所有权限、用户权限和其他权限标志。

Acknowledging the study by Holm and Afridi [43] on CVSS reliability by surveying 384 experts and 3,000 vulnerabilities that concluded that the reliability depends on the vulnerability type, we also include CWE-ID as an input feature towards v3 approximation.
承认 Holm 和 Afridi 在 CVSS 可靠性方面的研究，该研究通过调查 384 位专家和 3,000 个漏洞得出结论，可靠性取决于漏洞类型，我们也将 CWE-ID 作为输入特征纳入 v3 近似中。

Ground Truth Dataset. A ground truth dataset with a mapping between v2 and v3 scores (or categories) is required for building our system. For that, we used the recent CVEs (≈37K CVEs) in the NVD that have both v2 and v3 CVSS versions. The v3 score emphasizes a better expressiveness for vulnerabilities’ impact. The effect of these changes on the vulnerabilities is summarized in Table 5, and we notice that there is no significant change in label across severity levels, i.e., no vulnerability moves from Low in v2 to Critical in v3. Similarly, no vulnerability moves from High in v2 to Low in v3.
真实数据集。构建我们的系统需要包含 v2 和 v3 评分（或类别）映射的真实数据集。为此，我们使用了 NVD 中最近的安全漏洞（ ≈ 37K CVEs），这些漏洞同时具有 v2 和 v3 版本的 CVSS。v3 评分强调对漏洞影响的更好表达。这些变化对漏洞的影响总结在 Table 5 中，我们注意到在严重程度级别上标签没有显著变化，即没有漏洞从 v2 的“低”级别移动到 v3 的“严重”级别。同样，没有漏洞从 v2 的“高”级别移动到 v3 的“低”级别。

TABLE 5 Transformation From v2 to v3 in Numbers
表 5 从 v2 到 v3 的数值转换

Model's Training. Using the aforementioned features, we predicted the v3 base scores for vulnerabilities that do not have the v3 metrics. We began by splitting the ground truth data into 80% training and 20% testing datasets evenly distributed among classes. Additionally, we observe non-linear patterns among the v2 and v3.³ We then applied a range of machine and deep learning prediction algorithms to predict the v3 scores: (1) Linear Regression (LR), (2) Support Vector Regression (SVR), (3) Convolutional Neural Networks (CNN), and (4) Deep Neural Networks (DNN). Linear regression finds the linear relationship between a target and one or more features. In addition, we used Support Vector Machine (SVM) as a regression method to predict v3 base score; we conducted the prediction using various combinations of parameters and report the best performing model on the training dataset (kernel type = rbf (radial basis function), kernel coefficient = 0.1, and penalty parameter = 2). We leveraged different deep learning techniques to extract deep feature representations for the vulnerabilities. We implemented a CNN model consisting of four consecutive convolutional layers. The first two layers consist of 64 filters and the remaining layers consist of 128 filters with a filter size of 3×3. The convolutional layers are followed by a flattening operation and a fully connected layer with 512 neurons. Next, a single neuron with a sigmoid activation function is used to output the prediction of the model. The sigmoid activation function is defined as f(x)=11+e−x. Similarly, we implemented a DNN model consisting of four fully connected layers with size of 128, 128, 256, and 256, respectively. The fully connected layers are followed by a single neuron with a sigmoid activation function to output the prediction of the model. We trained the deep learning models over 100 epochs using mean squared error loss function, 1N∑Ni=0(y(xi)−f(xi))2, and Adam optimizer with a learning rate of 0.001. For evaluation, we defined the average error (AE) as [∑Ni=0Abs(y(xi)−f(xi))]/N, where xi is the ith sample of the testing dataset, y(∗) is the v3 severity score of the sample, f(∗) is the predicted value of v3 severity score of the sample, and N is the size of the testing dataset. Similarly, we defined the average error rate (AER) as [∑Ni=0Abs(y(xi)−f(xi))/y(xi)]/N.
模型训练。使用上述特征，我们预测了没有 v3 指标的漏洞的 v3 基本分数。我们首先将真实数据集分为 80%的训练集和 20%的测试集，均匀分布在各个类别中。此外，我们还观察到 v2 和 v3 之间存在非线性模式。然后，我们应用了一系列机器学习和深度学习预测算法来预测 v3 分数：（1）线性回归（LR）、（2）支持向量回归（SVR）、（3）卷积神经网络（CNN）和（4）深度神经网络（DNN）。线性回归寻找目标与一个或多个特征之间的线性关系。此外，我们还使用支持向量机（SVM）作为回归方法来预测 v3 基本分数；我们使用各种参数组合进行预测，并在训练数据集上报告表现最好的模型（核类型=rbf（径向基函数），核系数=0.1，惩罚参数=2）。我们利用不同的深度学习技术提取漏洞的深度特征表示。我们实现了一个由四个连续卷积层组成的 CNN 模型。第一层和第二层包含 64 个滤波器，其余层包含 128 个滤波器，滤波器大小为 3×3 。卷积层之后是展平操作和具有 512 个神经元的全连接层。接下来，使用具有 sigmoid 激活函数的单个神经元来输出模型的预测。sigmoid 激活函数定义为 f(x)=11+e−x 。类似地，我们实现了一个由四个全连接层组成的 DNN 模型，分别具有 128、128、256 和 256 个神经元。全连接层之后是具有 sigmoid 激活函数的单个神经元，用于输出模型的预测。我们使用均方误差损失函数 1N∑Ni=0(y(xi)−f(xi))2 和学习率为 0.001 的 Adam 优化器对深度学习模型进行了超过 100 个 epoch 的训练。对于评估，我们定义平均误差（AE）为 [∑Ni=0Abs(y(xi)−f(xi))]/N ，其中 xi 是测试数据集的 ith 样本， y(∗) 是样本的 v3 严重程度评分， f(∗) 是样本的 v3 严重程度评分的预测值， N 是测试数据集的大小。类似地，我们定义平均误差率（AER）为 [∑Ni=0Abs(y(xi)−f(xi))/y(xi)]/N 。

Model Learning Results. Table 6 shows the average error and error deviation for different machine learning algorithms. The table shows that CNN has the lowest error rate and average error. Moreover, we translated the predicted v3 base scores to their respective severity labels according to the ranges in Table 1. Table 8 lists the accuracy per input class, and we found that the model performs best for the input class High, i.e., with 93.55% accuracy, and performs worst for target class Low, i.e., with 82.84% of accuracy. The overall accuracy of 86.29% means that our model could not predict the correct v3 label for 13.71% of the vulnerabilities in our dataset. We also observed that DNN performs slightly better than CNN for the input class Low. Furthermore, we also tried other machine learning algorithms, and found that deep learning-based models (CNN and DNN) outperformed those alternatives. Given that the CNN-based model outperforms DNN-based model by ≈2%, overall, we chose the CNN-based model for prediction.
模型学习结果。 Table 6 显示了不同机器学习算法的平均误差和误差偏差。表格显示，CNN 具有最低的误差率和平均误差。此外，我们将预测的 v3 基本分数转换为相应的严重性标签，根据 Table 1 中的范围。 Table 8 列出了每个输入类的准确率，我们发现模型在输入类“高”上表现最佳，即准确率为 93.55%，在目标类“低”上表现最差，即准确率为 82.84%。整体准确率为 86.29%，这意味着我们的模型无法预测我们数据集中 13.71%的漏洞的正确 v3 标签。我们还观察到，对于输入类“低”，DNN 的性能略优于 CNN。此外，我们还尝试了其他机器学习算法，并发现基于深度学习的模型（CNN 和 DNN）优于这些替代方案。鉴于基于 CNN 的模型比基于 DNN 的模型高出 ≈ 2%，总体而言，我们选择了基于 CNN 的模型进行预测。

TABLE 6 Prediction Results: Average Error (AE) and AE Rate (AER)
表 6 预测结果：平均误差（AE）和平均误差率（AER）

TABLE 7 The v2 and v3, Where v3 Labels are Predicted by Our Model
表 7 v2 和 v3，其中 v3 标签由我们的模型预测

TABLE 8 Prediction Accuracy
表 8 预测准确率

Improvement Impact. With our model, we can assign v3 scores and severity levels to all vulnerabilities in the NVD that only have the v2 scores. For over 74K CVEs with only v2 scores, Table 7 depicts their severity categories under v2 and our predicted v3. We observed that 48K CVEs change severity levels under v3, with 29K CVEs changing severity categories if we consider v2 High and v3 Critical to be equivalent (as v2 lacks a Critical level). Thus, nearly 40% of CVEs have different severity once the severity score is updated with the predicted v3. Overall, the change is skewed towards high severity ratings. We hypothesize that this characteristic is because v3 was designed in part to account for the scope of software affected, which can elevate the severity of a vulnerability when other sensitive systems are involved beyond the immediate vulnerable system. As a result, users of the NVD can better prioritize the vulnerabilities that they analyze and address.
改进影响。使用我们的模型，我们可以将 v3 评分和严重程度级别分配给 NVD 中所有只有 v2 评分的漏洞。对于只有 v2 评分的超过 74K CVE， Table 7 描绘了它们在 v2 和预测的 v3 下的严重程度类别。我们观察到，在 v3 下，48K CVE 的严重程度级别发生了变化，如果我们将 v2 高和 v3 关键视为等效（因为 v2 缺乏关键级别），则 29K CVE 的严重程度类别发生了变化。因此，一旦严重程度评分更新为预测的 v3，近 40%的 CVE 的严重程度就不同了。总体而言，变化倾向于高严重程度评分。我们假设这一特征是因为 v3 部分是为了考虑受影响的软件范围而设计的，当涉及其他敏感系统时，这可以提高漏洞的严重程度，而不仅仅是直接受影响的系统。因此，NVD 的用户可以更好地优先考虑他们分析和解决的漏洞。

The most impacted vulnerabilities by v3 do not adhere to any patterns, as confirmed from the prediction results, highlighting the power of our learning techniques in capturing complex mappings. Note that both the old vulnerabilities mentioned earlier, that are still exploited (i.e., CVE-2011-0997 and CVE-2004-0113), are more properly categorized as critical severity under our model—whereas one was labeled as medium severity, the other was high severity with the v2 labels.
v3 影响最大的漏洞没有遵循任何模式，正如预测结果所确认的，这突显了我们学习技术在捕捉复杂映射方面的强大能力。请注意，之前提到的旧漏洞（即 CVE-2011-0997 和 CVE-2004-0113），尽管仍在被利用，但在我们的模型中更恰当地被归类为严重程度——其中一个是中等严重性，另一个是高严重性，与 v2 标签相对应。

In conducting our v3 extrapolation, we also argue that the predicted labels will help users prioritize vulnerabilities better. In particular, we found that the confidentiality, base score, and integrity are important features that impact the performance of our prediction model, i.e., the degree of information disclosure, the cumulative score of the vulnerability, and the degree of impact on the integrity of the victim. Allodi et al. [45] evaluated information affecting severity assessment. Our work extends their findings by showing which features determine the v3 score of a vulnerability.
在进行我们的 v3 外推时，我们还认为预测标签将帮助用户更好地优先考虑漏洞。特别是，我们发现机密性、基础得分和完整性是影响我们预测模型性能的重要特征，即信息泄露程度、漏洞累积得分以及对受害者完整性的影响程度。Allodi 等人 [45] 评估了影响严重性评估的信息。我们的工作通过展示哪些特征决定了漏洞的 v3 得分，扩展了他们的发现。

Limitations. For the vulnerabilities that do not have their v3 scores, we utilize ML algorithms to approximate their v3 scores from their v2 metrics. However, we acknowledge that the v3 score does not solely dependent on the v2 metrics, as v3 introduces additional parameters for measurement. The overall accuracy of 86.29% in extrapolating v3 score from v2 metrics is credited to the power of the added parameters to the v3 metric, which we do not consider.
局限性。对于没有其 v3 评分的漏洞，我们利用机器学习算法从它们的 v2 指标中近似估计其 v3 评分。然而，我们承认 v3 评分不仅仅依赖于 v2 指标，因为 v3 引入了额外的测量参数。从 v2 指标外推 v3 评分的整体准确率达到 86.29%，归功于 v3 指标中添加的参数的强大作用，这是我们未考虑的。

4.4 Vulnerability Types 4.4 漏洞类型

Inconsistencies. In the NVD, a CVE should be assigned a vulnerability type under the CWE classification [33] to provide users with an overview of the vulnerability nature and risk. Security analysts and developers leverage the vulnerability type to understand attack vectors that may impact their software, types of defenses to deploy, and track shifts in security concerns over time [46]. However, we identified that the CWE field for CVEs is not consistently populated correctly with a CWE-ID value.
不一致性。在 NVD 中，CVE 应根据 CWE 分类分配一个漏洞类型 [33] ，以便为用户提供对漏洞性质和风险的概述。安全分析师和开发人员利用漏洞类型来了解可能影响其软件的攻击向量、要部署的防御类型，并跟踪随时间推移的安全关注点的变化 [46] 。然而，我们发现 CVE 的 CWE 字段并未始终正确地填充 CWE-ID 值。

We found CVEs without CWE values, as well as those with CWE entry as NVD-CWE-Other. By itself, this is missing data—rather than inconsistent, and out of the scope of our investigation (although worth noting for those analyzing NVD vulnerability types). However, we observed that the free-form CVE description (particularly the description provided by one of the vulnerability's evaluators) often contains the CWE-ID. For example, CVE-2007-0838 lists NVD-CWE-Other as its CWE-ID, while its evaluator description includes “CWE-835: Loop with Unreachable Exit Condition (’Infinite Loop’)”. We also observed CVEs that list additionally relevant CWE-IDs in the description beyond those listed in the CWE field. In these cases, the CWE information is accessible in the CVE entry, but inconsistently provided.
我们发现了一些没有 CWE 值的 CVE，以及那些 CWE 条目为 NVD-CWE-Other 的 CVE。这本身是缺失数据——而不是不一致，并且超出了我们的调查范围（尽管对于分析 NVD 漏洞类型的人来说值得关注）。然而，我们观察到，自由形式的 CVE 描述（尤其是漏洞评估者提供的描述）通常包含 CWE-ID。例如，CVE-2007-0838 将其 CWE-ID 列为 NVD-CWE-Other，而其评估者描述包括“CWE-835：无法到达的退出条件的循环（无限循环）”。我们还观察到，CVE 描述中列出了除 CWE 字段中列出的 CWE-ID 之外的相关 CWE-ID。在这些情况下，CWE 信息可在 CVE 条目中访问，但提供不一致。

Identification and Improvement. The CWE-ID follows a standard and distinct format that allows us to easily identify IDs in description strings through a regular expression (i.e., CWE-[0-9]*). For all CVEs, we applied this regular expression to the description strings to extract any CWE-IDs and add them to the set of CWE-IDs listed in the CWE field, if any. From this set of CWE-IDs, we filtered any CWE-ID values that indicate missing or non-specific CWEs (e.g., NVD-CWE-Other). In theory, descriptions could list CWE-IDs that are not relevant to the CVE (e.g., if discussing another vulnerability). However, through manually inspecting a random sample, we did not observe any erroneous cases where the CWE-ID in the description is not correct. Evidently, the CVE description outlines the traces of a vulnerability, which can be used to determine the type of vulnerability. We, therefore, investigated the capability of the CVE descriptions to extrapolate their corresponding types. We did so by utilizing different Natural Language Processing, machine learning, and deep learning techniques.
识别与改进。CWE-ID 遵循标准且独特的格式，使我们能够通过正则表达式（即，CWE-[0-9]*)轻松地在描述字符串中识别 ID。对于所有 CVE，我们将此正则表达式应用于描述字符串以提取任何 CWE-ID，并将其添加到 CWE 字段中列出的 CWE-ID 集合中（如果有的话）。从这个 CWE-ID 集合中，我们过滤掉任何表示缺失或非特定 CWE 的 CWE-ID 值（例如，NVD-CWE-Other）。从理论上讲，描述可以列出与 CVE 不相关的 CWE-ID（例如，如果讨论另一个漏洞）。然而，通过手动检查随机样本，我们没有观察到任何错误的案例，其中描述中的 CWE-ID 是不正确的。显然，CVE 描述概述了漏洞的痕迹，这些痕迹可用于确定漏洞类型。因此，我们研究了 CVE 描述推断其对应类型的能力。我们通过利用不同的自然语言处理、机器学习和深度学习技术来实现这一点。

The crowd-sourced nature of the vulnerabilities devoid the descriptions of a standard descriptive pattern. Therefore, we began by preprocessing the data. Particularly, we unified the cases (convert text to lower case), removed the stop words and special characters (commonly used words that do not affect the meaning of the sentence, e.g., This capability can be accessed is changed to capability access), replaced contractions (e.g., identifier’s is changed to identifier), and tense (past tense is changed to present tense, e.g., used is changed to use). Then, Universal Sentence Encoder [47], a pre-trained transformer that is used to transform the text into high dimensional vector representation depending upon the semantic similarities and clustering, is utilized to represent the descriptions as vectors of size 1×512. The encoded vectors are then used to train and evaluate several machine learning and deep learning techniques, namely, k-Nearest Neighbor (k-NN), CNN, and DNN. We observed that k-NN (k = 1) provides the best results, predicting 151 different types with 65.60% accuracy. While the results seem high considering the number of target classes, they cannot be reliably used given the criticality of the application.
众包性质下的漏洞描述缺乏标准描述模式。因此，我们首先对数据进行预处理。特别是，我们统一了案例（将文本转换为小写），移除了停用词和特殊字符（不影响句子意义的常用词，例如，This capability can be accessed 被改为 capability access），替换了缩写（例如，identifier’s 被改为 identifier），以及时态（过去时改为现在时，例如，used 被改为 use）。然后，使用 Universal Sentence Encoder [47] ，这是一个预训练的转换器，用于根据语义相似性和聚类将文本转换为高维向量表示，将其描述表示为大小为 1×512 的向量。然后，使用编码向量来训练和评估几种机器学习和深度学习技术，即 k-Nearest Neighbor (k-NN)、CNN 和 DNN。我们发现 k-NN（k = 1）提供了最佳结果，预测了 151 种不同类型，准确率为 65.60%。考虑到目标类别的数量，结果似乎很高，但由于应用的关键性，它们不能被可靠地使用。

Improvement Impact. By applying our CWE-ID extraction from CVE descriptions and matching CWE-ID name from the CWE list from their website [48], we correct the CWE field for 2,456 vulnerabilities that do not have their types labeled. These vulnerabilities also include those that already have types assigned. Statistically, the existing database includes 26,312 vulnerabilities with NVD-CWE-Other label, 7,566 with NVD-CWE-noinfo label, and 1,293 with no assigned label, aggregating to ≈31% of all the vulnerabilities. Additionally, we observed that most of the affected CVEs after our inconsistency fixes are those of type NVD-CWE-Others. Our analysis finds appropriate labels for 1,732 of the NVD-CWE-Other vulnerabilities and 14 of both the NVD-CWE-noinfo and unassigned vulnerabilities, making up for ≈5% of those vulnerabilities.
改进影响。通过从 CVE 描述中提取 CWE-ID 并从其网站上的 CWE 列表中匹配 CWE-ID 名称 [48] ，我们纠正了 2,456 个未标记类型的漏洞的 CWE 字段。这些漏洞也包括那些已经分配了类型的漏洞。从统计数据来看，现有数据库包括带有 NVD-CWE-Other 标签的漏洞 26,312 个，带有 NVD-CWE-noinfo 标签的 7,566 个，以及未分配标签的 1,293 个，总计占所有漏洞的 31%。此外，我们还观察到，在解决不一致性后，受影响的大部分 CVE 都是 NVD-CWE-Others 类型的。我们的分析为 1,732 个 NVD-CWE-Other 漏洞和 14 个 NVD-CWE-noinfo 以及未分配标签的漏洞找到了适当的标签，占这些漏洞的 5%。

Limitations. We analyze the description field to obtain information about the weakness of the vulnerability, finding a CWE-ID for 2,456 vulnerabilities. To improve the coverage of the vulnerabilities with inconsistent CWE-ID, it would be essential to employ program analysis techniques, e.g., analyze the code segments before and after the patch.
局限性。我们分析描述字段以获取关于漏洞弱点的信息，为 2,456 个漏洞找到 CWE-ID。为了提高具有不一致 CWE-ID 的漏洞的覆盖率，采用程序分析技术是必要的，例如，分析补丁前后的代码段。

SECTION 5 第五节

Case Studies 案例研究

With an improved and more consistent NVD, we conduct several vulnerability analyses as case studies on the impact of our NVD corrections. For each analysis, we describe what questions are being asked, how the answers might be valuable in practice, the results from the analysis using both the original and rectified NVD data, and the impact of our improvements on the analysis outcome.
通过改进和更一致的 NVD，我们进行了几项漏洞分析作为案例研究，以评估我们的 NVD 修正的影响。对于每一项分析，我们描述了正在提出的问题，这些答案在实践中的价值可能如何，以及使用原始和修正后的 NVD 数据进行的分析结果，以及我们的改进对分析结果的影响。

We recognize that there are a variety of potential analysis directions. This subset is by no means comprehensive, but rather involves informative questions one might reasonably ask when using the CVE fields we investigated from the NVD. We note that security reports are a common practice, and especially in the security industry, where various companies release summaries to highlight the state of the vulnerability reporting, the way we conducted them in this paper. To this end, the ultimate goal of these case studies is to demonstrate how analysis results can be affected by the NVD data issues that we correct, which may impact the state of the eventual produced reports. We also believe that the findings of the study can be leveraged directly by the vulnerability databases to limit the issues therein by programmatically implementing the heuristics developed and pursued in this study to address the underlying sources of inconsistencies.
我们认识到存在多种潜在的分析方向。这个子集远非全面，而是涉及在使用我们从 NVD 调查的 CVE 字段时可能会合理提出的信息性问题。我们注意到安全报告是一种常见做法，尤其是在安全行业，各公司发布总结以突出漏洞报告的状态，包括我们在本文中进行的报告方式。为此，这些案例研究的最终目标是展示分析结果如何受到我们纠正的 NVD 数据问题的影响，这可能会影响最终产生的报告状态。我们相信，该研究的发现可以直接被漏洞数据库利用，通过程序化实施本研究中开发和追求的启发式方法来解决不一致性的根本原因，从而限制其中存在的问题。

5.1 Vulnerability Disclosures
5.1 漏洞披露

RQ1. When are vulnerabilities most frequently disclosed?
RQ1. 漏洞最频繁地是在何时被披露的？

Analysis Value and Rationale. Understanding the times associated with high levels of vulnerability disclosures could shed light on underlying decisions in the disclosure process, as well as the impact of those decisions. The published date from the NVD has been utilized to draw conclusion on vulnerability reporting trends [49], [50]. Additionally, hypothetically, vendors could opt to disclose vulnerabilities at the end of the week or near holidays. As many people (including those working for media organizations) are off of work during subsequent periods, the vulnerabilities may draw less negative attention. As a consequence though, vulnerability remediation may be substantially delayed. It is important to understand if this indeed happens frequently.
分析价值与理由。理解与高漏洞披露水平相关的时间可能有助于揭示披露过程中的潜在决策，以及这些决策的影响。已利用 NVD 的发布日期来分析漏洞报告趋势 [49] ， [50] 。此外，从理论上讲，供应商可以选择在周末或节假日结束时披露漏洞。由于在此后的时间段内许多人（包括为媒体组织工作的人）都休假，漏洞可能吸引较少的负面关注。然而，因此，漏洞修复可能会大大延迟。了解这种情况是否确实经常发生是很重要的。

Analysis Results. Table 9 shows the top 10 dates in terms of the number of vulnerability disclosures (based on our estimated disclosure date), as well as the day of the week for each date. When considering US holidays, we do not notice any particular pattern of pre-holiday disclosures. Rather, several of these top dates are within a couple of weeks after a US holiday, such as Independence Day (7/9/18, 7/5/17, 7/18/17, 7/14/15, and 7/17/18), Labor Day (9/9/14), and New Year's Day (1/17/17 and 1/19/16). Additionally, we note that these dates are primarily on Mondays and Tuesdays. To investigate this observation more broadly, Fig. 2 shows the number of vulnerabilities disclosed on each day of the week. We find that beyond the top 10 dates, vulnerabilities are most frequently disclosed in the first half of a week (with fewer disclosures on Friday or over the weekend). In this analysis, we consider US holidays as most vendors in the NVD are US-based companies. However, we recognize that other nations celebrate many other holidays, and leave a more detailed global analysis for future work. We note that most vulnerabilities are disclosed during reasonable periods, where security professionals can obtain and act on information promptly.
分析结果。 Table 9 展示了按漏洞披露数量（基于我们估计的披露日期）排名前 10 的日期，以及每个日期对应的星期几。在考虑美国假日时，我们没有注意到任何特别的假日前的披露模式。相反，这些排名靠前的日期中有几个是在美国假日后的几周内，例如独立日（2018 年 7 月 9 日、2017 年 7 月 5 日、2017 年 7 月 18 日、2015 年 7 月 14 日和 2018 年 7 月 17 日）、劳动节（2014 年 9 月 9 日）和新年（2017 年 1 月 17 日和 2016 年 1 月 19 日）。此外，我们注意到这些日期主要集中在星期一和星期二。为了更广泛地调查这一观察结果， Fig. 2 展示了每周每天披露的漏洞数量。我们发现，除了排名前 10 的日期外，漏洞最频繁地在每周的前半段披露（周五或周末披露较少）。在本分析中，我们考虑美国假日，因为 NVD 中的大多数供应商都是美国公司。然而，我们认识到其他国家庆祝许多其他假日，并将更详细的全球分析留待未来工作。我们注意到，大多数漏洞都在合理期限内被披露，此时安全专业人士可以及时获取并采取行动。

TABLE 9 Top 10 Dates With the Most Vulnerabilities by CVE Publication and Our Estimated Disclosure Dates (EDD)
表 9 CVE 发布日期和我们的估计披露日期（EDD）中漏洞最多的前 10 个日期

Fig. 2. -
The number of CVEs disclosed per week day (using our estimated disclosure dates) and published to NVD.

Fig. 2. 图 2

The number of CVEs disclosed per week day (using our estimated disclosure dates) and published to NVD.
每周工作日披露的 CVE 数量（使用我们估计的披露日期）以及发布到 NVD 的数量。

Show All

Impact of NVD Data Issues. For top CVE publication dates from Table 9, we observe New Year's Eve as four of the top 10 most active days, whereas it does not appear anywhere among the top 10 dates by our estimated disclosure dates. Most notably, on 12/31/2004, over 1K CVEs were added to the NVD, accounting for over 44% of CVEs for that year. Yet according to our estimated disclosure date, only 175 were publicly disclosed that day. This discrepancy suggests an NVD artifact where a large number of CVEs may be added to the database before a new year arrives, or backdated to the last day of a prior year, rather than a more fundamental aspect of vulnerability reporting. Using the raw NVD data for vulnerability frequency analysis could produce inaccurate conclusions such as high vulnerability reporting during holidays. Similarly, Fig. 2 indicates a more equal distribution of CVE publication dates throughout the week, which would incorrectly suggest many CVEs are indeed disclosed near weekends.
NVD 数据问题的影响。对于前 10 个 CVE 发布日期，我们发现除夕夜是其中最活跃的 4 天之一，而根据我们估计的披露日期，它并未出现在前 10 个日期中。最值得注意的是，在 2004 年 12 月 31 日，NVD 添加了超过 1000 个 CVE，占当年 CVE 总数的 44%以上。然而，根据我们估计的披露日期，当天只有 175 个 CVE 被公开披露。这种差异表明 NVD 存在一个错误，即在新年到来之前或在上一年的最后一天添加大量 CVE 到数据库中，而不是漏洞报告的根本性方面。使用原始 NVD 数据进行漏洞频率分析可能会得出不准确结论，例如假期期间漏洞报告高。同样， Fig. 2 表明 CVE 发布日期在周内分布更加均匀，这会错误地暗示许多 CVE 确实在周末被披露。

Takeaway and Answer to RQ1. While the earlier days of the week show higher vulnerability disclosure trends, the latter days show higher publication trends.
结论与 RQ1 的回答。尽管一周的前几天显示出更高的漏洞披露趋势，但后几天显示出更高的发布趋势。

5.2 Vulnerability Severity
5.2 漏洞严重性

RQ2. What is the severity distribution of vulnerabilities?
RQ2. 漏洞的严重程度分布如何？

Analysis Value. As thousands of vulnerabilities are identified annually, it is vital that security practitioners can prioritize the most severe ones first. Furthermore, understanding what fraction of vulnerabilities receive each severity label allows them to identify how many vulnerabilities they may need to contend with. For the security community, it is also valuable to understand whether disclosed vulnerabilities skew towards low or high severity ones, shedding light on the nature of vulnerabilities being uncovered.
分析值。随着每年数千个漏洞被识别出来，安全从业者能够首先优先处理最严重的漏洞至关重要。此外，了解每个严重性标签分配给漏洞的比例，使他们能够确定他们可能需要应对多少漏洞。对于安全社区来说，了解披露的漏洞是否倾向于低严重性或高严重性，也有助于揭示正在发现的漏洞的性质。

Analysis Results. Recall that in Section 4.3, we augmented the NVD by automatically applying accurate v3 severity ratings to all CVEs, rather than just relying on the most recent CVEs reported since v3 became standard. In Table 10, we present the distribution of CVE severity (across all CVEs in the NVD) for both v2 and our predicted v3. In total, 8.25% of all CVEs are low severity under v2, with the majority as medium severity. In contrast, under our predicted v3, less than 2% are low severity, and the severity distribution is skewed towards the higher end, with the majority of vulnerabilities as high or critical severity. From both the v2 and v3 distributions, the small proportion of low severity vulnerabilities suggests some bias against discovering, reporting, or disclosing less urgent security concerns. However, v3's skew towards high severity ratings could spur different vulnerability remediation behavior, as many vulnerabilities rated as medium under v2 but higher under v3 might have been ignored by security practitioners earlier.
分析结果。回顾 Section 4.3 ，我们通过自动将准确的 v3 严重性评级应用于所有 CVE，而不是仅仅依赖于自 v3 成为标准以来报告的最新的 CVE，从而增强了 NVD。在 Table 10 中，我们展示了 CVE 严重性（在 NVD 中所有 CVE 的范围内）的分布，包括 v2 和我们的预测 v3。总共，8.25%的所有 CVE 在 v2 下属于低严重性，大多数为中等严重性。相比之下，在我们的预测 v3 下，低于 2%的 CVE 属于低严重性，严重性分布偏向高端，大多数漏洞为高或严重严重性。从 v2 和 v3 的分布来看，低严重性漏洞的少量比例表明对发现、报告或披露不那么紧迫的安全问题的偏见。然而，v3 对高严重性评级的倾斜可能会激发不同的漏洞修复行为，因为许多在 v2 下被评为中等但在 v3 下更高的漏洞可能之前被安全从业者忽视了。

TABLE 10 CVSS Severity Score Distributions Over all CVEs
表 10 所有 CVE 的 CVSS 严重程度分数分布

Fig. 3 further breaks down the yearly distribution of CVEs across different severity categories, for v2, v3, and our predicted v3. Using our predicted v3 severity scores, we observe a decreasing trend in the proportion of critical severity CVEs over the years. For example, from 2011 onwards, less than 20% of each year's CVEs were critical, compared to the early 2000s where nearly 30-40% were likewise. This change indicates that the severity distribution of vulnerabilities is shifting over time. While we are uncertain of the cause of this shift, one hypothesis is that the increasing use of program analysis and fuzzing tools may be producing larger vulnerability populations than before, but the number of critical ones remains similar, thus resulting in a smaller proportion. Future work could investigate this phenomenon in more depth.
将 CVEs 在各个严重程度类别中的年度分布进一步细分，针对 v2、v3 以及我们预测的 v3。使用我们预测的 v3 严重程度评分，我们发现关键严重程度 CVEs 的比例在逐年下降。例如，从 2011 年开始，每年 CVEs 中不到 20%是关键的，而相比之下，在 21 世纪初，这一比例接近 30-40%。这种变化表明漏洞的严重程度分布随着时间的推移正在发生变化。虽然我们不确定这种变化的原因，但一个假设是程序分析和模糊测试工具的日益普及可能产生了比以前更大的漏洞群体，但关键漏洞的数量保持相似，因此比例较小。未来的工作可以更深入地研究这一现象。

Fig. 3. -
CVEs Distribution across severity categories over the years with different severity scoring methods; v2, v3, and pv3 (our predicted v3 scores applied to all CVEs in the NVD; Section 4.3). Recall that v3 was only released in 2015, and all CVEs after 2017 were labeled with v3 scores. However, a subset of CVEs before 2017 was retroactively labeled with v3 scores.

Fig. 3. 图 3

CVEs Distribution across severity categories over the years with different severity scoring methods; v2, v3, and pv3 (our predicted v3 scores applied to all CVEs in the NVD; Section 4.3). Recall that v3 was only released in 2015, and all CVEs after 2017 were labeled with v3 scores. However, a subset of CVEs before 2017 was retroactively labeled with v3 scores.
CVEs 在不同严重程度类别中的分布以及不同严重程度评分方法（v2、v3 和 pv3，我们对 NVD 中所有 CVE 应用的预测 v3 评分； Section 4.3 ）。请记住，v3 仅在 2015 年发布，2017 年之后的所有 CVE 都标注了 v3 评分。然而，2017 年之前的一小部分 CVE 被追加了 v3 评分。

Show All

Impact of NVD Data Issues. In NVD, all CVEs since 2017 are assigned v3 scores. However, no CVE before 1999 has an assigned v3 score, and before 2013, no more than 35 CVEs each year have a v3 score retroactively labeled (as v3 was officially released at the end of 2015 [51]). This minority of CVEs with assigned v3 scores is too limited for many analyses. For example, as seen in Fig. 3, CVEs with assigned v3 scores in certain years are unrepresentative of the likely real severity distribution. In 2000-2002, 2004-2006, and 2009, only one severity level appears for all CVEs with assigned v3 scores. While security analysts could rely on v2 instead, v3 was explicitly designed to overcome limitations of v2. Thus, our predicted v3 affords comprehensive severity analysis across the entire NVD dataset. This historical perspective is particularly important as vulnerabilities remain viable for years after disclosure [41].
NVD 数据问题的影响。在 NVD 中，自 2017 年以来所有 CVE 都分配了 v3 评分。然而，1999 年之前的 CVE 没有分配 v3 评分，在 2013 年之前，每年最多只有 35 个 CVE 被回溯性地标注为 v3 评分（因为 v3 在 2015 年底正式发布 [51] ）。这些分配了 v3 评分的 CVE 少数群体对于许多分析来说过于有限。例如，如 Fig. 3 所示，某些年份分配了 v3 评分的 CVE 不能代表可能的真实严重性分布。在 2000-2002 年、2004-2006 年和 2009 年，所有分配了 v3 评分的 CVE 只出现了一个严重性级别。虽然安全分析师可以依赖 v2，但 v3 被明确设计用来克服 v2 的局限性。因此，我们预测的 v3 能够对整个 NVD 数据集进行全面严重性分析。这种历史视角尤其重要，因为漏洞在披露后多年仍具有可行性 [41] 。

Takeaway and Answer to RQ2. While the number of critical vulnerabilities remain similar temporally, their proportion has reduced over time.
RQ2 的结论和答案。虽然关键漏洞的数量在时间上保持相似，但它们的比例随着时间的推移而减少。

5.3 Vulnerability Types 5.3 漏洞类型

RQ3. Which vulnerability type has the most critical vulnerabilities?
RQ3. 哪种漏洞类型具有最关键的漏洞？

Analysis Value. Understanding which vulnerabilities are associated with the most critical CVEs is useful for both security practitioners and researchers, in the sense that our findings would allow them to prioritize which tools or defense systems to invest in or investigate based on such knowledge.
分析价值。了解哪些漏洞与最关键的 CVE 关联对安全实践者和研究人员都有用，从这种意义上说，我们的发现将使他们能够根据这种知识优先考虑投资或研究哪些工具或防御系统。

Analysis Results. Our analysis involves the CWE and CVSS severity fields. In Table 11 we list the top 10 CWE categories by the number of high/critical severity CWEs, using v2, v3, and pv3 severity scores. By both correcting CWE labels and using our predicted v3 scores, we identify that SQL injection has the most critical CVEs, with almost twice as many as the next vulnerability type (buffer overflows). Meanwhile, for high-but-not-critical CVEs, buffer overflows are most common, and SQL injection does not appear within the top 10. This suggests that when SQL injection vulnerabilities are identified, they are typically of the utmost severity.
分析结果。我们的分析涉及 CWE 和 CVSS 严重性字段。在 Table 11 中，我们根据高/严重性 CWE 的数量，使用 v2、v3 和 pv3 严重性评分，列出了前 10 个 CWE 类别。通过纠正 CWE 标签并使用我们预测的 v3 评分，我们发现 SQL 注入具有最关键的 CVE，其数量几乎是下一个漏洞类型（缓冲区溢出）的两倍。同时，对于高但非关键的 CVE，缓冲区溢出最为常见，SQL 注入并未出现在前 10 位。这表明，当 SQL 注入漏洞被识别时，它们通常具有极高的严重性。

TABLE 11 Top 10 Vulnerability Types by the Number of Critical or High Severity CVEs Using v2, v3, and Our Predicted v3 (pv3) Scores
表 11 使用 v2、v3 和我们的预测 v3（pv3）评分按 CVE 数量排名的前 10 种漏洞类型

Impact of NVD Data Issues. Buffer overflow and SQL injection are consistently the most frequent types under v2, v3, and our PV3. However, we note that overall, the top 10 CWE types for our PV3 more closely resembles that of v2, compared to v3. For example, access control, command injection, and hard-coded credentials are in the top 10 v3 critical CVEs, but not in v2 or our PV3. Thus, our corrected NVD results appear more consistent than using the original CWE and v3 NVD labels.
NVD 数据问题的影响。缓冲区溢出和 SQL 注入在 v2、v3 以及我们的 PV3 中始终是最频繁的类型。然而，我们注意到，总体而言，我们的 PV3 的前 10 个 CWE 类型与 v2 更为相似，与 v3 相比。例如，访问控制、命令注入和硬编码凭证是 v3 前 10 个关键 CVE 中的内容，但在 v2 或我们的 PV3 中并不存在。因此，我们修正后的 NVD 结果看起来比使用原始的 CWE 和 v3 NVD 标签更为一致。

Takeaway and Answer to RQ3. The 10 most frequent vulnerability types among the severe vulnerabilities remain the same across CVSS versions.
RQ3 的结论和答案。在严重漏洞中，最常见的 10 种漏洞类型在 CVSS 版本之间保持不变。

5.4 Vendor and Product Names
5.4 供应商和产品名称

RQ4. Which vendors have most CVEs or vulnerable products?
RQ4. 哪些供应商拥有最多的 CVE 或易受攻击的产品？

Analysis Value. Analysts may inform their operation using the vulnerability impact information across vendors, e.g., which vendors to track for new vulnerabilities, or which products to analyze.
分析值。分析师可以使用跨供应商的漏洞影响信息来指导他们的操作，例如，跟踪哪些供应商的新漏洞，或者分析哪些产品。

Analysis Results. Table 12 shows the top 10 vendors per the associated CVEs and affected products, as a count and a fraction of all CVEs and affected products associated with each vendor. The statistics are presented for before and after our NVD corrections, but we will use the post-correction values for our analysis. We observe that the top vendors represent a significant fraction of all CVEs and products. The top 10 vendors account for about 36% of all CVEs and 22% of all products. Thus, the impact of vulnerabilities is concentrated on a small set of vendors, with a long-tail of the remaining less-impact ones. It is also interesting to note that the top vendors by CVE count are quite different than those by the product count, with only 4 common vendors. This difference suggests that the concentration of CVEs among top vendors is not simply due to these vendors supporting a wide number of products.
分析结果。 Table 12 展示了根据相关 CVE 和受影响产品，按供应商排名前十，以及每个供应商关联的所有 CVE 和受影响产品的数量和占比。统计结果包括我们 NVD 修正前后的数据，但我们将使用修正后的值进行分析。我们观察到，排名前位的供应商占所有 CVE 和产品的很大一部分。前十位供应商占所有 CVE 的约 36% 和所有产品的 22%。因此，漏洞的影响集中在少数供应商身上，而其他影响较小的供应商则呈长尾分布。值得注意的是，按 CVE 数量排名的前位供应商与按产品数量排名的前位供应商有很大不同，只有 4 家供应商相同。这种差异表明，CVE 在顶级供应商中的集中并不仅仅是因为这些供应商支持了大量的产品。

TABLE 12 Top 10 Vendors per the Number of Associated CVEs and Affected Products, After and Before Name Corrections (# is a Count and % as a Percent of CVEs or Products Associated With That Vendor)
表 12 按关联 CVE 数量和受影响产品数量排名的前 10 家供应商，在名称修正前后（#表示数量，%表示与该供应商关联的 CVE 或产品的百分比）

Impact of NVD Data Issues. The impact of product and vendor name inconsistencies is less dramatic for this analysis, as ultimately the order of top vendors remains the same before and after corrections. However, the changes in vulnerability counts can be notable. For example, Oracle had over 100 more associated CVEs after our naming fixes, and Debian had 95 more CVEs. Even when the number of CVEs with a mislabeled vendor or product is small, the security risk can be high. In Table 13, we consider all CVEs with the corrected vendor or product label, and break down their severity levels using v2 and our predicted v3. While only several thousand CVEs were mislabeled and subsequently corrected, over a third are high severity under v2 and a quarter are critical under our predicted v3. In total, nearly 1,000 mislabeled CVEs are critically severe. A security analyst tracking a particular product or vendor could easily miss relevant severe vulnerabilities, putting their systems at risk. (After all, it only takes one missed vulnerability to permit a security situation, such as with Equifax [52].)
NVD 数据问题的影响。对于这次分析，由于最终前后的主要供应商排名保持不变，因此产品和供应商名称不一致的影响并不明显。然而，漏洞数量的变化可能相当显著。例如，在命名修复后，Oracle 关联的 CVE 数量增加了 100 多个，Debian 增加了 95 个 CVE。即使误标供应商或产品的 CVE 数量很少，安全风险也可能很高。在 Table 13 中，我们考虑了所有经过修正的供应商或产品标签的 CVE，并使用 v2 和我们的预测 v3 将它们的严重程度进行细分。尽管只有几千个 CVE 被误标并随后进行了修正，但在 v2 下超过三分之一是高严重性，在我们的预测 v3 下有四分之一是关键严重性。总共，近 1000 个误标的 CVE 是关键严重性的。跟踪特定产品或供应商的安全分析师可能会轻易错过相关的严重漏洞，从而使他们的系统面临风险。（毕竟，只要错过一个漏洞就可能导致安全事件，如 Equifax [52] 。）

Takeaway and Answer to RQ4. More than two-thirds of the vulnerabilities with vendor name inconsistency have high/critical severity in the uniform severity scores.
RQ4 的结论和答案。超过三分之二的不一致供应商名称的漏洞在统一严重性评分中具有高/严重级别。

TABLE 13 CVEs With Mislabeled Vendors/Products by Severity Levels Using v2 and Our Predicted v3 (pv3) Labels
表 13：使用 v2 和我们的预测 v3（pv3）标签按严重程度划分的错误标记的 CVEs 及其供应商/产品

SECTION 6 第六节

Discussion 讨论

The Need for a Reliable Vulnerability Database. Given the wide range of applications of vulnerability databases, in both the industry and the research community, the reliability of the information present in them is of the utmost importance. However, some of the key takeaways of this work show that the information in NVD is inconsistent, as demonstrated by the associated quantification, thereby raising questions on NVD's reliability. The inconsistencies are shown to vary, including the delay between a vulnerability's disclosure and its publish date in the NVD, to its vendor and product name, to its severity metrics, to the vulnerability type. With this work, by identifying the inconsistencies, we highlight the pitfalls of using NVD. Given the non-uniform state of the vulnerable systems, inconsistencies in them require manual effort. We conducted a manual investigation and then utilized the efforts to build an automated system to identify inconsistencies. For others, we built automated tools that can be used to recover consistency.
对可靠漏洞数据库的需求。鉴于漏洞数据库在工业界和研究社区中应用广泛，其中所含信息的可靠性至关重要。然而，这项工作的关键发现表明，NVD 中的信息不一致，如相关量化所示，从而对 NVD 的可靠性提出质疑。不一致性表现为多种形式，包括漏洞披露与 NVD 发布日期之间的延迟，到其供应商和产品名称，到其严重性指标，到漏洞类型。通过这项工作，通过识别不一致性，我们突出了使用 NVD 的陷阱。鉴于易受攻击系统的非统一状态，它们的不一致性需要人工努力。我们进行了人工调查，然后利用这些努力构建了一个自动系统来识别不一致性。对于其他人，我们构建了可以用于恢复一致性的自动化工具。

While the estimated disclosure date in this study fundamentally questions the completeness of the NVD, other fixes address NVD's inconsistency. It is argued that the reports listed in the reference links in NVD might not be public or known at the time of their insertion into the NVD. In addition, the vulnerability information can be modified multiple times, as it is the practice with incremental vulnerability reporting. The proposed approach can therefore be utilized to change the estimated disclosure date of the vulnerability during a modification, given such practices and operational caveats. Moreover, recall the presence of inconsistencies identified in the NVD in other vulnerability databases as well, indicating the spread of the inconsistencies, possibly due to information sharing.
尽管本研究中估计的披露日期从根本上质疑了 NVD 的完整性，其他修复措施解决了 NVD 的不一致性。有论点认为，NVD 参考链接中列出的报告可能在它们被插入 NVD 时并非公开或为人所知。此外，漏洞信息可能被修改多次，因为这是增量漏洞报告的惯例。因此，考虑到这些做法和操作注意事项，所提出的方法可以用来在修改期间更改漏洞的估计披露日期。此外，还要记住，在其他的漏洞数据库中也发现了 NVD 中存在的不一致性，这表明不一致性的传播，可能是由于信息共享造成的。

6.1 Prediction Performance
6.1 预测性能

In Table 5, we observed that the movement of v2 vulnerabilities with High severity level is ≈equally split between High and Critical severity levels when transformed to v3. However, the prediction results of the vulnerabilities with no v3 severity in Table 7 shows that the split of v2 vulnerabilities with High severity that transform to critical severity level is ≈twice the number of vulnerabilities that transform to High severity in v3. To ensure the performance of our prediction, we check the behavior of the model for the ground truth dataset. We begin by using our model to predict for the vulnerabilities that have v3 labeled. Table 14 shows the results of this experiment. Recall from Table 5 that only 1% of v2-medium and 9.5% v2-low vulnerabilities transformed to low severity level in v3. We, therefore, see less number of vulnerabilities in the v3 low severity level. Considering that this experiment includes the training dataset, which makes 80% of our overall dataset, we now look into only the testing dataset, removing possible biases.
在 Table 5 中，我们观察到 v2 漏洞中严重程度为高的漏洞在转换为 v3 时，在严重程度为高和严重程度为关键之间 ≈ 平均分配。然而， Table 7 中无 v3 严重程度的漏洞的预测结果显示，将 v2 严重程度为高的漏洞转换为严重程度为关键的漏洞的数量是转换为 v3 中严重程度为高的漏洞数量的 ≈ 两倍。为确保我们预测的性能，我们检查了模型在真实数据集上的行为。我们首先使用我们的模型预测带有 v3 标签的漏洞。 Table 14 显示了这次实验的结果。回顾 Table 5 ，只有 1%的 v2-medium 和 9.5%的 v2-low 漏洞在 v3 中转换为低严重程度。因此，我们在 v3 的低严重程度级别中看到的漏洞数量较少。考虑到这次实验包括训练数据集，占我们整体数据集的 80%，我们现在只关注测试数据集，以消除可能的偏差。

TABLE 14 Ground Truth - Prediction Results
表 14 实际结果 - 预测结果

Table 15 shows the actual representation of the ground truth-testing dataset, while Table 16 shows the movements of the same vulnerabilities by our prediction model. Notice that low severity vulnerabilities in v2 are only 10% of the total testing dataset, out of which only 1.38% of the samples remain in low in v3 leading to most of the low vulnerabilities in v2 moving to medium severity level in v3. In Tables 14 and 16, we see that the v2-high vulnerabilities have proportionally transformed to v3-high and v3-critical. Considering these the only explanation for the presence of ≈twice the number of transformed v3-critical vulnerabilities than v3-high (from v2-high) is the nature of their feature space than possible aberration in our model.
Table 15 展示了真实测试数据集的实际表示，而 Table 16 展示了相同漏洞的预测模型运动。请注意，v2 中的低严重性漏洞仅占总测试数据集的 10%，其中只有 1.38%的样本在 v3 中保持低严重性，导致 v2 中的大多数低严重性漏洞在 v3 中变为中等严重性。在 Tables 14 和 16 中，我们看到 v2 高严重性漏洞成比例地转变为 v3 高严重性和 v3 关键。考虑到这些， ≈ 中 v3 关键漏洞数量是 v3 高（从 v2 高）的两倍，唯一的解释是它们的特征空间性质，而不是我们模型中可能出现的异常。

TABLE 15 Test Dataset - Ground Truth Data
表 15 测试数据集 - 真实数据

TABLE 16 Test Dataset - Prediction Results
表 16 测试数据集 - 预测结果

6.2 Root Cause of Inconsistencies
6.2 不一致性的根本原因

Understanding the root causes of the inconsistencies in NVD can help eliminating them. Our analyses provide various plausible explanations for the root causes of inconsistencies. For vendor/product inconsistencies, we noticed that they were clearly due to the incorrect naming conventions, using developers as vendors, due to vendor acquisitions, and typos by analysts. Among those root causes, the acquisitions are a dynamic root cause, and therefore are difficult to mitigate, while other causes can be addressed by standardizing a nomenclature.
理解 NVD 不一致性的根本原因有助于消除它们。我们的分析提供了各种可能的解释，以说明不一致性的根本原因。对于供应商/产品不一致性，我们发现这明显是由于错误的命名约定、将开发者作为供应商、由于供应商收购以及分析师的打字错误造成的。在这些根本原因中，收购是一个动态的根本原因，因此难以缓解，而其他原因可以通过标准化命名法来解决。

The reason behind the inconsistencies in the v3 severity is the adoption of a new severity scoring system, which was not in existence at the time of scoring the severity of older vulnerabilities. Given the absence of the parameters that differentiate between v3 and v2, v3 was not generalized for those vulnerabilities. However, such generalization was done by the NVD when adopting v2 throughout with a considerable accuracy.⁴ Similarly, by leveraging the deep learning-based algorithms, we determined the v3 labels from the v2 labels. We investigated the severity of the vulnerabilities with a lag between the estimated disclosure date and the NVD date. Fig. 4 shows the average lag, in days, by the different severity levels in the v3, and we observe that the average among the various severity levels ranges between 47.6 days to 66.8 days, thereby demonstrating that the delay in the insertion of vulnerability into the NVD has no relationship with the severity of the vulnerability.
v3 严重性不一致的原因是采用了新的严重性评分系统，该系统在评估旧漏洞严重性时并不存在。由于缺乏区分 v3 和 v2 的参数，v3 没有对这些漏洞进行泛化。然而，NVD 在全面采用 v2 时，以相当高的准确性进行了这种泛化。同样，通过利用基于深度学习的算法，我们从 v2 标签中确定了 v3 标签。我们调查了漏洞严重性与估计披露日期与 NVD 日期之间的滞后。图@1 显示了 v3 中不同严重性级别之间的平均滞后天数，我们发现各种严重性级别之间的平均滞后天数在 47.6 天到 66.8 天之间，从而表明将漏洞插入 NVD 的延迟与漏洞的严重性之间没有关系。

Fig. 4. -
Average lag time by v3 severity level.

Fig. 4. 图 4

Average lag time by v3 severity level.
平均延迟时间按 v3 严重程度级别。

Show All

6.3 Observations: Inconsistent Vendor and Product
6.3 观察结果：供应商和产品不一致

From our analysis, we observed several interesting naming patterns that reflect the complex software ecosystem and highlight difficulties that can arise in managing vendor and product names. For example: ① In the NVD, various entities may be deemed the vendor. Interestingly, a primary software developer is sometimes listed as a vendor, and different maintainers over time may list the same product. For example, Igor Sysoev was the original author of nginx, which is now maintained by nginx.inc, and both of them are listed as vendors with nginx as a product. Additionally, developers can be referenced with variations of their real name, leading to inconsistency (e.g., provos and neilsprovos). Acquired companies can also be listed as products under the acquiring vendor (e.g., ICQ and AOL). Note that our vendor heuristics allow us to select these vendor pairs for manual analysis. ② A vendor could be a parent company while the product is the subsidiary. Here, the subsidiary can be both a vendor (listing its own software) as well as a product, which is also detected by our vendor heuristics. ③ A vendor could change name (e.g., cat became quickheal). We note that our vendor heuristics may catch this if the old and new vendor names share characters or product names, but may miss cases otherwise.
从我们的分析中，我们观察到一些有趣的命名模式，这些模式反映了复杂的软件生态系统，并突出了在管理供应商和产品名称时可能出现的困难。例如：①在 NVD 中，各种实体可能被视为供应商。有趣的是，一个主要软件开发者有时会被列为供应商，而不同的维护者在不同时间可能会列出相同的产品。例如，Igor Sysoev 是 nginx 的原始作者，现在由 nginx.inc 维护，它们都被列为以 nginx 为产品的供应商。此外，开发者可以用他们真实名字的变体来引用，导致不一致（例如，provos 和 neilsprovos）。被收购的公司也可以在收购供应商下作为产品列出（例如，ICQ 和 AOL）。请注意，我们的供应商启发式方法使我们能够选择这些供应商对进行手动分析。②供应商可能是一家母公司，而产品是其子公司。在这里，子公司可以既是供应商（列出其自己的软件），也是产品，这也可以通过我们的供应商启发式方法检测到。③供应商可能会更改名称（例如，cat 变成了 quickheal）。我们注意到，如果新旧供应商名称或产品名称有共享字符，我们的供应商启发式方法可能会捕捉到这种情况，但否则可能会错过。

Thus, the NVD would benefit from defining consistent rules for vendor and product naming, such as on the use of white spaces, special characters, and abbreviations. One path forward would be to require vulnerability reporters to check their name submissions against a tool or online interface that searches existing names that likely match, perhaps using an approach such as our identification method.
因此，NVD 将受益于为供应商和产品命名定义一致的规则，例如关于空格、特殊字符和缩写的使用。一个可行的方案是要求漏洞报告者将他们的名称提交与一个工具或在线界面进行核对，该工具或界面搜索可能匹配的现有名称，或许可以使用我们识别方法的方式进行。

6.4 Applications 6.4 应用

This work highlights inconsistencies in the NVD data fields, and proposes methods to fix them. The diversified inconsistencies warrant multiple tools, dealing with one at a time. As a result, this study can be utilized by the analysts at NVD towards the following goals:
这项工作突出了 NVD 数据字段的不一致性，并提出了修复它们的方法。多样化的不一致性需要多个工具，一次处理一个。因此，这项研究可以被 NVD 的分析师用于以下目标：

The estimated disclosure date identification can enrich the vulnerability report for the end-user's perusal. The tool enables the analysts to scrape through the different vulnerability reports and disclosures from the reference links of the recently added vulnerabilities and notify them of the disclosure date.
估计披露日期识别可以丰富漏洞报告，便于最终用户查阅。该工具使分析师能够通过最近添加漏洞的参考链接，抓取不同的漏洞报告和披露信息，并通知他们披露日期。
While we do not present a fully automated tool to automatically pinpoint inconsistencies in product and vendor names, however, our heuristics to find inconsistent vendor and product names can be leveraged during the vulnerability reporting. We believe that engineering this tool, while interesting in itself, falls well within the contribution of this paper: inconsistency identification, heuristics for remediation, and measurements based on an improved vulnerability database. With such a tool, we envision that the individual reporters (analysts) can enter the vendor and product name according to their perception, and the tool will suggest the appropriate vendor and product name from the generated consistent database. The reporter will then choose the consistent vendor and product name if available. Additionally, the NVD analysts can use the tool to re-assess the vendor and product names towards the generation of CPE URI (both 2.2 and 2.3). Moreover, for new vendor and/or product names, our observed inconsistencies and the root causes can help control the inconsistencies in the future.
虽然我们不提供一种完全自动化的工具来自动识别产品和供应商名称的不一致性，然而，我们的启发式方法在漏洞报告过程中可以用来查找不一致的供应商和产品名称。我们认为，开发这样一个工具虽然本身很有趣，但完全符合本文的贡献：不一致性识别、修复启发式方法和基于改进的漏洞数据库的测量。有了这样的工具，我们设想个人报告者（分析师）可以根据他们的感知输入供应商和产品名称，工具将建议从生成的数据库中选择适当的供应商和产品名称。如果可用，报告者将选择一致性的供应商和产品名称。此外，NVD 分析师可以使用该工具重新评估供应商和产品名称，以生成 CPE URI（2.2 和 2.3）。此外，对于新的供应商和/或产品名称，我们观察到的不一致性和根本原因可以帮助控制未来的不一致性。
Our tool to determine the CVSS v3 metrics can be leveraged for approximating a uniform severity metric and score across vulnerabilities in the database. Moreover, it can be used by the users (analysts) of NVD to prioritize their patching. For example, although the v3 scoring system update affects vulnerabilities that have occurred at least before 2015, the continuous exploitation of older vulnerabilities ascertain the necessity of an updated severity based on the current threats that it poses on the systems.
我们的 CVSS v3 指标确定工具可用于估算数据库中漏洞的统一严重程度指标和评分。此外，NVD 的用户（分析师）可以使用它来优先处理补丁。例如，尽管 v3 评分系统更新影响了至少在 2015 年之前发生的漏洞，但旧漏洞的持续利用确认了根据当前威胁对系统造成的威胁更新严重程度的必要性。

The last point can be made even stronger with some recent evidence suggesting that older vulnerabilities are exploited by the adversaries, where sometimes as old as 14 years (e.g., CVE-2004-0113) are being utilized. While it is true that they should have been patched it earlier, that is not the case, and a reassignment of a lower security vulnerability to critical security to emphasize this scenario would catch the eye of the security analysts.
最后一点可以通过一些近期证据得到进一步加强，这些证据表明对手正在利用一些老旧漏洞，有时甚至可以追溯到 14 年前（例如，CVE-2004-0113），这些漏洞正在被利用。虽然它们确实应该在更早的时候就被修复，但事实并非如此，将一个较低的安全漏洞重新归类为关键安全漏洞以强调这一情况，将引起安全分析师的注意。

Leveraging the improved NVD, we formulate analysis questions as case studies to understand the impact of our corrective measures. Although there were numerous analyses that we came up with, we present the questions that a user might have when using the corrected fields. We observe that while public disclosures happen in the early days of the week, the inclusion of them in the NVD happens on the latter days. Additionally, the high reportage of CVEs on the last day of a year can be due to their retroactive inclusion when only the year was known.
利用改进的 NVD，我们将分析问题作为案例研究来了解我们的纠正措施的影响。尽管我们提出了许多分析，但我们展示了用户在使用修正字段时可能提出的问题。我们观察到，虽然公共披露发生在一周的开始几天，但它们被纳入 NVD 发生在后几天。此外，CVE 在年底的最后一天的高报道可能是因为在只知道年份的情况下，它们被追溯性地纳入。

The temporal analysis of software weakness can help understand the trends to understand the up and the coming vulnerabilities. These emerging software weaknesses may be a result of a recently found attack vector. These can be utilized during the software product development and can help prioritize patching processes, and to emphasize upon, during the various phases of the software development life cycle. A consistent database would give a better picture of the trends, including their exploitation window (depending upon the disclosure date of a vulnerability and the date it is discovered on a host computer).
软件漏洞的时间分析有助于理解趋势，了解上升和即将出现的漏洞。这些新兴的软件漏洞可能是最近发现的攻击向量所致。这些漏洞可以在软件开发过程中被利用，有助于优先处理补丁程序，并在软件开发生命周期的各个阶段强调这一点。一个一致的数据库将更好地描绘趋势，包括它们的利用窗口（取决于漏洞的披露日期和它在宿主计算机上被发现的日期）。

Limitations. To estimate the disclosure date, we consider the domain names representing 85% of the URLs. The reduction of coverage by 15% may lead to an imprecise estimation of the disclosure date. Moreover, vendor and product inconsistency numbers present a lower bound on inconsistencies that NVD may have. During our experimentation, we would not group the vendors if another vendor acquired a probable inconsistent vendor. An approach to improve the bounds would require determining the date of acquisition of the probable inconsistent vendor and then correlating it with their estimated disclosure date.
局限性。为了估计披露日期，我们考虑代表 85% URL 的域名。覆盖率的 15%减少可能导致披露日期估计不准确。此外，供应商和产品不一致的数字为 NVD 可能存在的不一致设定了一个下限。在我们的实验中，如果另一个供应商收购了一个可能不一致的供应商，我们不会将供应商分组。为了提高界限，需要确定可能不一致供应商的收购日期，然后将其与他们的估计披露日期相关联。

SECTION 7 第七节

Conclusion 结论

Given the importance of such a database as NVD for security operations, identifying, measuring, and fixing the inconsistencies is essential, which we pursue through various tools, including multi-sourced web scraping, manual vetting, and deep learning algorithms for the publication date, vendor names, product names, severity categories, and vulnerability types inconsistency remedies. The inconsistency fixed database revealed exciting insights about the NVD and vulnerability reporting in general, and how basing the analysis on the current NVD leads to different conclusions than on the fixed one. The frequent days in estimated public disclosure and published date shows the prevalence of early days in the week (Monday and Tuesday) among disclosure dates and the latter days among publication date in the NVD. The fixed vendor names show decreasing inconsistencies over time, while product names need more attention for better resolution. The v3 fix reveals a better distribution of the v3 metric and the vulnerability type fix identifies additional types, other than the ones listed in the NVD.
鉴于 NVD 等数据库在安全操作中的重要性，识别、衡量和修复不一致性是至关重要的，我们通过各种工具来实现这一目标，包括多源网络爬虫、人工审核以及用于发布日期、供应商名称、产品名称、严重程度类别和漏洞类型不一致性修复的深度学习算法。修复不一致性的数据库揭示了关于 NVD 和一般漏洞报告的令人兴奋的见解，以及基于当前 NVD 的分析与基于修复后的 NVD 得出的结论不同。估计的公开披露日和发布日的频繁日期显示，在披露日期中，周一和周二更为普遍，而在 NVD 的发布日期中，则是后几天更为普遍。修复后的供应商名称显示出随着时间的推移不一致性逐渐减少，而产品名称需要更多关注以获得更好的解析。v3 修复揭示了 v3 指标和漏洞类型修复的更好分布，而漏洞类型修复识别了除了 NVD 中列出的类型之外的其他类型。

BEIHANG UNIVERSITY

Cleaning the NVD: Comprehensive Quality Assessment, Improvements, and Analyses
清洗 NVD：全面质量评估、改进与分析

Abstract:

ISSN Information:

Funding Agency:

Introduction 引言

Related Work 相关工作

Dataset 数据集