Evaluation of Feature Engineering Techniques for Improving CVE Vulnerability Classification
DOI:
https://doi.org/10.47392/irjash.2023.S026Keywords:
CVE, Vulnerability, Feature Engineering, ClassificationAbstract
This paper presents a three-stage approach to analyzing Common Vulnerabilities and Exposures (CVE) vulnerability datasets using machine learning techniques. In the first stage, K-Means clustering, and Linear discriminant analysis (LDA) topic modeling are applied to identify distinct clusters and topics within the dataset. The Elbow method is used to determine the optimal number of clusters for K-Means, while Grid Search is used to find the best topic model for LDA. After labeling 100 random samples from each cluster, the data is split into training and testing sets for use in various classification algorithms in the third stage. The paper contributes to the field by proposing a novel approach to analyzing CVE vulnerability datasets that combines clustering and classification techniques. The use of K-Means clustering and LDA topic modeling allows for the identification of distinct clusters and topics within the dataset, which can be used to improve the accuracy of classification algorithms. The study highlights the importance of using pre-trained word embeddings and discusses the limitations of the proposed approach. Overall, the paper provides valuable insights into the analysis of CVE vulnerability datasets and offers a framework for future research in this area.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.