Evaluation of Feature Engineering Techniques for Improving CVE Vulnerability Classification

Authors

  • Mounesh Marali Department of computer Science and Engineering, National Institute of Technology Puducherry, Karaikal,India Author
  • Dhanalakshmi R Department of Computer Science and Engineering, Indian Institute of Information Technology Tiruchirappalli, India. Author
  • Narendran Rajagopalan Department of computer Science and Engineering, National Institute of Technology Puducherry, Karaikal,India. Author

DOI:

https://doi.org/10.47392/irjash.2023.S026

Keywords:

CVE, Vulnerability, Feature Engineering, Classification

Abstract

This paper presents a three-stage approach to analyzing Common Vulnerabilities and Exposures (CVE) vulnerability datasets using machine learning techniques. In the first stage, K-Means clustering, and Linear discriminant analysis (LDA) topic modeling are applied to identify distinct clusters and topics within the dataset. The Elbow method is used to determine the optimal number of clusters for K-Means, while Grid Search is used to find the best topic model for LDA. After labeling 100 random samples from each cluster, the data is split into training and testing sets for use in various classification algorithms in the third stage. The paper contributes to the field by proposing a novel approach to analyzing CVE vulnerability datasets that combines clustering and classification techniques. The use of K-Means clustering and LDA topic modeling allows for the identification of distinct clusters and topics within the dataset, which can be used to improve the accuracy of classification algorithms. The study highlights the importance of using pre-trained word embeddings and discusses the limitations of the proposed approach. Overall, the paper provides valuable insights into the analysis of CVE vulnerability datasets and offers a framework for future research in this area.

Downloads

Published

2023-05-28