Performance Analysis of Feature Selection Techniques for Text Classification

Authors

  • Hemlata Patel Student,Dept. of Computer Science& Engineering, Dr. A.P.J. Abdul Kalam University, Indore,MP, India Author
  • Dr. Dhanraj Verma Professor,Dept. of Computer Science& Engineering, Dr. A.P.J. Abdul Kalam University, Indore,MP, India Author

DOI:

https://doi.org/10.47392/irjash.2020.259

Keywords:

Web Data Mining, GINI Index, Information Gain, K- Nearest Neighbour, Support Vector Machine

Abstract

Internet is a suitable, highly available and low cost publishing medium. Therefore a significant data is hosted and published using websites. In this domain some amount of data is directly present for common people and some of data is not publically distributed. Such kinds of data are utilizable by service providers and administrators for business intelligence and other similar applications. In this presented work the web data analysis or mining is the key area of investigation and experimental study. The web data mining can be dividing in three major classes i.e. web content mining, web structure mining and web usages mining. In this work the web content mining and web usages mining is taken into consideration. First of all the web content mining is explored thus a system is developed for making comparative performance study of different content feature selection techniques. In this experiment the GINI index, Information Gain, DFS and Odd Ratio is compared using a real world collection of web pages. In order to classify the extracted features from the web contents the SVM (Support Vector Machine) is applied. The comparative study demonstrates the IG and GI is the suitable feature selection techniques that work well with the SVM classifier.

         

Downloads

Published

2020-12-01