Enriching and Clustering Short Text Using KNN

Authors

  • Ms. Shalika Assistant Professor, Department of Computer Applications, KIET Group of Institutions, Ghaziabad, India Author
  • Mr. Veepin Kumar Assistant Professor, Department of Information Technology, KIET Group of Institutions, Ghaziabad, India Author

DOI:

https://doi.org/10.47392/irjash.2021.219

Keywords:

Short Text, k-Nearest Neighbor, Semantic Enrichment, Hashing

Abstract

Semantic Hashing technique wraps the meaning of short texts into compressed binary codes. So, to find out that whether two short texts are alike or not in their meaning, their binary codes need to be matched. A deep neural network is used for encoding. Bag-of-words representation of texts is used to train the neural network. Unfortunately, the fundamental semantics are not sufficiently captured by the above mentioned form of representation for short texts such as titles, tweets, or queries. We propose adding additional semantic signals to better group short texts using their meaning. More specifically, we procure the co-occurring terms and concepts of every term in the short text via a knowledge database to further enhance the short text. Additionally, we use a k-Nearest Neighbor based approach id for hashing. Multiple experiments provide evidence that by increasing the number of semantic signals, our neural network is better capable to capture the meaning of short texts, which enables various uses like retrieving information, classifying data, and processing of short texts.

Downloads

Published

2021-07-01