Enriching and Clustering Short Text Using KNN
International Research Journal on Advanced Science Hub,
2021, Volume 3, Issue Special Issue 7S, Pages 111-116
AbstractSemantic Hashing technique wraps the meaning of short texts into compressed binary codes. So, to find out that whether two short texts are alike or not in their meaning, their binary codes need to be matched. A deep neural network is used for encoding. Bag-of-words representation of texts is used to train the neural network. Unfortunately, the fundamental semantics are not sufficiently captured by the above mentioned form of representation for short texts such as titles, tweets, or queries. We propose adding additional semantic signals to better group short texts using their meaning. More specifically, we procure the co-occurring terms and concepts of every term in the short text via a knowledge database to further enhance the short text. Additionally, we use a k-Nearest Neighbor based approach id for hashing. Multiple experiments provide evidence that by increasing the number of semantic signals, our neural network is better capable to capture the meaning of short texts, which enables various uses like retrieving information, classifying data, and processing of short texts.
- Article View: 93
- PDF Download: 83