Using K Nearest Neighbors for text segmentation with feature similarity

2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE) Pub Date : 1900-01-01 DOI:10.1109/ICCCCEE.2017.7866706

T. Jo

引用次数: 15

Abstract

In this research, we propose the version of K Nearest Neighbor which considers similarity among attributes for computing the similarity between feature vectors. The text segmentation task is viewed into the binary classification where each pair of sentences or paragraphs is classified into whether we put the boundary or not, and the proposed version resulted in the successful results in previous works concerned with the text categorization and clustering. In this research, we define the similarity measure based on both attributes and values, modify the KNN using it, and apply the modified version into the text segmentation task. We may expect more compact representation of data items and improved performance in the text segmentation task as well as other tasks of text mining. Therefore, the goal of this research is to implement the text segmentation system which provides the benefits.

查看原文本刊更多论文

基于特征相似度的K近邻文本分割

在本研究中，我们提出了考虑属性之间相似度的K近邻版本来计算特征向量之间的相似度。将文本分词任务视为二值分类，将每对句子或段落划分为是否设置边界，本文提出的版本取得了以往文本分类和聚类工作的成功成果。在本研究中，我们定义了基于属性和值的相似度度量，使用它来修改KNN，并将修改后的版本应用到文本分割任务中。我们可以期望在文本分割任务以及其他文本挖掘任务中更紧凑的数据项表示和改进的性能。因此，本研究的目标是实现文本分割系统提供的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE)

自引率

0.00%

发文量