Using K Nearest Neighbors for text segmentation with feature similarity

T. Jo
{"title":"Using K Nearest Neighbors for text segmentation with feature similarity","authors":"T. Jo","doi":"10.1109/ICCCCEE.2017.7866706","DOIUrl":null,"url":null,"abstract":"In this research, we propose the version of K Nearest Neighbor which considers similarity among attributes for computing the similarity between feature vectors. The text segmentation task is viewed into the binary classification where each pair of sentences or paragraphs is classified into whether we put the boundary or not, and the proposed version resulted in the successful results in previous works concerned with the text categorization and clustering. In this research, we define the similarity measure based on both attributes and values, modify the KNN using it, and apply the modified version into the text segmentation task. We may expect more compact representation of data items and improved performance in the text segmentation task as well as other tasks of text mining. Therefore, the goal of this research is to implement the text segmentation system which provides the benefits.","PeriodicalId":227798,"journal":{"name":"2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCCEE.2017.7866706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

In this research, we propose the version of K Nearest Neighbor which considers similarity among attributes for computing the similarity between feature vectors. The text segmentation task is viewed into the binary classification where each pair of sentences or paragraphs is classified into whether we put the boundary or not, and the proposed version resulted in the successful results in previous works concerned with the text categorization and clustering. In this research, we define the similarity measure based on both attributes and values, modify the KNN using it, and apply the modified version into the text segmentation task. We may expect more compact representation of data items and improved performance in the text segmentation task as well as other tasks of text mining. Therefore, the goal of this research is to implement the text segmentation system which provides the benefits.
基于特征相似度的K近邻文本分割
在本研究中,我们提出了考虑属性之间相似度的K近邻版本来计算特征向量之间的相似度。将文本分词任务视为二值分类,将每对句子或段落划分为是否设置边界,本文提出的版本取得了以往文本分类和聚类工作的成功成果。在本研究中,我们定义了基于属性和值的相似度度量,使用它来修改KNN,并将修改后的版本应用到文本分割任务中。我们可以期望在文本分割任务以及其他文本挖掘任务中更紧凑的数据项表示和改进的性能。因此,本研究的目标是实现文本分割系统提供的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信