{"title":"Using K Nearest Neighbors for text segmentation with feature similarity","authors":"T. Jo","doi":"10.1109/ICCCCEE.2017.7866706","DOIUrl":null,"url":null,"abstract":"In this research, we propose the version of K Nearest Neighbor which considers similarity among attributes for computing the similarity between feature vectors. The text segmentation task is viewed into the binary classification where each pair of sentences or paragraphs is classified into whether we put the boundary or not, and the proposed version resulted in the successful results in previous works concerned with the text categorization and clustering. In this research, we define the similarity measure based on both attributes and values, modify the KNN using it, and apply the modified version into the text segmentation task. We may expect more compact representation of data items and improved performance in the text segmentation task as well as other tasks of text mining. Therefore, the goal of this research is to implement the text segmentation system which provides the benefits.","PeriodicalId":227798,"journal":{"name":"2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCCEE.2017.7866706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
In this research, we propose the version of K Nearest Neighbor which considers similarity among attributes for computing the similarity between feature vectors. The text segmentation task is viewed into the binary classification where each pair of sentences or paragraphs is classified into whether we put the boundary or not, and the proposed version resulted in the successful results in previous works concerned with the text categorization and clustering. In this research, we define the similarity measure based on both attributes and values, modify the KNN using it, and apply the modified version into the text segmentation task. We may expect more compact representation of data items and improved performance in the text segmentation task as well as other tasks of text mining. Therefore, the goal of this research is to implement the text segmentation system which provides the benefits.