{"title":"Specializing K Nearest Neighbor for Content Based Segmentation of News Article by Graph Similarity Metric","authors":"T. Jo","doi":"10.1109/ICGHIT.2019.00024","DOIUrl":null,"url":null,"abstract":"This research is concerned with the graph based KNN version as a text segmentation tool. The text segmentation is mapped into a binary classification where each paragraph pair is classified into boundary or continuance, and the graph is known as the visualized text representations. In this research, we encode the paragraph pairs which are generated from full texts into graphs, define the similarity between graphs, and modify the KNN algorithm by replacing the existing similarity metric by the proposed one for the text segmentation task. The proposed version is empirically validated as the better one in segmenting news articles. It needs to classify an entire text into its corresponding domain before carrying out the text segmentation.","PeriodicalId":160708,"journal":{"name":"2019 International Conference on Green and Human Information Technology (ICGHIT)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Green and Human Information Technology (ICGHIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICGHIT.2019.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This research is concerned with the graph based KNN version as a text segmentation tool. The text segmentation is mapped into a binary classification where each paragraph pair is classified into boundary or continuance, and the graph is known as the visualized text representations. In this research, we encode the paragraph pairs which are generated from full texts into graphs, define the similarity between graphs, and modify the KNN algorithm by replacing the existing similarity metric by the proposed one for the text segmentation task. The proposed version is empirically validated as the better one in segmenting news articles. It needs to classify an entire text into its corresponding domain before carrying out the text segmentation.