{"title":"一种用于抽取文本摘要的自适应归一化谷歌距离相似度量","authors":"Albaraa Abuobieda, A. H. Osman","doi":"10.1109/ICCIS49240.2020.9257668","DOIUrl":null,"url":null,"abstract":"No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.","PeriodicalId":425637,"journal":{"name":"2020 2nd International Conference on Computer and Information Sciences (ICCIS)","volume":"49 21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Adaptive Normalized Google Distance Similarity Measure for Extractive Text Summarization\",\"authors\":\"Albaraa Abuobieda, A. H. Osman\",\"doi\":\"10.1109/ICCIS49240.2020.9257668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.\",\"PeriodicalId\":425637,\"journal\":{\"name\":\"2020 2nd International Conference on Computer and Information Sciences (ICCIS)\",\"volume\":\"49 21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Computer and Information Sciences (ICCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIS49240.2020.9257668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Computer and Information Sciences (ICCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIS49240.2020.9257668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Adaptive Normalized Google Distance Similarity Measure for Extractive Text Summarization
No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.