一种用于抽取文本摘要的自适应归一化谷歌距离相似度量

2020 2nd International Conference on Computer and Information Sciences (ICCIS) Pub Date : 2020-10-13 DOI:10.1109/ICCIS49240.2020.9257668

Albaraa Abuobieda, A. H. Osman

{"title":"一种用于抽取文本摘要的自适应归一化谷歌距离相似度量","authors":"Albaraa Abuobieda, A. H. Osman","doi":"10.1109/ICCIS49240.2020.9257668","DOIUrl":null,"url":null,"abstract":"No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.","PeriodicalId":425637,"journal":{"name":"2020 2nd International Conference on Computer and Information Sciences (ICCIS)","volume":"49 21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Adaptive Normalized Google Distance Similarity Measure for Extractive Text Summarization\",\"authors\":\"Albaraa Abuobieda, A. H. Osman\",\"doi\":\"10.1109/ICCIS49240.2020.9257668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.\",\"PeriodicalId\":425637,\"journal\":{\"name\":\"2020 2nd International Conference on Computer and Information Sciences (ICCIS)\",\"volume\":\"49 21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Computer and Information Sciences (ICCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIS49240.2020.9257668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Computer and Information Sciences (ICCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIS49240.2020.9257668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

毫无疑问，对于每一种聚类算法运行的相似度计算不当，都会导致聚类精度降低。因此，使用这种算法的一些应用程序受到负面影响并产生不正确的结果。在之前的工作中，我们发现使用归一化谷歌距离(NGD)相似度度量聚类文档的句子来解决文本摘要问题是不合理的;因为NGD基本上是为处理大型数据库而设计的。另一方面，术语加权方法被广泛用于定义文档的内容。本文将术语加权方法与NGD相似度度量相结合，使后者能够在小型数据库(单个文档)中工作。采用差分进化(DE)算法对该方法进行训练和测试。对DUC2002数据集进行了预处理，并作为测试平台。结果表明，本文提出的方法在F-score评价指标上优于前人的工作，同时也优于标准基线方法Microsoft Word和Copernic Summarizer。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Adaptive Normalized Google Distance Similarity Measure for Extractive Text Summarization

No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 2nd International Conference on Computer and Information Sciences (ICCIS)

自引率

0.00%

发文量