一种用于抽取文本摘要的自适应归一化谷歌距离相似度量

Albaraa Abuobieda, A. H. Osman
{"title":"一种用于抽取文本摘要的自适应归一化谷歌距离相似度量","authors":"Albaraa Abuobieda, A. H. Osman","doi":"10.1109/ICCIS49240.2020.9257668","DOIUrl":null,"url":null,"abstract":"No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.","PeriodicalId":425637,"journal":{"name":"2020 2nd International Conference on Computer and Information Sciences (ICCIS)","volume":"49 21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Adaptive Normalized Google Distance Similarity Measure for Extractive Text Summarization\",\"authors\":\"Albaraa Abuobieda, A. H. Osman\",\"doi\":\"10.1109/ICCIS49240.2020.9257668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.\",\"PeriodicalId\":425637,\"journal\":{\"name\":\"2020 2nd International Conference on Computer and Information Sciences (ICCIS)\",\"volume\":\"49 21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Computer and Information Sciences (ICCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIS49240.2020.9257668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Computer and Information Sciences (ICCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIS49240.2020.9257668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

毫无疑问,对于每一种聚类算法运行的相似度计算不当,都会导致聚类精度降低。因此,使用这种算法的一些应用程序受到负面影响并产生不正确的结果。在之前的工作中,我们发现使用归一化谷歌距离(NGD)相似度度量聚类文档的句子来解决文本摘要问题是不合理的;因为NGD基本上是为处理大型数据库而设计的。另一方面,术语加权方法被广泛用于定义文档的内容。本文将术语加权方法与NGD相似度度量相结合,使后者能够在小型数据库(单个文档)中工作。采用差分进化(DE)算法对该方法进行训练和测试。对DUC2002数据集进行了预处理,并作为测试平台。结果表明,本文提出的方法在F-score评价指标上优于前人的工作,同时也优于标准基线方法Microsoft Word和Copernic Summarizer。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Adaptive Normalized Google Distance Similarity Measure for Extractive Text Summarization
No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信