基于图的阿拉伯语文本摘要方法

Nabil Burmani, H. Alami, Said Lafkiar, Mohamed Zouitni, Mohammed Taleb, Noureddine En Nahnahi
{"title":"基于图的阿拉伯语文本摘要方法","authors":"Nabil Burmani, H. Alami, Said Lafkiar, Mohamed Zouitni, Mohammed Taleb, Noureddine En Nahnahi","doi":"10.1109/ISCV54655.2022.9806127","DOIUrl":null,"url":null,"abstract":"The amount of Arabic textual data is growing tremendously, hence the need to reduce it with the aim to be easier to use while keeping only the necessary from the original text. In this regard, several natural language processing researchers are working on the generation of extractive and abstractive summary tools to achieve this aim. In this work, we explore an extractive approach to realize a generative model of summaries for Arabic single-documents. We focus on the use of graph-based methods to find the most important sentences and then extract them with a variety of text representation methods such as TF-IDF, fastText, and Word2Vec-, similarity measures, and graph ranking methods. To test our system we used the EASC (Essex Arabic Summaries Corpus) and the ROUGE metric to evaluate it. The results obtained show that the TF-IDF representation, the ranking by PageRank, and the use of cosine similarity achieve good performance, which can generate a high-quality summary.","PeriodicalId":426665,"journal":{"name":"2022 International Conference on Intelligent Systems and Computer Vision (ISCV)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph based method for Arabic text summarization\",\"authors\":\"Nabil Burmani, H. Alami, Said Lafkiar, Mohamed Zouitni, Mohammed Taleb, Noureddine En Nahnahi\",\"doi\":\"10.1109/ISCV54655.2022.9806127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of Arabic textual data is growing tremendously, hence the need to reduce it with the aim to be easier to use while keeping only the necessary from the original text. In this regard, several natural language processing researchers are working on the generation of extractive and abstractive summary tools to achieve this aim. In this work, we explore an extractive approach to realize a generative model of summaries for Arabic single-documents. We focus on the use of graph-based methods to find the most important sentences and then extract them with a variety of text representation methods such as TF-IDF, fastText, and Word2Vec-, similarity measures, and graph ranking methods. To test our system we used the EASC (Essex Arabic Summaries Corpus) and the ROUGE metric to evaluate it. The results obtained show that the TF-IDF representation, the ranking by PageRank, and the use of cosine similarity achieve good performance, which can generate a high-quality summary.\",\"PeriodicalId\":426665,\"journal\":{\"name\":\"2022 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"volume\":\"103 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCV54655.2022.9806127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Intelligent Systems and Computer Vision (ISCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCV54655.2022.9806127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

阿拉伯文文本数据的数量正在急剧增加,因此需要减少它,以便更容易使用,同时只保留原始文本中必要的部分。在这方面,一些自然语言处理研究人员正在致力于生成抽取和抽象的摘要工具来实现这一目标。在这项工作中,我们探索了一种提取方法来实现阿拉伯语单一文档摘要的生成模型。我们专注于使用基于图的方法来找到最重要的句子,然后使用各种文本表示方法(如TF-IDF、fastText和Word2Vec-)、相似度度量和图排序方法来提取它们。为了测试我们的系统,我们使用EASC(埃塞克斯阿拉伯语摘要语料库)和ROUGE度量来评估它。结果表明,TF-IDF表示、PageRank排序和余弦相似度的使用均取得了较好的效果,可以生成高质量的摘要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Graph based method for Arabic text summarization
The amount of Arabic textual data is growing tremendously, hence the need to reduce it with the aim to be easier to use while keeping only the necessary from the original text. In this regard, several natural language processing researchers are working on the generation of extractive and abstractive summary tools to achieve this aim. In this work, we explore an extractive approach to realize a generative model of summaries for Arabic single-documents. We focus on the use of graph-based methods to find the most important sentences and then extract them with a variety of text representation methods such as TF-IDF, fastText, and Word2Vec-, similarity measures, and graph ranking methods. To test our system we used the EASC (Essex Arabic Summaries Corpus) and the ROUGE metric to evaluate it. The results obtained show that the TF-IDF representation, the ranking by PageRank, and the use of cosine similarity achieve good performance, which can generate a high-quality summary.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信