基于图的阿拉伯语文本摘要方法

2022 International Conference on Intelligent Systems and Computer Vision (ISCV) Pub Date : 2022-05-18 DOI:10.1109/ISCV54655.2022.9806127

Nabil Burmani, H. Alami, Said Lafkiar, Mohamed Zouitni, Mohammed Taleb, Noureddine En Nahnahi

{"title":"基于图的阿拉伯语文本摘要方法","authors":"Nabil Burmani, H. Alami, Said Lafkiar, Mohamed Zouitni, Mohammed Taleb, Noureddine En Nahnahi","doi":"10.1109/ISCV54655.2022.9806127","DOIUrl":null,"url":null,"abstract":"The amount of Arabic textual data is growing tremendously, hence the need to reduce it with the aim to be easier to use while keeping only the necessary from the original text. In this regard, several natural language processing researchers are working on the generation of extractive and abstractive summary tools to achieve this aim. In this work, we explore an extractive approach to realize a generative model of summaries for Arabic single-documents. We focus on the use of graph-based methods to find the most important sentences and then extract them with a variety of text representation methods such as TF-IDF, fastText, and Word2Vec-, similarity measures, and graph ranking methods. To test our system we used the EASC (Essex Arabic Summaries Corpus) and the ROUGE metric to evaluate it. The results obtained show that the TF-IDF representation, the ranking by PageRank, and the use of cosine similarity achieve good performance, which can generate a high-quality summary.","PeriodicalId":426665,"journal":{"name":"2022 International Conference on Intelligent Systems and Computer Vision (ISCV)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph based method for Arabic text summarization\",\"authors\":\"Nabil Burmani, H. Alami, Said Lafkiar, Mohamed Zouitni, Mohammed Taleb, Noureddine En Nahnahi\",\"doi\":\"10.1109/ISCV54655.2022.9806127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of Arabic textual data is growing tremendously, hence the need to reduce it with the aim to be easier to use while keeping only the necessary from the original text. In this regard, several natural language processing researchers are working on the generation of extractive and abstractive summary tools to achieve this aim. In this work, we explore an extractive approach to realize a generative model of summaries for Arabic single-documents. We focus on the use of graph-based methods to find the most important sentences and then extract them with a variety of text representation methods such as TF-IDF, fastText, and Word2Vec-, similarity measures, and graph ranking methods. To test our system we used the EASC (Essex Arabic Summaries Corpus) and the ROUGE metric to evaluate it. The results obtained show that the TF-IDF representation, the ranking by PageRank, and the use of cosine similarity achieve good performance, which can generate a high-quality summary.\",\"PeriodicalId\":426665,\"journal\":{\"name\":\"2022 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"volume\":\"103 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCV54655.2022.9806127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Intelligent Systems and Computer Vision (ISCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCV54655.2022.9806127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

阿拉伯文文本数据的数量正在急剧增加，因此需要减少它，以便更容易使用，同时只保留原始文本中必要的部分。在这方面，一些自然语言处理研究人员正在致力于生成抽取和抽象的摘要工具来实现这一目标。在这项工作中，我们探索了一种提取方法来实现阿拉伯语单一文档摘要的生成模型。我们专注于使用基于图的方法来找到最重要的句子，然后使用各种文本表示方法(如TF-IDF、fastText和Word2Vec-)、相似度度量和图排序方法来提取它们。为了测试我们的系统，我们使用EASC(埃塞克斯阿拉伯语摘要语料库)和ROUGE度量来评估它。结果表明，TF-IDF表示、PageRank排序和余弦相似度的使用均取得了较好的效果，可以生成高质量的摘要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Graph based method for Arabic text summarization

The amount of Arabic textual data is growing tremendously, hence the need to reduce it with the aim to be easier to use while keeping only the necessary from the original text. In this regard, several natural language processing researchers are working on the generation of extractive and abstractive summary tools to achieve this aim. In this work, we explore an extractive approach to realize a generative model of summaries for Arabic single-documents. We focus on the use of graph-based methods to find the most important sentences and then extract them with a variety of text representation methods such as TF-IDF, fastText, and Word2Vec-, similarity measures, and graph ranking methods. To test our system we used the EASC (Essex Arabic Summaries Corpus) and the ROUGE metric to evaluate it. The results obtained show that the TF-IDF representation, the ranking by PageRank, and the use of cosine similarity achieve good performance, which can generate a high-quality summary.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Intelligent Systems and Computer Vision (ISCV)

自引率

0.00%

发文量