{"title":"多文件阿拉伯文本摘要","authors":"Mahmoud El-Haj, Udo Kruschwitz, C. Fox","doi":"10.1109/CEEC.2011.5995822","DOIUrl":null,"url":null,"abstract":"In this paper we present our generic extractive Arabic and English multi-document summarisers. We also describe the use of machine translation for evaluating the generated Arabic multi-document summaries using English extractive gold standards. In this work we first address the lack of Arabic multi-document corpora for summarisation and the absence of automatic and manual Arabic gold-standard summaries. These are required to evaluate any automatic Arabic summarisers. Second, we demonstrate the use of Google Translate in creating an Arabic version of the DUC-2002 dataset. The parallel Arabic/English dataset is summarised using the Arabic and English summarisation systems. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.","PeriodicalId":409910,"journal":{"name":"2011 3rd Computer Science and Electronic Engineering Conference (CEEC)","volume":"177 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Multi-document arabic text summarisation\",\"authors\":\"Mahmoud El-Haj, Udo Kruschwitz, C. Fox\",\"doi\":\"10.1109/CEEC.2011.5995822\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present our generic extractive Arabic and English multi-document summarisers. We also describe the use of machine translation for evaluating the generated Arabic multi-document summaries using English extractive gold standards. In this work we first address the lack of Arabic multi-document corpora for summarisation and the absence of automatic and manual Arabic gold-standard summaries. These are required to evaluate any automatic Arabic summarisers. Second, we demonstrate the use of Google Translate in creating an Arabic version of the DUC-2002 dataset. The parallel Arabic/English dataset is summarised using the Arabic and English summarisation systems. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.\",\"PeriodicalId\":409910,\"journal\":{\"name\":\"2011 3rd Computer Science and Electronic Engineering Conference (CEEC)\",\"volume\":\"177 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 3rd Computer Science and Electronic Engineering Conference (CEEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEEC.2011.5995822\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 3rd Computer Science and Electronic Engineering Conference (CEEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEC.2011.5995822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper we present our generic extractive Arabic and English multi-document summarisers. We also describe the use of machine translation for evaluating the generated Arabic multi-document summaries using English extractive gold standards. In this work we first address the lack of Arabic multi-document corpora for summarisation and the absence of automatic and manual Arabic gold-standard summaries. These are required to evaluate any automatic Arabic summarisers. Second, we demonstrate the use of Google Translate in creating an Arabic version of the DUC-2002 dataset. The parallel Arabic/English dataset is summarised using the Arabic and English summarisation systems. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.