{"title":"Automatic Arabic text summarization using clustering and keyphrase extraction","authors":"Hamzah Noori Fejer, N. Omar","doi":"10.1109/ICIMU.2014.7066647","DOIUrl":null,"url":null,"abstract":"As the number of electronic documents increases rapidly, the need for faster techniques to assess the relevance of these documents emerges. A summary is a concise representation of underlying text. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. This paper propose a hybrid clustering method(partitioning and hierarchical) to group many Arabic documents into several clusters .Then keyphrase extraction module is applied to extract important Keyphrases from each cluster, which helps identify the most important sentences and find similar sentences based on several similarity algorithms. It applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed for both single-and multi-document Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) matrix used for the evaluation. For the summarization dataset, Essex Arabic Summaries Corpus was used. It has many topic based articles with multiple human summaries. This model achieved an accuracy of 80 % for single-document and 62% for multi-document summarization.","PeriodicalId":408534,"journal":{"name":"Proceedings of the 6th International Conference on Information Technology and Multimedia","volume":"241 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Information Technology and Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIMU.2014.7066647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
As the number of electronic documents increases rapidly, the need for faster techniques to assess the relevance of these documents emerges. A summary is a concise representation of underlying text. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. This paper propose a hybrid clustering method(partitioning and hierarchical) to group many Arabic documents into several clusters .Then keyphrase extraction module is applied to extract important Keyphrases from each cluster, which helps identify the most important sentences and find similar sentences based on several similarity algorithms. It applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed for both single-and multi-document Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) matrix used for the evaluation. For the summarization dataset, Essex Arabic Summaries Corpus was used. It has many topic based articles with multiple human summaries. This model achieved an accuracy of 80 % for single-document and 62% for multi-document summarization.