{"title":"乌尔都语文本反摘要智能系统","authors":"Muhammad Wasif Bhatti, Muhammad Aslam","doi":"10.1109/CEET1.2019.8711842","DOIUrl":null,"url":null,"abstract":"Text De-Summarization is a method of increasing the document and explains the substantial point of the text. It is very rough assignment for humans to manually explain the central subject from the large article. De- Summarization can be separating into two branches as Abstractive and Extractive approaches. Extractive accumulates the imperative paragraph or sentence from the original document and presents them as an explanation. Urdu inherits a lot of vocabulary from Arabic, Persian and the native languages of South Asia. Due to this effect, Urdu has a complex morphology. In terms of syntax, it has a relatively free word order (Subject, Object, and Verb). Despite spoken by millions of people, Urdu is an under-resourced language in terms of available computational resources. We extent the single document extractive de-summarization methodology for Urdu based on the sentence weight algorithm especially for the news, sports, and health etc. topics. We encapsulate the manuscript by preprocessing (sentence segmentation, tokenization, stop words and lemmatization) and apply sentence weight algorithm.","PeriodicalId":207523,"journal":{"name":"2019 International Conference on Engineering and Emerging Technologies (ICEET)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"ISUTD: Intelligent System for Urdu Text De-Summarization\",\"authors\":\"Muhammad Wasif Bhatti, Muhammad Aslam\",\"doi\":\"10.1109/CEET1.2019.8711842\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text De-Summarization is a method of increasing the document and explains the substantial point of the text. It is very rough assignment for humans to manually explain the central subject from the large article. De- Summarization can be separating into two branches as Abstractive and Extractive approaches. Extractive accumulates the imperative paragraph or sentence from the original document and presents them as an explanation. Urdu inherits a lot of vocabulary from Arabic, Persian and the native languages of South Asia. Due to this effect, Urdu has a complex morphology. In terms of syntax, it has a relatively free word order (Subject, Object, and Verb). Despite spoken by millions of people, Urdu is an under-resourced language in terms of available computational resources. We extent the single document extractive de-summarization methodology for Urdu based on the sentence weight algorithm especially for the news, sports, and health etc. topics. We encapsulate the manuscript by preprocessing (sentence segmentation, tokenization, stop words and lemmatization) and apply sentence weight algorithm.\",\"PeriodicalId\":207523,\"journal\":{\"name\":\"2019 International Conference on Engineering and Emerging Technologies (ICEET)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Engineering and Emerging Technologies (ICEET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEET1.2019.8711842\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Engineering and Emerging Technologies (ICEET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEET1.2019.8711842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ISUTD: Intelligent System for Urdu Text De-Summarization
Text De-Summarization is a method of increasing the document and explains the substantial point of the text. It is very rough assignment for humans to manually explain the central subject from the large article. De- Summarization can be separating into two branches as Abstractive and Extractive approaches. Extractive accumulates the imperative paragraph or sentence from the original document and presents them as an explanation. Urdu inherits a lot of vocabulary from Arabic, Persian and the native languages of South Asia. Due to this effect, Urdu has a complex morphology. In terms of syntax, it has a relatively free word order (Subject, Object, and Verb). Despite spoken by millions of people, Urdu is an under-resourced language in terms of available computational resources. We extent the single document extractive de-summarization methodology for Urdu based on the sentence weight algorithm especially for the news, sports, and health etc. topics. We encapsulate the manuscript by preprocessing (sentence segmentation, tokenization, stop words and lemmatization) and apply sentence weight algorithm.