Kamran Shaukat Dar, Ahmad Bin Shafat, Muhammad Umair Hassan
{"title":"An efficient stop word elimination algorithm for Urdu language","authors":"Kamran Shaukat Dar, Ahmad Bin Shafat, Muhammad Umair Hassan","doi":"10.1109/ECTICON.2017.8096386","DOIUrl":null,"url":null,"abstract":"Stop words occur multiple times in a document and the occurrence of stop words have least semantic value in the document sentences. These words cover a noteworthy bundle of archives that have no semantic significance. So, the stop words ought to be removed for better language description. In this paper, we have proposed a proficient algorithm which will eliminate the Urdu document stop words. Many considerable efforts have been performed in the areas like natural language processing (NLP), stemming for Urdu language and sentence limit disambiguation. However, there is no such work available for Urdu language that can remove the stop words from an Urdu document. That is the motivation behind this work that we proposed stop words elimination algorithm from Urdu language documents. This is being carried out for the first time in Urdu language by our proposed algorithm.","PeriodicalId":273911,"journal":{"name":"2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECTICON.2017.8096386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Stop words occur multiple times in a document and the occurrence of stop words have least semantic value in the document sentences. These words cover a noteworthy bundle of archives that have no semantic significance. So, the stop words ought to be removed for better language description. In this paper, we have proposed a proficient algorithm which will eliminate the Urdu document stop words. Many considerable efforts have been performed in the areas like natural language processing (NLP), stemming for Urdu language and sentence limit disambiguation. However, there is no such work available for Urdu language that can remove the stop words from an Urdu document. That is the motivation behind this work that we proposed stop words elimination algorithm from Urdu language documents. This is being carried out for the first time in Urdu language by our proposed algorithm.