{"title":"Lexical Normalization of Roman Urdu","authors":"Mamoona Tasadduq","doi":"10.1109/INMIC56986.2022.9972968","DOIUrl":null,"url":null,"abstract":"Roman Urdu is an informal form of writing the Urdu language which is written in Latin script. It is the language most widely used on the internet, social media, and text messaging by native Urdu speakers. The problem that arises with Roman Urdu is an inconsistent way of writing by different people. No standard rules are defined for writing Roman Urdu which makes it very difficult to perform Natural Language Processing. To overcome this issue, the text needs to be normalized to perform effective analysis. Therefore, this work provides a Roman Urdu dictionary that works as the foundation for processing Roman Urdu. It also proposes a model for the lexical normalization of Roman Urdu text.","PeriodicalId":404424,"journal":{"name":"2022 24th International Multitopic Conference (INMIC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 24th International Multitopic Conference (INMIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INMIC56986.2022.9972968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Roman Urdu is an informal form of writing the Urdu language which is written in Latin script. It is the language most widely used on the internet, social media, and text messaging by native Urdu speakers. The problem that arises with Roman Urdu is an inconsistent way of writing by different people. No standard rules are defined for writing Roman Urdu which makes it very difficult to perform Natural Language Processing. To overcome this issue, the text needs to be normalized to perform effective analysis. Therefore, this work provides a Roman Urdu dictionary that works as the foundation for processing Roman Urdu. It also proposes a model for the lexical normalization of Roman Urdu text.