{"title":"Automatic topic identification and classification of text messages in the SMSAll system","authors":"Fahad Pervaiz, L. Subramanian, U. Saif","doi":"10.1145/2160601.2160626","DOIUrl":null,"url":null,"abstract":"This paper presents a way to identify topics and classify text messages in the SMSAll system, which is the Twitter of Pakistan (except over SMS). Among many challenges, one is to develop an unsupervised algorithm for text messages containing Urdu-English words written in roman letters. Still in 1-gram we are able to have 72%, 53% and 58% true positives for popular, medium and rare topics respectively and 48% and 40% true positives in 2 and 3-grams respectively.","PeriodicalId":153059,"journal":{"name":"ACM DEV '12","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM DEV '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2160601.2160626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a way to identify topics and classify text messages in the SMSAll system, which is the Twitter of Pakistan (except over SMS). Among many challenges, one is to develop an unsupervised algorithm for text messages containing Urdu-English words written in roman letters. Still in 1-gram we are able to have 72%, 53% and 58% true positives for popular, medium and rare topics respectively and 48% and 40% true positives in 2 and 3-grams respectively.