{"title":"Phonetic normalization of microtext","authors":"R. Khoury","doi":"10.1145/2808797.2809352","DOIUrl":null,"url":null,"abstract":"Microtext normalization is the challenge of discovering the English words corresponding to the unusually-spelled words used in social-media messages and posts. In this paper, we propose a novel method for doing this by rendering both English and microtext words phonetically based on their spelling, and matching similar ones together. We present our algorithm to learn spelling-to-phonetic probabilities and to efficiently search the English language and match words together. Our results demonstrate that our system correctly handles many types of normalization problems.","PeriodicalId":371988,"journal":{"name":"2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808797.2809352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Microtext normalization is the challenge of discovering the English words corresponding to the unusually-spelled words used in social-media messages and posts. In this paper, we propose a novel method for doing this by rendering both English and microtext words phonetically based on their spelling, and matching similar ones together. We present our algorithm to learn spelling-to-phonetic probabilities and to efficiently search the English language and match words together. Our results demonstrate that our system correctly handles many types of normalization problems.