Jebran Khan, Kashif Ahmad, Senthil Kumar Jagatheesaperumal, Kyung-Ah Sohn
{"title":"社交媒体文本处理应用中的文本变化:挑战、解决方案和趋势","authors":"Jebran Khan, Kashif Ahmad, Senthil Kumar Jagatheesaperumal, Kyung-Ah Sohn","doi":"10.1007/s10462-024-11071-z","DOIUrl":null,"url":null,"abstract":"<div><p>Being an informal communication source, social media text is susceptible to several intentional and unintentional textual variations. These variations lead to various out-of-vocabulary (OOV) words, making social media text processing more challenging. This work analyses and discusses such challenges by providing a detailed overview of different sources of intentional and unintentional OOV words and associated challenges. We provide a detailed survey of pre-processing techniques, including traditional and application-specific methods proposed in the literature to handle intentional and unintentional textual variations, while highlighting their pros and cons. The paper analyses the implications of text normalization (standardization) in different social media text-processing applications. Moreover, the paper provides an overview of the recent challenges and trends in handling social media textual variations, and it is expected to provide a baseline for future research.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 3","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-11071-z.pdf","citationCount":"0","resultStr":"{\"title\":\"Textual variations in social media text processing applications: challenges, solutions, and trends\",\"authors\":\"Jebran Khan, Kashif Ahmad, Senthil Kumar Jagatheesaperumal, Kyung-Ah Sohn\",\"doi\":\"10.1007/s10462-024-11071-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Being an informal communication source, social media text is susceptible to several intentional and unintentional textual variations. These variations lead to various out-of-vocabulary (OOV) words, making social media text processing more challenging. This work analyses and discusses such challenges by providing a detailed overview of different sources of intentional and unintentional OOV words and associated challenges. We provide a detailed survey of pre-processing techniques, including traditional and application-specific methods proposed in the literature to handle intentional and unintentional textual variations, while highlighting their pros and cons. The paper analyses the implications of text normalization (standardization) in different social media text-processing applications. Moreover, the paper provides an overview of the recent challenges and trends in handling social media textual variations, and it is expected to provide a baseline for future research.</p></div>\",\"PeriodicalId\":8449,\"journal\":{\"name\":\"Artificial Intelligence Review\",\"volume\":\"58 3\",\"pages\":\"\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2025-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10462-024-11071-z.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10462-024-11071-z\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-11071-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Textual variations in social media text processing applications: challenges, solutions, and trends
Being an informal communication source, social media text is susceptible to several intentional and unintentional textual variations. These variations lead to various out-of-vocabulary (OOV) words, making social media text processing more challenging. This work analyses and discusses such challenges by providing a detailed overview of different sources of intentional and unintentional OOV words and associated challenges. We provide a detailed survey of pre-processing techniques, including traditional and application-specific methods proposed in the literature to handle intentional and unintentional textual variations, while highlighting their pros and cons. The paper analyses the implications of text normalization (standardization) in different social media text-processing applications. Moreover, the paper provides an overview of the recent challenges and trends in handling social media textual variations, and it is expected to provide a baseline for future research.
期刊介绍:
Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.