Rachsuda Jiamthapthaksin, Pisal Setthawong, Nitipan Ratanasawetwad
{"title":"一个基于n-gram标记化的从社交媒体内容中提取流行泰国俚语的系统","authors":"Rachsuda Jiamthapthaksin, Pisal Setthawong, Nitipan Ratanasawetwad","doi":"10.1109/KST.2016.7440478","DOIUrl":null,"url":null,"abstract":"With increased penetration of smart devices and internet connectivity, many Thais are more readily engaged in social media, online forums, and chat groups. As there is an increased consumption of social media content, there is a shift from the consumption of traditional medias in which formal language are used regularly such as broadcast and traditional print medias. Social media posts are a reflection of the trend, where posts usually made by younger generations usually involve communication in slang and non-formal language which is not typically available in formalized dictionaries. As the Thai population like to follow trends, one of behaviors of that many Thai social media users engage in, is to follow the latest popular social media trends in slang and word usage. As slang are changed and evolved over time, it is usually useful to have an online mining tool in which could capture the trends of emerging and popular slang. This paper proposes an approach that extracts popular Thai slang by comparing social media posts and utilizing tokenization, a dictionary based approach to extract unknown words, before expanding it by using n-gram approach to figure what are currently trending and popular slang words.","PeriodicalId":350687,"journal":{"name":"2016 8th International Conference on Knowledge and Smart Technology (KST)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A system for popular Thai slang extraction from social media content with n-gram based tokenization\",\"authors\":\"Rachsuda Jiamthapthaksin, Pisal Setthawong, Nitipan Ratanasawetwad\",\"doi\":\"10.1109/KST.2016.7440478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With increased penetration of smart devices and internet connectivity, many Thais are more readily engaged in social media, online forums, and chat groups. As there is an increased consumption of social media content, there is a shift from the consumption of traditional medias in which formal language are used regularly such as broadcast and traditional print medias. Social media posts are a reflection of the trend, where posts usually made by younger generations usually involve communication in slang and non-formal language which is not typically available in formalized dictionaries. As the Thai population like to follow trends, one of behaviors of that many Thai social media users engage in, is to follow the latest popular social media trends in slang and word usage. As slang are changed and evolved over time, it is usually useful to have an online mining tool in which could capture the trends of emerging and popular slang. This paper proposes an approach that extracts popular Thai slang by comparing social media posts and utilizing tokenization, a dictionary based approach to extract unknown words, before expanding it by using n-gram approach to figure what are currently trending and popular slang words.\",\"PeriodicalId\":350687,\"journal\":{\"name\":\"2016 8th International Conference on Knowledge and Smart Technology (KST)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 8th International Conference on Knowledge and Smart Technology (KST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KST.2016.7440478\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 8th International Conference on Knowledge and Smart Technology (KST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KST.2016.7440478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A system for popular Thai slang extraction from social media content with n-gram based tokenization
With increased penetration of smart devices and internet connectivity, many Thais are more readily engaged in social media, online forums, and chat groups. As there is an increased consumption of social media content, there is a shift from the consumption of traditional medias in which formal language are used regularly such as broadcast and traditional print medias. Social media posts are a reflection of the trend, where posts usually made by younger generations usually involve communication in slang and non-formal language which is not typically available in formalized dictionaries. As the Thai population like to follow trends, one of behaviors of that many Thai social media users engage in, is to follow the latest popular social media trends in slang and word usage. As slang are changed and evolved over time, it is usually useful to have an online mining tool in which could capture the trends of emerging and popular slang. This paper proposes an approach that extracts popular Thai slang by comparing social media posts and utilizing tokenization, a dictionary based approach to extract unknown words, before expanding it by using n-gram approach to figure what are currently trending and popular slang words.