Phat Jotikabukkana, Virach Sornlertlamvanich, Okumura Manabu, C. Haruechaiyasak
{"title":"Effectiveness of social media text classification by utilizing the online news category","authors":"Phat Jotikabukkana, Virach Sornlertlamvanich, Okumura Manabu, C. Haruechaiyasak","doi":"10.1109/ICAICTA.2015.7335361","DOIUrl":null,"url":null,"abstract":"Social media text can illustrate significant information of our real social situation. It can show the direction of real-time social movement. However, it has its own characteristics such as using short text and informal language, many unstructured information and argot. This kind of text is hard to classify and difficult to analyze to extract the useful information. In this paper, we propose an effective technique to classify the social media text by utilizing the initial keywords from well-formed sources of data, such as online news. Term frequency-inverse document frequency weighting technique (TF-IDF) and Word Article Matrix (WAM) are used as main methods in this research. We use the extracted keywords from the well-formed source as a main factor to do experiment on Twitter messages. We found a set of the social media keywords can represent the essence of social events and can be used to classify the text effectively.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Social media text can illustrate significant information of our real social situation. It can show the direction of real-time social movement. However, it has its own characteristics such as using short text and informal language, many unstructured information and argot. This kind of text is hard to classify and difficult to analyze to extract the useful information. In this paper, we propose an effective technique to classify the social media text by utilizing the initial keywords from well-formed sources of data, such as online news. Term frequency-inverse document frequency weighting technique (TF-IDF) and Word Article Matrix (WAM) are used as main methods in this research. We use the extracted keywords from the well-formed source as a main factor to do experiment on Twitter messages. We found a set of the social media keywords can represent the essence of social events and can be used to classify the text effectively.