{"title":"Feature Expansion with Word2Vec for Topic Classification with Gradient Boosted Decision Tree on Twitter","authors":"Dhuhita Trias Maulidia, Erwin Budi Setiawan","doi":"10.1109/ICoDSA55874.2022.9862907","DOIUrl":null,"url":null,"abstract":"Online Social Networks have an essential role as a source of information, especially during an emergency. One of them is Twitter, a service that allows users to send and read messages but is limited in character. Thus, tweets that are written are very short and do not always use the correct grammar and use many variations of words. Using word variations can increase the likelihood of vocabulary mismatches and make tweets difficult to understand. One solution to overcome this problem is to expand the features of the tweet. The feature expansion on Twitter is a semantic addition to the process of multiplying the original text to make it look like large text. In this study, Word2Vec will be used with the Gradient Boosted Decision Tree Method to classify it. The expected result of this research is to reduce words or sentences in the classification of Twitter topics which are evaluated using the accuracy value, F1-Measure. The highest accuracy value in the application of feature expansion using Word2Vec with the Gradient Boosted Decision Tree classification method is 85.44%.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Data Science and Its Applications (ICoDSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDSA55874.2022.9862907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Online Social Networks have an essential role as a source of information, especially during an emergency. One of them is Twitter, a service that allows users to send and read messages but is limited in character. Thus, tweets that are written are very short and do not always use the correct grammar and use many variations of words. Using word variations can increase the likelihood of vocabulary mismatches and make tweets difficult to understand. One solution to overcome this problem is to expand the features of the tweet. The feature expansion on Twitter is a semantic addition to the process of multiplying the original text to make it look like large text. In this study, Word2Vec will be used with the Gradient Boosted Decision Tree Method to classify it. The expected result of this research is to reduce words or sentences in the classification of Twitter topics which are evaluated using the accuracy value, F1-Measure. The highest accuracy value in the application of feature expansion using Word2Vec with the Gradient Boosted Decision Tree classification method is 85.44%.