{"title":"基于支持向量机快速文本特征扩展的印尼语推文主题分类","authors":"Imaduddin Muhammad Fadhil, Y. Sibaroni","doi":"10.1109/ICoDSA55874.2022.9862899","DOIUrl":null,"url":null,"abstract":"Twitter is a popular social media platform that gives users the ability to send text messages with a maximum length of 280 characters which causes a lot of use of word variations that cause vocabulary writing errors and nowadays more and more tweets are spread and because of the very rapid spread it causes information overload. From the problems raised, it is necessary to be able to recognize words that have errors in writing and categorize tweets into certain categories. Therefore, this study aims to build a topic classification system on tweets that can study writing errors in a word and feature expansion using pretrained from FastText can be used to recognize writing errors in a word because the process of building word vectors from FastText can learn the internal structure of a word that will be used in the Support Vector Machine. The best results from this study get an accuracy of 76.88% with the application of feature expansion on top-1 but the application of feature expansion using pretrained classification Support Vector Machine.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Topic Classification in Indonesian-language Tweets using Fast-Text Feature Expansion with Support Vector Machine (SVM)\",\"authors\":\"Imaduddin Muhammad Fadhil, Y. Sibaroni\",\"doi\":\"10.1109/ICoDSA55874.2022.9862899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is a popular social media platform that gives users the ability to send text messages with a maximum length of 280 characters which causes a lot of use of word variations that cause vocabulary writing errors and nowadays more and more tweets are spread and because of the very rapid spread it causes information overload. From the problems raised, it is necessary to be able to recognize words that have errors in writing and categorize tweets into certain categories. Therefore, this study aims to build a topic classification system on tweets that can study writing errors in a word and feature expansion using pretrained from FastText can be used to recognize writing errors in a word because the process of building word vectors from FastText can learn the internal structure of a word that will be used in the Support Vector Machine. The best results from this study get an accuracy of 76.88% with the application of feature expansion on top-1 but the application of feature expansion using pretrained classification Support Vector Machine.\",\"PeriodicalId\":339135,\"journal\":{\"name\":\"2022 International Conference on Data Science and Its Applications (ICoDSA)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Data Science and Its Applications (ICoDSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICoDSA55874.2022.9862899\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Data Science and Its Applications (ICoDSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDSA55874.2022.9862899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Topic Classification in Indonesian-language Tweets using Fast-Text Feature Expansion with Support Vector Machine (SVM)
Twitter is a popular social media platform that gives users the ability to send text messages with a maximum length of 280 characters which causes a lot of use of word variations that cause vocabulary writing errors and nowadays more and more tweets are spread and because of the very rapid spread it causes information overload. From the problems raised, it is necessary to be able to recognize words that have errors in writing and categorize tweets into certain categories. Therefore, this study aims to build a topic classification system on tweets that can study writing errors in a word and feature expansion using pretrained from FastText can be used to recognize writing errors in a word because the process of building word vectors from FastText can learn the internal structure of a word that will be used in the Support Vector Machine. The best results from this study get an accuracy of 76.88% with the application of feature expansion on top-1 but the application of feature expansion using pretrained classification Support Vector Machine.