{"title":"Penerapan算法支持向量机(SVM)邓根TF-IDF N-Gram untuk文本分类","authors":"N. Arifin, Ultach Enri, Nina Sulistiyowati","doi":"10.30998/string.v6i2.10133","DOIUrl":null,"url":null,"abstract":"Syntax Journal of Informatics is an information system that contains a collection of scientific articles managed by the Informatics Study Program of Singaperbangsa Karawang University. Currently, Syntax Journal of Informatics does not have a feature for categorizing scientific articles based on their focus and scope. The research is conducted to classify scientific articles into categories according to focus and scope contained on Syntax Journal of Informatics’ page automatically by utilizing the text mining process. Text mining is a process that aims to get important information from the text. The method used in the research is Knowledge Discovery in Database (KDD) with stages of data selection, preprocessing, transformation, modeling and evaluation. This study will compare the classifications based on the title of the article. The algorithm used is the Support Vector Machine (SVM) using four SVM kernels, including the linear kernel, polynomial kernel, sigmoid kernel and RBF kernel. Data are divided into four scenarios by using traintestsplit, namely 60:40, 70:30, 80:30 and 90:10. The results of the study after testing the model are measured by of Accuracy, Precision, Recall and F-measure. The best results are accuracy of 70%, precision of 75%, recall of 69% and fmeasure of 71% in the 90:10 comparison scenario and linear kernel.","PeriodicalId":177991,"journal":{"name":"STRING (Satuan Tulisan Riset dan Inovasi Teknologi)","volume":"429 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification\",\"authors\":\"N. Arifin, Ultach Enri, Nina Sulistiyowati\",\"doi\":\"10.30998/string.v6i2.10133\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Syntax Journal of Informatics is an information system that contains a collection of scientific articles managed by the Informatics Study Program of Singaperbangsa Karawang University. Currently, Syntax Journal of Informatics does not have a feature for categorizing scientific articles based on their focus and scope. The research is conducted to classify scientific articles into categories according to focus and scope contained on Syntax Journal of Informatics’ page automatically by utilizing the text mining process. Text mining is a process that aims to get important information from the text. The method used in the research is Knowledge Discovery in Database (KDD) with stages of data selection, preprocessing, transformation, modeling and evaluation. This study will compare the classifications based on the title of the article. The algorithm used is the Support Vector Machine (SVM) using four SVM kernels, including the linear kernel, polynomial kernel, sigmoid kernel and RBF kernel. Data are divided into four scenarios by using traintestsplit, namely 60:40, 70:30, 80:30 and 90:10. The results of the study after testing the model are measured by of Accuracy, Precision, Recall and F-measure. The best results are accuracy of 70%, precision of 75%, recall of 69% and fmeasure of 71% in the 90:10 comparison scenario and linear kernel.\",\"PeriodicalId\":177991,\"journal\":{\"name\":\"STRING (Satuan Tulisan Riset dan Inovasi Teknologi)\",\"volume\":\"429 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"STRING (Satuan Tulisan Riset dan Inovasi Teknologi)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30998/string.v6i2.10133\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"STRING (Satuan Tulisan Riset dan Inovasi Teknologi)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30998/string.v6i2.10133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
摘要
《语法信息学杂志》是由新加坡邦沙卡拉旺大学信息学研究计划管理的一个包含科学论文集合的信息系统。目前,《语法信息学杂志》还没有一个基于主题和范围对科学文章进行分类的功能。本研究利用文本挖掘过程,根据Syntax Journal of Informatics页面所包含的焦点和范围,对科技文章进行自动分类。文本挖掘是一种旨在从文本中获取重要信息的过程。本研究采用的方法是数据库知识发现(Knowledge Discovery in Database, KDD),分为数据选择、预处理、转换、建模和评价四个阶段。本研究将根据文章标题对分类进行比较。所使用的算法是支持向量机(SVM),采用线性核、多项式核、sigmoid核和RBF核四种支持向量机核。使用traintestsplit将数据分为四种场景,分别是60:40、70:30、80:30和90:10。对模型进行检验后的研究结果分别用正确率、精密度、召回率和f -测度来衡量。在90:10的比较场景和线性核的情况下,最佳结果是准确率为70%,精密度为75%,召回率为69%,fmeasure为71%。
Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification
Syntax Journal of Informatics is an information system that contains a collection of scientific articles managed by the Informatics Study Program of Singaperbangsa Karawang University. Currently, Syntax Journal of Informatics does not have a feature for categorizing scientific articles based on their focus and scope. The research is conducted to classify scientific articles into categories according to focus and scope contained on Syntax Journal of Informatics’ page automatically by utilizing the text mining process. Text mining is a process that aims to get important information from the text. The method used in the research is Knowledge Discovery in Database (KDD) with stages of data selection, preprocessing, transformation, modeling and evaluation. This study will compare the classifications based on the title of the article. The algorithm used is the Support Vector Machine (SVM) using four SVM kernels, including the linear kernel, polynomial kernel, sigmoid kernel and RBF kernel. Data are divided into four scenarios by using traintestsplit, namely 60:40, 70:30, 80:30 and 90:10. The results of the study after testing the model are measured by of Accuracy, Precision, Recall and F-measure. The best results are accuracy of 70%, precision of 75%, recall of 69% and fmeasure of 71% in the 90:10 comparison scenario and linear kernel.