Penerapan算法支持向量机(SVM)邓根TF-IDF N-Gram untuk文本分类

N. Arifin, Ultach Enri, Nina Sulistiyowati
{"title":"Penerapan算法支持向量机(SVM)邓根TF-IDF N-Gram untuk文本分类","authors":"N. Arifin, Ultach Enri, Nina Sulistiyowati","doi":"10.30998/string.v6i2.10133","DOIUrl":null,"url":null,"abstract":"Syntax Journal of Informatics is an information system that contains a collection of scientific articles managed by the Informatics Study Program of Singaperbangsa Karawang University. Currently, Syntax Journal of Informatics does not have a feature for categorizing scientific articles based on their focus and scope. The research is conducted to classify scientific articles into categories according to focus and scope contained on Syntax Journal of Informatics’ page automatically by utilizing the text mining process. Text mining is a process that aims to get important information from the text. The method used in the research is Knowledge Discovery in Database (KDD) with stages of data selection, preprocessing, transformation, modeling and evaluation. This study will compare the classifications based on the title of the article. The algorithm used is the Support Vector Machine (SVM) using four SVM kernels, including the linear kernel, polynomial kernel, sigmoid kernel and RBF kernel. Data are divided into four scenarios by using traintestsplit, namely 60:40, 70:30, 80:30 and 90:10. The results of the study after testing the model are measured by of Accuracy, Precision, Recall and F-measure. The best results are accuracy of 70%, precision of 75%, recall of 69% and fmeasure of 71% in the 90:10 comparison scenario and linear kernel.","PeriodicalId":177991,"journal":{"name":"STRING (Satuan Tulisan Riset dan Inovasi Teknologi)","volume":"429 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification\",\"authors\":\"N. Arifin, Ultach Enri, Nina Sulistiyowati\",\"doi\":\"10.30998/string.v6i2.10133\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Syntax Journal of Informatics is an information system that contains a collection of scientific articles managed by the Informatics Study Program of Singaperbangsa Karawang University. Currently, Syntax Journal of Informatics does not have a feature for categorizing scientific articles based on their focus and scope. The research is conducted to classify scientific articles into categories according to focus and scope contained on Syntax Journal of Informatics’ page automatically by utilizing the text mining process. Text mining is a process that aims to get important information from the text. The method used in the research is Knowledge Discovery in Database (KDD) with stages of data selection, preprocessing, transformation, modeling and evaluation. This study will compare the classifications based on the title of the article. The algorithm used is the Support Vector Machine (SVM) using four SVM kernels, including the linear kernel, polynomial kernel, sigmoid kernel and RBF kernel. Data are divided into four scenarios by using traintestsplit, namely 60:40, 70:30, 80:30 and 90:10. The results of the study after testing the model are measured by of Accuracy, Precision, Recall and F-measure. The best results are accuracy of 70%, precision of 75%, recall of 69% and fmeasure of 71% in the 90:10 comparison scenario and linear kernel.\",\"PeriodicalId\":177991,\"journal\":{\"name\":\"STRING (Satuan Tulisan Riset dan Inovasi Teknologi)\",\"volume\":\"429 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"STRING (Satuan Tulisan Riset dan Inovasi Teknologi)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30998/string.v6i2.10133\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"STRING (Satuan Tulisan Riset dan Inovasi Teknologi)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30998/string.v6i2.10133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

《语法信息学杂志》是由新加坡邦沙卡拉旺大学信息学研究计划管理的一个包含科学论文集合的信息系统。目前,《语法信息学杂志》还没有一个基于主题和范围对科学文章进行分类的功能。本研究利用文本挖掘过程,根据Syntax Journal of Informatics页面所包含的焦点和范围,对科技文章进行自动分类。文本挖掘是一种旨在从文本中获取重要信息的过程。本研究采用的方法是数据库知识发现(Knowledge Discovery in Database, KDD),分为数据选择、预处理、转换、建模和评价四个阶段。本研究将根据文章标题对分类进行比较。所使用的算法是支持向量机(SVM),采用线性核、多项式核、sigmoid核和RBF核四种支持向量机核。使用traintestsplit将数据分为四种场景,分别是60:40、70:30、80:30和90:10。对模型进行检验后的研究结果分别用正确率、精密度、召回率和f -测度来衡量。在90:10的比较场景和线性核的情况下,最佳结果是准确率为70%,精密度为75%,召回率为69%,fmeasure为71%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification
Syntax Journal of Informatics is an information system that contains a collection of scientific articles managed by the Informatics Study Program of Singaperbangsa Karawang University. Currently, Syntax Journal of Informatics does not have a feature for categorizing scientific articles based on their focus and scope. The research is conducted to classify scientific articles into categories according to focus and scope contained on Syntax Journal of Informatics’ page automatically by utilizing the text mining process. Text mining is a process that aims to get important information from the text. The method used in the research is Knowledge Discovery in Database (KDD) with stages of data selection, preprocessing, transformation, modeling and evaluation. This study will compare the classifications based on the title of the article. The algorithm used is the Support Vector Machine (SVM) using four SVM kernels, including the linear kernel, polynomial kernel, sigmoid kernel and RBF kernel. Data are divided into four scenarios by using traintestsplit, namely 60:40, 70:30, 80:30 and 90:10. The results of the study after testing the model are measured by of Accuracy, Precision, Recall and F-measure. The best results are accuracy of 70%, precision of 75%, recall of 69% and fmeasure of 71% in the 90:10 comparison scenario and linear kernel.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信