Journal Classification Using Cosine Similarity Method on Title and Abstract with Frequency-Based Stopword Removal 

Piska dwi Nurfadila, A. Wibawa, I. Zaeni, A. Nafalski
{"title":"Journal Classification Using Cosine Similarity Method on Title and Abstract with Frequency-Based Stopword Removal ","authors":"Piska dwi Nurfadila, A. Wibawa, I. Zaeni, A. Nafalski","doi":"10.29099/IJAIR.V3I2.99","DOIUrl":null,"url":null,"abstract":"Classification of economic journal articles has been done using the VSM (Vector Space Model) approach and the Cosine Similarity method. The results of previous studies are considered to be less optimal because Stopword Removal was carried out by using a dictionary of basic words (tuning). Therefore, the omitted words limited to only basic words. This study shows the improved performance accuracy of the Cosine Similarity method using frequency-based Stopword Removal. The reason is because the term with a certain frequency is assumed to be an insignificant word and will give less relevant results. Performance testing of the Cosine Similarity method that had been added to frequency-based Stopword Removal was done by using K-fold Cross Validation. The method performance produced accuracy value for 64.28%, precision for 64.76 %, and recall for 65.26%. The execution time after pre-processing was 0, 05033 second.","PeriodicalId":334856,"journal":{"name":"International Journal of Artificial Intelligence Research","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Artificial Intelligence Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29099/IJAIR.V3I2.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Classification of economic journal articles has been done using the VSM (Vector Space Model) approach and the Cosine Similarity method. The results of previous studies are considered to be less optimal because Stopword Removal was carried out by using a dictionary of basic words (tuning). Therefore, the omitted words limited to only basic words. This study shows the improved performance accuracy of the Cosine Similarity method using frequency-based Stopword Removal. The reason is because the term with a certain frequency is assumed to be an insignificant word and will give less relevant results. Performance testing of the Cosine Similarity method that had been added to frequency-based Stopword Removal was done by using K-fold Cross Validation. The method performance produced accuracy value for 64.28%, precision for 64.76 %, and recall for 65.26%. The execution time after pre-processing was 0, 05033 second.
基于频率停止词去除的标题与摘要余弦相似度方法期刊分类
使用向量空间模型(VSM)方法和余弦相似度方法对经济期刊文章进行分类。先前的研究结果被认为不是最优的,因为停止词去除是通过使用基本词的字典(调谐)来进行的。因此,省略的词仅限于基本词。研究表明,基于频率的停词去除方法提高了余弦相似度方法的性能精度。其原因是,具有一定频率的术语被认为是无关紧要的单词,因此给出的相关结果较少。使用K-fold交叉验证对添加到基于频率的停词去除中的余弦相似度方法进行了性能测试。准确度为64.28%,精密度为64.76%,召回率为65.26%。预处理后的执行时间为0,05033秒。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信