Implementation of Information Retrieval Using Tf-Idf Weighting Method On Detik.Com’s Website

Arfiani Nur Khusna, I. Agustina
{"title":"Implementation of Information Retrieval Using Tf-Idf Weighting Method On Detik.Com’s Website","authors":"Arfiani Nur Khusna, I. Agustina","doi":"10.1109/TSSA.2018.8708744","DOIUrl":null,"url":null,"abstract":"Information Retrieval is a process to find back the information that is needed by system. News is not only communicated via the print media, but also through online media. The rapid technology makes people more up to date to on news or current information. Detik.com is one of the online news website that serves a variety of the latest information. Based on the results of questionnaires taken from 30 respondents, the results obtained percentage of 100% which states that online news is important But in detik.com website visitors often get articles that are not in accordance with what is referred to, is evidenced by the results of the percentage is 66.7%. It is claimed that the keywords entered are not relevant to the search results. This research was conducted by applying a weighting method TF-IDF (Term Frequency Inverse Document Frequency). There are several preprocessing stages that conducted in the search for relevance weighting value starting from tokenizing process, Sitering process, stemming process followed by a TF-IDF weighting method. The weighting of the results obtained weight value relevance of each article from highest to lowest weight. This research resulted a web applications Information Retrieval on the site detik.com using TF-IDF weighting method. The test results showed recall value of 1 indicating that the relevant articles can be found by the system and the precision value of 0:50 indicates there are relevant articles that are not found in the system. Recall and precision resulted in a value of 1 if the query (keyword) which included having one term (word). Precision low value indicates that the average accuracy of the keywords entered by the article irrelevant search results.","PeriodicalId":159795,"journal":{"name":"2018 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSSA.2018.8708744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Information Retrieval is a process to find back the information that is needed by system. News is not only communicated via the print media, but also through online media. The rapid technology makes people more up to date to on news or current information. Detik.com is one of the online news website that serves a variety of the latest information. Based on the results of questionnaires taken from 30 respondents, the results obtained percentage of 100% which states that online news is important But in detik.com website visitors often get articles that are not in accordance with what is referred to, is evidenced by the results of the percentage is 66.7%. It is claimed that the keywords entered are not relevant to the search results. This research was conducted by applying a weighting method TF-IDF (Term Frequency Inverse Document Frequency). There are several preprocessing stages that conducted in the search for relevance weighting value starting from tokenizing process, Sitering process, stemming process followed by a TF-IDF weighting method. The weighting of the results obtained weight value relevance of each article from highest to lowest weight. This research resulted a web applications Information Retrieval on the site detik.com using TF-IDF weighting method. The test results showed recall value of 1 indicating that the relevant articles can be found by the system and the precision value of 0:50 indicates there are relevant articles that are not found in the system. Recall and precision resulted in a value of 1 if the query (keyword) which included having one term (word). Precision low value indicates that the average accuracy of the keywords entered by the article irrelevant search results.
基于Tf-Idf加权法的Detik.Com网站信息检索实现
信息检索是将系统所需要的信息检索出来的过程。新闻不仅通过纸媒传播,也通过网络媒体传播。快速发展的技术使人们对新闻或当前信息的了解更加及时。Detik.com是一家提供各种最新信息的在线新闻网站。根据对30名受访者的问卷调查结果,得出的结果百分比为100%,这表明网络新闻很重要,但在detik.com网站访问者经常会得到与所提及的内容不相符的文章,这一结果百分比为66.7%。据称,输入的关键字与搜索结果无关。本研究采用TF-IDF (Term Frequency Inverse Document Frequency)加权法进行。在搜索相关权重值的过程中,有几个预处理阶段,从标记化过程、筛选过程、词干处理开始,然后是TF-IDF加权方法。对权重结果进行加权,得到各文章相关度从高到低的权重值。本研究利用TF-IDF加权法对detik.com网站进行了web应用信息检索。测试结果显示,召回值为1,表明系统可以找到相关物品,精度值为0:50,表明系统中存在未找到的相关物品。如果查询(关键字)包含一个词(词),则召回率和精度的值为1。精度低值表示文章输入的关键词与搜索结果不相关的平均精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信