网络媒体作为价格监视器:使用文本提取技术和Jaro-Winkler相似算法的文本分析

Vivine Nurcahyawati, Z. Mustaffa
{"title":"网络媒体作为价格监视器:使用文本提取技术和Jaro-Winkler相似算法的文本分析","authors":"Vivine Nurcahyawati, Z. Mustaffa","doi":"10.1109/ETCCE51779.2020.9350898","DOIUrl":null,"url":null,"abstract":"Online media has become an essential part of everyday life in modern society. Everyone or organization is free to share their opinions and feelings about any topic on it, including information or news about commodity price fluctuations. Commodity price data from the National Strategic Price Information Center (NSPIC) website is not real-time, so it is not sufficient as a basis for monitoring commodity price fluctuations. Meanwhile, the government needs to collect data and infor-mation quickly about these price fluctuations, hence immediately strategic decisions and policies can be made to stabilize the prices. This study explores the potential function of online media by extracting the text in it and analyzing text so that it can display the commodity price data sought. The commodities used as search keywords were com-modities that had the highest consumption level in 2016 in Indonesia. The texts analyzed were taken from three online media, namely Twit-ter, Liputan6.com, and Detik.com. It was analyzed using text extraction techniques and the application of the Jaro-Winkler algorithm to find commodity prices in the text collection. Then compare the results of text analysis with commodity prices from the NSPIC website. The experimental data were 99,007 with a data collection time of three months. From only 122 data that match the keywords, it consists of 100 training data and 22 testing data. The results of the text analysis show that the text from the Detik.com website shows the commodity prices closest to the price data from the NSPIC, while Twitter shows the farthest results. The accuracy test with the confusion matrix is 75%. Based on this research, online media texts are a viable source for moni-toring commodity price fluctuations.","PeriodicalId":234459,"journal":{"name":"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)","volume":"240 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Online Media as a Price Monitor: Text Analysis using Text Extraction Technique and Jaro-Winkler Similarity Algorithm\",\"authors\":\"Vivine Nurcahyawati, Z. Mustaffa\",\"doi\":\"10.1109/ETCCE51779.2020.9350898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online media has become an essential part of everyday life in modern society. Everyone or organization is free to share their opinions and feelings about any topic on it, including information or news about commodity price fluctuations. Commodity price data from the National Strategic Price Information Center (NSPIC) website is not real-time, so it is not sufficient as a basis for monitoring commodity price fluctuations. Meanwhile, the government needs to collect data and infor-mation quickly about these price fluctuations, hence immediately strategic decisions and policies can be made to stabilize the prices. This study explores the potential function of online media by extracting the text in it and analyzing text so that it can display the commodity price data sought. The commodities used as search keywords were com-modities that had the highest consumption level in 2016 in Indonesia. The texts analyzed were taken from three online media, namely Twit-ter, Liputan6.com, and Detik.com. It was analyzed using text extraction techniques and the application of the Jaro-Winkler algorithm to find commodity prices in the text collection. Then compare the results of text analysis with commodity prices from the NSPIC website. The experimental data were 99,007 with a data collection time of three months. From only 122 data that match the keywords, it consists of 100 training data and 22 testing data. The results of the text analysis show that the text from the Detik.com website shows the commodity prices closest to the price data from the NSPIC, while Twitter shows the farthest results. The accuracy test with the confusion matrix is 75%. Based on this research, online media texts are a viable source for moni-toring commodity price fluctuations.\",\"PeriodicalId\":234459,\"journal\":{\"name\":\"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)\",\"volume\":\"240 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ETCCE51779.2020.9350898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Emerging Technology in Computing, Communication and Electronics (ETCCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCCE51779.2020.9350898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

网络媒体已经成为现代社会日常生活的重要组成部分。每个人或组织都可以自由地分享他们对任何话题的看法和感受,包括有关商品价格波动的信息或新闻。来自国家战略价格信息中心(NSPIC)网站的商品价格数据不是实时的,因此作为监测商品价格波动的依据并不充分。同时,政府需要迅速收集有关这些价格波动的数据和信息,从而可以立即制定战略决策和政策来稳定价格。本研究通过对网络媒体中的文本进行提取和分析,探索网络媒体的潜在功能,使其能够展示所寻求的商品价格数据。作为搜索关键词的商品是2016年印尼消费水平最高的商品。分析的文本来自三个网络媒体,即twitter, Liputan6.com和Detik.com。使用文本提取技术对其进行分析,并应用Jaro-Winkler算法在文本集合中查找商品价格。然后将文本分析结果与NSPIC网站上的商品价格进行比较。实验数据为99,007,数据收集时间为三个月。从122个匹配关键字的数据中,它由100个训练数据和22个测试数据组成。文本分析的结果显示,来自Detik.com网站的文本显示的商品价格最接近NSPIC的价格数据,而Twitter显示的结果最远。混淆矩阵的准确率测试为75%。基于本研究,网络媒体文本是监测商品价格波动的可行来源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Online Media as a Price Monitor: Text Analysis using Text Extraction Technique and Jaro-Winkler Similarity Algorithm
Online media has become an essential part of everyday life in modern society. Everyone or organization is free to share their opinions and feelings about any topic on it, including information or news about commodity price fluctuations. Commodity price data from the National Strategic Price Information Center (NSPIC) website is not real-time, so it is not sufficient as a basis for monitoring commodity price fluctuations. Meanwhile, the government needs to collect data and infor-mation quickly about these price fluctuations, hence immediately strategic decisions and policies can be made to stabilize the prices. This study explores the potential function of online media by extracting the text in it and analyzing text so that it can display the commodity price data sought. The commodities used as search keywords were com-modities that had the highest consumption level in 2016 in Indonesia. The texts analyzed were taken from three online media, namely Twit-ter, Liputan6.com, and Detik.com. It was analyzed using text extraction techniques and the application of the Jaro-Winkler algorithm to find commodity prices in the text collection. Then compare the results of text analysis with commodity prices from the NSPIC website. The experimental data were 99,007 with a data collection time of three months. From only 122 data that match the keywords, it consists of 100 training data and 22 testing data. The results of the text analysis show that the text from the Detik.com website shows the commodity prices closest to the price data from the NSPIC, while Twitter shows the farthest results. The accuracy test with the confusion matrix is 75%. Based on this research, online media texts are a viable source for moni-toring commodity price fluctuations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信