评估文本挖掘算法的结果

IF 1.7 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Triss Ashton, Nicholas E. Evangelopoulos, A. Paswan, V. Prybutok, R. Pavur
{"title":"评估文本挖掘算法的结果","authors":"Triss Ashton, Nicholas E. Evangelopoulos, A. Paswan, V. Prybutok, R. Pavur","doi":"10.1080/2573234x.2020.1785342","DOIUrl":null,"url":null,"abstract":"ABSTRACT There is a surge in the development of decision-oriented analysis tools intended to extract actionable information from text. These tools integrate various text-mining methods that were performance tested in a manner that was often biased toward the new system. Those tests primarily utilised descriptive measurement criteria and test datasets that are inconsistent with most business corpora. We propose and test a user-oriented judgment approach that allows testing under controlled customer-oriented corpora and generates effect size measures. To illustrate the approach, customer relations data was analysed by latent semantic analysis and latent Dirichlet analysis with results evaluated by prospective business analysts. Reporting includes comparisons of results with published literature. While the research centres on the context-region text-mining systems, literature comparisons include word-embedding methods. The analysis concludes that none of the systems reviewed possess a repeatable statistical advantage over the others. Instead, distribution attributes, algorithm configuration, and the evaluation task drive results.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"15 1","pages":"107 - 121"},"PeriodicalIF":1.7000,"publicationDate":"2020-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Assessing text mining algorithm outcomes\",\"authors\":\"Triss Ashton, Nicholas E. Evangelopoulos, A. Paswan, V. Prybutok, R. Pavur\",\"doi\":\"10.1080/2573234x.2020.1785342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT There is a surge in the development of decision-oriented analysis tools intended to extract actionable information from text. These tools integrate various text-mining methods that were performance tested in a manner that was often biased toward the new system. Those tests primarily utilised descriptive measurement criteria and test datasets that are inconsistent with most business corpora. We propose and test a user-oriented judgment approach that allows testing under controlled customer-oriented corpora and generates effect size measures. To illustrate the approach, customer relations data was analysed by latent semantic analysis and latent Dirichlet analysis with results evaluated by prospective business analysts. Reporting includes comparisons of results with published literature. While the research centres on the context-region text-mining systems, literature comparisons include word-embedding methods. The analysis concludes that none of the systems reviewed possess a repeatable statistical advantage over the others. Instead, distribution attributes, algorithm configuration, and the evaluation task drive results.\",\"PeriodicalId\":36417,\"journal\":{\"name\":\"Journal of Business Analytics\",\"volume\":\"15 1\",\"pages\":\"107 - 121\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2020-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Business Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/2573234x.2020.1785342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234x.2020.1785342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 3

摘要

有一个以决策为导向的分析工具旨在从文本中提取可操作的信息的发展激增。这些工具集成了各种文本挖掘方法,这些方法以一种通常偏向于新系统的方式进行了性能测试。这些测试主要使用与大多数业务语料库不一致的描述性度量标准和测试数据集。我们提出并测试了一种面向用户的判断方法,该方法允许在受控的面向客户的语料库下进行测试,并生成效应大小测量。为了说明这种方法,客户关系数据通过潜在语义分析和潜在狄利克雷分析进行分析,结果由潜在业务分析师进行评估。报告包括结果与已发表文献的比较。虽然研究集中在上下文区域文本挖掘系统上,但文献比较包括词嵌入方法。分析得出的结论是,所审查的系统中没有一个比其他系统具有可重复的统计优势。相反,分布属性、算法配置和评估任务驱动结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing text mining algorithm outcomes
ABSTRACT There is a surge in the development of decision-oriented analysis tools intended to extract actionable information from text. These tools integrate various text-mining methods that were performance tested in a manner that was often biased toward the new system. Those tests primarily utilised descriptive measurement criteria and test datasets that are inconsistent with most business corpora. We propose and test a user-oriented judgment approach that allows testing under controlled customer-oriented corpora and generates effect size measures. To illustrate the approach, customer relations data was analysed by latent semantic analysis and latent Dirichlet analysis with results evaluated by prospective business analysts. Reporting includes comparisons of results with published literature. While the research centres on the context-region text-mining systems, literature comparisons include word-embedding methods. The analysis concludes that none of the systems reviewed possess a repeatable statistical advantage over the others. Instead, distribution attributes, algorithm configuration, and the evaluation task drive results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Business Analytics
Journal of Business Analytics Business, Management and Accounting-Management Information Systems
CiteScore
2.50
自引率
0.00%
发文量
13
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信