优化多元损失函数的文本量词

IF 0.1 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
ERCIM News Pub Date : 2015-02-19 DOI:10.1145/2700406
Andrea Esuli, F. Sebastiani
{"title":"优化多元损失函数的文本量词","authors":"Andrea Esuli, F. Sebastiani","doi":"10.1145/2700406","DOIUrl":null,"url":null,"abstract":"We address the problem of quantification, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or prevalence) of the class in a dataset of unlabeled items. Quantification has several applications in data and text mining, such as estimating the prevalence of positive reviews in a set of reviews of a given product or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by learning a general-purpose classifier, counting the unlabeled items that have been assigned the class, and tuning the obtained counts according to some heuristics. In this article, we depart from the tradition of using general-purpose classifiers and use instead a supervised learning model for structured prediction, capable of generating classifiers directly optimized for the (multivariate and nonlinear) function used for evaluating quantification accuracy. The experiments that we have run on 5,500 binary high-dimensional datasets (averaging more than 14,000 documents each) show that this method is more accurate, more stable, and more efficient than existing state-of-the-art quantification methods.","PeriodicalId":44543,"journal":{"name":"ERCIM News","volume":"2015 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2015-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2700406","citationCount":"75","resultStr":"{\"title\":\"Optimizing Text Quantifiers for Multivariate Loss Functions\",\"authors\":\"Andrea Esuli, F. Sebastiani\",\"doi\":\"10.1145/2700406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the problem of quantification, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or prevalence) of the class in a dataset of unlabeled items. Quantification has several applications in data and text mining, such as estimating the prevalence of positive reviews in a set of reviews of a given product or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by learning a general-purpose classifier, counting the unlabeled items that have been assigned the class, and tuning the obtained counts according to some heuristics. In this article, we depart from the tradition of using general-purpose classifiers and use instead a supervised learning model for structured prediction, capable of generating classifiers directly optimized for the (multivariate and nonlinear) function used for evaluating quantification accuracy. The experiments that we have run on 5,500 binary high-dimensional datasets (averaging more than 14,000 documents each) show that this method is more accurate, more stable, and more efficient than existing state-of-the-art quantification methods.\",\"PeriodicalId\":44543,\"journal\":{\"name\":\"ERCIM News\",\"volume\":\"2015 1\",\"pages\":\"\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2015-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/2700406\",\"citationCount\":\"75\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERCIM News\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2700406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERCIM News","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2700406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 75

摘要

我们解决了量化问题,这是一个监督学习任务,其目标是给定一个类,估计该类在未标记项目数据集中的相对频率(或流行度)。量化在数据和文本挖掘中有几个应用,例如在给定产品的一组评论中估计积极评论的普遍程度,或者在技术支持电话记录的数据集中估计给定支持问题的普遍程度。到目前为止,量化已经通过学习一个通用分类器来解决,计算已分配给该类的未标记项目,并根据一些启发式方法调整获得的计数。在本文中,我们抛弃了使用通用分类器的传统,转而使用监督学习模型进行结构化预测,能够生成直接针对用于评估量化准确性的(多元和非线性)函数优化的分类器。我们对5,500个二进制高维数据集(平均每个数据集超过14,000个文档)进行的实验表明,这种方法比现有的最先进的量化方法更准确、更稳定、更有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimizing Text Quantifiers for Multivariate Loss Functions
We address the problem of quantification, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or prevalence) of the class in a dataset of unlabeled items. Quantification has several applications in data and text mining, such as estimating the prevalence of positive reviews in a set of reviews of a given product or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by learning a general-purpose classifier, counting the unlabeled items that have been assigned the class, and tuning the obtained counts according to some heuristics. In this article, we depart from the tradition of using general-purpose classifiers and use instead a supervised learning model for structured prediction, capable of generating classifiers directly optimized for the (multivariate and nonlinear) function used for evaluating quantification accuracy. The experiments that we have run on 5,500 binary high-dimensional datasets (averaging more than 14,000 documents each) show that this method is more accurate, more stable, and more efficient than existing state-of-the-art quantification methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ERCIM News
ERCIM News COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信