使用有限的少数族裔课堂数据进行学习

T. Khoshgoftaar, Chris Seiffert, J. V. Hulse, Amri Napolitano, A. Folleco
{"title":"使用有限的少数族裔课堂数据进行学习","authors":"T. Khoshgoftaar, Chris Seiffert, J. V. Hulse, Amri Napolitano, A. Folleco","doi":"10.1109/ICMLA.2007.76","DOIUrl":null,"url":null,"abstract":"A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This work presents a comprehensive empirical study of learning when examples from one class are extremely rare, but examples of the other class(es) are plentiful. Specifically, we address the issue of how many examples from the abundant class should be used when training a classifier on data where one class is very rare. Nearly one million classifiers were built and evaluated to generate the results presented in this work. Our results demonstrate that the often used 'even distribution' is not optimal when dealing with such rare events.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"103","resultStr":"{\"title\":\"Learning with limited minority class data\",\"authors\":\"T. Khoshgoftaar, Chris Seiffert, J. V. Hulse, Amri Napolitano, A. Folleco\",\"doi\":\"10.1109/ICMLA.2007.76\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This work presents a comprehensive empirical study of learning when examples from one class are extremely rare, but examples of the other class(es) are plentiful. Specifically, we address the issue of how many examples from the abundant class should be used when training a classifier on data where one class is very rare. Nearly one million classifiers were built and evaluated to generate the results presented in this work. Our results demonstrate that the often used 'even distribution' is not optimal when dealing with such rare events.\",\"PeriodicalId\":448863,\"journal\":{\"name\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"103\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2007.76\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 103

摘要

数据挖掘和机器学习中的一个实际问题是数据的有限可用性。例如,在二元分类问题中,通常会出现一类的例子很丰富,而另一类的例子却很短缺的情况。由于收集这些示例所需的财务成本或时间,来自一个类别(通常是正面类别)的示例可能会受到限制。这项工作提出了一个全面的学习实证研究,当一个类的例子非常罕见,但另一个类的例子是丰富的。具体来说,我们解决的问题是,当在一个类别非常罕见的数据上训练分类器时,应该使用多少个来自丰富类别的示例。近一百万个分类器被构建和评估,以产生本工作中呈现的结果。我们的结果表明,通常使用的“均匀分布”在处理此类罕见事件时不是最优的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning with limited minority class data
A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This work presents a comprehensive empirical study of learning when examples from one class are extremely rare, but examples of the other class(es) are plentiful. Specifically, we address the issue of how many examples from the abundant class should be used when training a classifier on data where one class is very rare. Nearly one million classifiers were built and evaluated to generate the results presented in this work. Our results demonstrate that the often used 'even distribution' is not optimal when dealing with such rare events.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信