无限不平衡下的线性分类器

P. Glasserman, Mike Li
{"title":"无限不平衡下的线性分类器","authors":"P. Glasserman, Mike Li","doi":"10.2139/ssrn.3863653","DOIUrl":null,"url":null,"abstract":"<br>We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of the other remains fixed. The coefficients of the classifier minimize an expected loss specified through a weight function. We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite limit under infinite imbalance, extending prior work on logistic regression. The limit depends on the left tail of the weight function, for which we distinguish three cases: bounded, asymptotically polynomial, and asymptotically exponential. The limiting coefficient vectors reflect robustness or conservatism properties in the sense that they optimize against certain worst-case alternatives. In the bounded and polynomial cases, the limit is equivalent to an implicit choice of upsampling distribution for the minority class. We apply these ideas in a credit risk setting, with particular emphasis on performance in the high-sensitivity and high-specificity regions.","PeriodicalId":251522,"journal":{"name":"Risk Management & Analysis in Financial Institutions eJournal","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Linear Classifiers Under Infinite Imbalance\",\"authors\":\"P. Glasserman, Mike Li\",\"doi\":\"10.2139/ssrn.3863653\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<br>We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of the other remains fixed. The coefficients of the classifier minimize an expected loss specified through a weight function. We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite limit under infinite imbalance, extending prior work on logistic regression. The limit depends on the left tail of the weight function, for which we distinguish three cases: bounded, asymptotically polynomial, and asymptotically exponential. The limiting coefficient vectors reflect robustness or conservatism properties in the sense that they optimize against certain worst-case alternatives. In the bounded and polynomial cases, the limit is equivalent to an implicit choice of upsampling distribution for the minority class. We apply these ideas in a credit risk setting, with particular emphasis on performance in the high-sensitivity and high-specificity regions.\",\"PeriodicalId\":251522,\"journal\":{\"name\":\"Risk Management & Analysis in Financial Institutions eJournal\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Risk Management & Analysis in Financial Institutions eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3863653\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Risk Management & Analysis in Financial Institutions eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3863653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

研究二元分类的线性判别函数在无限不平衡极限下的行为,其中一类的样本量无限制地增长,而另一类的样本量保持不变。分类器的系数使通过权重函数指定的期望损失最小化。我们证明了对于一大类权函数,截距发散,但系数向量的其余部分在无限不平衡下有有限极限,扩展了先前的逻辑回归工作。极限依赖于权函数的左尾,为此我们区分了三种情况:有界、渐近多项式和渐近指数。极限系数向量反映了鲁棒性或保守性,因为它们针对某些最坏情况的选择进行了优化。在有界和多项式情况下,极限等价于对少数类的上采样分布的隐式选择。我们将这些想法应用于信用风险设置,特别强调在高灵敏度和高特异性区域的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Linear Classifiers Under Infinite Imbalance

We study the behavior of linear discriminant functions for binary classification in the infinite-imbalance limit, where the sample size of one class grows without bound while the sample size of the other remains fixed. The coefficients of the classifier minimize an expected loss specified through a weight function. We show that for a broad class of weight functions, the intercept diverges but the rest of the coefficient vector has a finite limit under infinite imbalance, extending prior work on logistic regression. The limit depends on the left tail of the weight function, for which we distinguish three cases: bounded, asymptotically polynomial, and asymptotically exponential. The limiting coefficient vectors reflect robustness or conservatism properties in the sense that they optimize against certain worst-case alternatives. In the bounded and polynomial cases, the limit is equivalent to an implicit choice of upsampling distribution for the minority class. We apply these ideas in a credit risk setting, with particular emphasis on performance in the high-sensitivity and high-specificity regions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信