协变量移位对费米- lat源多类分类的影响

Dmitry V Malyshev
{"title":"协变量移位对费米- lat源多类分类的影响","authors":"Dmitry V Malyshev","doi":"10.1093/rasti/rzad053","DOIUrl":null,"url":null,"abstract":"Abstract Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is not the case for the Fermi-LAT catalogs. The problem of different distributions of training and testing (or target) datasets as a function of input features (covariates) is known as the covariate shift. In this paper, we, for the first time, quantitatively estimate the effect of the covariate shift on the multi-class classification of Fermi-LAT sources. We introduce sample weights proportional to the ratio of unassociated to associated source probability density functions so that associated sources in areas, which are densely populated with unassociated sources, have more weight than the sources in areas with few unassociated sources. We find that the covariate shift has relatively little effect on the predicted probabilities, i.e. the training can be performed either with weighted or with unweighted samples, which is generally expected for the covariate shift problems. The main effect of the covariate shift is on the estimated performance of the classification. Depending on the class, the covariate shift can lead up to 10 – 20% reduction in precision and recall compared to the estimates, where the covariate shift is not taken into account.","PeriodicalId":500957,"journal":{"name":"RAS Techniques and Instruments","volume":"132 37","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effect of covariate shift on multi-class classification of Fermi-LAT sources\",\"authors\":\"Dmitry V Malyshev\",\"doi\":\"10.1093/rasti/rzad053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is not the case for the Fermi-LAT catalogs. The problem of different distributions of training and testing (or target) datasets as a function of input features (covariates) is known as the covariate shift. In this paper, we, for the first time, quantitatively estimate the effect of the covariate shift on the multi-class classification of Fermi-LAT sources. We introduce sample weights proportional to the ratio of unassociated to associated source probability density functions so that associated sources in areas, which are densely populated with unassociated sources, have more weight than the sources in areas with few unassociated sources. We find that the covariate shift has relatively little effect on the predicted probabilities, i.e. the training can be performed either with weighted or with unweighted samples, which is generally expected for the covariate shift problems. The main effect of the covariate shift is on the estimated performance of the classification. Depending on the class, the covariate shift can lead up to 10 – 20% reduction in precision and recall compared to the estimates, where the covariate shift is not taken into account.\",\"PeriodicalId\":500957,\"journal\":{\"name\":\"RAS Techniques and Instruments\",\"volume\":\"132 37\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"RAS Techniques and Instruments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/rasti/rzad053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"RAS Techniques and Instruments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/rasti/rzad053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

使用机器学习方法对非关联费米- lat源进行概率分类有一个隐含的假设,即关联源和非关联源的分布与源参数的函数相同,这与费米- lat目录的情况不同。训练和测试(或目标)数据集的不同分布作为输入特征(协变量)的函数的问题被称为协变量移位。本文首次定量地估计了协变量位移对费米- lat源多类分类的影响。我们引入了与非关联源与关联源概率密度函数之比成比例的样本权重,使得非关联源密集区域中的关联源比非关联源较少区域中的关联源具有更大的权重。我们发现协变量移位对预测概率的影响相对较小,即可以使用加权或未加权的样本进行训练,这通常是协变量移位问题所期望的。协变量移位的主要影响是对分类的估计性能。根据类别的不同,与未考虑协变量移位的估计相比,协变量移位可能导致精度和召回率降低10 - 20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Effect of covariate shift on multi-class classification of Fermi-LAT sources
Abstract Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is not the case for the Fermi-LAT catalogs. The problem of different distributions of training and testing (or target) datasets as a function of input features (covariates) is known as the covariate shift. In this paper, we, for the first time, quantitatively estimate the effect of the covariate shift on the multi-class classification of Fermi-LAT sources. We introduce sample weights proportional to the ratio of unassociated to associated source probability density functions so that associated sources in areas, which are densely populated with unassociated sources, have more weight than the sources in areas with few unassociated sources. We find that the covariate shift has relatively little effect on the predicted probabilities, i.e. the training can be performed either with weighted or with unweighted samples, which is generally expected for the covariate shift problems. The main effect of the covariate shift is on the estimated performance of the classification. Depending on the class, the covariate shift can lead up to 10 – 20% reduction in precision and recall compared to the estimates, where the covariate shift is not taken into account.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信