协变量移位对费米- lat源多类分类的影响

RAS Techniques and Instruments Pub Date : 2023-11-13 DOI:10.1093/rasti/rzad053

Dmitry V Malyshev

{"title":"协变量移位对费米- lat源多类分类的影响","authors":"Dmitry V Malyshev","doi":"10.1093/rasti/rzad053","DOIUrl":null,"url":null,"abstract":"Abstract Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is not the case for the Fermi-LAT catalogs. The problem of different distributions of training and testing (or target) datasets as a function of input features (covariates) is known as the covariate shift. In this paper, we, for the first time, quantitatively estimate the effect of the covariate shift on the multi-class classification of Fermi-LAT sources. We introduce sample weights proportional to the ratio of unassociated to associated source probability density functions so that associated sources in areas, which are densely populated with unassociated sources, have more weight than the sources in areas with few unassociated sources. We find that the covariate shift has relatively little effect on the predicted probabilities, i.e. the training can be performed either with weighted or with unweighted samples, which is generally expected for the covariate shift problems. The main effect of the covariate shift is on the estimated performance of the classification. Depending on the class, the covariate shift can lead up to 10 – 20% reduction in precision and recall compared to the estimates, where the covariate shift is not taken into account.","PeriodicalId":500957,"journal":{"name":"RAS Techniques and Instruments","volume":"132 37","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effect of covariate shift on multi-class classification of Fermi-LAT sources\",\"authors\":\"Dmitry V Malyshev\",\"doi\":\"10.1093/rasti/rzad053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is not the case for the Fermi-LAT catalogs. The problem of different distributions of training and testing (or target) datasets as a function of input features (covariates) is known as the covariate shift. In this paper, we, for the first time, quantitatively estimate the effect of the covariate shift on the multi-class classification of Fermi-LAT sources. We introduce sample weights proportional to the ratio of unassociated to associated source probability density functions so that associated sources in areas, which are densely populated with unassociated sources, have more weight than the sources in areas with few unassociated sources. We find that the covariate shift has relatively little effect on the predicted probabilities, i.e. the training can be performed either with weighted or with unweighted samples, which is generally expected for the covariate shift problems. The main effect of the covariate shift is on the estimated performance of the classification. Depending on the class, the covariate shift can lead up to 10 – 20% reduction in precision and recall compared to the estimates, where the covariate shift is not taken into account.\",\"PeriodicalId\":500957,\"journal\":{\"name\":\"RAS Techniques and Instruments\",\"volume\":\"132 37\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"RAS Techniques and Instruments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/rasti/rzad053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"RAS Techniques and Instruments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/rasti/rzad053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

使用机器学习方法对非关联费米- lat源进行概率分类有一个隐含的假设，即关联源和非关联源的分布与源参数的函数相同，这与费米- lat目录的情况不同。训练和测试(或目标)数据集的不同分布作为输入特征(协变量)的函数的问题被称为协变量移位。本文首次定量地估计了协变量位移对费米- lat源多类分类的影响。我们引入了与非关联源与关联源概率密度函数之比成比例的样本权重，使得非关联源密集区域中的关联源比非关联源较少区域中的关联源具有更大的权重。我们发现协变量移位对预测概率的影响相对较小，即可以使用加权或未加权的样本进行训练，这通常是协变量移位问题所期望的。协变量移位的主要影响是对分类的估计性能。根据类别的不同，与未考虑协变量移位的估计相比，协变量移位可能导致精度和召回率降低10 - 20%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Effect of covariate shift on multi-class classification of Fermi-LAT sources

Abstract Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is not the case for the Fermi-LAT catalogs. The problem of different distributions of training and testing (or target) datasets as a function of input features (covariates) is known as the covariate shift. In this paper, we, for the first time, quantitatively estimate the effect of the covariate shift on the multi-class classification of Fermi-LAT sources. We introduce sample weights proportional to the ratio of unassociated to associated source probability density functions so that associated sources in areas, which are densely populated with unassociated sources, have more weight than the sources in areas with few unassociated sources. We find that the covariate shift has relatively little effect on the predicted probabilities, i.e. the training can be performed either with weighted or with unweighted samples, which is generally expected for the covariate shift problems. The main effect of the covariate shift is on the estimated performance of the classification. Depending on the class, the covariate shift can lead up to 10 – 20% reduction in precision and recall compared to the estimates, where the covariate shift is not taken into account.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

RAS Techniques and Instruments

自引率

0.00%

发文量