Inference of an explanatory variable from observations in a high-dimensional space: Application to high-resolution spectra of stars

V. Watson, J. Trouilhet, F. Paletou, S. Girard
{"title":"Inference of an explanatory variable from observations in a high-dimensional space: Application to high-resolution spectra of stars","authors":"V. Watson, J. Trouilhet, F. Paletou, S. Girard","doi":"10.1109/ECMSM.2017.7945912","DOIUrl":null,"url":null,"abstract":"Our aim is to evaluate fundamental parameters from the analysis of the electromagnetic spectra of stars. We may use 103–105 spectra; each spectrum being a vector with 102–104 coordinates. We thus face the so-called “curse of dimensionality”. We look for a method to reduce the size of this data-space, keeping only the most relevant information. As a reference method, we use principal component analysis (PCA) to reduce dimensionality. However, PCA is an unsupervised method, therefore its subspace was not consistent with the parameter. We thus tested a supervised method based on Sliced Inverse Regression (SIR), which provides a subspace consistent with the parameter. It also shares analogies with factorial discriminant analysis: the method slices the database along the parameter variation, and builds the subspace which maximizes the inter-slice variance, while standardizing the total projected variance of the data. Nevertheless the performances of SIR were not satisfying in standard usage, because of the non-monotonicity of the unknown function linking the data to the parameter and because of the noise propagation. We show that better performances can be achieved by selecting the most relevant directions for parameter inference. Preliminary tests are performed on synthetic pseudo-line profiles plus noise. Using one direction, we show that compared to PCA, the error associated with SIR is 50% smaller on a non-linear parameter, and 70% smaler on a linear parameter. Moreover, using a selected direction, the error is 80% smaller for a non-linear parameter, and 95% smaller for a linear parameter.","PeriodicalId":358140,"journal":{"name":"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECMSM.2017.7945912","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Our aim is to evaluate fundamental parameters from the analysis of the electromagnetic spectra of stars. We may use 103–105 spectra; each spectrum being a vector with 102–104 coordinates. We thus face the so-called “curse of dimensionality”. We look for a method to reduce the size of this data-space, keeping only the most relevant information. As a reference method, we use principal component analysis (PCA) to reduce dimensionality. However, PCA is an unsupervised method, therefore its subspace was not consistent with the parameter. We thus tested a supervised method based on Sliced Inverse Regression (SIR), which provides a subspace consistent with the parameter. It also shares analogies with factorial discriminant analysis: the method slices the database along the parameter variation, and builds the subspace which maximizes the inter-slice variance, while standardizing the total projected variance of the data. Nevertheless the performances of SIR were not satisfying in standard usage, because of the non-monotonicity of the unknown function linking the data to the parameter and because of the noise propagation. We show that better performances can be achieved by selecting the most relevant directions for parameter inference. Preliminary tests are performed on synthetic pseudo-line profiles plus noise. Using one direction, we show that compared to PCA, the error associated with SIR is 50% smaller on a non-linear parameter, and 70% smaler on a linear parameter. Moreover, using a selected direction, the error is 80% smaller for a non-linear parameter, and 95% smaller for a linear parameter.
从高维空间观测推断解释变量:应用于恒星的高分辨率光谱
我们的目的是通过对恒星电磁波谱的分析来评估基本参数。我们可以使用103-105光谱;每个光谱都是102-104坐标的矢量。因此,我们面临着所谓的“维度的诅咒”。我们寻找一种方法来减少这个数据空间的大小,只保留最相关的信息。作为参考方法,我们使用主成分分析(PCA)来降维。然而,PCA是一种无监督的方法,因此它的子空间与参数不一致。因此,我们测试了基于切片逆回归(SIR)的监督方法,该方法提供了与参数一致的子空间。它也与析因判别分析有相似之处:该方法沿着参数变化对数据库进行切片,并构建最大化切片间方差的子空间,同时标准化数据的总投影方差。然而,由于连接数据和参数的未知函数的非单调性以及噪声的传播,SIR的性能在标准使用中并不令人满意。我们表明,通过选择最相关的方向进行参数推理可以获得更好的性能。在加噪声的合成伪线剖面上进行了初步试验。使用一个方向,我们表明与PCA相比,与SIR相关的误差在非线性参数上小50%,在线性参数上小70%。此外,使用选定的方向,非线性参数的误差减小80%,线性参数的误差减小95%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信