Informative Feature Extraction

Medical Applications of Laser Molecular Imaging and Machine Learning Pub Date : 2021-07-26 DOI:10.1117/3.2599935.CH3

Y. Kistenev, A. Borisov, D. Vrazhnov

{"title":"Informative Feature Extraction","authors":"Y. Kistenev, A. Borisov, D. Vrazhnov","doi":"10.1117/3.2599935.CH3","DOIUrl":null,"url":null,"abstract":"Laser molecular imaging produces high-dimension data with the structure dependent on the optical modality, laser type, detection method, kind of sample, etc. Generally, data’s high dimension corresponds to a situation where the number of initial parameters exceeds by orders of magnitude the number of hidden independent variables, e.g., when the number of measured absorption coefficients of a complex gas mixture exceeds by an order or more the quantity of pure components in the mixture. The high-dimension data are hard to use for predictive data model construction due to the “curse of dimensionality” problem formulated by R. Bellman. Essentially, when the feature vector’s dimension increases, the volume of data needed for classifier training grows exponentially. This is because the difference between two random vectors tends to zero as their dimension increases according to the central limit theorem. One of the main goals of feature extraction is to overcome this problem. The universal approach for this is in decreasing the data dimension. Concrete ways depend on the data origin. In particular, 2D-3D images can be decomposed into small geometrical parts with similar properties named textures. The texture approach allows one to find a compact description of the initial image. Molecular spectra can be considered as a degenerate case of molecular imaging data in a case of a homogeneous medium when we can study only one “point” to describe the whole sample. Feature vector dimension reduction includes feature selection and feature extraction. The difference between them is only in the ways used to get the result. This chapter describes these methods in details sufficient for practical applications. The Python codes for the most useful analytical methods described in the chapter are presented in the Supplemental Materials.","PeriodicalId":285501,"journal":{"name":"Medical Applications of Laser Molecular Imaging and Machine Learning","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Applications of Laser Molecular Imaging and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/3.2599935.CH3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Laser molecular imaging produces high-dimension data with the structure dependent on the optical modality, laser type, detection method, kind of sample, etc. Generally, data’s high dimension corresponds to a situation where the number of initial parameters exceeds by orders of magnitude the number of hidden independent variables, e.g., when the number of measured absorption coefficients of a complex gas mixture exceeds by an order or more the quantity of pure components in the mixture. The high-dimension data are hard to use for predictive data model construction due to the “curse of dimensionality” problem formulated by R. Bellman. Essentially, when the feature vector’s dimension increases, the volume of data needed for classifier training grows exponentially. This is because the difference between two random vectors tends to zero as their dimension increases according to the central limit theorem. One of the main goals of feature extraction is to overcome this problem. The universal approach for this is in decreasing the data dimension. Concrete ways depend on the data origin. In particular, 2D-3D images can be decomposed into small geometrical parts with similar properties named textures. The texture approach allows one to find a compact description of the initial image. Molecular spectra can be considered as a degenerate case of molecular imaging data in a case of a homogeneous medium when we can study only one “point” to describe the whole sample. Feature vector dimension reduction includes feature selection and feature extraction. The difference between them is only in the ways used to get the result. This chapter describes these methods in details sufficient for practical applications. The Python codes for the most useful analytical methods described in the chapter are presented in the Supplemental Materials.

查看原文本刊更多论文

信息特征提取

激光分子成像产生的高维数据，其结构取决于光模态、激光类型、检测方法、样品种类等。通常，数据的高维对应于初始参数的数量超过隐藏自变量数个数量级的情况，例如，当测量到的复杂气体混合物的吸收系数的数量超过混合物中纯组分的数量一个数量级或更多时。由于R. Bellman提出的“维度诅咒”问题，高维数据难以用于预测数据模型的构建。本质上，当特征向量的维数增加时，分类器训练所需的数据量呈指数增长。这是因为根据中心极限定理，随着两个随机向量的维数增加，它们之间的差趋于零。特征提取的主要目标之一就是克服这个问题。通用的方法是降低数据维度。具体方法取决于数据源。特别是，2D-3D图像可以分解成具有相似属性的小几何部分，称为纹理。纹理方法允许人们找到初始图像的紧凑描述。当我们只能研究一个“点”来描述整个样品时，分子光谱可以看作是均匀介质中分子成像数据的简并情况。特征向量降维包括特征选择和特征提取。它们之间的区别只是用于获得结果的方式不同。本章对这些方法进行了详细的描述，以便于实际应用。本章中描述的最有用的分析方法的Python代码在补充材料中提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical Applications of Laser Molecular Imaging and Machine Learning

自引率

0.00%

发文量