{"title":"Informative Feature Extraction","authors":"Y. Kistenev, A. Borisov, D. Vrazhnov","doi":"10.1117/3.2599935.CH3","DOIUrl":null,"url":null,"abstract":"Laser molecular imaging produces high-dimension data with the structure dependent on the optical modality, laser type, detection method, kind of sample, etc. Generally, data’s high dimension corresponds to a situation where the number of initial parameters exceeds by orders of magnitude the number of hidden independent variables, e.g., when the number of measured absorption coefficients of a complex gas mixture exceeds by an order or more the quantity of pure components in the mixture. The high-dimension data are hard to use for predictive data model construction due to the “curse of dimensionality” problem formulated by R. Bellman. Essentially, when the feature vector’s dimension increases, the volume of data needed for classifier training grows exponentially. This is because the difference between two random vectors tends to zero as their dimension increases according to the central limit theorem. One of the main goals of feature extraction is to overcome this problem. The universal approach for this is in decreasing the data dimension. Concrete ways depend on the data origin. In particular, 2D-3D images can be decomposed into small geometrical parts with similar properties named textures. The texture approach allows one to find a compact description of the initial image. Molecular spectra can be considered as a degenerate case of molecular imaging data in a case of a homogeneous medium when we can study only one “point” to describe the whole sample. Feature vector dimension reduction includes feature selection and feature extraction. The difference between them is only in the ways used to get the result. This chapter describes these methods in details sufficient for practical applications. The Python codes for the most useful analytical methods described in the chapter are presented in the Supplemental Materials.","PeriodicalId":285501,"journal":{"name":"Medical Applications of Laser Molecular Imaging and Machine Learning","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Applications of Laser Molecular Imaging and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/3.2599935.CH3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Laser molecular imaging produces high-dimension data with the structure dependent on the optical modality, laser type, detection method, kind of sample, etc. Generally, data’s high dimension corresponds to a situation where the number of initial parameters exceeds by orders of magnitude the number of hidden independent variables, e.g., when the number of measured absorption coefficients of a complex gas mixture exceeds by an order or more the quantity of pure components in the mixture. The high-dimension data are hard to use for predictive data model construction due to the “curse of dimensionality” problem formulated by R. Bellman. Essentially, when the feature vector’s dimension increases, the volume of data needed for classifier training grows exponentially. This is because the difference between two random vectors tends to zero as their dimension increases according to the central limit theorem. One of the main goals of feature extraction is to overcome this problem. The universal approach for this is in decreasing the data dimension. Concrete ways depend on the data origin. In particular, 2D-3D images can be decomposed into small geometrical parts with similar properties named textures. The texture approach allows one to find a compact description of the initial image. Molecular spectra can be considered as a degenerate case of molecular imaging data in a case of a homogeneous medium when we can study only one “point” to describe the whole sample. Feature vector dimension reduction includes feature selection and feature extraction. The difference between them is only in the ways used to get the result. This chapter describes these methods in details sufficient for practical applications. The Python codes for the most useful analytical methods described in the chapter are presented in the Supplemental Materials.