Exploratory analysis of hyperspectral imaging data

IF 3.7 2区化学 Q2 AUTOMATION & CONTROL SYSTEMS

Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-07-09 DOI:10.1016/j.chemolab.2024.105174

{"title":"Exploratory analysis of hyperspectral imaging data","authors":"","doi":"10.1016/j.chemolab.2024.105174","DOIUrl":null,"url":null,"abstract":"<div><p>Characterizing sample composition and visualizing the distribution of its chemical compounds is a prominent topic in various research and applied fields. Integrating spatial and spectral information, hyperspectral imaging (HSI) plays a pivotal role in this pursuit. While self-modelling curve resolution techniques, like multivariate curve resolution - alternating least squares (MCR-ALS), and clustering methods, such as K-means, are widely used for HSI data analysis, their effectiveness in complex scenarios, where the structure of the data deviates from the models’ assumptions, deserves further investigation. The choice of a data analysis method is most often driven by research question at hand and prior knowledge of the sample. However, overlooking the structure of the investigated data, i.e. linearity, geometry, homogeneity, might lead to erroneous or biased results. Here, we propose an exploratory data analysis approach, based on the geometry of the data points cloud, to investigate the structure of HSI datasets and extract their main characteristics, providing insight into the results obtained by the above-mentioned methods. We employ the principle of essential information to extract archetype (most linearly dissimilar) spectra and archetype single-wavelength images. These spectra and images are then discussed and contrasted with MCR-ALS and K-means clustering results. Two datasets with varying characteristics and complexities were investigated: a powder mixture analyzed with Raman spectroscopy and a mineral sample analyzed with Laser Induced Breakdown Spectroscopy (LIBS). We show that the proposed approach enables to summarize the main characteristics of hyperspectral imaging data and provides a more accurate understanding of the results obtained by traditional data modelling methods, driving the choice of the most suitable one.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016974392400114X/pdfft?md5=fc1e3ebcd612aa27333c2ec8738aca2e&pid=1-s2.0-S016974392400114X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016974392400114X","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Characterizing sample composition and visualizing the distribution of its chemical compounds is a prominent topic in various research and applied fields. Integrating spatial and spectral information, hyperspectral imaging (HSI) plays a pivotal role in this pursuit. While self-modelling curve resolution techniques, like multivariate curve resolution - alternating least squares (MCR-ALS), and clustering methods, such as K-means, are widely used for HSI data analysis, their effectiveness in complex scenarios, where the structure of the data deviates from the models’ assumptions, deserves further investigation. The choice of a data analysis method is most often driven by research question at hand and prior knowledge of the sample. However, overlooking the structure of the investigated data, i.e. linearity, geometry, homogeneity, might lead to erroneous or biased results. Here, we propose an exploratory data analysis approach, based on the geometry of the data points cloud, to investigate the structure of HSI datasets and extract their main characteristics, providing insight into the results obtained by the above-mentioned methods. We employ the principle of essential information to extract archetype (most linearly dissimilar) spectra and archetype single-wavelength images. These spectra and images are then discussed and contrasted with MCR-ALS and K-means clustering results. Two datasets with varying characteristics and complexities were investigated: a powder mixture analyzed with Raman spectroscopy and a mineral sample analyzed with Laser Induced Breakdown Spectroscopy (LIBS). We show that the proposed approach enables to summarize the main characteristics of hyperspectral imaging data and provides a more accurate understanding of the results obtained by traditional data modelling methods, driving the choice of the most suitable one.

查看原文本刊更多论文

超光谱成像数据的探索性分析

表征样品成分和可视化其化学成分的分布是各个研究和应用领域的一个重要课题。高光谱成像（HSI）将空间信息和光谱信息融为一体，在这一领域发挥着举足轻重的作用。虽然自建模曲线解析技术（如多元曲线解析-交替最小二乘法（MCR-ALS））和聚类方法（如 K-means）被广泛用于高光谱成像数据分析，但它们在数据结构偏离模型假设的复杂情况下的有效性值得进一步研究。数据分析方法的选择通常取决于手头的研究问题和对样本的预先了解。然而，如果忽略了调查数据的结构，即线性、几何、同质性，可能会导致错误或有偏差的结果。在此，我们提出了一种基于数据点云几何结构的探索性数据分析方法，用于研究恒星仪数据集的结构并提取其主要特征，为上述方法得出的结果提供启示。我们利用基本信息原理提取原型（线性差异最大）光谱和原型单波长图像。然后对这些光谱和图像进行讨论，并与 MCR-ALS 和 K-means 聚类结果进行对比。我们研究了两个具有不同特征和复杂性的数据集：用拉曼光谱分析的粉末混合物和用激光诱导击穿光谱（LIBS）分析的矿物样品。我们发现，所提出的方法能够总结高光谱成像数据的主要特征，并能更准确地理解传统数据建模方法所获得的结果，从而选择最合适的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Chemometrics and Intelligent Laboratory Systems 工程技术-分析化学

CiteScore

7.50

自引率

7.70%

发文量

169

审稿时长

3.4 months

期刊介绍： Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.