PAM clustering algorithm based on mutual information matrix for ATR-FTIR spectral feature selection and disease diagnosis.

IF 3.4 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Francesca Condino, Maria Caterina Crocco, Rita Guzzi
{"title":"PAM clustering algorithm based on mutual information matrix for ATR-FTIR spectral feature selection and disease diagnosis.","authors":"Francesca Condino, Maria Caterina Crocco, Rita Guzzi","doi":"10.1186/s12874-025-02667-2","DOIUrl":null,"url":null,"abstract":"<p><p>The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the amount of information characterizing the spectral dataset and the presence of redundancy among data could make the selection of the more informative features cumbersome. Here, a novel approach is proposed to perform feature selection based on redundant information among spectral data. In particular, we consider the Partition Around Medoids algorithm based on a dissimilarity matrix obtained from mutual information measure, in order to obtain groups of variables (wavenumbers) having similar patterns of pairwise dependence. Indeed, an advantage of this grouping algorithm with respect to other more widely used clustering methods, is to facilitate the interpretation of results, since the centre of each cluster, the so-called medoid, corresponds to an observed data point. As a consequence, the obtained medoid can be considered as representative of the whole wavenumbers belonging to the same cluster and retained in the subsequent statistical methods for disease prediction. An application on real data is finally reported to show the ability of the proposed approach in discriminating between patients affected by multiple sclerosis and healthy subjects.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"225"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12487623/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02667-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the amount of information characterizing the spectral dataset and the presence of redundancy among data could make the selection of the more informative features cumbersome. Here, a novel approach is proposed to perform feature selection based on redundant information among spectral data. In particular, we consider the Partition Around Medoids algorithm based on a dissimilarity matrix obtained from mutual information measure, in order to obtain groups of variables (wavenumbers) having similar patterns of pairwise dependence. Indeed, an advantage of this grouping algorithm with respect to other more widely used clustering methods, is to facilitate the interpretation of results, since the centre of each cluster, the so-called medoid, corresponds to an observed data point. As a consequence, the obtained medoid can be considered as representative of the whole wavenumbers belonging to the same cluster and retained in the subsequent statistical methods for disease prediction. An application on real data is finally reported to show the ability of the proposed approach in discriminating between patients affected by multiple sclerosis and healthy subjects.

Abstract Image

Abstract Image

Abstract Image

基于互信息矩阵的PAM聚类算法在ATR-FTIR光谱特征选择和疾病诊断中的应用。
ATR-FTIR光谱数据代表了包括神经系统疾病在内的广泛病理的有价值的信息来源,并可用于疾病鉴别。为此,需要在所有可能的候选物中识别出潜在的光谱生物标志物,但表征光谱数据集的信息量和数据之间的冗余可能会使更多信息特征的选择变得繁琐。本文提出了一种基于光谱数据冗余信息进行特征选择的新方法。特别地,我们考虑了基于互信息度量获得的不相似矩阵的围绕介质的划分算法,以获得具有相似两两依赖模式的变量(波数)组。实际上,与其他更广泛使用的聚类方法相比,这种分组算法的一个优点是便于对结果的解释,因为每个聚类的中心,即所谓的中间点,对应于一个观察到的数据点。因此,所获得的介质可以被认为是属于同一簇的整个波数的代表,并保留在后续的疾病预测统计方法中。最后报告了一个实际数据的应用,表明了该方法在区分多发性硬化症患者和健康受试者方面的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Medical Research Methodology
BMC Medical Research Methodology 医学-卫生保健
CiteScore
6.50
自引率
2.50%
发文量
298
审稿时长
3-8 weeks
期刊介绍: BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信