Francesca Condino, Maria Caterina Crocco, Rita Guzzi
{"title":"基于互信息矩阵的PAM聚类算法在ATR-FTIR光谱特征选择和疾病诊断中的应用。","authors":"Francesca Condino, Maria Caterina Crocco, Rita Guzzi","doi":"10.1186/s12874-025-02667-2","DOIUrl":null,"url":null,"abstract":"<p><p>The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the amount of information characterizing the spectral dataset and the presence of redundancy among data could make the selection of the more informative features cumbersome. Here, a novel approach is proposed to perform feature selection based on redundant information among spectral data. In particular, we consider the Partition Around Medoids algorithm based on a dissimilarity matrix obtained from mutual information measure, in order to obtain groups of variables (wavenumbers) having similar patterns of pairwise dependence. Indeed, an advantage of this grouping algorithm with respect to other more widely used clustering methods, is to facilitate the interpretation of results, since the centre of each cluster, the so-called medoid, corresponds to an observed data point. As a consequence, the obtained medoid can be considered as representative of the whole wavenumbers belonging to the same cluster and retained in the subsequent statistical methods for disease prediction. An application on real data is finally reported to show the ability of the proposed approach in discriminating between patients affected by multiple sclerosis and healthy subjects.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"225"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12487623/pdf/","citationCount":"0","resultStr":"{\"title\":\"PAM clustering algorithm based on mutual information matrix for ATR-FTIR spectral feature selection and disease diagnosis.\",\"authors\":\"Francesca Condino, Maria Caterina Crocco, Rita Guzzi\",\"doi\":\"10.1186/s12874-025-02667-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the amount of information characterizing the spectral dataset and the presence of redundancy among data could make the selection of the more informative features cumbersome. Here, a novel approach is proposed to perform feature selection based on redundant information among spectral data. In particular, we consider the Partition Around Medoids algorithm based on a dissimilarity matrix obtained from mutual information measure, in order to obtain groups of variables (wavenumbers) having similar patterns of pairwise dependence. Indeed, an advantage of this grouping algorithm with respect to other more widely used clustering methods, is to facilitate the interpretation of results, since the centre of each cluster, the so-called medoid, corresponds to an observed data point. As a consequence, the obtained medoid can be considered as representative of the whole wavenumbers belonging to the same cluster and retained in the subsequent statistical methods for disease prediction. An application on real data is finally reported to show the ability of the proposed approach in discriminating between patients affected by multiple sclerosis and healthy subjects.</p>\",\"PeriodicalId\":9114,\"journal\":{\"name\":\"BMC Medical Research Methodology\",\"volume\":\"25 1\",\"pages\":\"225\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12487623/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Research Methodology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12874-025-02667-2\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02667-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
PAM clustering algorithm based on mutual information matrix for ATR-FTIR spectral feature selection and disease diagnosis.
The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the amount of information characterizing the spectral dataset and the presence of redundancy among data could make the selection of the more informative features cumbersome. Here, a novel approach is proposed to perform feature selection based on redundant information among spectral data. In particular, we consider the Partition Around Medoids algorithm based on a dissimilarity matrix obtained from mutual information measure, in order to obtain groups of variables (wavenumbers) having similar patterns of pairwise dependence. Indeed, an advantage of this grouping algorithm with respect to other more widely used clustering methods, is to facilitate the interpretation of results, since the centre of each cluster, the so-called medoid, corresponds to an observed data point. As a consequence, the obtained medoid can be considered as representative of the whole wavenumbers belonging to the same cluster and retained in the subsequent statistical methods for disease prediction. An application on real data is finally reported to show the ability of the proposed approach in discriminating between patients affected by multiple sclerosis and healthy subjects.
期刊介绍:
BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.