解释的可分辨性：为医学设计更可接受和有意义的机器学习模型。

IF 4.4 2区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Computational and structural biotechnology journal Pub Date : 2025-04-23 eCollection Date: 2025-01-01 DOI:10.1016/j.csbj.2025.04.021

Haomiao Wang, Julien Aligon, Julien May, Emmanuel Doumard, Nicolas Labroche, Cyrille Delpierre, Chantal Soulé-Dupuy, Louis Casteilla, Valérie Planat-Benard, Paul Monsarrat

{"title":"解释的可分辨性：为医学设计更可接受和有意义的机器学习模型。","authors":"Haomiao Wang, Julien Aligon, Julien May, Emmanuel Doumard, Nicolas Labroche, Cyrille Delpierre, Chantal Soulé-Dupuy, Louis Casteilla, Valérie Planat-Benard, Paul Monsarrat","doi":"10.1016/j.csbj.2025.04.021","DOIUrl":null,"url":null,"abstract":"Although the benefits of machine learning are undeniable in healthcare, explainability plays a vital role in improving transparency and understanding the most decisive and persuasive variables for prediction. The challenge is to identify explanations that make sense to the biomedical expert. This work proposes discernibility as a new approach to faithfully reflect human cognition, based on the user's perception of a relationship between explanations and data for a given variable. A total of 50 participants (19 biomedical experts and 31 data scientists) evaluated their perception of the discernibility of explanations from both synthetic and human-based datasets (National Health and Nutrition Examination Survey). The low inter-rater reliability of discernibility (Intraclass Correlation Coefficient < 0.5), with no significant difference between areas of expertise or levels of education, highlights the need for an objective metric of discernibility. Thirteen statistical coefficients were evaluated for their ability to capture, for a given variable, the relationship between its values and its explanations using Passing-Bablok regression. Among these, dcor was shown to be a reliable metric for assessing the discernibility of explanations, effectively capturing the clarity of the relationship between the data and their explanations, and providing clues to underlying pathophysiological mechanisms not immediately apparent when examining individual predictors. Discernibility can also serve as an evaluation metric for model quality, helping to prevent overfitting and aiding in feature selection, ultimately providing medical practitioners with more accurate and persuasive results.","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"1800-1808"},"PeriodicalIF":4.4000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127544/pdf/","citationCount":"0","resultStr":"{\"title\":\"Discernibility in explanations: Designing more acceptable and meaningful machine learning models for medicine.\",\"authors\":\"Haomiao Wang, Julien Aligon, Julien May, Emmanuel Doumard, Nicolas Labroche, Cyrille Delpierre, Chantal Soulé-Dupuy, Louis Casteilla, Valérie Planat-Benard, Paul Monsarrat\",\"doi\":\"10.1016/j.csbj.2025.04.021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although the benefits of machine learning are undeniable in healthcare, explainability plays a vital role in improving transparency and understanding the most decisive and persuasive variables for prediction. The challenge is to identify explanations that make sense to the biomedical expert. This work proposes discernibility as a new approach to faithfully reflect human cognition, based on the user's perception of a relationship between explanations and data for a given variable. A total of 50 participants (19 biomedical experts and 31 data scientists) evaluated their perception of the discernibility of explanations from both synthetic and human-based datasets (National Health and Nutrition Examination Survey). The low inter-rater reliability of discernibility (Intraclass Correlation Coefficient < 0.5), with no significant difference between areas of expertise or levels of education, highlights the need for an objective metric of discernibility. Thirteen statistical coefficients were evaluated for their ability to capture, for a given variable, the relationship between its values and its explanations using Passing-Bablok regression. Among these, dcor was shown to be a reliable metric for assessing the discernibility of explanations, effectively capturing the clarity of the relationship between the data and their explanations, and providing clues to underlying pathophysiological mechanisms not immediately apparent when examining individual predictors. Discernibility can also serve as an evaluation metric for model quality, helping to prevent overfitting and aiding in feature selection, ultimately providing medical practitioners with more accurate and persuasive results.\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"1800-1808\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127544/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.04.021\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.04.021","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

虽然机器学习在医疗保健领域的好处是不可否认的，但可解释性在提高透明度和理解预测中最具决定性和说服力的变量方面发挥着至关重要的作用。挑战在于找出对生物医学专家来说有意义的解释。这项工作提出了可辨别性作为一种新的方法来忠实地反映人类认知，基于用户对给定变量的解释和数据之间关系的感知。共有50名参与者（19名生物医学专家和31名数据科学家）评估了他们对合成数据集和基于人类的数据集解释的可辨别性的看法（国家健康和营养检查调查）。可辨性的低等级间可靠性（类内相关系数< 0.5），在专业领域或教育水平之间没有显著差异，突出了对可辨性客观度量的需求。对13个统计系数进行了评估，因为它们能够捕捉给定变量的值与使用Passing-Bablok回归的解释之间的关系。其中，dor被证明是一种可靠的指标，用于评估解释的可辨别性，有效地捕捉数据与其解释之间关系的清晰度，并在检查单个预测因子时提供潜在病理生理机制的线索。可辨别性还可以作为模型质量的评估指标，有助于防止过拟合和辅助特征选择，最终为医疗从业者提供更准确和更有说服力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Discernibility in explanations: Designing more acceptable and meaningful machine learning models for medicine.

Although the benefits of machine learning are undeniable in healthcare, explainability plays a vital role in improving transparency and understanding the most decisive and persuasive variables for prediction. The challenge is to identify explanations that make sense to the biomedical expert. This work proposes discernibility as a new approach to faithfully reflect human cognition, based on the user's perception of a relationship between explanations and data for a given variable. A total of 50 participants (19 biomedical experts and 31 data scientists) evaluated their perception of the discernibility of explanations from both synthetic and human-based datasets (National Health and Nutrition Examination Survey). The low inter-rater reliability of discernibility (Intraclass Correlation Coefficient < 0.5), with no significant difference between areas of expertise or levels of education, highlights the need for an objective metric of discernibility. Thirteen statistical coefficients were evaluated for their ability to capture, for a given variable, the relationship between its values and its explanations using Passing-Bablok regression. Among these, dcor was shown to be a reliable metric for assessing the discernibility of explanations, effectively capturing the clarity of the relationship between the data and their explanations, and providing clues to underlying pathophysiological mechanisms not immediately apparent when examining individual predictors. Discernibility can also serve as an evaluation metric for model quality, helping to prevent overfitting and aiding in feature selection, ultimately providing medical practitioners with more accurate and persuasive results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational and structural biotechnology journal Biochemistry, Genetics and Molecular Biology-Biophysics

CiteScore

9.30

自引率

3.30%

发文量

540

审稿时长

6 weeks

期刊介绍： Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology