Haomiao Wang, Julien Aligon, Julien May, Emmanuel Doumard, Nicolas Labroche, Cyrille Delpierre, Chantal Soulé-Dupuy, Louis Casteilla, Valérie Planat-Benard, Paul Monsarrat
{"title":"解释的可分辨性:为医学设计更可接受和有意义的机器学习模型。","authors":"Haomiao Wang, Julien Aligon, Julien May, Emmanuel Doumard, Nicolas Labroche, Cyrille Delpierre, Chantal Soulé-Dupuy, Louis Casteilla, Valérie Planat-Benard, Paul Monsarrat","doi":"10.1016/j.csbj.2025.04.021","DOIUrl":null,"url":null,"abstract":"<p><p>Although the benefits of machine learning are undeniable in healthcare, explainability plays a vital role in improving transparency and understanding the most decisive and persuasive variables for prediction. The challenge is to identify explanations that make sense to the biomedical expert. This work proposes <i>discernibility</i> as a new approach to faithfully reflect human cognition, based on the user's perception of a relationship between explanations and data for a given variable. A total of 50 participants (19 biomedical experts and 31 data scientists) evaluated their perception of the discernibility of explanations from both synthetic and human-based datasets (National Health and Nutrition Examination Survey). The low inter-rater reliability of discernibility (Intraclass Correlation Coefficient < 0.5), with no significant difference between areas of expertise or levels of education, highlights the need for an objective metric of discernibility. Thirteen statistical coefficients were evaluated for their ability to capture, for a given variable, the relationship between its values and its explanations using Passing-Bablok regression. Among these, dcor was shown to be a reliable metric for assessing the discernibility of explanations, effectively capturing the clarity of the relationship between the data and their explanations, and providing clues to underlying pathophysiological mechanisms not immediately apparent when examining individual predictors. Discernibility can also serve as an evaluation metric for model quality, helping to prevent overfitting and aiding in feature selection, ultimately providing medical practitioners with more accurate and persuasive results.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"1800-1808"},"PeriodicalIF":4.4000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127544/pdf/","citationCount":"0","resultStr":"{\"title\":\"Discernibility in explanations: Designing more acceptable and meaningful machine learning models for medicine.\",\"authors\":\"Haomiao Wang, Julien Aligon, Julien May, Emmanuel Doumard, Nicolas Labroche, Cyrille Delpierre, Chantal Soulé-Dupuy, Louis Casteilla, Valérie Planat-Benard, Paul Monsarrat\",\"doi\":\"10.1016/j.csbj.2025.04.021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Although the benefits of machine learning are undeniable in healthcare, explainability plays a vital role in improving transparency and understanding the most decisive and persuasive variables for prediction. The challenge is to identify explanations that make sense to the biomedical expert. This work proposes <i>discernibility</i> as a new approach to faithfully reflect human cognition, based on the user's perception of a relationship between explanations and data for a given variable. A total of 50 participants (19 biomedical experts and 31 data scientists) evaluated their perception of the discernibility of explanations from both synthetic and human-based datasets (National Health and Nutrition Examination Survey). The low inter-rater reliability of discernibility (Intraclass Correlation Coefficient < 0.5), with no significant difference between areas of expertise or levels of education, highlights the need for an objective metric of discernibility. Thirteen statistical coefficients were evaluated for their ability to capture, for a given variable, the relationship between its values and its explanations using Passing-Bablok regression. Among these, dcor was shown to be a reliable metric for assessing the discernibility of explanations, effectively capturing the clarity of the relationship between the data and their explanations, and providing clues to underlying pathophysiological mechanisms not immediately apparent when examining individual predictors. Discernibility can also serve as an evaluation metric for model quality, helping to prevent overfitting and aiding in feature selection, ultimately providing medical practitioners with more accurate and persuasive results.</p>\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"1800-1808\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127544/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.04.021\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.04.021","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Discernibility in explanations: Designing more acceptable and meaningful machine learning models for medicine.
Although the benefits of machine learning are undeniable in healthcare, explainability plays a vital role in improving transparency and understanding the most decisive and persuasive variables for prediction. The challenge is to identify explanations that make sense to the biomedical expert. This work proposes discernibility as a new approach to faithfully reflect human cognition, based on the user's perception of a relationship between explanations and data for a given variable. A total of 50 participants (19 biomedical experts and 31 data scientists) evaluated their perception of the discernibility of explanations from both synthetic and human-based datasets (National Health and Nutrition Examination Survey). The low inter-rater reliability of discernibility (Intraclass Correlation Coefficient < 0.5), with no significant difference between areas of expertise or levels of education, highlights the need for an objective metric of discernibility. Thirteen statistical coefficients were evaluated for their ability to capture, for a given variable, the relationship between its values and its explanations using Passing-Bablok regression. Among these, dcor was shown to be a reliable metric for assessing the discernibility of explanations, effectively capturing the clarity of the relationship between the data and their explanations, and providing clues to underlying pathophysiological mechanisms not immediately apparent when examining individual predictors. Discernibility can also serve as an evaluation metric for model quality, helping to prevent overfitting and aiding in feature selection, ultimately providing medical practitioners with more accurate and persuasive results.
期刊介绍:
Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to:
Structure and function of proteins, nucleic acids and other macromolecules
Structure and function of multi-component complexes
Protein folding, processing and degradation
Enzymology
Computational and structural studies of plant systems
Microbial Informatics
Genomics
Proteomics
Metabolomics
Algorithms and Hypothesis in Bioinformatics
Mathematical and Theoretical Biology
Computational Chemistry and Drug Discovery
Microscopy and Molecular Imaging
Nanotechnology
Systems and Synthetic Biology