Calibrated kNN classification via second-layer neighborhood analysis

IF 1.3 4区 计算机科学 Q2 STATISTICS & PROBABILITY
Bastian Pfeifer, Markus Kreuzthaler
{"title":"Calibrated kNN classification via second-layer neighborhood analysis","authors":"Bastian Pfeifer,&nbsp;Markus Kreuzthaler","doi":"10.1007/s11634-025-00654-5","DOIUrl":null,"url":null,"abstract":"<div><p>The integration of artificial intelligence (AI) into medical practices has emphasized the importance of ensuring the reliability of predictive confidence, as it influences decision-making and the efficacy of AI-driven solutions. This paper focuses on the utilization of a non-parametric machine learning method, with a particular focus on the computation of confidence scores for individual classifications. Unlike parametric approaches, non-parametric techniques like k-nearest neighbors (kNN) offer flexibility without imposing strict data distribution assumptions. Leveraging this flexibility, we propose a novel kNN approach that introduces confidence-awareness through a two-layered neighborhood analysis. The developed approach is intended to support the classical non-parametric kNN classifier by providing more reliable and trustworthy class probabilities. Experimental evaluations conducted on benchmark datasets as well as a de-identified clinical real-world Electronic Health Records (EHR) data table consisting of thousands of unique class labels demonstrate the effectiveness of our approach in enhancing both, prediction accuracy and certainty assessment.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"20 :","pages":"145 - 165"},"PeriodicalIF":1.3000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00654-5.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Analysis and Classification","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s11634-025-00654-5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

The integration of artificial intelligence (AI) into medical practices has emphasized the importance of ensuring the reliability of predictive confidence, as it influences decision-making and the efficacy of AI-driven solutions. This paper focuses on the utilization of a non-parametric machine learning method, with a particular focus on the computation of confidence scores for individual classifications. Unlike parametric approaches, non-parametric techniques like k-nearest neighbors (kNN) offer flexibility without imposing strict data distribution assumptions. Leveraging this flexibility, we propose a novel kNN approach that introduces confidence-awareness through a two-layered neighborhood analysis. The developed approach is intended to support the classical non-parametric kNN classifier by providing more reliable and trustworthy class probabilities. Experimental evaluations conducted on benchmark datasets as well as a de-identified clinical real-world Electronic Health Records (EHR) data table consisting of thousands of unique class labels demonstrate the effectiveness of our approach in enhancing both, prediction accuracy and certainty assessment.

通过第二层邻域分析校准kNN分类
人工智能(AI)与医疗实践的融合强调了确保预测信心的可靠性的重要性,因为它影响决策和人工智能驱动的解决方案的有效性。本文重点研究了非参数机器学习方法的应用,特别关注了单个分类的置信度分数的计算。与参数方法不同,像k近邻(kNN)这样的非参数技术提供了灵活性,而无需强加严格的数据分布假设。利用这种灵活性,我们提出了一种新的kNN方法,通过两层邻域分析引入信心意识。开发的方法旨在通过提供更可靠和可信的类概率来支持经典的非参数kNN分类器。在基准数据集上进行的实验评估,以及由数千个独特类别标签组成的去识别临床真实世界电子健康记录(EHR)数据表,证明了我们的方法在提高预测准确性和确定性评估方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.40
自引率
6.20%
发文量
45
审稿时长
>12 weeks
期刊介绍: The international journal Advances in Data Analysis and Classification (ADAC) is designed as a forum for high standard publications on research and applications concerning the extraction of knowable aspects from many types of data. It publishes articles on such topics as structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; methods for the extraction of knowledge from data, and applications of advanced methods in specific domains of practice. Articles illustrate how new domain-specific knowledge can be made available from data by skillful use of data analysis methods. The journal also publishes survey papers that outline, and illuminate the basic ideas and techniques of special approaches.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书