{"title":"[Validation of artificial intelligence algorithms for the surgical practice].","authors":"Annika Reinke","doi":"10.1007/s00104-025-02348-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) is increasingly being used in surgery; however, the validation of such systems is often methodologically insufficient.</p><p><strong>Objective: </strong>Which validation issues arise in surgical AI and what requirements can be derived for clinically meaningful validation strategies?</p><p><strong>Methods: </strong>Metric-related pitfalls reported in the literature were analyzed, combined with insights from the interdisciplinary consensus process \"metrics reloaded\" and its ongoing extension to surgical applications.</p><p><strong>Results: </strong>Recurring weaknesses are observed at the levels of data, metrics and reporting. The lack of consideration of temporal structures and aggregation in video data is particularly critical.</p><p><strong>Discussion: </strong>A structured, clinically grounded validation is essential for the safe use of surgical AI. The metrics reloaded procedure is currently being adapted to address surgery-specific requirements.</p>","PeriodicalId":72588,"journal":{"name":"Chirurgie (Heidelberg, Germany)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chirurgie (Heidelberg, Germany)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00104-025-02348-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Artificial intelligence (AI) is increasingly being used in surgery; however, the validation of such systems is often methodologically insufficient.
Objective: Which validation issues arise in surgical AI and what requirements can be derived for clinically meaningful validation strategies?
Methods: Metric-related pitfalls reported in the literature were analyzed, combined with insights from the interdisciplinary consensus process "metrics reloaded" and its ongoing extension to surgical applications.
Results: Recurring weaknesses are observed at the levels of data, metrics and reporting. The lack of consideration of temporal structures and aggregation in video data is particularly critical.
Discussion: A structured, clinically grounded validation is essential for the safe use of surgical AI. The metrics reloaded procedure is currently being adapted to address surgery-specific requirements.