{"title":"[人工智能算法在外科实践中的验证]。","authors":"Annika Reinke","doi":"10.1007/s00104-025-02348-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) is increasingly being used in surgery; however, the validation of such systems is often methodologically insufficient.</p><p><strong>Objective: </strong>Which validation issues arise in surgical AI and what requirements can be derived for clinically meaningful validation strategies?</p><p><strong>Methods: </strong>Metric-related pitfalls reported in the literature were analyzed, combined with insights from the interdisciplinary consensus process \"metrics reloaded\" and its ongoing extension to surgical applications.</p><p><strong>Results: </strong>Recurring weaknesses are observed at the levels of data, metrics and reporting. The lack of consideration of temporal structures and aggregation in video data is particularly critical.</p><p><strong>Discussion: </strong>A structured, clinically grounded validation is essential for the safe use of surgical AI. The metrics reloaded procedure is currently being adapted to address surgery-specific requirements.</p>","PeriodicalId":72588,"journal":{"name":"Chirurgie (Heidelberg, Germany)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Validation of artificial intelligence algorithms for the surgical practice].\",\"authors\":\"Annika Reinke\",\"doi\":\"10.1007/s00104-025-02348-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Artificial intelligence (AI) is increasingly being used in surgery; however, the validation of such systems is often methodologically insufficient.</p><p><strong>Objective: </strong>Which validation issues arise in surgical AI and what requirements can be derived for clinically meaningful validation strategies?</p><p><strong>Methods: </strong>Metric-related pitfalls reported in the literature were analyzed, combined with insights from the interdisciplinary consensus process \\\"metrics reloaded\\\" and its ongoing extension to surgical applications.</p><p><strong>Results: </strong>Recurring weaknesses are observed at the levels of data, metrics and reporting. The lack of consideration of temporal structures and aggregation in video data is particularly critical.</p><p><strong>Discussion: </strong>A structured, clinically grounded validation is essential for the safe use of surgical AI. The metrics reloaded procedure is currently being adapted to address surgery-specific requirements.</p>\",\"PeriodicalId\":72588,\"journal\":{\"name\":\"Chirurgie (Heidelberg, Germany)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chirurgie (Heidelberg, Germany)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00104-025-02348-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chirurgie (Heidelberg, Germany)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00104-025-02348-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
[Validation of artificial intelligence algorithms for the surgical practice].
Background: Artificial intelligence (AI) is increasingly being used in surgery; however, the validation of such systems is often methodologically insufficient.
Objective: Which validation issues arise in surgical AI and what requirements can be derived for clinically meaningful validation strategies?
Methods: Metric-related pitfalls reported in the literature were analyzed, combined with insights from the interdisciplinary consensus process "metrics reloaded" and its ongoing extension to surgical applications.
Results: Recurring weaknesses are observed at the levels of data, metrics and reporting. The lack of consideration of temporal structures and aggregation in video data is particularly critical.
Discussion: A structured, clinically grounded validation is essential for the safe use of surgical AI. The metrics reloaded procedure is currently being adapted to address surgery-specific requirements.