Angelo D'Ambrosio, Francesco Baglivo, Luigi De Angelis, Federico Tecchio, Caterina Rizzo
{"title":"大型语言模型在旅行医学认证多项选择题上的表现。","authors":"Angelo D'Ambrosio, Francesco Baglivo, Luigi De Angelis, Federico Tecchio, Caterina Rizzo","doi":"10.1701/4573.45796","DOIUrl":null,"url":null,"abstract":"<p><p>We benchmarked 40 LLMs on a 40 item travel medicine quiz. Bayesian modelling was used to evaluate accuracy, consistency, parsability, and cost metrics. Accuracy spanned 27.9-97.5%; reasoning tuned frontier models (OpenAI o3, Perplexity Sonar Reasoning) topped the benchmark, whereas local small underperformed. Cost accuracy curves revealed five Pareto optimal systems, with o3 being the current best. These findings confirm the performance of current LLMs as public health knowledge support systems.</p>","PeriodicalId":20887,"journal":{"name":"Recenti progressi in medicina","volume":"116 10","pages":"603-604"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance di large language models su quesiti a risposta multipla per la certificazione in medicina dei viaggi.\",\"authors\":\"Angelo D'Ambrosio, Francesco Baglivo, Luigi De Angelis, Federico Tecchio, Caterina Rizzo\",\"doi\":\"10.1701/4573.45796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We benchmarked 40 LLMs on a 40 item travel medicine quiz. Bayesian modelling was used to evaluate accuracy, consistency, parsability, and cost metrics. Accuracy spanned 27.9-97.5%; reasoning tuned frontier models (OpenAI o3, Perplexity Sonar Reasoning) topped the benchmark, whereas local small underperformed. Cost accuracy curves revealed five Pareto optimal systems, with o3 being the current best. These findings confirm the performance of current LLMs as public health knowledge support systems.</p>\",\"PeriodicalId\":20887,\"journal\":{\"name\":\"Recenti progressi in medicina\",\"volume\":\"116 10\",\"pages\":\"603-604\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recenti progressi in medicina\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1701/4573.45796\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recenti progressi in medicina","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1701/4573.45796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
Performance di large language models su quesiti a risposta multipla per la certificazione in medicina dei viaggi.
We benchmarked 40 LLMs on a 40 item travel medicine quiz. Bayesian modelling was used to evaluate accuracy, consistency, parsability, and cost metrics. Accuracy spanned 27.9-97.5%; reasoning tuned frontier models (OpenAI o3, Perplexity Sonar Reasoning) topped the benchmark, whereas local small underperformed. Cost accuracy curves revealed five Pareto optimal systems, with o3 being the current best. These findings confirm the performance of current LLMs as public health knowledge support systems.
期刊介绍:
Giunta ormai al sessantesimo anno, Recenti Progressi in Medicina continua a costituire un sicuro punto di riferimento ed uno strumento di lavoro fondamentale per l"ampliamento dell"orizzonte culturale del medico italiano. Recenti Progressi in Medicina è una rivista di medicina interna. Ciò significa il recupero di un"ottica globale e integrata, idonea ad evitare sia i particolarismi della informazione specialistica sia la frammentazione di quella generalista.