Edoardo Leo, Francesco Baglivo, Federico Starace, Andrea Romigi, Elena Antelmi, Caterina Rizzo, Ugo Faraguna
{"title":"人工智能与睡眠医学:意大利睡眠医学学院考试中大型语言模型与retrieval-增强一代的比较评估。","authors":"Edoardo Leo, Francesco Baglivo, Federico Starace, Andrea Romigi, Elena Antelmi, Caterina Rizzo, Ugo Faraguna","doi":"10.1701/4573.45797","DOIUrl":null,"url":null,"abstract":"<p><p>Using Sleep Medicine guidelines and textbook, we evaluated four large language models (LLMs) (Llama 3.2 3B, Llama 3.3 70B, GPT 4o mini, Gemini 2.0 Flash) on AIMS certification questions, comparing baseline and Retrieval Augmented Generation (RAG) performance. RAG improved accuracy in all models (e.g., Llama 3.2 +9.6 points, Gemini 2.0 +4.0 points), highlighting RAG's role in enhancing LLM reliability in specialized medical domain.</p>","PeriodicalId":20887,"journal":{"name":"Recenti progressi in medicina","volume":"116 10","pages":"605-606"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intelligenza artificiale e medicina del sonno: valutazione comparativa di large language models sull’esame dell’Accademia Italiana di Medicina del Sonno con retrieval-augmented generation.\",\"authors\":\"Edoardo Leo, Francesco Baglivo, Federico Starace, Andrea Romigi, Elena Antelmi, Caterina Rizzo, Ugo Faraguna\",\"doi\":\"10.1701/4573.45797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Using Sleep Medicine guidelines and textbook, we evaluated four large language models (LLMs) (Llama 3.2 3B, Llama 3.3 70B, GPT 4o mini, Gemini 2.0 Flash) on AIMS certification questions, comparing baseline and Retrieval Augmented Generation (RAG) performance. RAG improved accuracy in all models (e.g., Llama 3.2 +9.6 points, Gemini 2.0 +4.0 points), highlighting RAG's role in enhancing LLM reliability in specialized medical domain.</p>\",\"PeriodicalId\":20887,\"journal\":{\"name\":\"Recenti progressi in medicina\",\"volume\":\"116 10\",\"pages\":\"605-606\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recenti progressi in medicina\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1701/4573.45797\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recenti progressi in medicina","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1701/4573.45797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
Intelligenza artificiale e medicina del sonno: valutazione comparativa di large language models sull’esame dell’Accademia Italiana di Medicina del Sonno con retrieval-augmented generation.
Using Sleep Medicine guidelines and textbook, we evaluated four large language models (LLMs) (Llama 3.2 3B, Llama 3.3 70B, GPT 4o mini, Gemini 2.0 Flash) on AIMS certification questions, comparing baseline and Retrieval Augmented Generation (RAG) performance. RAG improved accuracy in all models (e.g., Llama 3.2 +9.6 points, Gemini 2.0 +4.0 points), highlighting RAG's role in enhancing LLM reliability in specialized medical domain.
期刊介绍:
Giunta ormai al sessantesimo anno, Recenti Progressi in Medicina continua a costituire un sicuro punto di riferimento ed uno strumento di lavoro fondamentale per l"ampliamento dell"orizzonte culturale del medico italiano. Recenti Progressi in Medicina è una rivista di medicina interna. Ciò significa il recupero di un"ottica globale e integrata, idonea ad evitare sia i particolarismi della informazione specialistica sia la frammentazione di quella generalista.