[评估大型语言模型在回答意大利语和英语乳房x光检查问题中的准确性:一项基于尤索比指南的研究。]

Q3 Medicine
Manuel Signorini, Silvia Fontani, Paola Minichetti, Silvia Teggi, Alessandra Barusco, Massimo Favat
{"title":"[评估大型语言模型在回答意大利语和英语乳房x光检查问题中的准确性:一项基于尤索比指南的研究。]","authors":"Manuel Signorini, Silvia Fontani, Paola Minichetti, Silvia Teggi, Alessandra Barusco, Massimo Favat","doi":"10.1701/4460.44556","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Artificial intelligence (AI) is transforming various aspects of everyday life, including healthcare, through large language models (LLMs) like ChatGPT, Gemini, and Copilot. These systems are increasingly used to disseminate medical information, allowing patients to access simplified explanations. This study aims to compare responses to breast imaging-related questions formulated in Italian and English, based on Eusobi guidelines, evaluating the LLMs' ability to provide accurate and complete answers on mammography screening concepts.</p><p><strong>Materials and methods: </strong>Nine questions related to breast cancer screening were developed by five breast radiologists based on Eusobi recommendations. These questions were submitted to ChatGPT, Gemini, and Copilot in both Italian and English. Responses were evaluated by two expert breast radiologists using a Likert scale (1 to 5), with statistical analysis performed to compare the accuracy, average length of responses, use of radiological sources and the agreement among readers.</p><p><strong>Results: </strong>The average scores for responses were similar in both languages, ranging from 3.6 to 4 out of 5. Questions on general mammography concepts received more accurate answers, while more specific questions based on the latest guidelines showed incomplete responses, especially about the definition of dense breast. The sources used, particularly in Italian, were often non-specialized in radiology, highlighting a limitation of LLMs in providing detailed and up-to-date medical answers.</p><p><strong>Conclusions: </strong>The study shows that LLMs are useful tools for medical communication, but they have limitations in delivering accurate answers on highly specialized medical topics. To improve the quality of information, collaboration between AI experts and healthcare professionals is necessary, especially in breast cancer prevention and screening.</p>","PeriodicalId":20887,"journal":{"name":"Recenti progressi in medicina","volume":"116 3","pages":"162-167"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Evaluating the accuracy of large language models in answering mammography screening questions in Italian and English: a study based on the Eusobi guidelines.]\",\"authors\":\"Manuel Signorini, Silvia Fontani, Paola Minichetti, Silvia Teggi, Alessandra Barusco, Massimo Favat\",\"doi\":\"10.1701/4460.44556\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Artificial intelligence (AI) is transforming various aspects of everyday life, including healthcare, through large language models (LLMs) like ChatGPT, Gemini, and Copilot. These systems are increasingly used to disseminate medical information, allowing patients to access simplified explanations. This study aims to compare responses to breast imaging-related questions formulated in Italian and English, based on Eusobi guidelines, evaluating the LLMs' ability to provide accurate and complete answers on mammography screening concepts.</p><p><strong>Materials and methods: </strong>Nine questions related to breast cancer screening were developed by five breast radiologists based on Eusobi recommendations. These questions were submitted to ChatGPT, Gemini, and Copilot in both Italian and English. Responses were evaluated by two expert breast radiologists using a Likert scale (1 to 5), with statistical analysis performed to compare the accuracy, average length of responses, use of radiological sources and the agreement among readers.</p><p><strong>Results: </strong>The average scores for responses were similar in both languages, ranging from 3.6 to 4 out of 5. Questions on general mammography concepts received more accurate answers, while more specific questions based on the latest guidelines showed incomplete responses, especially about the definition of dense breast. The sources used, particularly in Italian, were often non-specialized in radiology, highlighting a limitation of LLMs in providing detailed and up-to-date medical answers.</p><p><strong>Conclusions: </strong>The study shows that LLMs are useful tools for medical communication, but they have limitations in delivering accurate answers on highly specialized medical topics. To improve the quality of information, collaboration between AI experts and healthcare professionals is necessary, especially in breast cancer prevention and screening.</p>\",\"PeriodicalId\":20887,\"journal\":{\"name\":\"Recenti progressi in medicina\",\"volume\":\"116 3\",\"pages\":\"162-167\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recenti progressi in medicina\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1701/4460.44556\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recenti progressi in medicina","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1701/4460.44556","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

导语:人工智能(AI)正在通过ChatGPT、Gemini和Copilot等大型语言模型(llm)改变日常生活的各个方面,包括医疗保健。这些系统越来越多地用于传播医疗信息,使患者能够获得简化的解释。本研究旨在比较以意大利语和英语表述的乳房成像相关问题的回答,基于Eusobi指南,评估llm在乳房x线摄影筛查概念上提供准确和完整答案的能力。材料和方法:5名乳腺放射科医生根据Eusobi建议开发了9个与乳腺癌筛查相关的问题。这些问题以意大利语和英语提交给ChatGPT、Gemini和Copilot。回复由两位乳腺放射科专家使用李克特量表(1 - 5)进行评估,并进行统计分析以比较准确性、平均回复长度、放射源的使用和读者之间的一致性。结果:两种语言的平均得分相似,在3.6到4分(满分5分)之间。关于一般乳房x线摄影概念的问题得到了更准确的答案,而基于最新指南的更具体的问题则得到了不完整的回答,特别是关于致密乳房的定义。所使用的资料来源,特别是意大利语资料来源,往往是非放射学专业的,这突出了法学硕士在提供详细和最新的医学答案方面的局限性。结论:研究表明llm是医学交流的有用工具,但它们在提供高度专业化的医学主题的准确答案方面存在局限性。为了提高信息质量,人工智能专家和医疗保健专业人员之间的合作是必要的,特别是在乳腺癌预防和筛查方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
[Evaluating the accuracy of large language models in answering mammography screening questions in Italian and English: a study based on the Eusobi guidelines.]

Introduction: Artificial intelligence (AI) is transforming various aspects of everyday life, including healthcare, through large language models (LLMs) like ChatGPT, Gemini, and Copilot. These systems are increasingly used to disseminate medical information, allowing patients to access simplified explanations. This study aims to compare responses to breast imaging-related questions formulated in Italian and English, based on Eusobi guidelines, evaluating the LLMs' ability to provide accurate and complete answers on mammography screening concepts.

Materials and methods: Nine questions related to breast cancer screening were developed by five breast radiologists based on Eusobi recommendations. These questions were submitted to ChatGPT, Gemini, and Copilot in both Italian and English. Responses were evaluated by two expert breast radiologists using a Likert scale (1 to 5), with statistical analysis performed to compare the accuracy, average length of responses, use of radiological sources and the agreement among readers.

Results: The average scores for responses were similar in both languages, ranging from 3.6 to 4 out of 5. Questions on general mammography concepts received more accurate answers, while more specific questions based on the latest guidelines showed incomplete responses, especially about the definition of dense breast. The sources used, particularly in Italian, were often non-specialized in radiology, highlighting a limitation of LLMs in providing detailed and up-to-date medical answers.

Conclusions: The study shows that LLMs are useful tools for medical communication, but they have limitations in delivering accurate answers on highly specialized medical topics. To improve the quality of information, collaboration between AI experts and healthcare professionals is necessary, especially in breast cancer prevention and screening.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Recenti progressi in medicina
Recenti progressi in medicina Medicine-Medicine (all)
CiteScore
0.90
自引率
0.00%
发文量
143
期刊介绍: Giunta ormai al sessantesimo anno, Recenti Progressi in Medicina continua a costituire un sicuro punto di riferimento ed uno strumento di lavoro fondamentale per l"ampliamento dell"orizzonte culturale del medico italiano. Recenti Progressi in Medicina è una rivista di medicina interna. Ciò significa il recupero di un"ottica globale e integrata, idonea ad evitare sia i particolarismi della informazione specialistica sia la frammentazione di quella generalista.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信