{"title":"Meta LLaMa 3.1在胸部成像和诊断中的性能评价","authors":"Golnaz Lotfian, Keyur Parekh, Pokhraj P. Suthar","doi":"10.1002/ird3.70013","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>The integration of artificial intelligence (AI) in radiology has opened new possibilities for diagnostic accuracy, with large language models (LLMs) showing potential for supporting clinical decision-making. While proprietary models like ChatGPT have gained attention, open-source alternatives such as Meta LLaMa 3.1 remain underexplored. This study aims to evaluate the diagnostic accuracy of LLaMa 3.1 in thoracic imaging and to discuss broader implications of open-source versus proprietary AI models in healthcare.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Meta LLaMa 3.1 (8B parameter version) was tested on 126 multiple-choice thoracic imaging questions selected from <i>Thoracic Imaging: A Core Review</i> by Hobbs et al. These questions required no image interpretation. The model’s answers were validated by two board-certified diagnostic radiologists. Accuracy was assessed overall and across subgroups, including intensive care, pathology, and anatomy. Additionally, a narrative review introduces three widely used AI platforms in thoracic imaging: DeepLesion, ChexNet, and 3D Slicer.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>LLaMa 3.1 achieved an overall accuracy of 61.1%. It performed well in intensive care (90.0%) and terms and signs (83.3%) but showed variability across subgroups, with lower accuracy in normal anatomy and basic imaging (40.0%). Subgroup analysis revealed strengths in infectious pneumonia and pleural disease, but notable weaknesses in lung cancer and vascular pathology.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>LLaMa 3.1 demonstrates promise as an open-source NLP tool in thoracic diagnostics, though its performance variability highlights the need for refinement and domain-specific training. Open-source models offer transparency and accessibility, while proprietary models deliver consistency. Both hold value, depending on clinical context and resource availability.</p>\n </section>\n </div>","PeriodicalId":73508,"journal":{"name":"iRadiology","volume":"3 4","pages":"279-288"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ird3.70013","citationCount":"0","resultStr":"{\"title\":\"Performance Review of Meta LLaMa 3.1 in Thoracic Imaging and Diagnostics\",\"authors\":\"Golnaz Lotfian, Keyur Parekh, Pokhraj P. Suthar\",\"doi\":\"10.1002/ird3.70013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>The integration of artificial intelligence (AI) in radiology has opened new possibilities for diagnostic accuracy, with large language models (LLMs) showing potential for supporting clinical decision-making. While proprietary models like ChatGPT have gained attention, open-source alternatives such as Meta LLaMa 3.1 remain underexplored. This study aims to evaluate the diagnostic accuracy of LLaMa 3.1 in thoracic imaging and to discuss broader implications of open-source versus proprietary AI models in healthcare.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Meta LLaMa 3.1 (8B parameter version) was tested on 126 multiple-choice thoracic imaging questions selected from <i>Thoracic Imaging: A Core Review</i> by Hobbs et al. These questions required no image interpretation. The model’s answers were validated by two board-certified diagnostic radiologists. Accuracy was assessed overall and across subgroups, including intensive care, pathology, and anatomy. Additionally, a narrative review introduces three widely used AI platforms in thoracic imaging: DeepLesion, ChexNet, and 3D Slicer.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>LLaMa 3.1 achieved an overall accuracy of 61.1%. It performed well in intensive care (90.0%) and terms and signs (83.3%) but showed variability across subgroups, with lower accuracy in normal anatomy and basic imaging (40.0%). Subgroup analysis revealed strengths in infectious pneumonia and pleural disease, but notable weaknesses in lung cancer and vascular pathology.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>LLaMa 3.1 demonstrates promise as an open-source NLP tool in thoracic diagnostics, though its performance variability highlights the need for refinement and domain-specific training. Open-source models offer transparency and accessibility, while proprietary models deliver consistency. Both hold value, depending on clinical context and resource availability.</p>\\n </section>\\n </div>\",\"PeriodicalId\":73508,\"journal\":{\"name\":\"iRadiology\",\"volume\":\"3 4\",\"pages\":\"279-288\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ird3.70013\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"iRadiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ird3.70013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"iRadiology","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ird3.70013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Review of Meta LLaMa 3.1 in Thoracic Imaging and Diagnostics
Background
The integration of artificial intelligence (AI) in radiology has opened new possibilities for diagnostic accuracy, with large language models (LLMs) showing potential for supporting clinical decision-making. While proprietary models like ChatGPT have gained attention, open-source alternatives such as Meta LLaMa 3.1 remain underexplored. This study aims to evaluate the diagnostic accuracy of LLaMa 3.1 in thoracic imaging and to discuss broader implications of open-source versus proprietary AI models in healthcare.
Methods
Meta LLaMa 3.1 (8B parameter version) was tested on 126 multiple-choice thoracic imaging questions selected from Thoracic Imaging: A Core Review by Hobbs et al. These questions required no image interpretation. The model’s answers were validated by two board-certified diagnostic radiologists. Accuracy was assessed overall and across subgroups, including intensive care, pathology, and anatomy. Additionally, a narrative review introduces three widely used AI platforms in thoracic imaging: DeepLesion, ChexNet, and 3D Slicer.
Results
LLaMa 3.1 achieved an overall accuracy of 61.1%. It performed well in intensive care (90.0%) and terms and signs (83.3%) but showed variability across subgroups, with lower accuracy in normal anatomy and basic imaging (40.0%). Subgroup analysis revealed strengths in infectious pneumonia and pleural disease, but notable weaknesses in lung cancer and vascular pathology.
Conclusion
LLaMa 3.1 demonstrates promise as an open-source NLP tool in thoracic diagnostics, though its performance variability highlights the need for refinement and domain-specific training. Open-source models offer transparency and accessibility, while proprietary models deliver consistency. Both hold value, depending on clinical context and resource availability.