Saverio La Bella, Deniz Bayraktar, Annamaria Porreca, Linda C Li, Marina Attanasi, Emil Aliyev, Angela Nyangore Migowa, Christiaan Scott, Darpan R Thakare, Yagmur Bayindir, Alessandro Consolaro, Brian M Feldman, Seza Ozen
{"title":"人工智能生成的青少年特发性关节炎信息的全球变化。","authors":"Saverio La Bella, Deniz Bayraktar, Annamaria Porreca, Linda C Li, Marina Attanasi, Emil Aliyev, Angela Nyangore Migowa, Christiaan Scott, Darpan R Thakare, Yagmur Bayindir, Alessandro Consolaro, Brian M Feldman, Seza Ozen","doi":"10.1093/rheumatology/keaf329","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>We aimed to evaluate similarities and variations of information provided by Large Language Models (LLMs) across diverse world regions by analyzing responses to validated questions on oligoarticular juvenile idiopathic arthritis (oJIA).</p><p><strong>Methods: </strong>The ten PICOs related to the oJIA treatment on the 2021 American College of Rheumatology recommendations were simultaneously prompted in English to ChatGPT 4o from five different countries (Canada, India, Italy, Kenya and Türkiye). Readability was assessed through the Flesch Reading Ease Score (FRES), distinctiveness of terms through the Term Frequency-Inverse Document Frequency (TF-IDF) analysis. Co-occurrence networks (CONs) detailed the relationships between terms. Three experts rated the adherence of responses to recommendations using a Likert-like scale.</p><p><strong>Results: </strong>All the responses were difficult or very difficult to read, with a median FRES of 30 [24-34]. Depending on the expert, 52% to 84% of responses were mostly or fully adherent to the recommendations, with similar adherence rates across countries. No response was not adherent at all. Inter-rater agreement on the adherence of LLM-generated responses was generally weak (Kappa values mostly below 0.40), highlighting the challenges of consistently evaluating AI-generated medical information. The TF-IDF analysis showed that the distinctiveness of terminology in LLM-generated responses varied across countries, with scores ranging from 0.60-0.85. CONs detailed a strong focus on intra-articular corticosteroid treatments in Italy and emphasis on short- and long-term outcomes in Kenya.</p><p><strong>Conclusion: </strong>LLM-generated content should be critically evaluated in clinical practice, especially in the context of regional differences.</p>","PeriodicalId":21255,"journal":{"name":"Rheumatology","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Global variations in artificial intelligence-generated information on juvenile idiopathic arthritis.\",\"authors\":\"Saverio La Bella, Deniz Bayraktar, Annamaria Porreca, Linda C Li, Marina Attanasi, Emil Aliyev, Angela Nyangore Migowa, Christiaan Scott, Darpan R Thakare, Yagmur Bayindir, Alessandro Consolaro, Brian M Feldman, Seza Ozen\",\"doi\":\"10.1093/rheumatology/keaf329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>We aimed to evaluate similarities and variations of information provided by Large Language Models (LLMs) across diverse world regions by analyzing responses to validated questions on oligoarticular juvenile idiopathic arthritis (oJIA).</p><p><strong>Methods: </strong>The ten PICOs related to the oJIA treatment on the 2021 American College of Rheumatology recommendations were simultaneously prompted in English to ChatGPT 4o from five different countries (Canada, India, Italy, Kenya and Türkiye). Readability was assessed through the Flesch Reading Ease Score (FRES), distinctiveness of terms through the Term Frequency-Inverse Document Frequency (TF-IDF) analysis. Co-occurrence networks (CONs) detailed the relationships between terms. Three experts rated the adherence of responses to recommendations using a Likert-like scale.</p><p><strong>Results: </strong>All the responses were difficult or very difficult to read, with a median FRES of 30 [24-34]. Depending on the expert, 52% to 84% of responses were mostly or fully adherent to the recommendations, with similar adherence rates across countries. No response was not adherent at all. Inter-rater agreement on the adherence of LLM-generated responses was generally weak (Kappa values mostly below 0.40), highlighting the challenges of consistently evaluating AI-generated medical information. The TF-IDF analysis showed that the distinctiveness of terminology in LLM-generated responses varied across countries, with scores ranging from 0.60-0.85. CONs detailed a strong focus on intra-articular corticosteroid treatments in Italy and emphasis on short- and long-term outcomes in Kenya.</p><p><strong>Conclusion: </strong>LLM-generated content should be critically evaluated in clinical practice, especially in the context of regional differences.</p>\",\"PeriodicalId\":21255,\"journal\":{\"name\":\"Rheumatology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Rheumatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/rheumatology/keaf329\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RHEUMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/rheumatology/keaf329","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
Global variations in artificial intelligence-generated information on juvenile idiopathic arthritis.
Objectives: We aimed to evaluate similarities and variations of information provided by Large Language Models (LLMs) across diverse world regions by analyzing responses to validated questions on oligoarticular juvenile idiopathic arthritis (oJIA).
Methods: The ten PICOs related to the oJIA treatment on the 2021 American College of Rheumatology recommendations were simultaneously prompted in English to ChatGPT 4o from five different countries (Canada, India, Italy, Kenya and Türkiye). Readability was assessed through the Flesch Reading Ease Score (FRES), distinctiveness of terms through the Term Frequency-Inverse Document Frequency (TF-IDF) analysis. Co-occurrence networks (CONs) detailed the relationships between terms. Three experts rated the adherence of responses to recommendations using a Likert-like scale.
Results: All the responses were difficult or very difficult to read, with a median FRES of 30 [24-34]. Depending on the expert, 52% to 84% of responses were mostly or fully adherent to the recommendations, with similar adherence rates across countries. No response was not adherent at all. Inter-rater agreement on the adherence of LLM-generated responses was generally weak (Kappa values mostly below 0.40), highlighting the challenges of consistently evaluating AI-generated medical information. The TF-IDF analysis showed that the distinctiveness of terminology in LLM-generated responses varied across countries, with scores ranging from 0.60-0.85. CONs detailed a strong focus on intra-articular corticosteroid treatments in Italy and emphasis on short- and long-term outcomes in Kenya.
Conclusion: LLM-generated content should be critically evaluated in clinical practice, especially in the context of regional differences.
期刊介绍:
Rheumatology strives to support research and discovery by publishing the highest quality original scientific papers with a focus on basic, clinical and translational research. The journal’s subject areas cover a wide range of paediatric and adult rheumatological conditions from an international perspective. It is an official journal of the British Society for Rheumatology, published by Oxford University Press.
Rheumatology publishes original articles, reviews, editorials, guidelines, concise reports, meta-analyses, original case reports, clinical vignettes, letters and matters arising from published material. The journal takes pride in serving the global rheumatology community, with a focus on high societal impact in the form of podcasts, videos and extended social media presence, and utilizing metrics such as Altmetric. Keep up to date by following the journal on Twitter @RheumJnl.