Mattia Nigro MD , Andrea Aliverti PhD , Alessandra Angelucci PhD , Fulvio Braido MD, PhD , Giorgio W. Canonica MD , Apostolos Bossios MD, PhD , Hilary Pinnock MD , Jeanette Boyd PhD , Pippa Powell PhD , Stefano Aliberti MD , Artificial Intelligence Responses on Asthma Study Task Force
{"title":"人工智能生成的哮喘患者问题的答案:AIR-Asthma研究。","authors":"Mattia Nigro MD , Andrea Aliverti PhD , Alessandra Angelucci PhD , Fulvio Braido MD, PhD , Giorgio W. Canonica MD , Apostolos Bossios MD, PhD , Hilary Pinnock MD , Jeanette Boyd PhD , Pippa Powell PhD , Stefano Aliberti MD , Artificial Intelligence Responses on Asthma Study Task Force","doi":"10.1016/j.jaip.2025.04.051","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Asthma is a prevalent chronic respiratory disease requiring ongoing patient education and individualized management. The increasing reliance on digital tools, particularly generative artificial intelligence (AI), to answer health-related questions has raised concerns about the accuracy, reliability, and comprehensibility of AI-generated information for people living with asthma.</div></div><div><h3>Objective</h3><div>To evaluate systematically the reliability, accuracy, comprehensiveness, and understandability of responses generated by three widely used artificial intelligence–based chatbots (ChatGPT, Bard, and Copilot) to common questions formulated by people with asthma.</div></div><div><h3>Methods</h3><div>In this cross-sectional study, 15 questions regarding asthma management were formulated by patients and categorized by difficulty. Responses from ChatGPT, Bard, and Copilot were evaluated by international experts for accuracy and comprehensiveness, and by patient representatives for understandability. Reliability was assessed through consistency testing across devices. We conducted a blinded evaluation.</div></div><div><h3>Results</h3><div>A total of 21 experts and 16 patient representatives participated in the evaluation. ChatGPT demonstrated the highest reliability (15 of 15 responses), accuracy (median score, 9.0; interquartile range [IQR], 7.0-9.0), and comprehensiveness (median score, 8.0; IQR, 8.0-9.0) compared with Bard and Copilot (<em>P</em> < .0001). Bard achieved superior scores in understandability (median score, 9.0; IQR, 8.0-10.0; <em>P</em> < .0001). Performance differences were consistent across question difficulty levels.</div></div><div><h3>Conclusions</h3><div>Artificial intelligence–driven chatbots can provide generally accurate and understandable responses to asthma-related questions. Variability in reliability and accuracy underscores the need for caution in clinical contexts. Artificial intelligence tools may complement but cannot replace professional medical advice in asthma management.</div></div>","PeriodicalId":51323,"journal":{"name":"Journal of Allergy and Clinical Immunology-In Practice","volume":"13 9","pages":"Pages 2390-2396"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence–Generated Answers to Patients’ Questions on Asthma: The Artificial Intelligence Responses on Asthma Study\",\"authors\":\"Mattia Nigro MD , Andrea Aliverti PhD , Alessandra Angelucci PhD , Fulvio Braido MD, PhD , Giorgio W. Canonica MD , Apostolos Bossios MD, PhD , Hilary Pinnock MD , Jeanette Boyd PhD , Pippa Powell PhD , Stefano Aliberti MD , Artificial Intelligence Responses on Asthma Study Task Force\",\"doi\":\"10.1016/j.jaip.2025.04.051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Asthma is a prevalent chronic respiratory disease requiring ongoing patient education and individualized management. The increasing reliance on digital tools, particularly generative artificial intelligence (AI), to answer health-related questions has raised concerns about the accuracy, reliability, and comprehensibility of AI-generated information for people living with asthma.</div></div><div><h3>Objective</h3><div>To evaluate systematically the reliability, accuracy, comprehensiveness, and understandability of responses generated by three widely used artificial intelligence–based chatbots (ChatGPT, Bard, and Copilot) to common questions formulated by people with asthma.</div></div><div><h3>Methods</h3><div>In this cross-sectional study, 15 questions regarding asthma management were formulated by patients and categorized by difficulty. Responses from ChatGPT, Bard, and Copilot were evaluated by international experts for accuracy and comprehensiveness, and by patient representatives for understandability. Reliability was assessed through consistency testing across devices. We conducted a blinded evaluation.</div></div><div><h3>Results</h3><div>A total of 21 experts and 16 patient representatives participated in the evaluation. ChatGPT demonstrated the highest reliability (15 of 15 responses), accuracy (median score, 9.0; interquartile range [IQR], 7.0-9.0), and comprehensiveness (median score, 8.0; IQR, 8.0-9.0) compared with Bard and Copilot (<em>P</em> < .0001). Bard achieved superior scores in understandability (median score, 9.0; IQR, 8.0-10.0; <em>P</em> < .0001). Performance differences were consistent across question difficulty levels.</div></div><div><h3>Conclusions</h3><div>Artificial intelligence–driven chatbots can provide generally accurate and understandable responses to asthma-related questions. Variability in reliability and accuracy underscores the need for caution in clinical contexts. Artificial intelligence tools may complement but cannot replace professional medical advice in asthma management.</div></div>\",\"PeriodicalId\":51323,\"journal\":{\"name\":\"Journal of Allergy and Clinical Immunology-In Practice\",\"volume\":\"13 9\",\"pages\":\"Pages 2390-2396\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Allergy and Clinical Immunology-In Practice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2213219825004209\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ALLERGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Allergy and Clinical Immunology-In Practice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213219825004209","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ALLERGY","Score":null,"Total":0}
Artificial Intelligence–Generated Answers to Patients’ Questions on Asthma: The Artificial Intelligence Responses on Asthma Study
Background
Asthma is a prevalent chronic respiratory disease requiring ongoing patient education and individualized management. The increasing reliance on digital tools, particularly generative artificial intelligence (AI), to answer health-related questions has raised concerns about the accuracy, reliability, and comprehensibility of AI-generated information for people living with asthma.
Objective
To evaluate systematically the reliability, accuracy, comprehensiveness, and understandability of responses generated by three widely used artificial intelligence–based chatbots (ChatGPT, Bard, and Copilot) to common questions formulated by people with asthma.
Methods
In this cross-sectional study, 15 questions regarding asthma management were formulated by patients and categorized by difficulty. Responses from ChatGPT, Bard, and Copilot were evaluated by international experts for accuracy and comprehensiveness, and by patient representatives for understandability. Reliability was assessed through consistency testing across devices. We conducted a blinded evaluation.
Results
A total of 21 experts and 16 patient representatives participated in the evaluation. ChatGPT demonstrated the highest reliability (15 of 15 responses), accuracy (median score, 9.0; interquartile range [IQR], 7.0-9.0), and comprehensiveness (median score, 8.0; IQR, 8.0-9.0) compared with Bard and Copilot (P < .0001). Bard achieved superior scores in understandability (median score, 9.0; IQR, 8.0-10.0; P < .0001). Performance differences were consistent across question difficulty levels.
Conclusions
Artificial intelligence–driven chatbots can provide generally accurate and understandable responses to asthma-related questions. Variability in reliability and accuracy underscores the need for caution in clinical contexts. Artificial intelligence tools may complement but cannot replace professional medical advice in asthma management.
期刊介绍:
JACI: In Practice is an official publication of the American Academy of Allergy, Asthma & Immunology (AAAAI). It is a companion title to The Journal of Allergy and Clinical Immunology, and it aims to provide timely clinical papers, case reports, and management recommendations to clinical allergists and other physicians dealing with allergic and immunologic diseases in their practice. The mission of JACI: In Practice is to offer valid and impactful information that supports evidence-based clinical decisions in the diagnosis and management of asthma, allergies, immunologic conditions, and related diseases.
This journal publishes articles on various conditions treated by allergist-immunologists, including food allergy, respiratory disorders (such as asthma, rhinitis, nasal polyps, sinusitis, cough, ABPA, and hypersensitivity pneumonitis), drug allergy, insect sting allergy, anaphylaxis, dermatologic disorders (such as atopic dermatitis, contact dermatitis, urticaria, angioedema, and HAE), immunodeficiency, autoinflammatory syndromes, eosinophilic disorders, and mast cell disorders.
The focus of the journal is on providing cutting-edge clinical information that practitioners can use in their everyday practice or to acquire new knowledge and skills for the benefit of their patients. However, mechanistic or translational studies without immediate or near future clinical relevance, as well as animal studies, are not within the scope of the journal.