Małgorzata Pastucha , Anna Ratuszniak , Małgorzata Ganc , Edyta Piłka , Iryna Drohobycka , Henryk Skarżyński , W. Wiktor Jedrzejczak
{"title":"ChatGPT作为一种决策支持工具,可以更好地自我监测听力","authors":"Małgorzata Pastucha , Anna Ratuszniak , Małgorzata Ganc , Edyta Piłka , Iryna Drohobycka , Henryk Skarżyński , W. Wiktor Jedrzejczak","doi":"10.1016/j.amjoto.2025.104711","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The rapid development of large language model chatbots, such as ChatGPT, has created new possibilities for healthcare support. This study investigates the feasibility of integrating self-monitoring of hearing (via a mobile app) with ChatGPT's decision-making capabilities to assess whether specialist consultation is required. In particular, the study evaluated how ChatGPT's accuracy to make a recommendation changed over periods of up to 12 months.</div></div><div><h3>Methods</h3><div>ChatGPT-4o was tested on a dataset of 1000 simulated cases, each containing monthly hearing threshold measurements over periods of up to 12 months. Its recommendations were compared to the opinions of 5 experts using percent agreement and Cohen's Kappa. A multiple-response strategy, selecting the most frequent recommendation from 5 trials, was also analyzed.</div></div><div><h3>Results</h3><div>ChatGPT aligned strongly with the experts' judgments, with agreement scores ranging from 0.80 to 0.84. Accuracy scores improved to 0.87 when the multiple-query strategy was employed. In those cases where all 5 experts unanimously agreed, ChatGPT achieved a near-perfect agreement score of 0.99. It adapted its decision-making criteria with extended observation periods, seemingly accounting for potential random fluctuations in hearing thresholds.</div></div><div><h3>Conclusions</h3><div>ChatGPT has significant potential as a decision-support tool for monitoring hearing, able to match expert recommendations and adapting effectively to time-series data. Existing hearing self-testing apps lack capabilities for tracking and evaluating changes over time; integrating ChatGPT could fill this gap. While not without its limitations, ChatGPT offers a promising complement to self-monitoring. It can enhance decision-making processes and potentially encourage patients to seek clinical expertise when needed.</div></div>","PeriodicalId":7591,"journal":{"name":"American Journal of Otolaryngology","volume":"46 5","pages":"Article 104711"},"PeriodicalIF":1.7000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatGPT as a decision-support tool for better self-monitoring of hearing\",\"authors\":\"Małgorzata Pastucha , Anna Ratuszniak , Małgorzata Ganc , Edyta Piłka , Iryna Drohobycka , Henryk Skarżyński , W. Wiktor Jedrzejczak\",\"doi\":\"10.1016/j.amjoto.2025.104711\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>The rapid development of large language model chatbots, such as ChatGPT, has created new possibilities for healthcare support. This study investigates the feasibility of integrating self-monitoring of hearing (via a mobile app) with ChatGPT's decision-making capabilities to assess whether specialist consultation is required. In particular, the study evaluated how ChatGPT's accuracy to make a recommendation changed over periods of up to 12 months.</div></div><div><h3>Methods</h3><div>ChatGPT-4o was tested on a dataset of 1000 simulated cases, each containing monthly hearing threshold measurements over periods of up to 12 months. Its recommendations were compared to the opinions of 5 experts using percent agreement and Cohen's Kappa. A multiple-response strategy, selecting the most frequent recommendation from 5 trials, was also analyzed.</div></div><div><h3>Results</h3><div>ChatGPT aligned strongly with the experts' judgments, with agreement scores ranging from 0.80 to 0.84. Accuracy scores improved to 0.87 when the multiple-query strategy was employed. In those cases where all 5 experts unanimously agreed, ChatGPT achieved a near-perfect agreement score of 0.99. It adapted its decision-making criteria with extended observation periods, seemingly accounting for potential random fluctuations in hearing thresholds.</div></div><div><h3>Conclusions</h3><div>ChatGPT has significant potential as a decision-support tool for monitoring hearing, able to match expert recommendations and adapting effectively to time-series data. Existing hearing self-testing apps lack capabilities for tracking and evaluating changes over time; integrating ChatGPT could fill this gap. While not without its limitations, ChatGPT offers a promising complement to self-monitoring. It can enhance decision-making processes and potentially encourage patients to seek clinical expertise when needed.</div></div>\",\"PeriodicalId\":7591,\"journal\":{\"name\":\"American Journal of Otolaryngology\",\"volume\":\"46 5\",\"pages\":\"Article 104711\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Otolaryngology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0196070925001140\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OTORHINOLARYNGOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Otolaryngology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0196070925001140","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}
ChatGPT as a decision-support tool for better self-monitoring of hearing
Background
The rapid development of large language model chatbots, such as ChatGPT, has created new possibilities for healthcare support. This study investigates the feasibility of integrating self-monitoring of hearing (via a mobile app) with ChatGPT's decision-making capabilities to assess whether specialist consultation is required. In particular, the study evaluated how ChatGPT's accuracy to make a recommendation changed over periods of up to 12 months.
Methods
ChatGPT-4o was tested on a dataset of 1000 simulated cases, each containing monthly hearing threshold measurements over periods of up to 12 months. Its recommendations were compared to the opinions of 5 experts using percent agreement and Cohen's Kappa. A multiple-response strategy, selecting the most frequent recommendation from 5 trials, was also analyzed.
Results
ChatGPT aligned strongly with the experts' judgments, with agreement scores ranging from 0.80 to 0.84. Accuracy scores improved to 0.87 when the multiple-query strategy was employed. In those cases where all 5 experts unanimously agreed, ChatGPT achieved a near-perfect agreement score of 0.99. It adapted its decision-making criteria with extended observation periods, seemingly accounting for potential random fluctuations in hearing thresholds.
Conclusions
ChatGPT has significant potential as a decision-support tool for monitoring hearing, able to match expert recommendations and adapting effectively to time-series data. Existing hearing self-testing apps lack capabilities for tracking and evaluating changes over time; integrating ChatGPT could fill this gap. While not without its limitations, ChatGPT offers a promising complement to self-monitoring. It can enhance decision-making processes and potentially encourage patients to seek clinical expertise when needed.
期刊介绍:
Be fully informed about developments in otology, neurotology, audiology, rhinology, allergy, laryngology, speech science, bronchoesophagology, facial plastic surgery, and head and neck surgery. Featured sections include original contributions, grand rounds, current reviews, case reports and socioeconomics.