W. Wiktor Jedrzejczak, Piotr H. Skarzynski, Danuta Raj-Koziak, Milaine Dominici Sanfins, Stavros Hatzopoulos, Krzysztof Kochanek
{"title":"ChatGPT for tinnitus information and support: response accuracy and retest after three months","authors":"W. Wiktor Jedrzejczak, Piotr H. Skarzynski, Danuta Raj-Koziak, Milaine Dominici Sanfins, Stavros Hatzopoulos, Krzysztof Kochanek","doi":"10.1101/2023.12.19.23300189","DOIUrl":null,"url":null,"abstract":"Background: ChatGPT - a conversational tool based on artificial intelligence - has recently been tested on a range of topics. However most of the testing has involved broad domains of knowledge. Here we test ChatGPT's knowledge of tinnitus, an important but specialized aspect of audiology and otolaryngology. Testing involved evaluating ChatGPT's answers to a defined set of 10 questions on tinnitus. Furthermore, given the technology is advancing quickly, we re-evaluated the responses to the same 10 questions 3 months later.\nMaterial and method: ChatGPT (free version 3.5) was asked 10 questions on tinnitus at two points of time - August 2023 and November 2023. The accuracy of the responses was rated by 6 experts using a Likert scale ranging from 1 to 5. The number of words in each response was also counted, and responses were specifically examined for whether references were provided or whether consultation with a specialist was suggested.\nResults: Most of ChatGPT's responses were rated as satisfactory or better. However, we did detect a few instances where the responses were not accurate and might be considered somewhat misleading. The responses from ChatGPT were quite long (averaging over 400 words) and they occasionally tended to stray off-topic. No solid references to sources of information were ever supplied, and when references were specifically asked for the sources were artificial. For most responses consultation with a specialist was suggested. It is worth noting that after 3 months the responses generally improved.\nConclusions: ChatGPT provided surprisingly good responses, given that the questions were quite specific. Although no potentially harmful errors were identified, some mistakes could be seen as somewhat misleading. No solid references were ever supplied. ChatGPT shows great potential if further developed by experts in specific areas, but for now it is not yet ready for serious application.","PeriodicalId":501185,"journal":{"name":"medRxiv - Otolaryngology","volume":"66 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Otolaryngology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.12.19.23300189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: ChatGPT - a conversational tool based on artificial intelligence - has recently been tested on a range of topics. However most of the testing has involved broad domains of knowledge. Here we test ChatGPT's knowledge of tinnitus, an important but specialized aspect of audiology and otolaryngology. Testing involved evaluating ChatGPT's answers to a defined set of 10 questions on tinnitus. Furthermore, given the technology is advancing quickly, we re-evaluated the responses to the same 10 questions 3 months later.
Material and method: ChatGPT (free version 3.5) was asked 10 questions on tinnitus at two points of time - August 2023 and November 2023. The accuracy of the responses was rated by 6 experts using a Likert scale ranging from 1 to 5. The number of words in each response was also counted, and responses were specifically examined for whether references were provided or whether consultation with a specialist was suggested.
Results: Most of ChatGPT's responses were rated as satisfactory or better. However, we did detect a few instances where the responses were not accurate and might be considered somewhat misleading. The responses from ChatGPT were quite long (averaging over 400 words) and they occasionally tended to stray off-topic. No solid references to sources of information were ever supplied, and when references were specifically asked for the sources were artificial. For most responses consultation with a specialist was suggested. It is worth noting that after 3 months the responses generally improved.
Conclusions: ChatGPT provided surprisingly good responses, given that the questions were quite specific. Although no potentially harmful errors were identified, some mistakes could be seen as somewhat misleading. No solid references were ever supplied. ChatGPT shows great potential if further developed by experts in specific areas, but for now it is not yet ready for serious application.