Kyle W. Lawrence, Akram A. Habibi, Spencer A. Ward, Claudette M. Lajam, Ran Schwarzkopf, Joshua C. Rozell
{"title":"人工与人工智能生成的关节成形术文献:对感知交流、质量和作者来源的单盲分析。","authors":"Kyle W. Lawrence, Akram A. Habibi, Spencer A. Ward, Claudette M. Lajam, Ran Schwarzkopf, Joshua C. Rozell","doi":"10.1002/rcs.2621","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Large language models (LLM) have unknown implications for medical research. This study assessed whether LLM-generated abstracts are distinguishable from human-written abstracts and to compare their perceived quality.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>The LLM ChatGPT was used to generate 20 arthroplasty abstracts (AI-generated) based on full-text manuscripts, which were compared to originally published abstracts (human-written). Six blinded orthopaedic surgeons rated abstracts on overall quality, communication, and confidence in the authorship source. Authorship-confidence scores were compared to a test value representing complete inability to discern authorship.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Modestly increased confidence in human authorship was observed for human-written abstracts compared with AI-generated abstracts (<i>p</i> = 0.028), though AI-generated abstract authorship-confidence scores were statistically consistent with inability to discern authorship (<i>p</i> = 0.999). Overall abstract quality was higher for human-written abstracts (<i>p</i> = 0.019).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>AI-generated abstracts' absolute authorship-confidence ratings demonstrated difficulty in discerning authorship but did not achieve the perceived quality of human-written abstracts. Caution is warranted in implementing LLMs into scientific writing.</p>\n </section>\n </div>","PeriodicalId":50311,"journal":{"name":"International Journal of Medical Robotics and Computer Assisted Surgery","volume":"20 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source\",\"authors\":\"Kyle W. Lawrence, Akram A. Habibi, Spencer A. Ward, Claudette M. Lajam, Ran Schwarzkopf, Joshua C. Rozell\",\"doi\":\"10.1002/rcs.2621\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Large language models (LLM) have unknown implications for medical research. This study assessed whether LLM-generated abstracts are distinguishable from human-written abstracts and to compare their perceived quality.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>The LLM ChatGPT was used to generate 20 arthroplasty abstracts (AI-generated) based on full-text manuscripts, which were compared to originally published abstracts (human-written). Six blinded orthopaedic surgeons rated abstracts on overall quality, communication, and confidence in the authorship source. Authorship-confidence scores were compared to a test value representing complete inability to discern authorship.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Modestly increased confidence in human authorship was observed for human-written abstracts compared with AI-generated abstracts (<i>p</i> = 0.028), though AI-generated abstract authorship-confidence scores were statistically consistent with inability to discern authorship (<i>p</i> = 0.999). Overall abstract quality was higher for human-written abstracts (<i>p</i> = 0.019).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>AI-generated abstracts' absolute authorship-confidence ratings demonstrated difficulty in discerning authorship but did not achieve the perceived quality of human-written abstracts. Caution is warranted in implementing LLMs into scientific writing.</p>\\n </section>\\n </div>\",\"PeriodicalId\":50311,\"journal\":{\"name\":\"International Journal of Medical Robotics and Computer Assisted Surgery\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Robotics and Computer Assisted Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rcs.2621\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Robotics and Computer Assisted Surgery","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rcs.2621","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source
Background
Large language models (LLM) have unknown implications for medical research. This study assessed whether LLM-generated abstracts are distinguishable from human-written abstracts and to compare their perceived quality.
Methods
The LLM ChatGPT was used to generate 20 arthroplasty abstracts (AI-generated) based on full-text manuscripts, which were compared to originally published abstracts (human-written). Six blinded orthopaedic surgeons rated abstracts on overall quality, communication, and confidence in the authorship source. Authorship-confidence scores were compared to a test value representing complete inability to discern authorship.
Results
Modestly increased confidence in human authorship was observed for human-written abstracts compared with AI-generated abstracts (p = 0.028), though AI-generated abstract authorship-confidence scores were statistically consistent with inability to discern authorship (p = 0.999). Overall abstract quality was higher for human-written abstracts (p = 0.019).
Conclusions
AI-generated abstracts' absolute authorship-confidence ratings demonstrated difficulty in discerning authorship but did not achieve the perceived quality of human-written abstracts. Caution is warranted in implementing LLMs into scientific writing.
期刊介绍:
The International Journal of Medical Robotics and Computer Assisted Surgery provides a cross-disciplinary platform for presenting the latest developments in robotics and computer assisted technologies for medical applications. The journal publishes cutting-edge papers and expert reviews, complemented by commentaries, correspondence and conference highlights that stimulate discussion and exchange of ideas. Areas of interest include robotic surgery aids and systems, operative planning tools, medical imaging and visualisation, simulation and navigation, virtual reality, intuitive command and control systems, haptics and sensor technologies. In addition to research and surgical planning studies, the journal welcomes papers detailing clinical trials and applications of computer-assisted workflows and robotic systems in neurosurgery, urology, paediatric, orthopaedic, craniofacial, cardiovascular, thoraco-abdominal, musculoskeletal and visceral surgery. Articles providing critical analysis of clinical trials, assessment of the benefits and risks of the application of these technologies, commenting on ease of use, or addressing surgical education and training issues are also encouraged. The journal aims to foster a community that encompasses medical practitioners, researchers, and engineers and computer scientists developing robotic systems and computational tools in academic and commercial environments, with the intention of promoting and developing these exciting areas of medical technology.