Ethan Bernstein, Anya Ramsamooj, Kelsey L Millar, Zachary C Lum
{"title":"Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis.","authors":"Ethan Bernstein, Anya Ramsamooj, Kelsey L Millar, Zachary C Lum","doi":"10.2196/68603","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Since the release of ChatGPT and other large language models (LLMs), there has been a significant increase in academic publications exploring their capabilities and implications across various fields, such as medicine, education, and technology.</p><p><strong>Objective: </strong>This study aims to identify the most influential academic works on LLMs published in the past year, categorize their research types and thematic focuses, within different professional fields. The study also evaluates the ability of artificial intelligence (AI) tools, such as ChatGPT, to accurately classify academic research.</p><p><strong>Methods: </strong>We conducted a bibliometric analysis using Clarivate's Web of Science (WOS) to extract the top 100 most cited papers on LLMs. Papers were manually categorized by field, journal, author, and research type. ChatGPT-4 was used to generate categorizations for the same papers, and its performance was compared to human classifications. We summarized the distribution of research fields and assessed the concordance between AI-generated and manual classifications.</p><p><strong>Results: </strong>Medicine emerged as the predominant field among the top 100 most cited papers, accounting for 43 (43%), followed by education 26 (26%) and technology 15 (15%). Medical literature primarily focused on clinical applications of LLMs, limitations of AI in health care, and the role of AI in medical education. In education, research was centered around ethical concerns and potential applications of AI for teaching and learning. ChatGPT demonstrated variable concordance with human reviewers, achieving an agreement rating of 47% for research types and 92% for fields of study.</p><p><strong>Conclusions: </strong>While LLMs such as ChatGPT exhibit considerable potential in aiding research categorization, human oversight remains essential to address issues such as hallucinations, outdated information, and biases in AI-generated outputs. This study highlights the transformative potential of LLMs across multiple sectors and emphasizes the importance of continuous ethical evaluation and iterative improvement of AI systems to maximize their benefits while minimizing risks.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e68603"},"PeriodicalIF":2.0000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12384689/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/68603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Since the release of ChatGPT and other large language models (LLMs), there has been a significant increase in academic publications exploring their capabilities and implications across various fields, such as medicine, education, and technology.
Objective: This study aims to identify the most influential academic works on LLMs published in the past year, categorize their research types and thematic focuses, within different professional fields. The study also evaluates the ability of artificial intelligence (AI) tools, such as ChatGPT, to accurately classify academic research.
Methods: We conducted a bibliometric analysis using Clarivate's Web of Science (WOS) to extract the top 100 most cited papers on LLMs. Papers were manually categorized by field, journal, author, and research type. ChatGPT-4 was used to generate categorizations for the same papers, and its performance was compared to human classifications. We summarized the distribution of research fields and assessed the concordance between AI-generated and manual classifications.
Results: Medicine emerged as the predominant field among the top 100 most cited papers, accounting for 43 (43%), followed by education 26 (26%) and technology 15 (15%). Medical literature primarily focused on clinical applications of LLMs, limitations of AI in health care, and the role of AI in medical education. In education, research was centered around ethical concerns and potential applications of AI for teaching and learning. ChatGPT demonstrated variable concordance with human reviewers, achieving an agreement rating of 47% for research types and 92% for fields of study.
Conclusions: While LLMs such as ChatGPT exhibit considerable potential in aiding research categorization, human oversight remains essential to address issues such as hallucinations, outdated information, and biases in AI-generated outputs. This study highlights the transformative potential of LLMs across multiple sectors and emphasizes the importance of continuous ethical evaluation and iterative improvement of AI systems to maximize their benefits while minimizing risks.
背景:自从ChatGPT和其他大型语言模型(llm)发布以来,探索它们在各个领域(如医学、教育和技术)的能力和含义的学术出版物显著增加。目的:本研究旨在识别近一年来在不同专业领域发表的最具影响力的法学硕士学术著作,并对其研究类型和主题重点进行分类。该研究还评估了ChatGPT等人工智能(AI)工具对学术研究进行准确分类的能力。方法:利用Clarivate的Web of Science (WOS)进行文献计量学分析,提取被引频次前100位的法学硕士论文。论文按领域、期刊、作者和研究类型手工分类。ChatGPT-4被用于为相同的论文生成分类,并将其性能与人类分类进行比较。我们总结了研究领域的分布,并评估了人工智能生成的分类与人工分类之间的一致性。结果:在前100篇被引论文中,医学领域占主导地位,占43篇(43%),其次是教育26篇(26%),技术15篇(15%)。医学文献主要关注法学硕士的临床应用、人工智能在医疗保健中的局限性以及人工智能在医学教育中的作用。在教育方面,研究主要围绕伦理问题和人工智能在教学和学习方面的潜在应用。ChatGPT显示了与人类审稿人的可变一致性,在研究类型和研究领域的一致性评分分别达到47%和92%。结论:虽然像ChatGPT这样的法学硕士在帮助研究分类方面表现出相当大的潜力,但人类的监督对于解决诸如幻觉、过时信息和人工智能生成输出中的偏见等问题仍然至关重要。本研究强调了法学硕士在多个领域的变革潜力,并强调了持续的道德评估和人工智能系统的迭代改进的重要性,以最大限度地提高其收益,同时最大限度地降低风险。