Will Yeadon, Elise Agra, Oto-Obong Inyang, Paul Mackay, Arin Mizouri
{"title":"通过物理论文评估学术写作中的人工智能和人类作者质量","authors":"Will Yeadon, Elise Agra, Oto-Obong Inyang, Paul Mackay, Arin Mizouri","doi":"10.1088/1361-6404/ad669d","DOIUrl":null,"url":null,"abstract":"This study aims to compare the academic writing quality and detectability of authorship between human and AI-generated texts by evaluating <italic toggle=\"yes\">n</italic> = 300 short-form physics essay submissions, equally divided between student work submitted before the introduction of ChatGPT and those generated by OpenAI’s GPT-4. In blinded evaluations conducted by five independent markers who were unaware of the origin of the essays, we observed no statistically significant differences in scores between essays authored by humans and those produced by AI (<italic toggle=\"yes\">p</italic>-value = 0.107, <italic toggle=\"yes\">α</italic> = 0.05). Additionally, when the markers subsequently attempted to identify the authorship of the essays on a 4-point Likert scale—from ‘Definitely AI’ to ‘Definitely Human’—their performance was only marginally better than random chance. This outcome not only underscores the convergence of AI and human authorship quality but also highlights the difficulty of discerning AI-generated content solely through human judgment. Furthermore, the effectiveness of five commercially available software tools for identifying essay authorship was evaluated. Among these, ZeroGPT was the most accurate, achieving a 98% accuracy rate and a precision score of 1.0 when its classifications were reduced to binary outcomes. This result is a source of potential optimism for maintaining assessment integrity. Finally, we propose that texts with ≤50% AI-generated content should be considered the upper limit for classification as human-authored, a boundary inclusive of a future with ubiquitous AI assistance whilst also respecting human-authorship.","PeriodicalId":50480,"journal":{"name":"European Journal of Physics","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating AI and human authorship quality in academic writing through physics essays\",\"authors\":\"Will Yeadon, Elise Agra, Oto-Obong Inyang, Paul Mackay, Arin Mizouri\",\"doi\":\"10.1088/1361-6404/ad669d\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aims to compare the academic writing quality and detectability of authorship between human and AI-generated texts by evaluating <italic toggle=\\\"yes\\\">n</italic> = 300 short-form physics essay submissions, equally divided between student work submitted before the introduction of ChatGPT and those generated by OpenAI’s GPT-4. In blinded evaluations conducted by five independent markers who were unaware of the origin of the essays, we observed no statistically significant differences in scores between essays authored by humans and those produced by AI (<italic toggle=\\\"yes\\\">p</italic>-value = 0.107, <italic toggle=\\\"yes\\\">α</italic> = 0.05). Additionally, when the markers subsequently attempted to identify the authorship of the essays on a 4-point Likert scale—from ‘Definitely AI’ to ‘Definitely Human’—their performance was only marginally better than random chance. This outcome not only underscores the convergence of AI and human authorship quality but also highlights the difficulty of discerning AI-generated content solely through human judgment. Furthermore, the effectiveness of five commercially available software tools for identifying essay authorship was evaluated. Among these, ZeroGPT was the most accurate, achieving a 98% accuracy rate and a precision score of 1.0 when its classifications were reduced to binary outcomes. This result is a source of potential optimism for maintaining assessment integrity. Finally, we propose that texts with ≤50% AI-generated content should be considered the upper limit for classification as human-authored, a boundary inclusive of a future with ubiquitous AI assistance whilst also respecting human-authorship.\",\"PeriodicalId\":50480,\"journal\":{\"name\":\"European Journal of Physics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Physics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1088/1361-6404/ad669d\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Physics","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/1361-6404/ad669d","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
Evaluating AI and human authorship quality in academic writing through physics essays
This study aims to compare the academic writing quality and detectability of authorship between human and AI-generated texts by evaluating n = 300 short-form physics essay submissions, equally divided between student work submitted before the introduction of ChatGPT and those generated by OpenAI’s GPT-4. In blinded evaluations conducted by five independent markers who were unaware of the origin of the essays, we observed no statistically significant differences in scores between essays authored by humans and those produced by AI (p-value = 0.107, α = 0.05). Additionally, when the markers subsequently attempted to identify the authorship of the essays on a 4-point Likert scale—from ‘Definitely AI’ to ‘Definitely Human’—their performance was only marginally better than random chance. This outcome not only underscores the convergence of AI and human authorship quality but also highlights the difficulty of discerning AI-generated content solely through human judgment. Furthermore, the effectiveness of five commercially available software tools for identifying essay authorship was evaluated. Among these, ZeroGPT was the most accurate, achieving a 98% accuracy rate and a precision score of 1.0 when its classifications were reduced to binary outcomes. This result is a source of potential optimism for maintaining assessment integrity. Finally, we propose that texts with ≤50% AI-generated content should be considered the upper limit for classification as human-authored, a boundary inclusive of a future with ubiquitous AI assistance whilst also respecting human-authorship.
期刊介绍:
European Journal of Physics is a journal of the European Physical Society and its primary mission is to assist in maintaining and improving the standard of taught physics in universities and other institutes of higher education.
Authors submitting articles must indicate the usefulness of their material to physics education and make clear the level of readership (undergraduate or graduate) for which the article is intended. Submissions that omit this information or which, in the publisher''s opinion, do not contribute to the above mission will not be considered for publication.
To this end, we welcome articles that provide original insights and aim to enhance learning in one or more areas of physics. They should normally include at least one of the following:
Explanations of how contemporary research can inform the understanding of physics at university level: for example, a survey of a research field at a level accessible to students, explaining how it illustrates some general principles.
Original insights into the derivation of results. These should be of some general interest, consisting of more than corrections to textbooks.
Descriptions of novel laboratory exercises illustrating new techniques of general interest. Those based on relatively inexpensive equipment are especially welcome.
Articles of a scholarly or reflective nature that are aimed to be of interest to, and at a level appropriate for, physics students or recent graduates.
Descriptions of successful and original student projects, experimental, theoretical or computational.
Discussions of the history, philosophy and epistemology of physics, at a level accessible to physics students and teachers.
Reports of new developments in physics curricula and the techniques for teaching physics.
Physics Education Research reports: articles that provide original experimental and/or theoretical research contributions that directly relate to the teaching and learning of university-level physics.