{"title":"Exploring the potential of ChatGPT in assessing L2 writing accuracy for research purposes","authors":"Adam Pfau , Charlene Polio , Yiran Xu","doi":"10.1016/j.rmal.2023.100083","DOIUrl":null,"url":null,"abstract":"<div><p>This study investigates ChatGPT's potential for measuring linguistic accuracy in second language writing for research purposes. We processed 100 L2 essays across five proficiency levels with ChatGPT-4 and manually coded for precision and recall with regard to ChatGPT's identification of errors. Our findings indicate a strong correlation (<em>ρ</em> = 0.97 using one method and .94 using another method) between ChatGPT's error detection and human coding, although this correlation diminishes with lower proficiency levels. While ChatGPT infrequently misidentifies errors, it often underestimates the total error count. The study also highlights ChatGPT's limitations, such as the issue of consistency, and provides guidelines for future research applications.</p></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"2 3","pages":"Article 100083"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766123000435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study investigates ChatGPT's potential for measuring linguistic accuracy in second language writing for research purposes. We processed 100 L2 essays across five proficiency levels with ChatGPT-4 and manually coded for precision and recall with regard to ChatGPT's identification of errors. Our findings indicate a strong correlation (ρ = 0.97 using one method and .94 using another method) between ChatGPT's error detection and human coding, although this correlation diminishes with lower proficiency levels. While ChatGPT infrequently misidentifies errors, it often underestimates the total error count. The study also highlights ChatGPT's limitations, such as the issue of consistency, and provides guidelines for future research applications.