Josué Godeme, Julia Hill, Stephen P Gaughan, Wade J Hirschbuhl, Amanda J Emerson, Christian Darabos, Carly A Bobak, Karen L Fortuna
{"title":"人工盟友:通过NLP模型开发中的数据增强来验证同伴支持工具的合成文本。","authors":"Josué Godeme, Julia Hill, Stephen P Gaughan, Wade J Hirschbuhl, Amanda J Emerson, Christian Darabos, Carly A Bobak, Karen L Fortuna","doi":"10.1142/9789819807024_0008","DOIUrl":null,"url":null,"abstract":"<p><p>This study investigates the potential of using synthetic text to augment training data for Natural Language Processing (NLP) models, specifically within the context of peer support tools. We surveyed 22 participants-13 professional peer supporters and 9 AI-proficient individuals-tasked with distinguishing between AI-generated and human-written sentences. Using signal detection theory and confidence-based metrics, we evaluated the accuracy and confidence levels of both groups. The results show no significant differences in rater agreement between the two groups (p = 0.116), with overall classification accuracy falling below chance levels (mean accuracy = 43.10%, p < 0.001). Both groups exhibited a tendency to misclassify low-fidelity sentences as AI-generated, with peer supporters showing a significant bias (p = 0.007). Further analysis revealed a significant negative correlation between errors and confidence among AI-proficient raters (r = -0.429, p < 0.001), suggesting that as their confidence increased, their error rates decreased. Our findings support the feasibility of using synthetic text to mimic human communication, with important implications for improving the fidelity of peer support interventions through NLP model development.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"94-108"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Allies: Validation of Synthetic Text for Peer Support Tools through Data Augmentation in NLP Model Development.\",\"authors\":\"Josué Godeme, Julia Hill, Stephen P Gaughan, Wade J Hirschbuhl, Amanda J Emerson, Christian Darabos, Carly A Bobak, Karen L Fortuna\",\"doi\":\"10.1142/9789819807024_0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study investigates the potential of using synthetic text to augment training data for Natural Language Processing (NLP) models, specifically within the context of peer support tools. We surveyed 22 participants-13 professional peer supporters and 9 AI-proficient individuals-tasked with distinguishing between AI-generated and human-written sentences. Using signal detection theory and confidence-based metrics, we evaluated the accuracy and confidence levels of both groups. The results show no significant differences in rater agreement between the two groups (p = 0.116), with overall classification accuracy falling below chance levels (mean accuracy = 43.10%, p < 0.001). Both groups exhibited a tendency to misclassify low-fidelity sentences as AI-generated, with peer supporters showing a significant bias (p = 0.007). Further analysis revealed a significant negative correlation between errors and confidence among AI-proficient raters (r = -0.429, p < 0.001), suggesting that as their confidence increased, their error rates decreased. Our findings support the feasibility of using synthetic text to mimic human communication, with important implications for improving the fidelity of peer support interventions through NLP model development.</p>\",\"PeriodicalId\":34954,\"journal\":{\"name\":\"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing\",\"volume\":\"30 \",\"pages\":\"94-108\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/9789819807024_0008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789819807024_0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
Artificial Allies: Validation of Synthetic Text for Peer Support Tools through Data Augmentation in NLP Model Development.
This study investigates the potential of using synthetic text to augment training data for Natural Language Processing (NLP) models, specifically within the context of peer support tools. We surveyed 22 participants-13 professional peer supporters and 9 AI-proficient individuals-tasked with distinguishing between AI-generated and human-written sentences. Using signal detection theory and confidence-based metrics, we evaluated the accuracy and confidence levels of both groups. The results show no significant differences in rater agreement between the two groups (p = 0.116), with overall classification accuracy falling below chance levels (mean accuracy = 43.10%, p < 0.001). Both groups exhibited a tendency to misclassify low-fidelity sentences as AI-generated, with peer supporters showing a significant bias (p = 0.007). Further analysis revealed a significant negative correlation between errors and confidence among AI-proficient raters (r = -0.429, p < 0.001), suggesting that as their confidence increased, their error rates decreased. Our findings support the feasibility of using synthetic text to mimic human communication, with important implications for improving the fidelity of peer support interventions through NLP model development.