比较人类编码与两种自然语言处理算法对Duchenne肌肉营养不良患者愿望的影响

C. Schwartz, Roland B. Stark, Elijah Biletch, Richard B. B. Stuart
{"title":"比较人类编码与两种自然语言处理算法对Duchenne肌肉营养不良患者愿望的影响","authors":"C. Schwartz, Roland B. Stark, Elijah Biletch, Richard B. B. Stuart","doi":"10.2458/jmmss.5397","DOIUrl":null,"url":null,"abstract":"Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study participants who have not been broadly studied. However, qualitative data require time and effort to train raters to achieve validity and reliability. This study compares recent advances in Natural Language Processing (NLP) models with human coding. This web-based study (N=1,253; 3,046 free-text entries, averaging 64 characters per entry) included people with Duchenne Muscular Dystrophy (DMD), their siblings, and a representative comparison group. Human raters (n=6) were trained over multiple sessions in content analysis as per a comprehensive codebook. Three prompts addressed distinct aspects of participants’ aspirations. Unsupervised NLP was implemented using Latent Dirichlet Allocation (LDA), which extracts latent topics across all the free-text entries. Supervised NLP was done using a Bidirectional Encoder Representations from Transformers (BERT) model, which requires training the algorithm to recognize relevant human-coded themes across free-text entries. We compared the human-, LDA-, and BERT-coded themes. Study sample contained 286 people with DMD, 355 DMD siblings, and 997 comparison participants, age 8-69. Human coders generated 95 codes across the three prompts and had an average inter-rater reliability (Fleiss’s kappa) of 0.77, with minimal rater-effect (pseudo R2=4%). Compared to human coders, LDA does not yield easily interpretable themes. BERT correctly classified only 61-70% of the validation set. LDA and BERT required technical expertise to program and took approximately 1.15 minutes per open-text entry, compared to 1.18 minutes for human raters including training time. LDA and BERT provide potentially viable approaches to analyzing large-scale qualitative data, but both have limitations. When text entries are short, LDA yields latent topics that are hard to interpret. BERT accurately identified only about two thirds of new statements. Humans provided reliable and cost-effective coding in the web-based context. The upfront training enables BERT to process enormous quantities of text data in future work, which should examine NLP’s predictive accuracy given different quantities of training data.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy\",\"authors\":\"C. Schwartz, Roland B. Stark, Elijah Biletch, Richard B. B. Stuart\",\"doi\":\"10.2458/jmmss.5397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study participants who have not been broadly studied. However, qualitative data require time and effort to train raters to achieve validity and reliability. This study compares recent advances in Natural Language Processing (NLP) models with human coding. This web-based study (N=1,253; 3,046 free-text entries, averaging 64 characters per entry) included people with Duchenne Muscular Dystrophy (DMD), their siblings, and a representative comparison group. Human raters (n=6) were trained over multiple sessions in content analysis as per a comprehensive codebook. Three prompts addressed distinct aspects of participants’ aspirations. Unsupervised NLP was implemented using Latent Dirichlet Allocation (LDA), which extracts latent topics across all the free-text entries. Supervised NLP was done using a Bidirectional Encoder Representations from Transformers (BERT) model, which requires training the algorithm to recognize relevant human-coded themes across free-text entries. We compared the human-, LDA-, and BERT-coded themes. Study sample contained 286 people with DMD, 355 DMD siblings, and 997 comparison participants, age 8-69. Human coders generated 95 codes across the three prompts and had an average inter-rater reliability (Fleiss’s kappa) of 0.77, with minimal rater-effect (pseudo R2=4%). Compared to human coders, LDA does not yield easily interpretable themes. BERT correctly classified only 61-70% of the validation set. LDA and BERT required technical expertise to program and took approximately 1.15 minutes per open-text entry, compared to 1.18 minutes for human raters including training time. LDA and BERT provide potentially viable approaches to analyzing large-scale qualitative data, but both have limitations. When text entries are short, LDA yields latent topics that are hard to interpret. BERT accurately identified only about two thirds of new statements. Humans provided reliable and cost-effective coding in the web-based context. The upfront training enables BERT to process enormous quantities of text data in future work, which should examine NLP’s predictive accuracy given different quantities of training data.\",\"PeriodicalId\":90602,\"journal\":{\"name\":\"Journal of methods and measurement in the social sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of methods and measurement in the social sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2458/jmmss.5397\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of methods and measurement in the social sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2458/jmmss.5397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

定性方法可以增强我们对尚未被很好地描绘的结构的理解,并能够对尚未被广泛研究的研究参与者的经验进行细致入微的描述。然而,定性数据需要时间和精力来训练评分员以达到效度和信度。本研究比较了自然语言处理(NLP)模型与人类编码的最新进展。这项基于网络的研究(N=1,253;3,046个自由文本条目,平均每个条目64个字符)包括患有杜氏肌营养不良症(DMD)的人,他们的兄弟姐妹和一个有代表性的对照组。人类评分员(n=6)在内容分析的多个会话中按照综合代码本进行训练。三个提示针对参与者愿望的不同方面。使用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)实现无监督自然语言处理,该方法从所有自由文本条目中提取潜在主题。有监督的NLP使用变形金刚的双向编码器表示(BERT)模型完成,这需要训练算法在自由文本条目中识别相关的人工编码主题。我们比较了人类编码、LDA编码和bert编码的主题。研究样本包括286名DMD患者,355名DMD兄弟姐妹和997名年龄在8-69岁之间的对照参与者。人类编码员在三个提示中生成了95个代码,平均评分者间可靠性(Fleiss的kappa)为0.77,评分者效应最小(伪R2=4%)。与人类程序员相比,LDA不能产生容易解释的主题。BERT只正确分类了61-70%的验证集。LDA和BERT需要技术专业知识来编程,并且每个开放文本条目大约需要1.15分钟,相比之下,人类评分者需要1.18分钟(包括训练时间)。LDA和BERT为分析大规模定性数据提供了潜在的可行方法,但两者都有局限性。当文本条目很短时,LDA产生难以解释的潜在主题。BERT只准确地识别了大约三分之二的新语句。人类在基于web的环境中提供可靠且经济的编码。预先的训练使BERT能够在未来的工作中处理大量的文本数据,这应该在给定不同数量的训练数据的情况下检查NLP的预测准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy
Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study participants who have not been broadly studied. However, qualitative data require time and effort to train raters to achieve validity and reliability. This study compares recent advances in Natural Language Processing (NLP) models with human coding. This web-based study (N=1,253; 3,046 free-text entries, averaging 64 characters per entry) included people with Duchenne Muscular Dystrophy (DMD), their siblings, and a representative comparison group. Human raters (n=6) were trained over multiple sessions in content analysis as per a comprehensive codebook. Three prompts addressed distinct aspects of participants’ aspirations. Unsupervised NLP was implemented using Latent Dirichlet Allocation (LDA), which extracts latent topics across all the free-text entries. Supervised NLP was done using a Bidirectional Encoder Representations from Transformers (BERT) model, which requires training the algorithm to recognize relevant human-coded themes across free-text entries. We compared the human-, LDA-, and BERT-coded themes. Study sample contained 286 people with DMD, 355 DMD siblings, and 997 comparison participants, age 8-69. Human coders generated 95 codes across the three prompts and had an average inter-rater reliability (Fleiss’s kappa) of 0.77, with minimal rater-effect (pseudo R2=4%). Compared to human coders, LDA does not yield easily interpretable themes. BERT correctly classified only 61-70% of the validation set. LDA and BERT required technical expertise to program and took approximately 1.15 minutes per open-text entry, compared to 1.18 minutes for human raters including training time. LDA and BERT provide potentially viable approaches to analyzing large-scale qualitative data, but both have limitations. When text entries are short, LDA yields latent topics that are hard to interpret. BERT accurately identified only about two thirds of new statements. Humans provided reliable and cost-effective coding in the web-based context. The upfront training enables BERT to process enormous quantities of text data in future work, which should examine NLP’s predictive accuracy given different quantities of training data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
26 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信