Semi-Automated Coding for Qualitative Research: A User-Centered Inquiry and Initial Prototypes

Megh Marathe, K. Toyama
{"title":"Semi-Automated Coding for Qualitative Research: A User-Centered Inquiry and Initial Prototypes","authors":"Megh Marathe, K. Toyama","doi":"10.1145/3173574.3173922","DOIUrl":null,"url":null,"abstract":"Qualitative researchers perform an important and painstaking data annotation process known as coding. However, much of the process can be tedious and repetitive, becoming prohibitive for large datasets. Could coding be partially automated, and should it be? To answer this question, we interviewed researchers and observed them code interview transcripts. We found that across disciplines, researchers follow several coding practices well-suited to automation. Further, researchers desire automation after having developed a codebook and coded a subset of data, particularly in extending their coding to unseen data. Researchers also require any assistive tool to be transparent about its recommendations. Based on our findings, we built prototypes to partially automate coding using simple natural language processing techniques. Our top-performing system generates coding that matches human coders on inter-rater reliability measures. We discuss implications for interface and algorithm design, meta-issues around automating qualitative research, and suggestions for future work.","PeriodicalId":20512,"journal":{"name":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3173574.3173922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46

Abstract

Qualitative researchers perform an important and painstaking data annotation process known as coding. However, much of the process can be tedious and repetitive, becoming prohibitive for large datasets. Could coding be partially automated, and should it be? To answer this question, we interviewed researchers and observed them code interview transcripts. We found that across disciplines, researchers follow several coding practices well-suited to automation. Further, researchers desire automation after having developed a codebook and coded a subset of data, particularly in extending their coding to unseen data. Researchers also require any assistive tool to be transparent about its recommendations. Based on our findings, we built prototypes to partially automate coding using simple natural language processing techniques. Our top-performing system generates coding that matches human coders on inter-rater reliability measures. We discuss implications for interface and algorithm design, meta-issues around automating qualitative research, and suggestions for future work.
定性研究的半自动编码:以用户为中心的查询和初始原型
定性研究人员执行一个重要而艰苦的数据注释过程,即编码。然而,大部分过程可能是乏味和重复的,对于大型数据集来说是令人望而却步的。编码可以部分自动化吗?应该是这样吗?为了回答这个问题,我们采访了研究人员,并观察了他们对访谈记录的编码。我们发现,跨学科的研究人员遵循一些非常适合自动化的编码实践。此外,研究人员希望在开发出代码本并对数据子集进行编码后实现自动化,特别是将他们的编码扩展到看不见的数据。研究人员还要求任何辅助工具的建议都是透明的。基于我们的发现,我们构建了原型,使用简单的自然语言处理技术来部分自动化编码。我们性能最好的系统生成的代码与人类编码员在内部可靠性指标上相匹配。我们讨论了对界面和算法设计的影响,围绕自动化定性研究的元问题,以及对未来工作的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信