Semi-Automated Coding for Qualitative Research: A User-Centered Inquiry and Initial Prototypes

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems Pub Date : 2018-04-21 DOI:10.1145/3173574.3173922

Megh Marathe, K. Toyama

{"title":"Semi-Automated Coding for Qualitative Research: A User-Centered Inquiry and Initial Prototypes","authors":"Megh Marathe, K. Toyama","doi":"10.1145/3173574.3173922","DOIUrl":null,"url":null,"abstract":"Qualitative researchers perform an important and painstaking data annotation process known as coding. However, much of the process can be tedious and repetitive, becoming prohibitive for large datasets. Could coding be partially automated, and should it be? To answer this question, we interviewed researchers and observed them code interview transcripts. We found that across disciplines, researchers follow several coding practices well-suited to automation. Further, researchers desire automation after having developed a codebook and coded a subset of data, particularly in extending their coding to unseen data. Researchers also require any assistive tool to be transparent about its recommendations. Based on our findings, we built prototypes to partially automate coding using simple natural language processing techniques. Our top-performing system generates coding that matches human coders on inter-rater reliability measures. We discuss implications for interface and algorithm design, meta-issues around automating qualitative research, and suggestions for future work.","PeriodicalId":20512,"journal":{"name":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3173574.3173922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 46

Abstract

Qualitative researchers perform an important and painstaking data annotation process known as coding. However, much of the process can be tedious and repetitive, becoming prohibitive for large datasets. Could coding be partially automated, and should it be? To answer this question, we interviewed researchers and observed them code interview transcripts. We found that across disciplines, researchers follow several coding practices well-suited to automation. Further, researchers desire automation after having developed a codebook and coded a subset of data, particularly in extending their coding to unseen data. Researchers also require any assistive tool to be transparent about its recommendations. Based on our findings, we built prototypes to partially automate coding using simple natural language processing techniques. Our top-performing system generates coding that matches human coders on inter-rater reliability measures. We discuss implications for interface and algorithm design, meta-issues around automating qualitative research, and suggestions for future work.

查看原文本刊更多论文

定性研究的半自动编码:以用户为中心的查询和初始原型

定性研究人员执行一个重要而艰苦的数据注释过程，即编码。然而，大部分过程可能是乏味和重复的，对于大型数据集来说是令人望而却步的。编码可以部分自动化吗?应该是这样吗?为了回答这个问题，我们采访了研究人员，并观察了他们对访谈记录的编码。我们发现，跨学科的研究人员遵循一些非常适合自动化的编码实践。此外，研究人员希望在开发出代码本并对数据子集进行编码后实现自动化，特别是将他们的编码扩展到看不见的数据。研究人员还要求任何辅助工具的建议都是透明的。基于我们的发现，我们构建了原型，使用简单的自然语言处理技术来部分自动化编码。我们性能最好的系统生成的代码与人类编码员在内部可靠性指标上相匹配。我们讨论了对界面和算法设计的影响，围绕自动化定性研究的元问题，以及对未来工作的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

自引率

0.00%

发文量