{"title":"Semi-Automated Coding for Qualitative Research: A User-Centered Inquiry and Initial Prototypes","authors":"Megh Marathe, K. Toyama","doi":"10.1145/3173574.3173922","DOIUrl":null,"url":null,"abstract":"Qualitative researchers perform an important and painstaking data annotation process known as coding. However, much of the process can be tedious and repetitive, becoming prohibitive for large datasets. Could coding be partially automated, and should it be? To answer this question, we interviewed researchers and observed them code interview transcripts. We found that across disciplines, researchers follow several coding practices well-suited to automation. Further, researchers desire automation after having developed a codebook and coded a subset of data, particularly in extending their coding to unseen data. Researchers also require any assistive tool to be transparent about its recommendations. Based on our findings, we built prototypes to partially automate coding using simple natural language processing techniques. Our top-performing system generates coding that matches human coders on inter-rater reliability measures. We discuss implications for interface and algorithm design, meta-issues around automating qualitative research, and suggestions for future work.","PeriodicalId":20512,"journal":{"name":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3173574.3173922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46
Abstract
Qualitative researchers perform an important and painstaking data annotation process known as coding. However, much of the process can be tedious and repetitive, becoming prohibitive for large datasets. Could coding be partially automated, and should it be? To answer this question, we interviewed researchers and observed them code interview transcripts. We found that across disciplines, researchers follow several coding practices well-suited to automation. Further, researchers desire automation after having developed a codebook and coded a subset of data, particularly in extending their coding to unseen data. Researchers also require any assistive tool to be transparent about its recommendations. Based on our findings, we built prototypes to partially automate coding using simple natural language processing techniques. Our top-performing system generates coding that matches human coders on inter-rater reliability measures. We discuss implications for interface and algorithm design, meta-issues around automating qualitative research, and suggestions for future work.