Sina Mahdipour Saravani, Sadaf Ghaffari, Yanye Luther, J. Folkestad, Marcia Moraes
{"title":"Automated Code Extraction from Discussion Board Text Dataset","authors":"Sina Mahdipour Saravani, Sadaf Ghaffari, Yanye Luther, J. Folkestad, Marcia Moraes","doi":"10.48550/arXiv.2210.17495","DOIUrl":null,"url":null,"abstract":"This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.","PeriodicalId":252354,"journal":{"name":"International Conference on Quantitative Ethnography","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Quantitative Ethnography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.17495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.