Leveraging AI-generated synthetic data to train natural language processing models for qualitative feedback analysis

IF 3.4 2区工程技术 Q1 EDUCATION & EDUCATIONAL RESEARCH

Journal of Engineering Education Pub Date : 2025-08-31 DOI:10.1002/jee.70033

Stephanie Fuchs, Alexandra Werth, Cristóbal Méndez, Jonathan Butcher

{"title":"Leveraging AI-generated synthetic data to train natural language processing models for qualitative feedback analysis","authors":"Stephanie Fuchs, Alexandra Werth, Cristóbal Méndez, Jonathan Butcher","doi":"10.1002/jee.70033","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>High-quality feedback is crucial for academic success, driving student motivation and engagement while research explores effective delivery and student interactions. Advances in artificial intelligence (AI), particularly natural language processing (NLP), offer innovative methods for analyzing complex qualitative data such as feedback interactions.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>We developed a framework to train sentence transformers using generative AI–created synthetic data to categorize student-feedback interactions in engineering studios. We compared traditional thematic analysis with modern methods to evaluate the realism of synthetic datasets and their effectiveness in training NLP models by exploring how generative AI can aid qualitative coding.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We deidentified and transcribed eight audio recordings from engineering studios. Synthetic feedback transcripts were generated using three locally hosted large language models: Llama 3.1, Gemma 2.0, and Mistral NeMo, adjusting parameters to produce datasets mimicking the real transcripts. We assessed the quality of synthetic transcripts using our framework and used a sentence transformer model (trained on both real and synthetic data) to compare changes in the model's percent accuracy when qualitatively coding feedback interactions.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Synthetic data improved the NLP model's performance in classifying feedback interactions, boosting the average accuracy from 68.4% to 81% with Llama 3.1. Although incorporating synthetic data improved classification, all models produced transcripts that occasionally included extraneous details and failed to capture instructor-dominant discourse.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Synthetic data offers an opportunity to expand qualitative research, particularly in contexts where real data for NLP training is limited or hard to obtain; however, transparency in its use is paramount to maintain research integrity.</p>\n </section>\n </div>","PeriodicalId":50206,"journal":{"name":"Journal of Engineering Education","volume":"114 4","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Education","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jee.70033","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Background

High-quality feedback is crucial for academic success, driving student motivation and engagement while research explores effective delivery and student interactions. Advances in artificial intelligence (AI), particularly natural language processing (NLP), offer innovative methods for analyzing complex qualitative data such as feedback interactions.

Purpose

We developed a framework to train sentence transformers using generative AI–created synthetic data to categorize student-feedback interactions in engineering studios. We compared traditional thematic analysis with modern methods to evaluate the realism of synthetic datasets and their effectiveness in training NLP models by exploring how generative AI can aid qualitative coding.

Methods

We deidentified and transcribed eight audio recordings from engineering studios. Synthetic feedback transcripts were generated using three locally hosted large language models: Llama 3.1, Gemma 2.0, and Mistral NeMo, adjusting parameters to produce datasets mimicking the real transcripts. We assessed the quality of synthetic transcripts using our framework and used a sentence transformer model (trained on both real and synthetic data) to compare changes in the model's percent accuracy when qualitatively coding feedback interactions.

Results

Synthetic data improved the NLP model's performance in classifying feedback interactions, boosting the average accuracy from 68.4% to 81% with Llama 3.1. Although incorporating synthetic data improved classification, all models produced transcripts that occasionally included extraneous details and failed to capture instructor-dominant discourse.

Conclusions

Synthetic data offers an opportunity to expand qualitative research, particularly in contexts where real data for NLP training is limited or hard to obtain; however, transparency in its use is paramount to maintain research integrity.

查看原文本刊更多论文

利用人工智能生成的合成数据训练自然语言处理模型进行定性反馈分析

高质量的反馈对学术成功至关重要，在研究探索有效的交付和学生互动的同时，它能推动学生的积极性和参与度。人工智能（AI）的进步，特别是自然语言处理（NLP），为分析复杂的定性数据（如反馈交互）提供了创新方法。我们开发了一个框架，使用生成式人工智能创建的合成数据来训练句子转换器，以对工程工作室的学生反馈交互进行分类。我们将传统的主题分析与现代方法进行比较，通过探索生成式人工智能如何帮助定性编码，评估合成数据集的真实性及其在训练NLP模型中的有效性。方法对来自工程工作室的8段录音进行鉴定和转录。使用三个本地托管的大型语言模型：Llama 3.1、Gemma 2.0和Mistral NeMo生成合成反馈转录本，调整参数以生成模拟真实转录本的数据集。我们使用我们的框架评估合成转录本的质量，并使用句子转换模型（在真实和合成数据上进行训练）来比较定性编码反馈交互时模型百分比准确性的变化。结果综合数据提高了NLP模型在反馈交互分类方面的性能，将Llama 3.1的平均准确率从68.4%提高到81%。虽然合并合成数据改进了分类，但所有模型产生的转录本偶尔会包含无关的细节，并且无法捕捉到教师主导的话语。综合数据为扩展定性研究提供了机会，特别是在NLP训练的真实数据有限或难以获得的情况下；然而，其使用的透明度对于保持研究的完整性至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊