LLM-Based Response Generation for Korean Adolescents: A Study Using the NAVER Knowledge iN Q&A Dataset with RAG.

IF 2.1 Q3 MEDICAL INFORMATICS

Healthcare Informatics Research Pub Date : 2025-04-01 Epub Date: 2025-04-30 DOI:10.4258/hir.2025.31.2.136

Junseo Kim, Seok Jun Kim, Junseok Ahn, Suehyun Lee

{"title":"LLM-Based Response Generation for Korean Adolescents: A Study Using the NAVER Knowledge iN Q&A Dataset with RAG.","authors":"Junseo Kim, Seok Jun Kim, Junseok Ahn, Suehyun Lee","doi":"10.4258/hir.2025.31.2.136","DOIUrl":null,"url":null,"abstract":"Objectives: This research aimed to develop a retrieval-augmented generation (RAG) based large language model (LLM) system that offers personalized and reliable responses to a wide range of concerns raised by Korean adolescents. Our work focuses on building a culturally reflective dataset and on designing and validating the system's effectiveness by comparing the answer quality of RAG-based models with non-RAG models.Methods: Data were collected from the NAVER Knowledge iN platform, concentrating on posts that featured adolescents' questions and corresponding expert responses during the period 2014-2024. The dataset comprises 3,874 cases, categorized by key negative emotions and the primary sources of worry. The data were processed to remove irrelevant or redundant content and then classified into general and detailed causes. The RAG-based model employed FAISS for similarity-based retrieval of the top three reference cases and used GPT-4o mini for response generation. The responses generated with and without RAG were evaluated using several metrics.Results: RAG-based responses outperformed non-RAG responses across all evaluation metrics. Key findings indicate that RAG-based responses delivered more specific, empathetic, and actionable guidance, particularly when addressing complex emotional and situational concerns. The analysis revealed that family relationships, peer interactions, and academic stress are significant factors affecting adolescents' worries, with depression and stress frequently co-occurring.Conclusions: This study demonstrates the potential of RAG-based LLMs to address the diverse and culture-specific worries of Korean adolescents. By integrating external knowledge and offering personalized support, the proposed system provides a scalable approach to enhancing mental health interventions for adolescents. Future research should concentrate on expanding the dataset and improving multiturn conversational capabilities to deliver even more comprehensive support.","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 2","pages":"136-145"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12086440/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2025.31.2.136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/30 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: This research aimed to develop a retrieval-augmented generation (RAG) based large language model (LLM) system that offers personalized and reliable responses to a wide range of concerns raised by Korean adolescents. Our work focuses on building a culturally reflective dataset and on designing and validating the system's effectiveness by comparing the answer quality of RAG-based models with non-RAG models.

Methods: Data were collected from the NAVER Knowledge iN platform, concentrating on posts that featured adolescents' questions and corresponding expert responses during the period 2014-2024. The dataset comprises 3,874 cases, categorized by key negative emotions and the primary sources of worry. The data were processed to remove irrelevant or redundant content and then classified into general and detailed causes. The RAG-based model employed FAISS for similarity-based retrieval of the top three reference cases and used GPT-4o mini for response generation. The responses generated with and without RAG were evaluated using several metrics.

Results: RAG-based responses outperformed non-RAG responses across all evaluation metrics. Key findings indicate that RAG-based responses delivered more specific, empathetic, and actionable guidance, particularly when addressing complex emotional and situational concerns. The analysis revealed that family relationships, peer interactions, and academic stress are significant factors affecting adolescents' worries, with depression and stress frequently co-occurring.

Conclusions: This study demonstrates the potential of RAG-based LLMs to address the diverse and culture-specific worries of Korean adolescents. By integrating external knowledge and offering personalized support, the proposed system provides a scalable approach to enhancing mental health interventions for adolescents. Future research should concentrate on expanding the dataset and improving multiturn conversational capabilities to deliver even more comprehensive support.

Abstract Image

查看原文本刊更多论文

基于法学硕士的韩国青少年反应生成：基于NAVER知识在RAG问答数据集中的研究。

目的：本研究旨在开发一个基于检索增强生成（RAG）的大语言模型（LLM）系统，该系统为韩国青少年提出的广泛关注提供个性化和可靠的响应。我们的工作重点是建立一个文化反射数据集，并通过比较基于rag的模型与非rag模型的回答质量来设计和验证系统的有效性。方法：从NAVER Knowledge iN平台收集数据，集中收集2014-2024年期间青少年问题和相应专家回答的帖子。该数据集包括3874个案例，按主要负面情绪和担忧的主要来源进行分类。对数据进行处理，去除不相关或冗余的内容，然后将其分为一般原因和详细原因。基于rag的模型采用FAISS进行基于相似性的前三个参考案例检索，并使用gpt - 40mini进行响应生成。使用RAG和不使用RAG生成的响应使用几个指标进行评估。结果：基于rag的反应在所有评估指标上都优于非rag反应。主要研究结果表明，基于rag的响应提供了更具体、更有同理心和更可行的指导，特别是在处理复杂的情感和情境问题时。分析发现，家庭关系、同伴交往和学业压力是影响青少年焦虑的重要因素，抑郁和压力经常共存。结论：本研究证明了基于rag的法学硕士在解决韩国青少年的多样性和文化特异性担忧方面的潜力。通过整合外部知识和提供个性化支持，该系统为加强青少年心理健康干预提供了一种可扩展的方法。未来的研究应该集中在扩展数据集和改进多回合会话能力，以提供更全面的支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Healthcare Informatics Research MEDICAL INFORMATICS-

CiteScore

4.90

自引率

6.90%

发文量