大语言模型增强练习检索,促进个性化语言学习

Austin Xu, Will Monroe, K. Bicknell
{"title":"大语言模型增强练习检索,促进个性化语言学习","authors":"Austin Xu, Will Monroe, K. Bicknell","doi":"10.1145/3636555.3636883","DOIUrl":null,"url":null,"abstract":"We study the problem of zero-shot exercise retrieval in the context of online language learning, to give learners the ability to explicitly request personalized exercises via natural language. Using real-world data collected from language learners, we observe that vector similarity approaches poorly capture the relationship between exercise content and the language that learners use to express what they want to learn. This semantic gap between queries and content dramatically reduces the effectiveness of general-purpose retrieval models pretrained on large scale information retrieval datasets like MS MARCO. We leverage the generative capabilities of large language models to bridge the gap by synthesizing hypothetical exercises based on the learner's input, which are then used to search for relevant exercises. Our approach, which we call mHyER, overcomes three challenges: (1) lack of relevance labels for training, (2) unrestricted learner input content, and (3) low semantic similarity between input and retrieval candidates. mHyER outperforms several strong baselines on two novel benchmarks created from crowdsourced data and publicly available data.","PeriodicalId":162301,"journal":{"name":"International Conference on Learning Analytics and Knowledge","volume":"108 ","pages":"284-294"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Large Language Model Augmented Exercise Retrieval for Personalized Language Learning\",\"authors\":\"Austin Xu, Will Monroe, K. Bicknell\",\"doi\":\"10.1145/3636555.3636883\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of zero-shot exercise retrieval in the context of online language learning, to give learners the ability to explicitly request personalized exercises via natural language. Using real-world data collected from language learners, we observe that vector similarity approaches poorly capture the relationship between exercise content and the language that learners use to express what they want to learn. This semantic gap between queries and content dramatically reduces the effectiveness of general-purpose retrieval models pretrained on large scale information retrieval datasets like MS MARCO. We leverage the generative capabilities of large language models to bridge the gap by synthesizing hypothetical exercises based on the learner's input, which are then used to search for relevant exercises. Our approach, which we call mHyER, overcomes three challenges: (1) lack of relevance labels for training, (2) unrestricted learner input content, and (3) low semantic similarity between input and retrieval candidates. mHyER outperforms several strong baselines on two novel benchmarks created from crowdsourced data and publicly available data.\",\"PeriodicalId\":162301,\"journal\":{\"name\":\"International Conference on Learning Analytics and Knowledge\",\"volume\":\"108 \",\"pages\":\"284-294\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Learning Analytics and Knowledge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3636555.3636883\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Learning Analytics and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3636555.3636883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

我们研究了在线语言学习中的零点练习检索问题,使学习者能够通过自然语言明确要求个性化练习。利用从语言学习者那里收集到的真实世界数据,我们发现向量相似性方法不能很好地捕捉练习内容与学习者用来表达他们想要学习的内容的语言之间的关系。查询与内容之间的这种语义差距大大降低了在 MS MARCO 等大型信息检索数据集上预先训练的通用检索模型的有效性。我们利用大型语言模型的生成能力来弥补这一差距,根据学习者的输入合成假设练习,然后用于搜索相关练习。我们的方法被称为 mHyER,它克服了三个挑战:(1) 缺乏用于训练的相关性标签;(2) 学习者输入内容不受限制;(3) 输入与检索候选内容之间的语义相似性较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Large Language Model Augmented Exercise Retrieval for Personalized Language Learning
We study the problem of zero-shot exercise retrieval in the context of online language learning, to give learners the ability to explicitly request personalized exercises via natural language. Using real-world data collected from language learners, we observe that vector similarity approaches poorly capture the relationship between exercise content and the language that learners use to express what they want to learn. This semantic gap between queries and content dramatically reduces the effectiveness of general-purpose retrieval models pretrained on large scale information retrieval datasets like MS MARCO. We leverage the generative capabilities of large language models to bridge the gap by synthesizing hypothetical exercises based on the learner's input, which are then used to search for relevant exercises. Our approach, which we call mHyER, overcomes three challenges: (1) lack of relevance labels for training, (2) unrestricted learner input content, and (3) low semantic similarity between input and retrieval candidates. mHyER outperforms several strong baselines on two novel benchmarks created from crowdsourced data and publicly available data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信