PodGPT: an audio-augmented large language model for research and education.

NPJ biomedical innovations Pub Date : 2025-01-01 Epub Date: 2025-07-07 DOI:10.1038/s44385-025-00022-0
Shuyue Jia, Subhrangshu Bit, Edward Searls, Meagan V Lauber, Pengrui Fan, William M Wang, Lindsey A Claus, Varuna H Jasodanand, Divya Veerapaneni, Rhoda Au, Vijaya B Kolachalama
{"title":"PodGPT: an audio-augmented large language model for research and education.","authors":"Shuyue Jia, Subhrangshu Bit, Edward Searls, Meagan V Lauber, Pengrui Fan, William M Wang, Lindsey A Claus, Varuna H Jasodanand, Divya Veerapaneni, Rhoda Au, Vijaya B Kolachalama","doi":"10.1038/s44385-025-00022-0","DOIUrl":null,"url":null,"abstract":"<p><p>The proliferation of scientific podcasts has generated an extensive repository of educational content, rich in specialized terminology, diverse topics, and expert dialogues. Here, we introduce a computational framework designed to enhance large language models by leveraging this informational content from publicly accessible audio podcasts across science, technology, engineering, mathematics, and medicine (STEMM). This dataset, comprising over 3700 hours of audio content, was transcribed to generate over 42 million text tokens. Our model, PodGPT, integrates this wealth of complex dialogue found in audio podcasts to improve understanding of natural language nuances, cultural contexts, as well as scientific and medical knowledge. PodGPT also employs retrieval augmented generation (RAG) on a vector database, providing real-time access to emerging scientific literature. Evaluated on multiple benchmarks, PodGPT demonstrated an average improvement of 1.82 percentage points over standard open-source benchmarks and 2.43 percentage points when augmented with evidence from the RAG pipeline. Moreover, it showcased an average improvement of 1.18 percentage points in its zero-shot multilingual transfer ability, effectively generalizing to different linguistic contexts. By harnessing the untapped potential of podcast content, PodGPT advances natural language processing and conversational AI, offering enhanced capabilities for STEMM research and education.</p>","PeriodicalId":520479,"journal":{"name":"NPJ biomedical innovations","volume":"2 1","pages":"26"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12234354/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ biomedical innovations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s44385-025-00022-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The proliferation of scientific podcasts has generated an extensive repository of educational content, rich in specialized terminology, diverse topics, and expert dialogues. Here, we introduce a computational framework designed to enhance large language models by leveraging this informational content from publicly accessible audio podcasts across science, technology, engineering, mathematics, and medicine (STEMM). This dataset, comprising over 3700 hours of audio content, was transcribed to generate over 42 million text tokens. Our model, PodGPT, integrates this wealth of complex dialogue found in audio podcasts to improve understanding of natural language nuances, cultural contexts, as well as scientific and medical knowledge. PodGPT also employs retrieval augmented generation (RAG) on a vector database, providing real-time access to emerging scientific literature. Evaluated on multiple benchmarks, PodGPT demonstrated an average improvement of 1.82 percentage points over standard open-source benchmarks and 2.43 percentage points when augmented with evidence from the RAG pipeline. Moreover, it showcased an average improvement of 1.18 percentage points in its zero-shot multilingual transfer ability, effectively generalizing to different linguistic contexts. By harnessing the untapped potential of podcast content, PodGPT advances natural language processing and conversational AI, offering enhanced capabilities for STEMM research and education.

PodGPT:用于研究和教育的音频增强大型语言模型。
科学播客的激增产生了广泛的教育内容库,其中包含丰富的专业术语、多样化的主题和专家对话。在这里,我们介绍了一个计算框架,旨在通过利用科学、技术、工程、数学和医学(STEMM)中公开可访问的音频播客中的信息内容来增强大型语言模型。该数据集包含超过3700小时的音频内容,经过转录生成超过4200万个文本令牌。我们的模型PodGPT整合了音频播客中丰富的复杂对话,以提高对自然语言细微差别、文化背景以及科学和医学知识的理解。PodGPT还在矢量数据库上使用检索增强生成(RAG),提供对新兴科学文献的实时访问。在多个基准测试中评估,PodGPT比标准的开源基准测试平均提高了1.82个百分点,当与RAG管道的证据增强时,平均提高了2.43个百分点。此外,它的零概率多语言迁移能力平均提高了1.18个百分点,有效地概括了不同的语言语境。通过利用播客内容的未开发潜力,PodGPT推进了自然语言处理和会话人工智能,为STEMM研究和教育提供了增强的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信