Shuyue Jia, Subhrangshu Bit, Edward Searls, Meagan V Lauber, Pengrui Fan, William M Wang, Lindsey A Claus, Varuna H Jasodanand, Divya Veerapaneni, Rhoda Au, Vijaya B Kolachalama
{"title":"PodGPT: an audio-augmented large language model for research and education.","authors":"Shuyue Jia, Subhrangshu Bit, Edward Searls, Meagan V Lauber, Pengrui Fan, William M Wang, Lindsey A Claus, Varuna H Jasodanand, Divya Veerapaneni, Rhoda Au, Vijaya B Kolachalama","doi":"10.1038/s44385-025-00022-0","DOIUrl":null,"url":null,"abstract":"<p><p>The proliferation of scientific podcasts has generated an extensive repository of educational content, rich in specialized terminology, diverse topics, and expert dialogues. Here, we introduce a computational framework designed to enhance large language models by leveraging this informational content from publicly accessible audio podcasts across science, technology, engineering, mathematics, and medicine (STEMM). This dataset, comprising over 3700 hours of audio content, was transcribed to generate over 42 million text tokens. Our model, PodGPT, integrates this wealth of complex dialogue found in audio podcasts to improve understanding of natural language nuances, cultural contexts, as well as scientific and medical knowledge. PodGPT also employs retrieval augmented generation (RAG) on a vector database, providing real-time access to emerging scientific literature. Evaluated on multiple benchmarks, PodGPT demonstrated an average improvement of 1.82 percentage points over standard open-source benchmarks and 2.43 percentage points when augmented with evidence from the RAG pipeline. Moreover, it showcased an average improvement of 1.18 percentage points in its zero-shot multilingual transfer ability, effectively generalizing to different linguistic contexts. By harnessing the untapped potential of podcast content, PodGPT advances natural language processing and conversational AI, offering enhanced capabilities for STEMM research and education.</p>","PeriodicalId":520479,"journal":{"name":"NPJ biomedical innovations","volume":"2 1","pages":"26"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12234354/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ biomedical innovations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s44385-025-00022-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The proliferation of scientific podcasts has generated an extensive repository of educational content, rich in specialized terminology, diverse topics, and expert dialogues. Here, we introduce a computational framework designed to enhance large language models by leveraging this informational content from publicly accessible audio podcasts across science, technology, engineering, mathematics, and medicine (STEMM). This dataset, comprising over 3700 hours of audio content, was transcribed to generate over 42 million text tokens. Our model, PodGPT, integrates this wealth of complex dialogue found in audio podcasts to improve understanding of natural language nuances, cultural contexts, as well as scientific and medical knowledge. PodGPT also employs retrieval augmented generation (RAG) on a vector database, providing real-time access to emerging scientific literature. Evaluated on multiple benchmarks, PodGPT demonstrated an average improvement of 1.82 percentage points over standard open-source benchmarks and 2.43 percentage points when augmented with evidence from the RAG pipeline. Moreover, it showcased an average improvement of 1.18 percentage points in its zero-shot multilingual transfer ability, effectively generalizing to different linguistic contexts. By harnessing the untapped potential of podcast content, PodGPT advances natural language processing and conversational AI, offering enhanced capabilities for STEMM research and education.