{"title":"biorecap: an R package for summarizing bioRxiv preprints with a local LLM","authors":"Stephen D. Turner","doi":"arxiv-2408.11707","DOIUrl":null,"url":null,"abstract":"The establishment of bioRxiv facilitated the rapid adoption of preprints in\nthe life sciences, accelerating the dissemination of new research findings.\nHowever, the sheer volume of preprints published daily can be overwhelming,\nmaking it challenging for researchers to stay updated on the latest\ndevelopments. Here, I introduce biorecap, an R package that retrieves and\nsummarizes bioRxiv preprints using a large language model (LLM) running locally\non nearly any commodity laptop. biorecap leverages the ollamar package to\ninterface with the Ollama server and API endpoints, allowing users to prompt\nany local LLM available through Ollama. The package follows tidyverse\nconventions, enabling users to pipe the output of one function as input to\nanother. Additionally, biorecap provides a single wrapper function that\ngenerates a timestamped CSV file and HTML report containing short summaries of\nrecent preprints published in user-configurable subject areas. By combining the\nstrengths of LLMs with the flexibility and security of local execution,\nbiorecap represents an advancement in the tools available for managing the\ninformation overload in modern scientific research. The biorecap R package is\navailable on GitHub at https://github.com/stephenturner/biorecap under an\nopen-source (MIT) license.","PeriodicalId":501219,"journal":{"name":"arXiv - QuanBio - Other Quantitative Biology","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Other Quantitative Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11707","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The establishment of bioRxiv facilitated the rapid adoption of preprints in
the life sciences, accelerating the dissemination of new research findings.
However, the sheer volume of preprints published daily can be overwhelming,
making it challenging for researchers to stay updated on the latest
developments. Here, I introduce biorecap, an R package that retrieves and
summarizes bioRxiv preprints using a large language model (LLM) running locally
on nearly any commodity laptop. biorecap leverages the ollamar package to
interface with the Ollama server and API endpoints, allowing users to prompt
any local LLM available through Ollama. The package follows tidyverse
conventions, enabling users to pipe the output of one function as input to
another. Additionally, biorecap provides a single wrapper function that
generates a timestamped CSV file and HTML report containing short summaries of
recent preprints published in user-configurable subject areas. By combining the
strengths of LLMs with the flexibility and security of local execution,
biorecap represents an advancement in the tools available for managing the
information overload in modern scientific research. The biorecap R package is
available on GitHub at https://github.com/stephenturner/biorecap under an
open-source (MIT) license.