SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing

Devam Mondal, Atharva Inamdar
{"title":"SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing","authors":"Devam Mondal, Atharva Inamdar","doi":"arxiv-2407.03381","DOIUrl":null,"url":null,"abstract":"RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq,\nare critical tools for the biologist looking to analyze the genetic\nactivity/transcriptome of a tissue or cell during an experimental procedure.\nPlatforms like Illumina's next-generation sequencing (NGS) are used to produce\nthe raw data for this experimental procedure. This raw FASTQ data must then be\nprepared via a complex series of data manipulations by bioinformaticians. This\nprocess currently takes place on an unwieldy textual user interface like a\nterminal/command line that requires the user to install and import multiple\nprogram packages, preventing the untrained biologist from initiating data\nanalysis. Open-source platforms like Galaxy have produced a more user-friendly\npipeline, yet the visual interface remains cluttered and highly technical,\nremaining uninviting for the natural scientist. To address this, SeqMate is a\nuser-friendly tool that allows for one-click analytics by utilizing the power\nof a large language model (LLM) to automate both data preparation and analysis\n(differential expression, trajectory analysis, etc). Furthermore, by utilizing\nthe power of generative AI, SeqMate is also capable of analyzing such findings\nand producing written reports of upregulated/downregulated/user-prompted genes\nwith sources cited from known repositories like PubMed, PDB, and Uniprot.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.03381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq, are critical tools for the biologist looking to analyze the genetic activity/transcriptome of a tissue or cell during an experimental procedure. Platforms like Illumina's next-generation sequencing (NGS) are used to produce the raw data for this experimental procedure. This raw FASTQ data must then be prepared via a complex series of data manipulations by bioinformaticians. This process currently takes place on an unwieldy textual user interface like a terminal/command line that requires the user to install and import multiple program packages, preventing the untrained biologist from initiating data analysis. Open-source platforms like Galaxy have produced a more user-friendly pipeline, yet the visual interface remains cluttered and highly technical, remaining uninviting for the natural scientist. To address this, SeqMate is a user-friendly tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis (differential expression, trajectory analysis, etc). Furthermore, by utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes with sources cited from known repositories like PubMed, PDB, and Uniprot.
SeqMate:用于自动化 RNA 测序的新型大型语言模型管道
对于希望在实验过程中分析组织或细胞遗传活性/转录组的生物学家来说,RNA 测序技术,如批量 RNA-seq 和单细胞 (sc) RNA-seq 是至关重要的工具。然后,生物信息学家必须对这些原始 FASTQ 数据进行一系列复杂的数据处理。目前,这一过程是在类似于终端/命令行的笨重的文本用户界面上进行的,需要用户安装和导入多个程序包,使未经训练的生物学家无法开始数据分析。Galaxy 等开源平台提供了更加友好的用户界面,但其可视化界面仍然杂乱无章,技术性很强,对自然科学家来说缺乏吸引力。为了解决这个问题,SeqMate 是一款用户友好型工具,它利用大型语言模型(LLM)的强大功能,自动进行数据准备和分析(差异表达、轨迹分析等),从而实现一键式分析。此外,SeqMate 还能利用生成式人工智能的强大功能来分析这些发现,并生成上调/下调/用户提示基因的书面报告,报告的来源可从 PubMed、PDB 和 Uniprot 等已知资源库中引用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信