Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis

Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong
{"title":"Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis","authors":"Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong","doi":"arxiv-2407.09089","DOIUrl":null,"url":null,"abstract":"Interrogation of biological pathways is an integral part of omics data\nanalysis. Large language models (LLMs) enable the generation of custom pathways\nand gene sets tailored to specific scientific questions. These targeted sets\nare significantly smaller than traditional pathway enrichment analysis\nlibraries, reducing multiple hypothesis testing and potentially enhancing\nstatistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a\npython-based bioinformatics toolkit that streamlines the generation of pathways\nand gene sets for transcriptomic analysis. It operates in three steps: 1)\nderiving relevant pathways based on the researcher's scientific question, 2)\ngenerating valid gene sets for each pathway, and 3) outputting the results as\n.GMX files. Lomics also provides explanations for pathway selections.\nConsistency and accuracy are ensured through iterative processes, JSON format\nvalidation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol\nverification. Lomics serves as a foundation for integrating LLMs into omics\nresearch, potentially improving the specificity and efficiency of pathway\nanalysis.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Molecular Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.09089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Interrogation of biological pathways is an integral part of omics data analysis. Large language models (LLMs) enable the generation of custom pathways and gene sets tailored to specific scientific questions. These targeted sets are significantly smaller than traditional pathway enrichment analysis libraries, reducing multiple hypothesis testing and potentially enhancing statistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a python-based bioinformatics toolkit that streamlines the generation of pathways and gene sets for transcriptomic analysis. It operates in three steps: 1) deriving relevant pathways based on the researcher's scientific question, 2) generating valid gene sets for each pathway, and 3) outputting the results as .GMX files. Lomics also provides explanations for pathway selections. Consistency and accuracy are ensured through iterative processes, JSON format validation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol verification. Lomics serves as a foundation for integrating LLMs into omics research, potentially improving the specificity and efficiency of pathway analysis.
Lomics:使用大型语言模型生成通路和基因组,用于转录组分析
对生物通路的研究是 omics 数据分析不可或缺的一部分。大型语言模型(LLM)可以生成针对特定科学问题的定制通路和基因集。这些目标集比传统的通路富集分析库小很多,减少了多重假设检验,并可能提高统计能力。Lomics (Large Language Models for Omics Studies) v1.0 是一个基于 Python 的生物信息学工具包,可简化转录组分析中通路和基因集的生成。它分为三个步骤1)根据研究人员的科学问题生成相关通路;2)为每个通路生成有效的基因组;3)将结果输出为 GMX 文件。通过迭代过程、JSON 格式验证和 HUGO 基因命名委员会 (HGNC) 基因符号验证,确保了一致性和准确性。Lomics 是将 LLMs 整合到 omics 研究中的基础,有可能提高通路分析的特异性和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信