Pipeline to explore information on genome editing using large language models and genome editing meta-database.

IF 3.6 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation Pub Date : 2025-03-08 DOI:10.1093/database/baaf022

Takayuki Suzuki, Hidemasa Bono

{"title":"Pipeline to explore information on genome editing using large language models and genome editing meta-database.","authors":"Takayuki Suzuki, Hidemasa Bono","doi":"10.1093/database/baaf022","DOIUrl":null,"url":null,"abstract":"<p><p>Genome editing (GE) is widely recognized as an effective and valuable technology in life sciences research. However, certain genes are difficult to edit depending on some factors such as the type of species, sequences, and GE tools. Therefore, confirming the presence or absence of GE practices in previous publications is crucial for the effective designing and establishment of research using GE. Although the Genome Editing Meta-database (GEM: https://bonohu.hiroshima-u.ac.jp/gem/) aims to provide as comprehensive GE information as possible, it does not indicate how each registered gene is involved in GE. In this study, we developed a systematic method for extracting essential GE information using large language models from the information based on GEM and GE-related articles. This approach allows for a systematic and efficient investigation of GE information that cannot be achieved using the current GEM alone. In addition, by converting the extracted GE information into metrics, we propose a potential application of this method to prioritize genes for future research. The extracted GE information and novel GE-related scores are expected to facilitate the efficient selection of target genes for GE and support the design of research using GE. Database URLs: https://github.com/szktkyk/extract_geinfo, https://github.com/szktkyk/visualize_geinfo.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890094/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baaf022","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Genome editing (GE) is widely recognized as an effective and valuable technology in life sciences research. However, certain genes are difficult to edit depending on some factors such as the type of species, sequences, and GE tools. Therefore, confirming the presence or absence of GE practices in previous publications is crucial for the effective designing and establishment of research using GE. Although the Genome Editing Meta-database (GEM: https://bonohu.hiroshima-u.ac.jp/gem/) aims to provide as comprehensive GE information as possible, it does not indicate how each registered gene is involved in GE. In this study, we developed a systematic method for extracting essential GE information using large language models from the information based on GEM and GE-related articles. This approach allows for a systematic and efficient investigation of GE information that cannot be achieved using the current GEM alone. In addition, by converting the extracted GE information into metrics, we propose a potential application of this method to prioritize genes for future research. The extracted GE information and novel GE-related scores are expected to facilitate the efficient selection of target genes for GE and support the design of research using GE. Database URLs: https://github.com/szktkyk/extract_geinfo, https://github.com/szktkyk/visualize_geinfo.

查看原文本刊更多论文

利用大型语言模型和基因组编辑元数据库探索基因组编辑信息的管道。

基因组编辑技术在生命科学研究中被广泛认为是一种有效而有价值的技术。然而，某些基因很难编辑，这取决于一些因素，如物种类型、序列和基因工程工具。因此，确认以前出版物中是否存在通用电气实践对于有效设计和建立使用通用电气的研究至关重要。虽然基因组编辑元数据库（GEM: https://bonohu.hiroshima-u.ac.jp/gem/）旨在提供尽可能全面的基因工程信息，但它并没有表明每个注册的基因是如何参与基因工程的。在这项研究中，我们开发了一种系统的方法，利用大型语言模型从GEM和GE相关文章的信息中提取基本的GE信息。这种方法允许对GE信息进行系统和有效的调查，这是单独使用当前的GEM无法实现的。此外，通过将提取的GE信息转换为指标，我们提出了该方法在未来研究中优先考虑基因的潜在应用。提取的GE信息和新的GE相关评分有望促进GE靶基因的有效选择，并支持使用GE的研究设计。数据库url: https://github.com/szktkyk/extract_geinfo、https://github.com/szktkyk/visualize_geinfo。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Database: The Journal of Biological Databases and Curation MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

9.00

自引率

3.40%

发文量

100

审稿时长

>12 weeks

期刊介绍： Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data. Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.