Be-dataHIVE: a base editing database.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2024-10-15 DOI:10.1186/s12859-024-05898-0

Lucas Schneider, Peter Minary

{"title":"Be-dataHIVE: a base editing database.","authors":"Lucas Schneider, Peter Minary","doi":"10.1186/s12859-024-05898-0","DOIUrl":null,"url":null,"abstract":"<p><p>Base editing is an enhanced gene editing approach that enables the precise transformation of single nucleotides and has the potential to cure rare diseases. The design process of base editors is labour-intensive and outcomes are not easily predictable. For any clinical use, base editing has to be accurate and efficient. Thus, any bystander mutations have to be minimized. In recent years, computational models to predict base editing outcomes have been developed. However, the overall robustness and performance of those models is limited. One way to improve the performance is to train models on a diverse, feature-rich, and large dataset, which does not exist for the base editing field. Hence, we develop BE-dataHIVE, a mySQL database that covers over 460,000 gRNA target combinations. The current version of BE-dataHIVE consists of data from five studies and is enriched with melting temperatures and energy terms. Furthermore, multiple different data structures for machine learning were computed and are directly available. The database can be accessed via our website https://be-datahive.com/ or API and is therefore suitable for practitioners and machine learning researchers.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"330"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11476525/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05898-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Base editing is an enhanced gene editing approach that enables the precise transformation of single nucleotides and has the potential to cure rare diseases. The design process of base editors is labour-intensive and outcomes are not easily predictable. For any clinical use, base editing has to be accurate and efficient. Thus, any bystander mutations have to be minimized. In recent years, computational models to predict base editing outcomes have been developed. However, the overall robustness and performance of those models is limited. One way to improve the performance is to train models on a diverse, feature-rich, and large dataset, which does not exist for the base editing field. Hence, we develop BE-dataHIVE, a mySQL database that covers over 460,000 gRNA target combinations. The current version of BE-dataHIVE consists of data from five studies and is enriched with melting temperatures and energy terms. Furthermore, multiple different data structures for machine learning were computed and are directly available. The database can be accessed via our website https://be-datahive.com/ or API and is therefore suitable for practitioners and machine learning researchers.

查看原文本刊更多论文

Be-dataHIVE：基础编辑数据库。

碱基编辑是一种增强型基因编辑方法，可实现单个核苷酸的精确转化，具有治疗罕见疾病的潜力。碱基编辑器的设计过程是劳动密集型的，结果也不容易预测。要用于临床，碱基编辑必须准确、高效。因此，必须尽量减少旁观者突变。近年来，预测碱基编辑结果的计算模型已经开发出来。然而，这些模型的整体稳健性和性能有限。提高性能的方法之一是在多样化、特征丰富的大型数据集上训练模型，而碱基编辑领域并不存在这样的数据集。因此，我们开发了一个 MySQL 数据库 BE-dataHIVE，它涵盖了超过 46 万个 gRNA 目标组合。当前版本的 BE-dataHIVE 包含来自五项研究的数据，并丰富了熔化温度和能量项。此外，还为机器学习计算了多种不同的数据结构，并可直接使用。该数据库可通过我们的网站 https://be-datahive.com/ 或 API 访问，因此适合从业人员和机器学习研究人员使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.