Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models

Lun Ai, Stephen H. Muggleton, Shi-Shun Liang, Geoff S. Baldwin
{"title":"Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models","authors":"Lun Ai, Stephen H. Muggleton, Shi-Shun Liang, Geoff S. Baldwin","doi":"arxiv-2405.06724","DOIUrl":null,"url":null,"abstract":"Techniques to autonomously drive research have been prominent in\nComputational Scientific Discovery, while Synthetic Biology is a field of\nscience that focuses on designing and constructing new biological systems for\nuseful purposes. Here we seek to apply logic-based machine learning techniques\nto facilitate cellular engineering and drive biological discovery.\nComprehensive databases of metabolic processes called genome-scale metabolic\nnetwork models (GEMs) are often used to evaluate cellular engineering\nstrategies to optimise target compound production. However, predicted host\nbehaviours are not always correctly described by GEMs, often due to errors in\nthe models. The task of learning the intricate genetic interactions within GEMs\npresents computational and empirical challenges. To address these, we describe\na novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging\nboolean matrices to evaluate large logic programs. We introduce a new system,\n$BMLP_{active}$, which efficiently explores the genomic hypothesis space by\nguiding informative experimentation through active learning. In contrast to\nsub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a\nwidely accepted bacterial host in an interpretable and logical representation\nusing datalog logic programs. Notably, $BMLP_{active}$ can successfully learn\nthe interaction between a gene pair with fewer training examples than random\nexperimentation, overcoming the increase in experimental design space.\n$BMLP_{active}$ enables rapid optimisation of metabolic models to reliably\nengineer biological systems for producing useful compounds. It offers a\nrealistic approach to creating a self-driving lab for microbial engineering.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Molecular Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.06724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Techniques to autonomously drive research have been prominent in Computational Scientific Discovery, while Synthetic Biology is a field of science that focuses on designing and constructing new biological systems for useful purposes. Here we seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery. Comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs) are often used to evaluate cellular engineering strategies to optimise target compound production. However, predicted host behaviours are not always correctly described by GEMs, often due to errors in the models. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for microbial engineering.
布尔矩阵逻辑编程用于基因组尺度代谢网络模型中基因功能的主动学习
自主驱动研究的技术在计算科学发现领域非常突出,而合成生物学则是一个专注于设计和构建新生物系统以实现有用目的的科学领域。在这里,我们试图应用基于逻辑的机器学习技术来促进细胞工程并推动生物发现。被称为基因组规模代谢网络模型(GEM)的代谢过程综合数据库通常用于评估细胞工程策略,以优化目标化合物的生产。然而,GEMs 对宿主行为的预测并不总是正确的,这往往是由于模型中的错误造成的。学习 GEM 中错综复杂的基因相互作用是一项计算和经验方面的挑战。为了解决这些问题,我们介绍了一种称为布尔矩阵逻辑编程(BMLP)的新方法,利用布尔矩阵来评估大型逻辑程序。我们引入了一个新系统--$BMLP_{active}$,它通过主动学习引导信息实验,从而高效地探索基因组假设空间。与次符号方法不同的是,$BMLP_{active}$ 用可解释的逻辑表示法对广泛接受的细菌宿主的最先进的 GEM 进行了编码,并使用了 datalog 逻辑程序。值得注意的是,与随机试验相比,$BMLP_{active}$ 可以用更少的训练实例成功地学习基因对之间的相互作用,克服了试验设计空间增大的问题。BMLP_{active}$ 能够快速优化代谢模型,从而可靠地改造生物系统以生产有用的化合物。它为创建微生物工程的自动驾驶实验室提供了一种现实主义的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信