{"title":"Advancing genetic engineering with active learning: theory, implementations and potential opportunities.","authors":"Qixiu Du, Haochen Wang, Benben Jiang, Xiaowo Wang","doi":"10.1093/bib/bbaf286","DOIUrl":null,"url":null,"abstract":"<p><p>Employing machine learning (ML) models to accelerate experimentation and uncover biological mechanisms has been a rising tendency in genetic engineering. However, effectively collecting data to enhance model accuracy and improve design remains challenging, especially when data quality is poor and validation resources are limited. Active learning (AL) addresses this by iteratively identifying promising candidates, thereby reducing experimental efforts while improving model performance. This review highlights how AL can assist scientists throughout the design-build-test-learn cycle, explore its various practical implementations, and discuss its potential through the integration of cross-domain expertise. In the age of genetic engineering revolutionized by data-driven ML models, AL presents an iterative framework that significantly enhances the functionalities of biomolecules and uncovers their intrinsic mechanisms, all while minimizing expenses and efforts.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12236445/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf286","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Employing machine learning (ML) models to accelerate experimentation and uncover biological mechanisms has been a rising tendency in genetic engineering. However, effectively collecting data to enhance model accuracy and improve design remains challenging, especially when data quality is poor and validation resources are limited. Active learning (AL) addresses this by iteratively identifying promising candidates, thereby reducing experimental efforts while improving model performance. This review highlights how AL can assist scientists throughout the design-build-test-learn cycle, explore its various practical implementations, and discuss its potential through the integration of cross-domain expertise. In the age of genetic engineering revolutionized by data-driven ML models, AL presents an iterative framework that significantly enhances the functionalities of biomolecules and uncovers their intrinsic mechanisms, all while minimizing expenses and efforts.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.