Data hazards in synthetic biology.

IF 2.6 Q2 BIOCHEMICAL RESEARCH METHODS
Synthetic biology (Oxford, England) Pub Date : 2024-06-21 eCollection Date: 2024-01-01 DOI:10.1093/synbio/ysae010
Natalie R Zelenka, Nina Di Cara, Kieren Sharma, Seeralan Sarvaharman, Jasdeep S Ghataora, Fabio Parmeggiani, Jeff Nivala, Zahraa S Abdallah, Lucia Marucci, Thomas E Gorochowski
{"title":"Data hazards in synthetic biology.","authors":"Natalie R Zelenka, Nina Di Cara, Kieren Sharma, Seeralan Sarvaharman, Jasdeep S Ghataora, Fabio Parmeggiani, Jeff Nivala, Zahraa S Abdallah, Lucia Marucci, Thomas E Gorochowski","doi":"10.1093/synbio/ysae010","DOIUrl":null,"url":null,"abstract":"<p><p>Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.</p>","PeriodicalId":74902,"journal":{"name":"Synthetic biology (Oxford, England)","volume":"9 1","pages":"ysae010"},"PeriodicalIF":2.6000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11227101/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthetic biology (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/synbio/ysae010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.

合成生物学中的数据危害。
数据科学在工程生物学的设计和分析中发挥着越来越重要的作用。这得益于高通量方法的发展,如大规模并行报告检测、数据丰富的显微镜技术、计算蛋白质结构预测和设计,以及能够生成大量数据的全细胞模型的发展。虽然在这些情况下应用以数据为中心的分析能力很有吸引力,而且越来越容易做到,但它也伴随着潜在的风险。例如,基础数据中的偏差会如何影响结果的有效性?在这里,我们提出了一个社区开发的数据危害评估框架,以帮助解决这些问题,并展示了该框架在两个合成生物学案例研究中的应用。我们展示了常见类型的生物工程项目中出现的各种考虑因素,并提供了一些指导原则和缓解步骤。了解数据工作中的潜在问题和危险并积极主动地加以解决,对于确保适当使用新兴的数据密集型人工智能方法至关重要,并有助于提高其在合成生物学中应用的可信度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信