在数据贫乏的情况下,数据驱动的终点选择:页岩气返排和产出水的生物测定设计

IF 8.8 2区 环境科学与生态学 Q1 ENGINEERING, ENVIRONMENTAL
Fei Cheng, Zhimin Zhou, Fan Wu, Huizhen Li, Zhiqiang Yu, Xiangying Zeng and Jing You*, 
{"title":"在数据贫乏的情况下,数据驱动的终点选择:页岩气返排和产出水的生物测定设计","authors":"Fei Cheng,&nbsp;Zhimin Zhou,&nbsp;Fan Wu,&nbsp;Huizhen Li,&nbsp;Zhiqiang Yu,&nbsp;Xiangying Zeng and Jing You*,&nbsp;","doi":"10.1021/acs.estlett.2c00648","DOIUrl":null,"url":null,"abstract":"<p >Big data approaches have greatly improved scientific decision making, but they are highly dependent on the availability of data, impeding their use in data-poor scenarios. In addition to data abundance, enhancing data diversity is likewise a way to access knowledge. Herein, we propose a data-driven method for toxicity endpoint selection when directly relevant data are deficient, and shale gas exploitation sites were used as an example scenario. From the 1173 substances in the U.S. Environmental Protection Agency’s HFList, the most concerning endpoints in zebrafish embryo toxicity tests (FET) were inferred using a newly developed relational database (RDB) strategy that integrated chemical, high-throughput screening (HTS) bioactivity, genome, and FET endpoint information. This RDB strategy based on text mining and data fusion approaches enabled the integration of 255 bioactive contaminants, 955 HTS bioassays with known modes of action (MoAs), 214 gene ontologies, 65 pathways, and 27 phenotypic data and predicted measurement endpoints within 10 MoAs for shale gas pollution. This data-driven approach was further validated using zebrafish FET and transcriptomic sequencing with field-collected samples and achieved 89% and 97% accuracy for the predictive ontologies and pathways, respectively. This highlighted the applicability of RDB-based data-driven strategies for predicting toxicity endpoints from a priori knowledge of contaminants by improving data diversity.</p>","PeriodicalId":37,"journal":{"name":"Environmental Science & Technology Letters Environ.","volume":"9 12","pages":"1074–1080"},"PeriodicalIF":8.8000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Data-Driven Endpoint Selection in Data-Poor Scenarios: Bioassay Design for Shale Gas Flowback and Produced Waters\",\"authors\":\"Fei Cheng,&nbsp;Zhimin Zhou,&nbsp;Fan Wu,&nbsp;Huizhen Li,&nbsp;Zhiqiang Yu,&nbsp;Xiangying Zeng and Jing You*,&nbsp;\",\"doi\":\"10.1021/acs.estlett.2c00648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Big data approaches have greatly improved scientific decision making, but they are highly dependent on the availability of data, impeding their use in data-poor scenarios. In addition to data abundance, enhancing data diversity is likewise a way to access knowledge. Herein, we propose a data-driven method for toxicity endpoint selection when directly relevant data are deficient, and shale gas exploitation sites were used as an example scenario. From the 1173 substances in the U.S. Environmental Protection Agency’s HFList, the most concerning endpoints in zebrafish embryo toxicity tests (FET) were inferred using a newly developed relational database (RDB) strategy that integrated chemical, high-throughput screening (HTS) bioactivity, genome, and FET endpoint information. This RDB strategy based on text mining and data fusion approaches enabled the integration of 255 bioactive contaminants, 955 HTS bioassays with known modes of action (MoAs), 214 gene ontologies, 65 pathways, and 27 phenotypic data and predicted measurement endpoints within 10 MoAs for shale gas pollution. This data-driven approach was further validated using zebrafish FET and transcriptomic sequencing with field-collected samples and achieved 89% and 97% accuracy for the predictive ontologies and pathways, respectively. This highlighted the applicability of RDB-based data-driven strategies for predicting toxicity endpoints from a priori knowledge of contaminants by improving data diversity.</p>\",\"PeriodicalId\":37,\"journal\":{\"name\":\"Environmental Science & Technology Letters Environ.\",\"volume\":\"9 12\",\"pages\":\"1074–1080\"},\"PeriodicalIF\":8.8000,\"publicationDate\":\"2022-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Science & Technology Letters Environ.\",\"FirstCategoryId\":\"1\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.estlett.2c00648\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science & Technology Letters Environ.","FirstCategoryId":"1","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.estlett.2c00648","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 2

摘要

大数据方法极大地改善了科学决策,但它们高度依赖于数据的可用性,阻碍了它们在数据匮乏的情况下的使用。除了丰富数据之外,增强数据多样性也是获取知识的一种方式。在此,我们提出了一种数据驱动的方法,在缺乏直接相关数据的情况下选择毒性终点,并以页岩气开采地点为例。从美国环境保护署HFList中的1173种物质中,使用新开发的关系数据库(RDB)策略推断出斑马鱼胚胎毒性试验(FET)中最相关的终点,该策略集成了化学,高通量筛选(HTS)生物活性,基因组和FET终点信息。这种基于文本挖掘和数据融合方法的RDB策略能够整合255种生物活性污染物、955种具有已知作用模式(MoAs)的HTS生物测定、214种基因本体、65种途径和27种表型数据,并在10种MoAs内预测页岩气污染的测量端点。利用斑马鱼FET和现场采集样本的转录组测序进一步验证了这种数据驱动的方法,预测本体论和途径的准确率分别达到89%和97%。这突出了基于rdb的数据驱动策略的适用性,通过改善数据多样性,从污染物的先验知识中预测毒性终点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Data-Driven Endpoint Selection in Data-Poor Scenarios: Bioassay Design for Shale Gas Flowback and Produced Waters

Data-Driven Endpoint Selection in Data-Poor Scenarios: Bioassay Design for Shale Gas Flowback and Produced Waters

Big data approaches have greatly improved scientific decision making, but they are highly dependent on the availability of data, impeding their use in data-poor scenarios. In addition to data abundance, enhancing data diversity is likewise a way to access knowledge. Herein, we propose a data-driven method for toxicity endpoint selection when directly relevant data are deficient, and shale gas exploitation sites were used as an example scenario. From the 1173 substances in the U.S. Environmental Protection Agency’s HFList, the most concerning endpoints in zebrafish embryo toxicity tests (FET) were inferred using a newly developed relational database (RDB) strategy that integrated chemical, high-throughput screening (HTS) bioactivity, genome, and FET endpoint information. This RDB strategy based on text mining and data fusion approaches enabled the integration of 255 bioactive contaminants, 955 HTS bioassays with known modes of action (MoAs), 214 gene ontologies, 65 pathways, and 27 phenotypic data and predicted measurement endpoints within 10 MoAs for shale gas pollution. This data-driven approach was further validated using zebrafish FET and transcriptomic sequencing with field-collected samples and achieved 89% and 97% accuracy for the predictive ontologies and pathways, respectively. This highlighted the applicability of RDB-based data-driven strategies for predicting toxicity endpoints from a priori knowledge of contaminants by improving data diversity.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Environmental Science & Technology Letters Environ.
Environmental Science & Technology Letters Environ. ENGINEERING, ENVIRONMENTALENVIRONMENTAL SC-ENVIRONMENTAL SCIENCES
CiteScore
17.90
自引率
3.70%
发文量
163
期刊介绍: Environmental Science & Technology Letters serves as an international forum for brief communications on experimental or theoretical results of exceptional timeliness in all aspects of environmental science, both pure and applied. Published as soon as accepted, these communications are summarized in monthly issues. Additionally, the journal features short reviews on emerging topics in environmental science and technology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信