在数据贫乏的情况下，数据驱动的终点选择:页岩气返排和产出水的生物测定设计

IF 8.8 2区环境科学与生态学 Q1 ENGINEERING, ENVIRONMENTAL

Environmental Science & Technology Letters Environ. Pub Date : 2022-11-03 DOI:10.1021/acs.estlett.2c00648

Fei Cheng, Zhimin Zhou, Fan Wu, Huizhen Li, Zhiqiang Yu, Xiangying Zeng and Jing You*,

{"title":"在数据贫乏的情况下，数据驱动的终点选择:页岩气返排和产出水的生物测定设计","authors":"Fei Cheng, Zhimin Zhou, Fan Wu, Huizhen Li, Zhiqiang Yu, Xiangying Zeng and Jing You*, ","doi":"10.1021/acs.estlett.2c00648","DOIUrl":null,"url":null,"abstract":"<p >Big data approaches have greatly improved scientific decision making, but they are highly dependent on the availability of data, impeding their use in data-poor scenarios. In addition to data abundance, enhancing data diversity is likewise a way to access knowledge. Herein, we propose a data-driven method for toxicity endpoint selection when directly relevant data are deficient, and shale gas exploitation sites were used as an example scenario. From the 1173 substances in the U.S. Environmental Protection Agency’s HFList, the most concerning endpoints in zebrafish embryo toxicity tests (FET) were inferred using a newly developed relational database (RDB) strategy that integrated chemical, high-throughput screening (HTS) bioactivity, genome, and FET endpoint information. This RDB strategy based on text mining and data fusion approaches enabled the integration of 255 bioactive contaminants, 955 HTS bioassays with known modes of action (MoAs), 214 gene ontologies, 65 pathways, and 27 phenotypic data and predicted measurement endpoints within 10 MoAs for shale gas pollution. This data-driven approach was further validated using zebrafish FET and transcriptomic sequencing with field-collected samples and achieved 89% and 97% accuracy for the predictive ontologies and pathways, respectively. This highlighted the applicability of RDB-based data-driven strategies for predicting toxicity endpoints from a priori knowledge of contaminants by improving data diversity.</p>","PeriodicalId":37,"journal":{"name":"Environmental Science & Technology Letters Environ.","volume":"9 12","pages":"1074–1080"},"PeriodicalIF":8.8000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Data-Driven Endpoint Selection in Data-Poor Scenarios: Bioassay Design for Shale Gas Flowback and Produced Waters\",\"authors\":\"Fei Cheng, Zhimin Zhou, Fan Wu, Huizhen Li, Zhiqiang Yu, Xiangying Zeng and Jing You*, \",\"doi\":\"10.1021/acs.estlett.2c00648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Big data approaches have greatly improved scientific decision making, but they are highly dependent on the availability of data, impeding their use in data-poor scenarios. In addition to data abundance, enhancing data diversity is likewise a way to access knowledge. Herein, we propose a data-driven method for toxicity endpoint selection when directly relevant data are deficient, and shale gas exploitation sites were used as an example scenario. From the 1173 substances in the U.S. Environmental Protection Agency’s HFList, the most concerning endpoints in zebrafish embryo toxicity tests (FET) were inferred using a newly developed relational database (RDB) strategy that integrated chemical, high-throughput screening (HTS) bioactivity, genome, and FET endpoint information. This RDB strategy based on text mining and data fusion approaches enabled the integration of 255 bioactive contaminants, 955 HTS bioassays with known modes of action (MoAs), 214 gene ontologies, 65 pathways, and 27 phenotypic data and predicted measurement endpoints within 10 MoAs for shale gas pollution. This data-driven approach was further validated using zebrafish FET and transcriptomic sequencing with field-collected samples and achieved 89% and 97% accuracy for the predictive ontologies and pathways, respectively. This highlighted the applicability of RDB-based data-driven strategies for predicting toxicity endpoints from a priori knowledge of contaminants by improving data diversity.</p>\",\"PeriodicalId\":37,\"journal\":{\"name\":\"Environmental Science & Technology Letters Environ.\",\"volume\":\"9 12\",\"pages\":\"1074–1080\"},\"PeriodicalIF\":8.8000,\"publicationDate\":\"2022-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Science & Technology Letters Environ.\",\"FirstCategoryId\":\"1\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.estlett.2c00648\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science & Technology Letters Environ.","FirstCategoryId":"1","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.estlett.2c00648","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}

引用次数: 2

摘要

大数据方法极大地改善了科学决策，但它们高度依赖于数据的可用性，阻碍了它们在数据匮乏的情况下的使用。除了丰富数据之外，增强数据多样性也是获取知识的一种方式。在此，我们提出了一种数据驱动的方法，在缺乏直接相关数据的情况下选择毒性终点，并以页岩气开采地点为例。从美国环境保护署HFList中的1173种物质中，使用新开发的关系数据库(RDB)策略推断出斑马鱼胚胎毒性试验(FET)中最相关的终点，该策略集成了化学，高通量筛选(HTS)生物活性，基因组和FET终点信息。这种基于文本挖掘和数据融合方法的RDB策略能够整合255种生物活性污染物、955种具有已知作用模式(MoAs)的HTS生物测定、214种基因本体、65种途径和27种表型数据，并在10种MoAs内预测页岩气污染的测量端点。利用斑马鱼FET和现场采集样本的转录组测序进一步验证了这种数据驱动的方法，预测本体论和途径的准确率分别达到89%和97%。这突出了基于rdb的数据驱动策略的适用性，通过改善数据多样性，从污染物的先验知识中预测毒性终点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Data-Driven Endpoint Selection in Data-Poor Scenarios: Bioassay Design for Shale Gas Flowback and Produced Waters

查看原文本刊更多论文

Data-Driven Endpoint Selection in Data-Poor Scenarios: Bioassay Design for Shale Gas Flowback and Produced Waters

Big data approaches have greatly improved scientific decision making, but they are highly dependent on the availability of data, impeding their use in data-poor scenarios. In addition to data abundance, enhancing data diversity is likewise a way to access knowledge. Herein, we propose a data-driven method for toxicity endpoint selection when directly relevant data are deficient, and shale gas exploitation sites were used as an example scenario. From the 1173 substances in the U.S. Environmental Protection Agency’s HFList, the most concerning endpoints in zebrafish embryo toxicity tests (FET) were inferred using a newly developed relational database (RDB) strategy that integrated chemical, high-throughput screening (HTS) bioactivity, genome, and FET endpoint information. This RDB strategy based on text mining and data fusion approaches enabled the integration of 255 bioactive contaminants, 955 HTS bioassays with known modes of action (MoAs), 214 gene ontologies, 65 pathways, and 27 phenotypic data and predicted measurement endpoints within 10 MoAs for shale gas pollution. This data-driven approach was further validated using zebrafish FET and transcriptomic sequencing with field-collected samples and achieved 89% and 97% accuracy for the predictive ontologies and pathways, respectively. This highlighted the applicability of RDB-based data-driven strategies for predicting toxicity endpoints from a priori knowledge of contaminants by improving data diversity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Environmental Science & Technology Letters Environ. ENGINEERING, ENVIRONMENTALENVIRONMENTAL SC-ENVIRONMENTAL SCIENCES

CiteScore

17.90

自引率

3.70%

发文量

163

期刊介绍： Environmental Science & Technology Letters serves as an international forum for brief communications on experimental or theoretical results of exceptional timeliness in all aspects of environmental science, both pure and applied. Published as soon as accepted, these communications are summarized in monthly issues. Additionally, the journal features short reviews on emerging topics in environmental science and technology.