High-Throughput Screening Assay Datasets from the PubChem Database.

Chemical informatics (Wilmington, Del.) Pub Date : 2017-01-01 Epub Date: 2017-04-26
Mariusz Butkiewicz, Yanli Wang, Stephen H Bryant, Edward W Lowe, David C Weaver, Jens Meiler
{"title":"High-Throughput Screening Assay Datasets from the PubChem Database.","authors":"Mariusz Butkiewicz,&nbsp;Yanli Wang,&nbsp;Stephen H Bryant,&nbsp;Edward W Lowe,&nbsp;David C Weaver,&nbsp;Jens Meiler","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentration-response experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as 'active', though its meaning is 'inactive' on the target of interest as it is 'active' on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases. The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking.</p>","PeriodicalId":92340,"journal":{"name":"Chemical informatics (Wilmington, Del.)","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e1/a7/nihms936862.PMC5962024.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical informatics (Wilmington, Del.)","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/4/26 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentration-response experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as 'active', though its meaning is 'inactive' on the target of interest as it is 'active' on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases. The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking.

Abstract Image

Abstract Image

来自PubChem数据库的高通量筛选分析数据集。
公共领域高通量筛选(HTS)数据的可用性为促进基于配体的计算机辅助药物发现(LB-CADD)方法的发展提供了巨大的潜力,这对学术界和工业界的药物发现工作至关重要。LB-CADD方法的开发依赖于高质量的HTS分析数据,即包含活性和非活性化合物的数据集。这些活性化合物是经过浓度-反应实验测试的初级筛选命中的,并且命中的目标特异性已通过适当的二级筛选实验验证。公开可用的HTS存储库(如PubChem)通常以一种复杂的方式提供此类数据:需要从主要筛选记录中提取被归类为非活性的化合物。然而,在初筛记录中被分类为活性的化合物由于假阳性率高,不适合作为LB-CADD实验的一组活性化合物。通过仔细分析通常多达五次或更多次用于确认和分类化合物活性的测定结果,可以得出一组合适的活性。这些分析在某种程度上是建立在彼此的基础上的。然而,通常并不是所有之前筛选的成功化合物都经过了测试。有时一种化合物可以被归类为“活性”,尽管它的意思是对感兴趣的目标“不活跃”,因为它对不同的目标蛋白质是“活跃”的。在这里,基于两个特定选择的蛋白质用例,说明了分层相关的验证性屏幕的管理过程。《PubChem》随后的重新上传过程描述了这两种场景的发现。此外,我们为未来的LB-CADD方法开发提供了9个公开可访问的高质量数据集,为未来的方法与科学界的比较提供了一个共同的基线。我们还提供了一个研究人员可以遵循的协议,以上传额外的数据集进行基准测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信