通过候选者排名对假设生成系统进行大规模验证。

Justin Sybrandt, Michael Shtutman, Ilya Safro
{"title":"通过候选者排名对假设生成系统进行大规模验证。","authors":"Justin Sybrandt, Michael Shtutman, Ilya Safro","doi":"10.1109/bigdata.2018.8622637","DOIUrl":null,"url":null,"abstract":"<p><p>The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses. This method evaluates a HG system by its ability to rank hypotheses by plausibility; a process reminiscent of human candidate selection. Because HG systems do not produce a ranking criteria, specifically those that produce topic models, we additionally present novel metrics to quantify the plausibility of hypotheses given topic model system output. Finally, we demonstrate that our proposed validation method aligns with real-world research goals by deploying our method within MOLIERE, our recent topic-driven HG system, in order to automatically generate a set of candidate genes related to HIV-associated neurodegenerative disease (HAND). By performing laboratory experiments based on this candidate set, we discover a new connection between HAND and Dead Box RNA Helicase 3 (DDX3).</p><p><strong>Reproducibility: </strong>code, validation data, and results can be found at sybrandt.com/2018/validation.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":" ","pages":"1494-1503"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9248026/pdf/nihms-1819102.pdf","citationCount":"0","resultStr":"{\"title\":\"Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.\",\"authors\":\"Justin Sybrandt, Michael Shtutman, Ilya Safro\",\"doi\":\"10.1109/bigdata.2018.8622637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses. This method evaluates a HG system by its ability to rank hypotheses by plausibility; a process reminiscent of human candidate selection. Because HG systems do not produce a ranking criteria, specifically those that produce topic models, we additionally present novel metrics to quantify the plausibility of hypotheses given topic model system output. Finally, we demonstrate that our proposed validation method aligns with real-world research goals by deploying our method within MOLIERE, our recent topic-driven HG system, in order to automatically generate a set of candidate genes related to HIV-associated neurodegenerative disease (HAND). By performing laboratory experiments based on this candidate set, we discover a new connection between HAND and Dead Box RNA Helicase 3 (DDX3).</p><p><strong>Reproducibility: </strong>code, validation data, and results can be found at sybrandt.com/2018/validation.</p>\",\"PeriodicalId\":74501,\"journal\":{\"name\":\"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data\",\"volume\":\" \",\"pages\":\"1494-1503\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9248026/pdf/nihms-1819102.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/bigdata.2018.8622637\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/1/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bigdata.2018.8622637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/1/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

许多研究项目的第一步都是确定和排列一份候选研究对象的简短清单。在科学进步日新月异的今天,一些人转而使用自动假设生成(HG)系统来辅助这一过程。这些系统可以在庞大的科学语料库中找出隐含的或被忽视的联系,虽然它们的重要性随着科学发展的步伐而增长,但却缺乏全面的验证。由于没有任何标准的数字评估方法,许多人通过重新发现少量历史发现来验证通用型 HG 系统,而一些希望更彻底的人可能会根据自动建议进行实验室实验。这些方法既昂贵又耗时,而且无法推广。因此,我们提出了一个数字评估框架,用于验证利用数千个验证假设的危险源系统。这种方法通过假设的可信度排序能力来评估 HG 系统;这一过程让人联想到人类的候选者选择。由于 HG 系统(特别是那些生成主题模型的系统)不会生成排序标准,因此我们另外提出了新的指标来量化主题模型系统输出的假设的可信度。最后,我们证明了我们提出的验证方法符合现实世界的研究目标,我们在 MOLIERE 中部署了我们的方法,这是我们最近开发的主题驱动 HG 系统,目的是自动生成一组与 HIV 相关神经退行性疾病(HAND)有关的候选基因。通过基于这组候选基因进行实验室实验,我们发现了HAND与死盒RNA螺旋酶3(DDX3)之间的新联系。可重复性:代码、验证数据和结果可在sybrandt.com/2018/validation上找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.

Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.

Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.

Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.

The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses. This method evaluates a HG system by its ability to rank hypotheses by plausibility; a process reminiscent of human candidate selection. Because HG systems do not produce a ranking criteria, specifically those that produce topic models, we additionally present novel metrics to quantify the plausibility of hypotheses given topic model system output. Finally, we demonstrate that our proposed validation method aligns with real-world research goals by deploying our method within MOLIERE, our recent topic-driven HG system, in order to automatically generate a set of candidate genes related to HIV-associated neurodegenerative disease (HAND). By performing laboratory experiments based on this candidate set, we discover a new connection between HAND and Dead Box RNA Helicase 3 (DDX3).

Reproducibility: code, validation data, and results can be found at sybrandt.com/2018/validation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信