High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response

IF 65.3 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
J. Wozniak, H. Yoo, J. Mohd-Yusof, Bogdan Nicolae, Nicholson T. Collier, J. Ozik, T. Brettin, Rick L. Stevens
{"title":"High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response","authors":"J. Wozniak, H. Yoo, J. Mohd-Yusof, Bogdan Nicolae, Nicholson T. Collier, J. Ozik, T. Brettin, Rick L. Stevens","doi":"10.1109/MLHPCAI4S51975.2020.00012","DOIUrl":null,"url":null,"abstract":"Machine learning in biomedicine is reliant on the availability of large, high-quality data sets. These corpora are used for training statistical or deep learning-based models that can be validated against other data sets and ultimately used to guide decisions. The quality of these data sets is an essential component of the quality of the models and their decisions. Thus, identifying and inspecting outlier data is critical for evaluating, curating, and using biomedical data sets. Many techniques are available to look for outlier data, but it is not clear how to evaluate the impact on highly complex deep learning methods. In this paper, we use deep learning ensembles and workflows to construct a system for automatically identifying data subsets that have a large impact on the trained models. These effects can be quantified and presented to the user for further inspection, which could improve data quality overall. We then present results from running this method on the near-exascale Summit supercomputer.","PeriodicalId":47667,"journal":{"name":"Foundations and Trends in Machine Learning","volume":"82 1","pages":"1-10"},"PeriodicalIF":65.3000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLHPCAI4S51975.2020.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 6

Abstract

Machine learning in biomedicine is reliant on the availability of large, high-quality data sets. These corpora are used for training statistical or deep learning-based models that can be validated against other data sets and ultimately used to guide decisions. The quality of these data sets is an essential component of the quality of the models and their decisions. Thus, identifying and inspecting outlier data is critical for evaluating, curating, and using biomedical data sets. Many techniques are available to look for outlier data, but it is not clear how to evaluate the impact on highly complex deep learning methods. In this paper, we use deep learning ensembles and workflows to construct a system for automatically identifying data subsets that have a large impact on the trained models. These effects can be quantified and presented to the user for further inspection, which could improve data quality overall. We then present results from running this method on the near-exascale Summit supercomputer.
高旁路学习:自动检测显著影响药物反应的肿瘤细胞
生物医学中的机器学习依赖于大量高质量数据集的可用性。这些语料库用于训练统计或基于深度学习的模型,这些模型可以针对其他数据集进行验证,并最终用于指导决策。这些数据集的质量是模型及其决策质量的重要组成部分。因此,识别和检查异常数据对于评估、管理和使用生物医学数据集至关重要。有许多技术可用于寻找离群数据,但尚不清楚如何评估对高度复杂的深度学习方法的影响。在本文中,我们使用深度学习集成和工作流来构建一个系统,用于自动识别对训练模型有很大影响的数据子集。这些影响可以量化并呈现给用户以供进一步检查,这可以提高总体数据质量。然后,我们展示了在接近百亿亿次的Summit超级计算机上运行该方法的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Foundations and Trends in Machine Learning
Foundations and Trends in Machine Learning COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
108.50
自引率
0.00%
发文量
5
期刊介绍: Each issue of Foundations and Trends® in Machine Learning comprises a monograph of at least 50 pages written by research leaders in the field. We aim to publish monographs that provide an in-depth, self-contained treatment of topics where there have been significant new developments. Typically, this means that the monographs we publish will contain a significant level of mathematical detail (to describe the central methods and/or theory for the topic at hand), and will not eschew these details by simply pointing to existing references. Literature surveys and original research papers do not fall within these aims.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信