On Model Discovery For Hosted Data Science Projects

Hui Miao, Ang Li, L. Davis, A. Deshpande
{"title":"On Model Discovery For Hosted Data Science Projects","authors":"Hui Miao, Ang Li, L. Davis, A. Deshpande","doi":"10.1145/3076246.3076252","DOIUrl":null,"url":null,"abstract":"Alongside developing systems for scalable machine learning and collaborative data science activities, there is an increasing trend toward publicly shared data science projects, hosted in general or dedicated hosting services, such as GitHub and DataHub. The artifacts of the hosted projects are rich and include not only text files, but also versioned datasets, trained models, project documents, etc. Under the fast pace and expectation of data science activities, model discovery, i.e., finding relevant data science projects to reuse, is an important task in the context of data management for end-to-end machine learning. In this paper, we study the task and present the ongoing work on ModelHub Discovery, a system for finding relevant models in hosted data science projects. Instead of prescribing a structured data model for data science projects, we take an information retrieval approach by decomposing the discovery task into three major steps: project query and matching, model comparison and ranking, and processing and building ensembles with returned models. We describe the motivation and desiderata, propose techniques, and present opportunities and challenges for model discovery for hosted data science projects.","PeriodicalId":118931,"journal":{"name":"Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3076246.3076252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Alongside developing systems for scalable machine learning and collaborative data science activities, there is an increasing trend toward publicly shared data science projects, hosted in general or dedicated hosting services, such as GitHub and DataHub. The artifacts of the hosted projects are rich and include not only text files, but also versioned datasets, trained models, project documents, etc. Under the fast pace and expectation of data science activities, model discovery, i.e., finding relevant data science projects to reuse, is an important task in the context of data management for end-to-end machine learning. In this paper, we study the task and present the ongoing work on ModelHub Discovery, a system for finding relevant models in hosted data science projects. Instead of prescribing a structured data model for data science projects, we take an information retrieval approach by decomposing the discovery task into three major steps: project query and matching, model comparison and ranking, and processing and building ensembles with returned models. We describe the motivation and desiderata, propose techniques, and present opportunities and challenges for model discovery for hosted data science projects.
托管数据科学项目的模型发现
除了为可扩展的机器学习和协作数据科学活动开发系统外,公共共享数据科学项目的趋势也在增加,这些项目托管在通用或专用托管服务上,如GitHub和DataHub。托管项目的工件非常丰富,不仅包括文本文件,还包括版本化的数据集、训练过的模型、项目文档等。在数据科学活动的快节奏和期望下,模型发现,即找到相关的数据科学项目进行重用,是端到端机器学习数据管理背景下的一项重要任务。在本文中,我们研究了这项任务,并介绍了ModelHub Discovery上正在进行的工作,ModelHub Discovery是一个在托管数据科学项目中查找相关模型的系统。我们没有为数据科学项目规定结构化数据模型,而是采用信息检索方法,将发现任务分解为三个主要步骤:项目查询和匹配,模型比较和排序,以及使用返回模型处理和构建集成。我们描述了动机和需求,提出了技术,并提出了托管数据科学项目模型发现的机遇和挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信