科学发现和ML的严谨性

2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT) Pub Date : 2020-12-11 DOI:10.1109/MPCIT51588.2020.9350455

Abdul-Gafoor Mohamed, P. Mahanta

{"title":"科学发现和ML的严谨性","authors":"Abdul-Gafoor Mohamed, P. Mahanta","doi":"10.1109/MPCIT51588.2020.9350455","DOIUrl":null,"url":null,"abstract":"The evolution of Data Management Scenarios augmented by scientific discovery and rigor is apparent in the industry, judging by the sheer focus on it by analysts and others over the past couple of years. Machine Learning helps immensely playing its part in simplifying enterprise data landscapes, contributing to many aspects of Data Management. We see value in focusing on the Data Discovery and Data Quality aspects in this context, as enterprises these days have complex landscapes, with the average enterprise using more than 5 Cloud storages in addition to their on-prem data sources.A greater affinity for enterprise grade Machine Learning has created a significant pull for system design. This leads platforms towards capabilities like standard APIs for scaled-database queries and integration scenarios. This paper explores the integration of Machine Learning tools and customized libraries with any Cloud Platform for enhancing the stakeholders’ experience with Analytics. As far as concepts are concerned, we propose a hypothesis for scaling an existent platform to a community-based approach, which helps enable sharing of experimental iterations, ideally translating into industry specific solutions that should stay extremely reusable. The intent is to offer a data model flexible enough to handle diverse data scenarios, evaluating confidence scores for each of these. It should enable reproducible shared experiments with consistent evaluated scores, thereby easing the integration process through automated guidance. This paper will touch upon the good practices and architectural recommendations that need to be considered for general Machine Learning applications.","PeriodicalId":136514,"journal":{"name":"2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scientific Discovery and Rigor with ML\",\"authors\":\"Abdul-Gafoor Mohamed, P. Mahanta\",\"doi\":\"10.1109/MPCIT51588.2020.9350455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The evolution of Data Management Scenarios augmented by scientific discovery and rigor is apparent in the industry, judging by the sheer focus on it by analysts and others over the past couple of years. Machine Learning helps immensely playing its part in simplifying enterprise data landscapes, contributing to many aspects of Data Management. We see value in focusing on the Data Discovery and Data Quality aspects in this context, as enterprises these days have complex landscapes, with the average enterprise using more than 5 Cloud storages in addition to their on-prem data sources.A greater affinity for enterprise grade Machine Learning has created a significant pull for system design. This leads platforms towards capabilities like standard APIs for scaled-database queries and integration scenarios. This paper explores the integration of Machine Learning tools and customized libraries with any Cloud Platform for enhancing the stakeholders’ experience with Analytics. As far as concepts are concerned, we propose a hypothesis for scaling an existent platform to a community-based approach, which helps enable sharing of experimental iterations, ideally translating into industry specific solutions that should stay extremely reusable. The intent is to offer a data model flexible enough to handle diverse data scenarios, evaluating confidence scores for each of these. It should enable reproducible shared experiments with consistent evaluated scores, thereby easing the integration process through automated guidance. This paper will touch upon the good practices and architectural recommendations that need to be considered for general Machine Learning applications.\",\"PeriodicalId\":136514,\"journal\":{\"name\":\"2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT)\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MPCIT51588.2020.9350455\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MPCIT51588.2020.9350455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

从过去几年分析师和其他人对数据管理场景的关注来看，在科学发现和严谨的推动下，数据管理场景的演变在行业中是显而易见的。机器学习在简化企业数据环境方面发挥了巨大的作用，为数据管理的许多方面做出了贡献。在这种情况下，我们看到了关注数据发现和数据质量方面的价值，因为如今的企业环境复杂，除了本地数据源外，平均每家企业使用超过5个云存储。对企业级机器学习的更大亲和力为系统设计创造了重要的吸引力。这使得平台具备了一些功能，比如用于扩展数据库查询和集成场景的标准api。本文探讨了机器学习工具和定制库与任何云平台的集成，以增强利益相关者的分析体验。就概念而言，我们提出了一个假设，将现有平台扩展为基于社区的方法，这有助于实现实验迭代的共享，理想情况下，转化为应该保持高度可重用的行业特定解决方案。其目的是提供一个足够灵活的数据模型来处理各种数据场景，并评估每种场景的置信度得分。它应该使具有一致评估分数的可重复共享实验成为可能，从而通过自动化指导简化集成过程。本文将涉及一般机器学习应用程序需要考虑的良好实践和架构建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scientific Discovery and Rigor with ML

The evolution of Data Management Scenarios augmented by scientific discovery and rigor is apparent in the industry, judging by the sheer focus on it by analysts and others over the past couple of years. Machine Learning helps immensely playing its part in simplifying enterprise data landscapes, contributing to many aspects of Data Management. We see value in focusing on the Data Discovery and Data Quality aspects in this context, as enterprises these days have complex landscapes, with the average enterprise using more than 5 Cloud storages in addition to their on-prem data sources.A greater affinity for enterprise grade Machine Learning has created a significant pull for system design. This leads platforms towards capabilities like standard APIs for scaled-database queries and integration scenarios. This paper explores the integration of Machine Learning tools and customized libraries with any Cloud Platform for enhancing the stakeholders’ experience with Analytics. As far as concepts are concerned, we propose a hypothesis for scaling an existent platform to a community-based approach, which helps enable sharing of experimental iterations, ideally translating into industry specific solutions that should stay extremely reusable. The intent is to offer a data model flexible enough to handle diverse data scenarios, evaluating confidence scores for each of these. It should enable reproducible shared experiments with consistent evaluated scores, thereby easing the integration process through automated guidance. This paper will touch upon the good practices and architectural recommendations that need to be considered for general Machine Learning applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT)

自引率

0.00%

发文量