数据科学的三个原则:可预测性、稳定性和可计算性

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2017-08-04 DOI:10.1145/3097983.3105808

Bin Yu

{"title":"数据科学的三个原则:可预测性、稳定性和可计算性","authors":"Bin Yu","doi":"10.1145/3097983.3105808","DOIUrl":null,"url":null,"abstract":"In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title in data-driven decisions. Making prediction as its central task and embracing computation as its core, machine learning has enabled wide-ranging data-driven successes. Prediction is a useful way to check with reality. Good prediction implicitly assumes stability between past and future. Stability (relative to data and model perturbations) is also a minimum requirement for interpretability and reproducibility of data driven results (cf. Yu, 2013). It is closely related to uncertainty assessment. Obviously, both prediction and stability principles can not be employed without feasible computational algorithms, hence the importance of computability. The three principles will be demonstrated in the context of two neuroscience projects and through analytical connections. In particular, the first project adds stability to predictive modeling used for reconstruction of movies from fMRI brain signlas for interpretable models. The second project use predictive transfer learning that combines AlexNet, GoogleNet and VGG with single V4 neuron data for state-of-the-art prediction performance. Our results lend support, to a certain extent, to the resemblance of these CNNs to brain and at the same time provide stable pattern interpretations of neurons in the difficult primate visual cortex V4.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Three Principles of Data Science: Predictability, Stability and Computability\",\"authors\":\"Bin Yu\",\"doi\":\"10.1145/3097983.3105808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title in data-driven decisions. Making prediction as its central task and embracing computation as its core, machine learning has enabled wide-ranging data-driven successes. Prediction is a useful way to check with reality. Good prediction implicitly assumes stability between past and future. Stability (relative to data and model perturbations) is also a minimum requirement for interpretability and reproducibility of data driven results (cf. Yu, 2013). It is closely related to uncertainty assessment. Obviously, both prediction and stability principles can not be employed without feasible computational algorithms, hence the importance of computability. The three principles will be demonstrated in the context of two neuroscience projects and through analytical connections. In particular, the first project adds stability to predictive modeling used for reconstruction of movies from fMRI brain signlas for interpretable models. The second project use predictive transfer learning that combines AlexNet, GoogleNet and VGG with single V4 neuron data for state-of-the-art prediction performance. Our results lend support, to a certain extent, to the resemblance of these CNNs to brain and at the same time provide stable pattern interpretations of neurons in the difficult primate visual cortex V4.\",\"PeriodicalId\":314049,\"journal\":{\"name\":\"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3097983.3105808\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3097983.3105808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

在这次演讲中，我想讨论数据科学的三个原则在数据驱动决策中的相互交织的重要性和联系。机器学习以预测为中心任务，以计算为核心，实现了广泛的数据驱动型成功。预测是检验现实的有效方法。好的预测隐含地假定过去和未来之间是稳定的。稳定性(相对于数据和模型扰动)也是数据驱动结果的可解释性和可重复性的最低要求(cf. Yu, 2013)。它与不确定性评估密切相关。显然，如果没有可行的计算算法，预测原理和稳定性原理都不能被采用，因此可计算性的重要性。这三个原则将在两个神经科学项目的背景下通过分析联系来证明。特别是，第一个项目增加了预测模型的稳定性，用于从fMRI脑信号重建可解释模型的电影。第二个项目使用预测迁移学习，将AlexNet、GoogleNet和VGG与单个V4神经元数据相结合，以实现最先进的预测性能。我们的研究结果在一定程度上支持了这些cnn与大脑的相似性，同时为灵长类动物视觉皮层V4中的神经元提供了稳定的模式解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Three Principles of Data Science: Predictability, Stability and Computability

In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title in data-driven decisions. Making prediction as its central task and embracing computation as its core, machine learning has enabled wide-ranging data-driven successes. Prediction is a useful way to check with reality. Good prediction implicitly assumes stability between past and future. Stability (relative to data and model perturbations) is also a minimum requirement for interpretability and reproducibility of data driven results (cf. Yu, 2013). It is closely related to uncertainty assessment. Obviously, both prediction and stability principles can not be employed without feasible computational algorithms, hence the importance of computability. The three principles will be demonstrated in the context of two neuroscience projects and through analytical connections. In particular, the first project adds stability to predictive modeling used for reconstruction of movies from fMRI brain signlas for interpretable models. The second project use predictive transfer learning that combines AlexNet, GoogleNet and VGG with single V4 neuron data for state-of-the-art prediction performance. Our results lend support, to a certain extent, to the resemblance of these CNNs to brain and at the same time provide stable pattern interpretations of neurons in the difficult primate visual cortex V4.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

自引率

0.00%

发文量