Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference最新文献

Session details: Session 3: Data Science Theory 会议详情:第三部分:数据科学理论

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3429735

Y. Ioannidis

引用次数: 0

Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting 在随机森林、AdaBoost和梯度增强的基础上，袋装TAO树的整体性能不断提高

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3412815.3416882

M. A. Carreira-Perpiñán, Arman Zharmagambetov

{"title":"Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting","authors":"M. A. Carreira-Perpiñán, Arman Zharmagambetov","doi":"10.1145/3412815.3416882","DOIUrl":"https://doi.org/10.1145/3412815.3416882","url":null,"abstract":"Ensemble methods based on trees, such as Random Forests, AdaBoost and gradient boosting, are widely recognized as among the best off-the-shelf classifiers: they typically achieve state-of-the-art accuracy in many problems with little effort in tuning hyperparameters, and they are often used in applications, possibly combined with other methods such as neural nets. While many variations of forest methods exist, using different diversity mechanisms (such as bagging, feature sampling or boosting), nearly all rely on training individual trees in a highly suboptimal way using greedy top-down tree induction algorithms such as CART or C5.0. We study forests where each tree is trained on a bootstrapped or random sample but using the recently proposed tree alternating optimization (TAO), which is able to learn trees that have both fewer nodes and lower error. The better optimization of individual trees translates into forests that achieve higher accuracy but using fewer, smaller trees with oblique nodes. We demonstrate this in a range of datasets and with a careful study of the complementary effect of optimization and diversity in the construction of the forest. These bagged TAO trees improve consistently and by a considerable margin over Random Forests, AdaBoost, gradient boosting and other forest algorithms in every single dataset we tried.","PeriodicalId":176130,"journal":{"name":"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Classification Acceleration via Merging Decision Trees 通过合并决策树加速分类

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3412815.3416886

Chenglin Fan, P. Li

引用次数: 6

Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data 纵向数据表示学习的动态高斯过程潜变量模型

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3412815.3416894

Thanh Le, Vasant G Honavar

{"title":"Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data","authors":"Thanh Le, Vasant G Honavar","doi":"10.1145/3412815.3416894","DOIUrl":"https://doi.org/10.1145/3412815.3416894","url":null,"abstract":"Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (L-GPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. L-GPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of L-GPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradient-based optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that L-GPLVM substantially and consistently outperforms the state-of-the-art alternatives in recovering the missing observations even when the available data exhibits a high degree of sparsity. The compact representations of irregularly sampled and sparse longitudinal data can be used to perform a variety of machine learning tasks, including clustering, classification, and regression.","PeriodicalId":176130,"journal":{"name":"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122208824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Session details: Session 2: Fairness, Privacy, Interpretability 会议详情:会议2:公平性，隐私性，可解释性

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3429733

Jeffrey D. Goldsmith

引用次数: 0

Incentives Needed for Low-Cost Fair Lateral Data Reuse 低成本公平横向数据重用所需的激励措施

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3412815.3416890

R. Maio, A. Chaintreau

{"title":"Incentives Needed for Low-Cost Fair Lateral Data Reuse","authors":"R. Maio, A. Chaintreau","doi":"10.1145/3412815.3416890","DOIUrl":"https://doi.org/10.1145/3412815.3416890","url":null,"abstract":"A central goal of algorithmic fairness is to build systems with fairness properties that compose gracefully. A major effort and step towards this goal in data science has been the development offair representations which guarantee demographic parity under sequential composition by imposing ademographic secrecy constraint. In this work, we elucidate limitations of demographically secret fair representations and propose a fresh approach to potentially overcome them by incorporating information about parties' incentives into fairness interventions. Specifically, we show that in a stylized model, it is possible to relax demographic secrecy to obtainincentive-compatible representations, where rational parties obtain exponentially greater utilities vis-à-vis any demographically secret representation and satisfy demographic parity. These substantial gains are recovered not from the well-knowncost of fairness, but rather from acost of demographic secrecy which we formalize and quantify for the first time. We further show that the sequential composition property of demographically secret representations is not robust to aggregation. Our results open several new directions for research in fair composition, fair machine learning and algorithmic fairness.","PeriodicalId":176130,"journal":{"name":"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126369552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semantic Scholar, NLP, and the Fight against COVID-19 语义学者、NLP和抗击COVID-19

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3412815.3416880

Oren Etzioni

引用次数: 0

Session details: Session 1: Methodology 会议详情:第1部分:方法论

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3429732

Julia Kempe

引用次数: 0

Session details: Session 4: Foundations in Practice 会议详情:第四部分:实践基础

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI: 10.1145/3429736

S. Ahalt

引用次数: 0

Session details: Keynote Talk I 会议详情:主题演讲1

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-18 DOI: 10.1145/3429731

D. Madigan

引用次数: 0