Tong Xu, Armeen Taeb, Simge Küçükyavuz, Ali Shojaie
{"title":"从不可识别高斯模型中学习有向无环图的整数规划。","authors":"Tong Xu, Armeen Taeb, Simge Küçükyavuz, Ali Shojaie","doi":"10.1093/biomet/asaf032","DOIUrl":null,"url":null,"abstract":"<p><p>We study the problem of learning directed acyclic graphs from continuous observational data, generated according to a linear Gaussian structural equation model. State-of-the-art structure learning methods for this setting have at least one of the following shortcomings: (i) they cannot provide optimality guarantees and can suffer from learning suboptimal models; (ii) they rely on the stringent assumption that the noise is homoscedastic, and hence the underlying model is fully identifiable. We overcome these shortcomings and develop a computationally efficient mixed-integer programming framework for learning medium-sized problems that accounts for arbitrary heteroscedastic noise. We present an early stopping criterion under which we can terminate the branch-and-bound procedure to achieve an asymptotically optimal solution and establish the consistency of this approximate solution. In addition, we show via numerical experiments that our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity, whereas the performance of some competing methods deteriorates under strong violations of the identifiability assumption. The software implementation of our method is available as the Python package micodag.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 3","pages":"asaf032"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12368277/pdf/","citationCount":"0","resultStr":"{\"title\":\"Integer programming for learning directed acyclic graphs from nonidentifiable Gaussian models.\",\"authors\":\"Tong Xu, Armeen Taeb, Simge Küçükyavuz, Ali Shojaie\",\"doi\":\"10.1093/biomet/asaf032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We study the problem of learning directed acyclic graphs from continuous observational data, generated according to a linear Gaussian structural equation model. State-of-the-art structure learning methods for this setting have at least one of the following shortcomings: (i) they cannot provide optimality guarantees and can suffer from learning suboptimal models; (ii) they rely on the stringent assumption that the noise is homoscedastic, and hence the underlying model is fully identifiable. We overcome these shortcomings and develop a computationally efficient mixed-integer programming framework for learning medium-sized problems that accounts for arbitrary heteroscedastic noise. We present an early stopping criterion under which we can terminate the branch-and-bound procedure to achieve an asymptotically optimal solution and establish the consistency of this approximate solution. In addition, we show via numerical experiments that our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity, whereas the performance of some competing methods deteriorates under strong violations of the identifiability assumption. The software implementation of our method is available as the Python package micodag.</p>\",\"PeriodicalId\":9001,\"journal\":{\"name\":\"Biometrika\",\"volume\":\"112 3\",\"pages\":\"asaf032\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12368277/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrika\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biomet/asaf032\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asaf032","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
Integer programming for learning directed acyclic graphs from nonidentifiable Gaussian models.
We study the problem of learning directed acyclic graphs from continuous observational data, generated according to a linear Gaussian structural equation model. State-of-the-art structure learning methods for this setting have at least one of the following shortcomings: (i) they cannot provide optimality guarantees and can suffer from learning suboptimal models; (ii) they rely on the stringent assumption that the noise is homoscedastic, and hence the underlying model is fully identifiable. We overcome these shortcomings and develop a computationally efficient mixed-integer programming framework for learning medium-sized problems that accounts for arbitrary heteroscedastic noise. We present an early stopping criterion under which we can terminate the branch-and-bound procedure to achieve an asymptotically optimal solution and establish the consistency of this approximate solution. In addition, we show via numerical experiments that our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity, whereas the performance of some competing methods deteriorates under strong violations of the identifiability assumption. The software implementation of our method is available as the Python package micodag.
期刊介绍:
Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.