{"title":"基于拓扑先验的不完全图数据的转移矩阵的贝叶斯推理。","authors":"Vincenzo Perri, Luka V Petrović, Ingo Scholtes","doi":"10.1140/epjds/s13688-023-00416-3","DOIUrl":null,"url":null,"abstract":"<p><p>Many network analysis and graph learning techniques are based on discrete- or continuous-time models of random walks. To apply these methods, it is necessary to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, where weighted edges capture observations of repeated interactions between nodes, it is common to estimate the entries of such transition matrices based on the (relative) weights of edges. However in real-world settings we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an <i>inference problem</i>. Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not. Examples include transportation networks, where we may have access to a small sample of passenger trajectories as well as the physical topology of connections, or a limited set of observed social interactions with additional information on the underlying social structure. Combining these two different sources of information to reliably infer transition matrices from incomplete data on repeated interactions is an important open challenge, with severe implications for the reliability of downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can considerably improve the inference of transition matrices, especially in situations where we only have a small number of observed interactions. To this end, we derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to perform data-efficient inference of transition matrices. We compare our approach against commonly used frequentist and Bayesian approaches both in synthetic data and in five real-world datasets, and we find that our method recovers the transition probabilities with higher accuracy. Furthermore, we demonstrate that the method is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for interdisciplinary data-driven analyses of networked systems.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"48"},"PeriodicalIF":3.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10567898/pdf/","citationCount":"0","resultStr":"{\"title\":\"Bayesian inference of transition matrices from incomplete graph data with a topological prior.\",\"authors\":\"Vincenzo Perri, Luka V Petrović, Ingo Scholtes\",\"doi\":\"10.1140/epjds/s13688-023-00416-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Many network analysis and graph learning techniques are based on discrete- or continuous-time models of random walks. To apply these methods, it is necessary to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, where weighted edges capture observations of repeated interactions between nodes, it is common to estimate the entries of such transition matrices based on the (relative) weights of edges. However in real-world settings we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an <i>inference problem</i>. Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not. Examples include transportation networks, where we may have access to a small sample of passenger trajectories as well as the physical topology of connections, or a limited set of observed social interactions with additional information on the underlying social structure. Combining these two different sources of information to reliably infer transition matrices from incomplete data on repeated interactions is an important open challenge, with severe implications for the reliability of downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can considerably improve the inference of transition matrices, especially in situations where we only have a small number of observed interactions. To this end, we derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to perform data-efficient inference of transition matrices. We compare our approach against commonly used frequentist and Bayesian approaches both in synthetic data and in five real-world datasets, and we find that our method recovers the transition probabilities with higher accuracy. Furthermore, we demonstrate that the method is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for interdisciplinary data-driven analyses of networked systems.</p>\",\"PeriodicalId\":11887,\"journal\":{\"name\":\"EPJ Data Science\",\"volume\":\"12 1\",\"pages\":\"48\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10567898/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"EPJ Data Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1140/epjds/s13688-023-00416-3\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/10/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPJ Data Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1140/epjds/s13688-023-00416-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Bayesian inference of transition matrices from incomplete graph data with a topological prior.
Many network analysis and graph learning techniques are based on discrete- or continuous-time models of random walks. To apply these methods, it is necessary to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, where weighted edges capture observations of repeated interactions between nodes, it is common to estimate the entries of such transition matrices based on the (relative) weights of edges. However in real-world settings we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an inference problem. Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not. Examples include transportation networks, where we may have access to a small sample of passenger trajectories as well as the physical topology of connections, or a limited set of observed social interactions with additional information on the underlying social structure. Combining these two different sources of information to reliably infer transition matrices from incomplete data on repeated interactions is an important open challenge, with severe implications for the reliability of downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can considerably improve the inference of transition matrices, especially in situations where we only have a small number of observed interactions. To this end, we derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to perform data-efficient inference of transition matrices. We compare our approach against commonly used frequentist and Bayesian approaches both in synthetic data and in five real-world datasets, and we find that our method recovers the transition probabilities with higher accuracy. Furthermore, we demonstrate that the method is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for interdisciplinary data-driven analyses of networked systems.
期刊介绍:
EPJ Data Science covers a broad range of research areas and applications and particularly encourages contributions from techno-socio-economic systems, where it comprises those research lines that now regard the digital “tracks” of human beings as first-order objects for scientific investigation. Topics include, but are not limited to, human behavior, social interaction (including animal societies), economic and financial systems, management and business networks, socio-technical infrastructure, health and environmental systems, the science of science, as well as general risk and crisis scenario forecasting up to and including policy advice.