基于拓扑先验的不完全图数据的转移矩阵的贝叶斯推理。

IF 3 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

EPJ Data Science Pub Date : 2023-01-01 Epub Date: 2023-10-11 DOI:10.1140/epjds/s13688-023-00416-3

Vincenzo Perri, Luka V Petrović, Ingo Scholtes

{"title":"基于拓扑先验的不完全图数据的转移矩阵的贝叶斯推理。","authors":"Vincenzo Perri, Luka V Petrović, Ingo Scholtes","doi":"10.1140/epjds/s13688-023-00416-3","DOIUrl":null,"url":null,"abstract":"Many network analysis and graph learning techniques are based on discrete- or continuous-time models of random walks. To apply these methods, it is necessary to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, where weighted edges capture observations of repeated interactions between nodes, it is common to estimate the entries of such transition matrices based on the (relative) weights of edges. However in real-world settings we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an inference problem. Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not. Examples include transportation networks, where we may have access to a small sample of passenger trajectories as well as the physical topology of connections, or a limited set of observed social interactions with additional information on the underlying social structure. Combining these two different sources of information to reliably infer transition matrices from incomplete data on repeated interactions is an important open challenge, with severe implications for the reliability of downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can considerably improve the inference of transition matrices, especially in situations where we only have a small number of observed interactions. To this end, we derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to perform data-efficient inference of transition matrices. We compare our approach against commonly used frequentist and Bayesian approaches both in synthetic data and in five real-world datasets, and we find that our method recovers the transition probabilities with higher accuracy. Furthermore, we demonstrate that the method is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for interdisciplinary data-driven analyses of networked systems.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"48"},"PeriodicalIF":3.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10567898/pdf/","citationCount":"0","resultStr":"{\"title\":\"Bayesian inference of transition matrices from incomplete graph data with a topological prior.\",\"authors\":\"Vincenzo Perri, Luka V Petrović, Ingo Scholtes\",\"doi\":\"10.1140/epjds/s13688-023-00416-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many network analysis and graph learning techniques are based on discrete- or continuous-time models of random walks. To apply these methods, it is necessary to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, where weighted edges capture observations of repeated interactions between nodes, it is common to estimate the entries of such transition matrices based on the (relative) weights of edges. However in real-world settings we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an inference problem. Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not. Examples include transportation networks, where we may have access to a small sample of passenger trajectories as well as the physical topology of connections, or a limited set of observed social interactions with additional information on the underlying social structure. Combining these two different sources of information to reliably infer transition matrices from incomplete data on repeated interactions is an important open challenge, with severe implications for the reliability of downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can considerably improve the inference of transition matrices, especially in situations where we only have a small number of observed interactions. To this end, we derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to perform data-efficient inference of transition matrices. We compare our approach against commonly used frequentist and Bayesian approaches both in synthetic data and in five real-world datasets, and we find that our method recovers the transition probabilities with higher accuracy. Furthermore, we demonstrate that the method is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for interdisciplinary data-driven analyses of networked systems.\",\"PeriodicalId\":11887,\"journal\":{\"name\":\"EPJ Data Science\",\"volume\":\"12 1\",\"pages\":\"48\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10567898/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"EPJ Data Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1140/epjds/s13688-023-00416-3\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/10/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPJ Data Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1140/epjds/s13688-023-00416-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

许多网络分析和图学习技术都是基于随机行走的离散或连续时间模型。为了应用这些方法，有必要推断转移矩阵，该矩阵形式化了观测图中潜在的随机过程。对于加权图，其中加权边捕获节点之间重复交互的观察结果，通常基于边的（相对）权重来估计这种转移矩阵的条目。然而，在现实世界中，我们经常遇到不完整的数据，这将基于加权图的转换矩阵的构建变成了一个推理问题。此外，我们经常可以访问额外的信息，这些信息捕捉系统的拓扑约束，即加权图中的哪些边（理论上）是可能的，哪些不可能。例子包括交通网络，在交通网络中，我们可能可以访问一小部分乘客轨迹样本以及连接的物理拓扑，或者一组有限的观察到的社会互动，以及关于潜在社会结构的额外信息。将这两种不同的信息源结合起来，从重复交互的不完整数据中可靠地推断转换矩阵是一个重要的开放挑战，对下游网络分析任务的可靠性有着严重的影响。针对这个问题，我们表明，包括关于这种拓扑约束的知识可以大大改进转换矩阵的推断，特别是在我们只有少量观察到的相互作用的情况下。为此，我们推导了一种可分析处理的贝叶斯方法，该方法使用重复相互作用和拓扑先验来执行转换矩阵的数据高效推理。我们将我们的方法与合成数据和五个真实世界数据集中常用的频率论和贝叶斯方法进行了比较，发现我们的方法以更高的精度恢复了转换概率。此外，我们证明了即使在拓扑约束的知识是部分的情况下，该方法也是鲁棒的。最后，我们表明，这种更高的精度提高了下游网络分析任务（如聚类检测和节点排序）的结果，这突出了我们的方法在网络系统跨学科数据驱动分析中的实际相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Bayesian inference of transition matrices from incomplete graph data with a topological prior.

查看原文本刊更多论文

Bayesian inference of transition matrices from incomplete graph data with a topological prior.

Many network analysis and graph learning techniques are based on discrete- or continuous-time models of random walks. To apply these methods, it is necessary to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, where weighted edges capture observations of repeated interactions between nodes, it is common to estimate the entries of such transition matrices based on the (relative) weights of edges. However in real-world settings we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an inference problem. Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not. Examples include transportation networks, where we may have access to a small sample of passenger trajectories as well as the physical topology of connections, or a limited set of observed social interactions with additional information on the underlying social structure. Combining these two different sources of information to reliably infer transition matrices from incomplete data on repeated interactions is an important open challenge, with severe implications for the reliability of downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can considerably improve the inference of transition matrices, especially in situations where we only have a small number of observed interactions. To this end, we derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to perform data-efficient inference of transition matrices. We compare our approach against commonly used frequentist and Bayesian approaches both in synthetic data and in five real-world datasets, and we find that our method recovers the transition probabilities with higher accuracy. Furthermore, we demonstrate that the method is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for interdisciplinary data-driven analyses of networked systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

EPJ Data Science MATHEMATICS, INTERDISCIPLINARY APPLICATIONS -

CiteScore

6.10

自引率

5.60%

发文量

审稿时长

13 weeks

期刊介绍： EPJ Data Science covers a broad range of research areas and applications and particularly encourages contributions from techno-socio-economic systems, where it comprises those research lines that now regard the digital “tracks” of human beings as first-order objects for scientific investigation. Topics include, but are not limited to, human behavior, social interaction (including animal societies), economic and financial systems, management and business networks, socio-technical infrastructure, health and environmental systems, the science of science, as well as general risk and crisis scenario forecasting up to and including policy advice.