Aaron B. Adcock, Blair D. Sullivan, Michael W. Mahoney
{"title":"Tree decompositions and social graphs","authors":"Aaron B. Adcock, Blair D. Sullivan, Michael W. Mahoney","doi":"10.1080/15427951.2016.1182952","DOIUrl":"https://doi.org/10.1080/15427951.2016.1182952","url":null,"abstract":"Abstract Recent work has established that large informatics graphs such as social and information networks have non-trivial tree-like structure when viewed at moderate size scales. Here, we present results from the first detailed empirical evaluation of the use of tree decomposition (TD) heuristics for structure identification and extraction in social graphs. Although TDs have historically been used in structural graph theory and scientific computing, we show that—even with existing TD heuristics developed for those very different areas—TD methods can identify interesting structure in a wide range of realistic informatics graphs. Our main contributions are the following: we show that TD methods can identify structures that correlate strongly with the core-periphery structure of realistic networks, even when using simple greedy heuristics; we show that the peripheral bags of these TDs correlate well with low-conductance communities (when they exist) found using local spectral computations; and we show that several types of large-scale “ground-truth” communities, defined by demographic metadata on the nodes of the network, are well-localized in the large-scale and/or peripheral structures of the TDs. Our other main contributions are the following: we provide detailed empirical results for TD heuristics on toy and synthetic networks to establish a baseline to understand better the behavior of the heuristics on more complex real-world networks; and we prove a theorem providing formal justification for the intuition that the only two impediments to low-distortion hyperbolic embedding are high tree-width and long geodesic cycles. Our results suggest future directions for improved TD heuristics that are more appropriate for realistic social graphs.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"12 1","pages":"315 - 361"},"PeriodicalIF":0.0,"publicationDate":"2014-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2016.1182952","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59948158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymptotic degree distribution of a duplication-deletion random graph model","authors":"Erik Thornblad","doi":"10.1080/15427951.2015.1009523","DOIUrl":"https://doi.org/10.1080/15427951.2015.1009523","url":null,"abstract":"We study a discrete–time duplication–deletion random graph model and analyse its asymptotic degree distribution. The random graphs consists of disjoint cliques. In each time step either a new vertex is brought in with probability 0 < p < 1 and attached to an existing clique, chosen with probability proportional to the clique size, or all the edges of a random vertex are deleted with probability 1 − p. We prove almost sure convergence of the asymptotic degree distribution and find its exact values in terms of a hypergeometric integral, expressed in terms of the parameter p. In the regime 0 < p < 1 2 we show that the degree sequence decays exponentially at rate p 1−p , whereas it satisfies a power–law with exponent p 2p−1 if 1 2 < p < 1. At the threshold p = 1 2 the degree sequence lies between a power–law and exponential decay.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2014-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2015.1009523","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59948233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Degree Distribution and the Number of Edges Between Nodes of given Degrees in Directed Scale-Free Graphs","authors":"E. Grechnikov","doi":"10.1080/15427951.2015.1012609","DOIUrl":"https://doi.org/10.1080/15427951.2015.1012609","url":null,"abstract":"In this article, we introduce our study of some important statistics of the random graph in the directed preferential attachment model introduced by B. Bollobás, C. Borgs, J. Chayes, and O. Riordan. First, we find a new asymptotic formula for the expectation of the number nin(t, d) of nodes of a given in-degree d in a graph in this model with t edges, which covers all possible degrees. The out-degree distribution in the model is symmetrical to the in-degree distribution. Then we prove tight concentration for nin(t, d) while d grows up to the moment when nin(t, d) decreases to ln 2t; if d grows even faster, nin(t, d) is zero whp. Furthermore, we study an average number of edges from a vertex of out-degree d1 to a vertex of in-degree d2. In particular, we prove that it grows proportionally to d1d2/t if and to something between and if , tending to the first expression when d1 is small compared to d2 and to the second one when d1 is large; is such that the main term of nin(t, d) is proportional to , is symmetrical for out-degrees. We also give exact formulas for intermediate cases.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"11 1","pages":"487 - 527"},"PeriodicalIF":0.0,"publicationDate":"2014-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2015.1012609","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59948282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Degree-Degree Dependencies in Random Graphs with Heavy-Tailed Degrees","authors":"R. Hofstad, N. Litvak","doi":"10.1080/15427951.2013.850455","DOIUrl":"https://doi.org/10.1080/15427951.2013.850455","url":null,"abstract":"Abstract Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, social, and biological networks are often characterized by degree-degree dependencies between neighboring nodes. In assortative networks, the degree-degree dependencies are positive (nodes with similar degrees tend to connect to each other), whereas in disassortative networks, these dependencies are negative. One of the problems with the commonly used Pearson correlation coefficient, also known as the assortativity coefficient, is that its magnitude decreases with the network size in disassortative networks. This makes it impossible to compare mixing patterns, for example, in two web crawls of different sizes. As an alternative, we have recently suggested to use rank correlation measures, such as Spearman’s rho. Numerical experiments have confirmed that Spearman’s rho produces consistent values in graphs of different sizes but similar structure, and it is able to reveal strong (positive or negative) dependencies in large graphs. In this study we analytically investigate degree-degree dependencies for scale-free graph sequences. In order to demonstrate the ill behavior of the Pearson’s correlation coefficient, we first study a simple model of two heavy-tailed, highly correlated, random variables X and Y, and show that the sample correlation coefficient converges in distribution either to a proper random variable on [ − 1, 1], or to zero, and the limit is nonnegative a.s. if X, Y ≥ 0. We next adapt these results to the degree-degree dependencies in networks as described by the Pearson correlation coefficient, and show that it is nonnegative in the large graph limit when the asymptotic degree distribution has an infinite third moment. Furthermore, we provide examples where in the Pearson’s correlation coefficient converges to zero in a network with strong negative degree-degree dependencies, and another example where this coefficient converges in distribution to a random variable. We suggest an alternative degree-degree dependency measure, based on Spearman’s rho, and prove that this statistical estimator converges to an appropriate limit under quite general conditions. These conditions are proved to be satisfied in common network models, such as the configuration model and the preferential attachment model. We conclude that rank correlations provide a suitable and informative method for uncovering network mixing patterns.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"10 1","pages":"287 - 334"},"PeriodicalIF":0.0,"publicationDate":"2014-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2013.850455","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Special Issue on Searching and Mining the Web and Social Networks","authors":"N. Litvak, S. Vigna","doi":"10.1080/15427951.2014.916132","DOIUrl":"https://doi.org/10.1080/15427951.2014.916132","url":null,"abstract":"The past few decades have seen the rise of online social networks as a worldwide phenomenon with a high impact on our society. Beyond the obvious exposure phenomena, with obvious implications on security and privacy, people have started to become acquainted—even married!—in online social networks. In parallel, we have seen an enormous growth in terms of the number of published articles in computer science, mathematics and physics that study the organization of such networks. The availability of large free databases of friendships, collaborations and citations have made possible to study social networks at a scale and with a precision previously unknown. \u0000This issue of Internet Mathematics, titled “Searching and Mining the Web and Social Networks,” was born out of the interest of the editors in the problem of searching and analyzing not only the web, but also social networks in a broad sense. In particular, we aimed to publish a collection of articles that take a rigorous mathematical viewpoint on problems most important and common in network applications. The general topics represented in this special issue cover ranking of the nodes, network measurements, and adversarial behavior. Each of these topics has received a large attention in the literature. We believe however that the originality of the articles presented in this volume is in a high level of mathematical rigor.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"10 1","pages":"219 - 221"},"PeriodicalIF":0.0,"publicationDate":"2014-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2014.916132","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Safe Strategies for Competitive Diffusion on Trees","authors":"J. Janssen, Celeste Vautour","doi":"10.1080/15427951.2014.977407","DOIUrl":"https://doi.org/10.1080/15427951.2014.977407","url":null,"abstract":"Abstract We study the two-player safe game of Competitive Diffusion, a game-theoretic model for the diffusion of technologies or influence through a social network. In game theory, safe strategies are mixed strategies with a minimum expected gain against unknown strategies of the opponents. Safe strategies for competitive diffusion lead to maximum spread of influence in the presence of uncertainty about the other players. We study the safe game on two specific classes of trees, spiders and complete trees, and give tight bounds on the minimum expected gain. We then use these results to give an algorithm that suggests a safe strategy for a player on any tree. We test this algorithm on randomly generated trees and show that it finds strategies that are close to optimal.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"11 1","pages":"232 - 252"},"PeriodicalIF":0.0,"publicationDate":"2014-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2014.977407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Erdélyi, A. Benczúr, B. Daróczy, A. Garzó, Tamás Kiss, Dávid Siklósi
{"title":"The Classification Power of Web Features","authors":"M. Erdélyi, A. Benczúr, B. Daróczy, A. Garzó, Tamás Kiss, Dávid Siklósi","doi":"10.1080/15427951.2013.850456","DOIUrl":"https://doi.org/10.1080/15427951.2013.850456","url":null,"abstract":"Abstract In this article we give a comprehensive overview of features devised for web spam detection and investigate how much various classes, some requiring very high computational effort, add to the classification accuracy. We collect and handle a large number of features based on recent advances in web spam filtering, including temporal ones; in particular, we analyze the strength and sensitivity of linkage change. We propose new, temporal link-similarity-based features and show how to compute them efficiently on large graphs. We show that machine learning techniques, including ensemble selection, LogitBoost, and random forest significantly improve accuracy. We conclude that, with appropriate learning techniques, a simple and computationally inexpensive feature subset outperforms all previous results published so far on our dataset and can be further improved only slightly by computationally expensive features. We test our method on three major publicly available datasets: the Web Spam Challenge 2008 dataset WEBSPAM-UK2007, the ECML/PKDD Discovery Challenge dataset DC2010, and the Waterloo Spam Rankings for ClueWeb09. Our classifier ensemble sets the strongest classification benchmark compared to participants of the Web Spam and ECML/PKDD Discovery Challenges as well as the TREC Web track. To foster research in the area, we make several feature sets and source codes public,1 https://datamining.sztaki.hu/en/download/web-spam-resources including the temporal features of eight .uk crawl snapshots that include WEBSPAM-UK2007 as well as the Web Spam Challenge features for the labeled part of ClueWeb09.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"10 1","pages":"421 - 457"},"PeriodicalIF":0.0,"publicationDate":"2014-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2013.850456","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Alvisi, Allen Clement, Alessandro Epasto, Silvio Lattanzi, A. Panconesi
{"title":"Communities, Random Walks, and Social Sybil Defense","authors":"L. Alvisi, Allen Clement, Alessandro Epasto, Silvio Lattanzi, A. Panconesi","doi":"10.1080/15427951.2013.865685","DOIUrl":"https://doi.org/10.1080/15427951.2013.865685","url":null,"abstract":"Abstract Sybil attacks, in which an adversary forges a potentially unbounded number of identities, are a danger to distributed systems and online social networks. The goal of sybil defense is to accurately identify sybil identities. This article surveys the evolution of sybil defense protocols that leverage the structural properties of the social graph underlying a distributed system to identify sybil identities. We make two main contributions. First, we clarify the deep connection between sybil defense and the theory of random walks. This leads us to identify a community detection algorithm that, for the first time, offers provable guarantees in the context of sybil defense. Second, we advocate a new goal for sybil defense that addresses the more limited, but practically useful, goal of securely white-listing a local region of the graph.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"10 1","pages":"360 - 420"},"PeriodicalIF":0.0,"publicationDate":"2014-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2013.865685","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some Properties of Random Apollonian Networks","authors":"A. Frieze, Charalampos E. Tsourakakis","doi":"10.1080/15427951.2013.796300","DOIUrl":"https://doi.org/10.1080/15427951.2013.796300","url":null,"abstract":"Abstract In this work, we analyze fundamental properties of random Apollonian networks [Zhang et al. 06, Zhou et al. 05], a popular random graph model that generates planar graphs with power-law properties. Specifically, we analyze the degree distribution, the k largest degrees, the k largest eigenvalues, and the diameter, where k is a constant.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"13 1","pages":"162 - 187"},"PeriodicalIF":0.0,"publicationDate":"2014-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2013.796300","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}