{"title":"用于监督链接预测的基于图的特征","authors":"William J. Cukierski, Benjamin Hamner, Bo Yang","doi":"10.1109/IJCNN.2011.6033365","DOIUrl":null,"url":null,"abstract":"The growing ubiquity of social networks has spurred research in link prediction, which aims to predict new connections based on existing ones in the network. The 2011 IJCNN Social Network challenge asked participants to separate real edges from fake in a set of 8960 edges sampled from an anonymized, directed graph depicting a subset of relationships on Flickr. Our method incorporates 94 distinct graph features, used as input for classification with Random Forests. We present a three-pronged approach to the link prediction task, along with several novel variations on established similarity metrics. We discuss the challenges of processing a graph with more than a million nodes. We found that the best classification results were achieved through the combination of a large number of features that model different aspects of the graph structure. Our method achieved an area under the receiver-operator characteristic (ROC) curve of 0.9695, the 2nd best overall score in the competition and the best score which did not de-anonymize the dataset.","PeriodicalId":415833,"journal":{"name":"The 2011 International Joint Conference on Neural Networks","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"106","resultStr":"{\"title\":\"Graph-based features for supervised link prediction\",\"authors\":\"William J. Cukierski, Benjamin Hamner, Bo Yang\",\"doi\":\"10.1109/IJCNN.2011.6033365\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing ubiquity of social networks has spurred research in link prediction, which aims to predict new connections based on existing ones in the network. The 2011 IJCNN Social Network challenge asked participants to separate real edges from fake in a set of 8960 edges sampled from an anonymized, directed graph depicting a subset of relationships on Flickr. Our method incorporates 94 distinct graph features, used as input for classification with Random Forests. We present a three-pronged approach to the link prediction task, along with several novel variations on established similarity metrics. We discuss the challenges of processing a graph with more than a million nodes. We found that the best classification results were achieved through the combination of a large number of features that model different aspects of the graph structure. Our method achieved an area under the receiver-operator characteristic (ROC) curve of 0.9695, the 2nd best overall score in the competition and the best score which did not de-anonymize the dataset.\",\"PeriodicalId\":415833,\"journal\":{\"name\":\"The 2011 International Joint Conference on Neural Networks\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"106\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2011 International Joint Conference on Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2011.6033365\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2011 International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2011.6033365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Graph-based features for supervised link prediction
The growing ubiquity of social networks has spurred research in link prediction, which aims to predict new connections based on existing ones in the network. The 2011 IJCNN Social Network challenge asked participants to separate real edges from fake in a set of 8960 edges sampled from an anonymized, directed graph depicting a subset of relationships on Flickr. Our method incorporates 94 distinct graph features, used as input for classification with Random Forests. We present a three-pronged approach to the link prediction task, along with several novel variations on established similarity metrics. We discuss the challenges of processing a graph with more than a million nodes. We found that the best classification results were achieved through the combination of a large number of features that model different aspects of the graph structure. Our method achieved an area under the receiver-operator characteristic (ROC) curve of 0.9695, the 2nd best overall score in the competition and the best score which did not de-anonymize the dataset.