社交网络爬虫& Twitter的结构分析

2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application Pub Date : 2011-12-01 DOI:10.1109/IMSAA.2011.6156368

Atul Saroop, A. Karnik

{"title":"社交网络爬虫& Twitter的结构分析","authors":"Atul Saroop, A. Karnik","doi":"10.1109/IMSAA.2011.6156368","DOIUrl":null,"url":null,"abstract":"Online social networks are growing at a rapid pace, both in terms of addition of new links between existing nodes and addition of new nodes to the network. Due to this continuous evolution of such networks, it is important to constantly crawl for information the overall network in general, and specific subnetworks in times of need. Precise information about social networks is important for devising strategies for improved dispersion of targeted information through the masses, for fine tuning messaging in marketing campaigns and for measuring the effectiveness of such marketing efforts. With the objective of gathering precise up-to-date information, we explore designs of fast crawlers for online social networks. Our experiments, carried on data downloaded from Twitter, show that node discovery strategies of random walk with backtrack and random search show promise as fast network crawlers. We implement the random search crawler for purposes of crawling Twitter for large amounts of information on network structure, user profile information and Tweet-level data. We present a summary of the data thus collected from Twitter. We also try to design generative models for Twitter-like networks that can be used in our simulations going forward, rather than having to depend upon downloading of large amounts of network information related data from online social networks.","PeriodicalId":445751,"journal":{"name":"2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Crawlers for social networks & structural analysis of Twitter\",\"authors\":\"Atul Saroop, A. Karnik\",\"doi\":\"10.1109/IMSAA.2011.6156368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online social networks are growing at a rapid pace, both in terms of addition of new links between existing nodes and addition of new nodes to the network. Due to this continuous evolution of such networks, it is important to constantly crawl for information the overall network in general, and specific subnetworks in times of need. Precise information about social networks is important for devising strategies for improved dispersion of targeted information through the masses, for fine tuning messaging in marketing campaigns and for measuring the effectiveness of such marketing efforts. With the objective of gathering precise up-to-date information, we explore designs of fast crawlers for online social networks. Our experiments, carried on data downloaded from Twitter, show that node discovery strategies of random walk with backtrack and random search show promise as fast network crawlers. We implement the random search crawler for purposes of crawling Twitter for large amounts of information on network structure, user profile information and Tweet-level data. We present a summary of the data thus collected from Twitter. We also try to design generative models for Twitter-like networks that can be used in our simulations going forward, rather than having to depend upon downloading of large amounts of network information related data from online social networks.\",\"PeriodicalId\":445751,\"journal\":{\"name\":\"2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMSAA.2011.6156368\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMSAA.2011.6156368","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

无论是在现有节点之间添加新的链接，还是在网络中添加新的节点，在线社交网络都在快速增长。由于这种网络的不断发展，不断地抓取整个网络的信息以及需要时的特定子网的信息是非常重要的。关于社交网络的精确信息对于制定策略、通过大众改进目标信息的分散、在营销活动中微调消息传递以及衡量此类营销努力的有效性非常重要。为了收集准确的最新信息，我们探索了在线社交网络快速爬虫的设计。我们用从Twitter下载的数据进行实验，结果表明，随机漫步带回溯和随机搜索的节点发现策略作为快速网络爬虫具有良好的前景。我们实现了随机搜索爬虫，目的是抓取Twitter的大量信息，包括网络结构、用户个人资料信息和Twitter级数据。我们对从Twitter上收集到的数据进行了总结。我们还尝试为类似twitter的网络设计生成模型，这些模型可以用于我们未来的模拟，而不必依赖于从在线社交网络下载大量网络信息相关数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Crawlers for social networks & structural analysis of Twitter

Online social networks are growing at a rapid pace, both in terms of addition of new links between existing nodes and addition of new nodes to the network. Due to this continuous evolution of such networks, it is important to constantly crawl for information the overall network in general, and specific subnetworks in times of need. Precise information about social networks is important for devising strategies for improved dispersion of targeted information through the masses, for fine tuning messaging in marketing campaigns and for measuring the effectiveness of such marketing efforts. With the objective of gathering precise up-to-date information, we explore designs of fast crawlers for online social networks. Our experiments, carried on data downloaded from Twitter, show that node discovery strategies of random walk with backtrack and random search show promise as fast network crawlers. We implement the random search crawler for purposes of crawling Twitter for large amounts of information on network structure, user profile information and Tweet-level data. We present a summary of the data thus collected from Twitter. We also try to design generative models for Twitter-like networks that can be used in our simulations going forward, rather than having to depend upon downloading of large amounts of network information related data from online social networks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application

自引率

0.00%

发文量