Effective location identification from microblogs

Guoliang Li, Jun Hu, Jianhua Feng, K. Tan
{"title":"Effective location identification from microblogs","authors":"Guoliang Li, Jun Hu, Jianhua Feng, K. Tan","doi":"10.1109/ICDE.2014.6816708","DOIUrl":null,"url":null,"abstract":"The rapid development of social networks has resulted in a proliferation of user-generated content (UGC). The UGC data, when properly analyzed, can be beneficial to many applications. For example, identifying a user's locations from microblogs is very important for effective location-based advertisement and recommendation. In this paper, we study the problem of identifying a user's locations from microblogs. This problem is rather challenging because the location information in a microblog is incomplete and we cannot get an accurate location from a local microblog. To address this challenge, we propose a global location identification method, called Glitter. Glitter combines multiple microblogs of a user and utilizes them to identify the user's locations. Glitter not only improves the quality of identifying a user's location but also supplements the location of a microblog so as to obtain an accurate location of a microblog. To facilitate location identification, GLITTER organizes points of interest (POIs) into a tree structure where leaf nodes are POIs and non-leaf nodes are segments of POIs, e.g., countries, states, cities, districts, and streets. Using the tree structure, Glitter first extracts candidate locations from each microblog of a user which correspond to some tree nodes. Then Glitter aggregates these candidate locations and identifies top-k locations of the user. Using the identified top-k user locations, Glitter refines the candidate locations and computes top-k locations of each microblog. To achieve high recall, we enable fuzzy matching between locations and microblogs. We propose an incremental algorithm to support dynamic updates of microblogs. Experimental results on real-world datasets show that our method achieves high quality and good performance, and scales very well.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 30th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2014.6816708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51

Abstract

The rapid development of social networks has resulted in a proliferation of user-generated content (UGC). The UGC data, when properly analyzed, can be beneficial to many applications. For example, identifying a user's locations from microblogs is very important for effective location-based advertisement and recommendation. In this paper, we study the problem of identifying a user's locations from microblogs. This problem is rather challenging because the location information in a microblog is incomplete and we cannot get an accurate location from a local microblog. To address this challenge, we propose a global location identification method, called Glitter. Glitter combines multiple microblogs of a user and utilizes them to identify the user's locations. Glitter not only improves the quality of identifying a user's location but also supplements the location of a microblog so as to obtain an accurate location of a microblog. To facilitate location identification, GLITTER organizes points of interest (POIs) into a tree structure where leaf nodes are POIs and non-leaf nodes are segments of POIs, e.g., countries, states, cities, districts, and streets. Using the tree structure, Glitter first extracts candidate locations from each microblog of a user which correspond to some tree nodes. Then Glitter aggregates these candidate locations and identifies top-k locations of the user. Using the identified top-k user locations, Glitter refines the candidate locations and computes top-k locations of each microblog. To achieve high recall, we enable fuzzy matching between locations and microblogs. We propose an incremental algorithm to support dynamic updates of microblogs. Experimental results on real-world datasets show that our method achieves high quality and good performance, and scales very well.
从微博中有效识别位置
社交网络的快速发展导致了用户生成内容(UGC)的激增。UGC数据经过适当分析后,对许多应用程序都是有益的。例如,从微博中识别用户的位置对于有效的基于位置的广告和推荐非常重要。在本文中,我们研究了从微博中识别用户位置的问题。由于微博中的位置信息是不完整的,我们无法从本地微博中获得准确的位置,因此这个问题具有一定的挑战性。为了应对这一挑战,我们提出了一种名为Glitter的全球位置识别方法。Glitter结合用户的多个微博,并利用它们来识别用户的位置。Glitter不仅提高了识别用户位置的质量,而且对微博的位置进行了补充,从而获得微博的准确位置。为了方便位置识别,GLITTER将兴趣点(poi)组织成树形结构,其中叶节点是poi,非叶节点是poi的片段,例如国家、州、城市、地区和街道。利用树形结构,Glitter首先从用户的每个微博中提取对应于树节点的候选位置。然后Glitter聚合这些候选位置并确定用户的前k个位置。使用确定的前k个用户位置,Glitter细化候选位置并计算每个微博的前k个位置。为了达到高召回率,我们实现了地点和微博之间的模糊匹配。提出了一种支持微博动态更新的增量算法。在实际数据集上的实验结果表明,我们的方法达到了高质量和良好的性能,并且具有很好的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信