推特从何而来?:一种用于用户位置推断的GIS方法

Workshop on Location-based Social Networks Pub Date : 2014-11-04 DOI:10.1145/2755492.2755494

Qunying Huang, G. Cao, Caixia Wang

{"title":"推特从何而来?:一种用于用户位置推断的GIS方法","authors":"Qunying Huang, G. Cao, Caixia Wang","doi":"10.1145/2755492.2755494","DOIUrl":null,"url":null,"abstract":"A number of natural language processing and text-mining algorithms have been developed to extract the geospatial cues (e.g., place names) to infer locations of content creators from publicly available information, such as text content, online social profiles, and the behaviors or interactions of users from social networks. These studies, however, can only successfully infer user locations at city levels with relatively decent accuracy, while much higher resolution is required for meaningful spatiotemporal analysis in geospatial fields. Additionally, geographical cues exploited by current text-based approaches are hidden in the unreliable, unstructured, informal, ungrammatical, and multilingual data, and therefore are hard to extract and make meaningful correctly. Instead of using such hidden geographic cues, this paper develops a GIS approach that can infer the true origin of tweets down to the zip code level by using and mining spatial (geo-tags) and temporal (timestamps when a message was posted) information recorded on user digital footprints. Further, individual major daily activity zones and mobility can be successfully inferred and predicted. By integrating GIS data and spatiotemporal clustering methods, this proposed approach can infer individual daily physical activity zones with spatial resolution as high as 20 m by 20 m or even higher depending on the number of digit footprints collected for social media users. The research results with detailed spatial resolution are necessary and useful for various applications such as human mobility pattern analysis, business site selection, disease control, or transportation systems improvement.","PeriodicalId":107369,"journal":{"name":"Workshop on Location-based Social Networks","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":"{\"title\":\"From where do tweets originate?: a GIS approach for user location inference\",\"authors\":\"Qunying Huang, G. Cao, Caixia Wang\",\"doi\":\"10.1145/2755492.2755494\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A number of natural language processing and text-mining algorithms have been developed to extract the geospatial cues (e.g., place names) to infer locations of content creators from publicly available information, such as text content, online social profiles, and the behaviors or interactions of users from social networks. These studies, however, can only successfully infer user locations at city levels with relatively decent accuracy, while much higher resolution is required for meaningful spatiotemporal analysis in geospatial fields. Additionally, geographical cues exploited by current text-based approaches are hidden in the unreliable, unstructured, informal, ungrammatical, and multilingual data, and therefore are hard to extract and make meaningful correctly. Instead of using such hidden geographic cues, this paper develops a GIS approach that can infer the true origin of tweets down to the zip code level by using and mining spatial (geo-tags) and temporal (timestamps when a message was posted) information recorded on user digital footprints. Further, individual major daily activity zones and mobility can be successfully inferred and predicted. By integrating GIS data and spatiotemporal clustering methods, this proposed approach can infer individual daily physical activity zones with spatial resolution as high as 20 m by 20 m or even higher depending on the number of digit footprints collected for social media users. The research results with detailed spatial resolution are necessary and useful for various applications such as human mobility pattern analysis, business site selection, disease control, or transportation systems improvement.\",\"PeriodicalId\":107369,\"journal\":{\"name\":\"Workshop on Location-based Social Networks\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"38\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Location-based Social Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2755492.2755494\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Location-based Social Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2755492.2755494","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

摘要

许多自然语言处理和文本挖掘算法已经被开发出来，从公开可用的信息中提取地理空间线索(例如，地名)来推断内容创建者的位置，例如文本内容、在线社交档案以及来自社交网络的用户的行为或交互。然而，这些研究只能以相对不错的精度成功地推断城市级别的用户位置，而在地理空间领域进行有意义的时空分析需要更高的分辨率。此外，当前基于文本的方法所利用的地理线索隐藏在不可靠的、非结构化的、非正式的、不语法的和多语言的数据中，因此很难提取并正确地使其有意义。本文没有使用这种隐藏的地理线索，而是开发了一种GIS方法，通过使用和挖掘用户数字足迹上记录的空间(地理标签)和时间(发布消息时的时间戳)信息，可以推断推文的真实来源，直至邮政编码级别。此外，个人主要的日常活动区域和流动性可以成功地推断和预测。该方法通过整合GIS数据和时空聚类方法，根据收集到的社交媒体用户的数字足迹数量，可以推断出个体日常身体活动区域，空间分辨率高达20米× 20米，甚至更高。具有详细空间分辨率的研究结果对于人类流动模式分析、商业选址、疾病控制或交通系统改善等各种应用都是必要和有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From where do tweets originate?: a GIS approach for user location inference

A number of natural language processing and text-mining algorithms have been developed to extract the geospatial cues (e.g., place names) to infer locations of content creators from publicly available information, such as text content, online social profiles, and the behaviors or interactions of users from social networks. These studies, however, can only successfully infer user locations at city levels with relatively decent accuracy, while much higher resolution is required for meaningful spatiotemporal analysis in geospatial fields. Additionally, geographical cues exploited by current text-based approaches are hidden in the unreliable, unstructured, informal, ungrammatical, and multilingual data, and therefore are hard to extract and make meaningful correctly. Instead of using such hidden geographic cues, this paper develops a GIS approach that can infer the true origin of tweets down to the zip code level by using and mining spatial (geo-tags) and temporal (timestamps when a message was posted) information recorded on user digital footprints. Further, individual major daily activity zones and mobility can be successfully inferred and predicted. By integrating GIS data and spatiotemporal clustering methods, this proposed approach can infer individual daily physical activity zones with spatial resolution as high as 20 m by 20 m or even higher depending on the number of digit footprints collected for social media users. The research results with detailed spatial resolution are necessary and useful for various applications such as human mobility pattern analysis, business site selection, disease control, or transportation systems improvement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Location-based Social Networks

自引率

0.00%

发文量