{"title":"UGC应用的空间洞察:关键字诱导点群的快速相似性搜索","authors":"Zhe Li, Yu Li, Man Lung Yiu","doi":"10.1109/MDM.2019.00-26","DOIUrl":null,"url":null,"abstract":"In the era of smartphones, massive data are generated with geo-related info. A large portion of them come from UGC applications (e.g., Twitter, Instagram), where the content provider are users themselves. Such applications are highly attractive for targeted marketing and recommendation, which have been well studied in recommendation system. In this paper, we consider this from a brand new spatial aspect using UGC contents only. To do this we first representing each message as a point with its geo info as its location and then grouping all the points by their keywords to form multiple point groups. We form a similarity search problem that given a query keyword, our problem aims to find k keywords with the most similar distribution of locations. Our case study shows that with similar distribution, the keywords are highly likely to have semantic connections. However, the performance of existing solutions degrades when different point groups have significant overlapping, which frequently happens in UGC contents. We propose efficient techniques to process similarity search on this kind of point groups. Experimental results on Twitter data demonstrate that our solution is faster than the state-of-the-art by up to 6 times.","PeriodicalId":241426,"journal":{"name":"2019 20th IEEE International Conference on Mobile Data Management (MDM)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Spatial Insight for UGC Apps: Fast Similarity Search on Keyword-Induced Point Groups\",\"authors\":\"Zhe Li, Yu Li, Man Lung Yiu\",\"doi\":\"10.1109/MDM.2019.00-26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of smartphones, massive data are generated with geo-related info. A large portion of them come from UGC applications (e.g., Twitter, Instagram), where the content provider are users themselves. Such applications are highly attractive for targeted marketing and recommendation, which have been well studied in recommendation system. In this paper, we consider this from a brand new spatial aspect using UGC contents only. To do this we first representing each message as a point with its geo info as its location and then grouping all the points by their keywords to form multiple point groups. We form a similarity search problem that given a query keyword, our problem aims to find k keywords with the most similar distribution of locations. Our case study shows that with similar distribution, the keywords are highly likely to have semantic connections. However, the performance of existing solutions degrades when different point groups have significant overlapping, which frequently happens in UGC contents. We propose efficient techniques to process similarity search on this kind of point groups. Experimental results on Twitter data demonstrate that our solution is faster than the state-of-the-art by up to 6 times.\",\"PeriodicalId\":241426,\"journal\":{\"name\":\"2019 20th IEEE International Conference on Mobile Data Management (MDM)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 20th IEEE International Conference on Mobile Data Management (MDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MDM.2019.00-26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th IEEE International Conference on Mobile Data Management (MDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MDM.2019.00-26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Spatial Insight for UGC Apps: Fast Similarity Search on Keyword-Induced Point Groups
In the era of smartphones, massive data are generated with geo-related info. A large portion of them come from UGC applications (e.g., Twitter, Instagram), where the content provider are users themselves. Such applications are highly attractive for targeted marketing and recommendation, which have been well studied in recommendation system. In this paper, we consider this from a brand new spatial aspect using UGC contents only. To do this we first representing each message as a point with its geo info as its location and then grouping all the points by their keywords to form multiple point groups. We form a similarity search problem that given a query keyword, our problem aims to find k keywords with the most similar distribution of locations. Our case study shows that with similar distribution, the keywords are highly likely to have semantic connections. However, the performance of existing solutions degrades when different point groups have significant overlapping, which frequently happens in UGC contents. We propose efficient techniques to process similarity search on this kind of point groups. Experimental results on Twitter data demonstrate that our solution is faster than the state-of-the-art by up to 6 times.