预测实时网络上的语义注释

Elham Khabiri, James Caverlee, K. Kamath
{"title":"预测实时网络上的语义注释","authors":"Elham Khabiri, James Caverlee, K. Kamath","doi":"10.1145/2309996.2310034","DOIUrl":null,"url":null,"abstract":"The explosion of the real-time web has spurred a growing need for new methods to organize, monitor, and distill relevant information from these large-scale social streams. One especially encouraging development is the self-curation of the real-time web via user-driven linking, in which users annotate their own status updates with lightweight semantic annotations -- or hashtags. Unfortunately, there is evidence that hashtag growth is not keeping pace with the growth of the overall real-time web. In a random sample of 3 million tweets, we find that only 10.2% contain at least one hashtag. Hence, in this paper we explore the possibility of predicting hashtags for un-annotated status updates. Toward this end, we propose and evaluate a graph-based prediction framework. Three of the unique features of the approach are: (i) a path aggregation technique for scoring the closeness of terms and hashtags in the graph; (ii) pivot term selection, for identifying high value terms in status updates; and (iii) a dynamic sliding window for recommending hashtags reflecting the current status of the real-time web. Experimentally we find encouraging results in comparison with Bayesian and data mining-based approaches.","PeriodicalId":91270,"journal":{"name":"HT ... : the proceedings of the ... ACM Conference on Hypertext and Social Media. ACM Conference on Hypertext and Social Media","volume":"37 1","pages":"219-228"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Predicting semantic annotations on the real-time web\",\"authors\":\"Elham Khabiri, James Caverlee, K. Kamath\",\"doi\":\"10.1145/2309996.2310034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The explosion of the real-time web has spurred a growing need for new methods to organize, monitor, and distill relevant information from these large-scale social streams. One especially encouraging development is the self-curation of the real-time web via user-driven linking, in which users annotate their own status updates with lightweight semantic annotations -- or hashtags. Unfortunately, there is evidence that hashtag growth is not keeping pace with the growth of the overall real-time web. In a random sample of 3 million tweets, we find that only 10.2% contain at least one hashtag. Hence, in this paper we explore the possibility of predicting hashtags for un-annotated status updates. Toward this end, we propose and evaluate a graph-based prediction framework. Three of the unique features of the approach are: (i) a path aggregation technique for scoring the closeness of terms and hashtags in the graph; (ii) pivot term selection, for identifying high value terms in status updates; and (iii) a dynamic sliding window for recommending hashtags reflecting the current status of the real-time web. Experimentally we find encouraging results in comparison with Bayesian and data mining-based approaches.\",\"PeriodicalId\":91270,\"journal\":{\"name\":\"HT ... : the proceedings of the ... ACM Conference on Hypertext and Social Media. ACM Conference on Hypertext and Social Media\",\"volume\":\"37 1\",\"pages\":\"219-228\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HT ... : the proceedings of the ... ACM Conference on Hypertext and Social Media. ACM Conference on Hypertext and Social Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2309996.2310034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HT ... : the proceedings of the ... ACM Conference on Hypertext and Social Media. ACM Conference on Hypertext and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2309996.2310034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

实时网络的爆炸式增长促使人们越来越需要新的方法来组织、监控和从这些大规模的社会信息流中提取相关信息。一个特别令人鼓舞的发展是实时网络的自我管理,通过用户驱动的链接,用户用轻量级的语义注释或标签注释他们自己的状态更新。不幸的是,有证据表明,标签的增长并没有跟上整个实时网络的增长。在300万条tweet的随机样本中,我们发现只有10.2%包含至少一个标签。因此,在本文中,我们探讨了预测未注释状态更新的标签的可能性。为此,我们提出并评估了一个基于图的预测框架。该方法的三个独特特征是:(i)用于对图中术语和标签的接近度进行评分的路径聚合技术;(ii)枢纽术语选择,用于在状态更新中识别高价值术语;(iii)一个动态滑动窗口,用于推荐反映实时网络当前状态的标签。实验中,我们发现与贝叶斯和基于数据挖掘的方法相比,结果令人鼓舞。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting semantic annotations on the real-time web
The explosion of the real-time web has spurred a growing need for new methods to organize, monitor, and distill relevant information from these large-scale social streams. One especially encouraging development is the self-curation of the real-time web via user-driven linking, in which users annotate their own status updates with lightweight semantic annotations -- or hashtags. Unfortunately, there is evidence that hashtag growth is not keeping pace with the growth of the overall real-time web. In a random sample of 3 million tweets, we find that only 10.2% contain at least one hashtag. Hence, in this paper we explore the possibility of predicting hashtags for un-annotated status updates. Toward this end, we propose and evaluate a graph-based prediction framework. Three of the unique features of the approach are: (i) a path aggregation technique for scoring the closeness of terms and hashtags in the graph; (ii) pivot term selection, for identifying high value terms in status updates; and (iii) a dynamic sliding window for recommending hashtags reflecting the current status of the real-time web. Experimentally we find encouraging results in comparison with Bayesian and data mining-based approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信