CTM -一种大规模多视图推文主题分类模型

Vivek Kulkarni, Kenny Leung, A. Haghighi
{"title":"CTM -一种大规模多视图推文主题分类模型","authors":"Vivek Kulkarni, Kenny Leung, A. Haghighi","doi":"10.48550/arXiv.2205.01603","DOIUrl":null,"url":null,"abstract":"Automatically associating social media posts with topics is an important prerequisite for effective search and recommendation on many social media platforms. However, topic classification of such posts is quite challenging because of (a) a large topic space (b) short text with weak topical cues, and (c) multiple topic associations per post. In contrast to most prior work which only focuses on post-classification into a small number of topics (10-20), we consider the task of large-scale topic classification in the context of Twitter where the topic space is 10 times larger with potentially multiple topic associations per Tweet. We address the challenges above and propose a novel neural model, that (a) supports a large topic space of 300 topics (b) takes a holistic approach to tweet content modeling – leveraging multi-modal content, author context, and deeper semantic cues in the Tweet. Our method offers an effective way to classify Tweets into topics at scale by yielding superior performance to other approaches (a relative lift of \\mathbf{20}\\% in median average precision score) and has been successfully deployed in production at Twitter.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"2006 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"CTM - A Model for Large-Scale Multi-View Tweet Topic Classification\",\"authors\":\"Vivek Kulkarni, Kenny Leung, A. Haghighi\",\"doi\":\"10.48550/arXiv.2205.01603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically associating social media posts with topics is an important prerequisite for effective search and recommendation on many social media platforms. However, topic classification of such posts is quite challenging because of (a) a large topic space (b) short text with weak topical cues, and (c) multiple topic associations per post. In contrast to most prior work which only focuses on post-classification into a small number of topics (10-20), we consider the task of large-scale topic classification in the context of Twitter where the topic space is 10 times larger with potentially multiple topic associations per Tweet. We address the challenges above and propose a novel neural model, that (a) supports a large topic space of 300 topics (b) takes a holistic approach to tweet content modeling – leveraging multi-modal content, author context, and deeper semantic cues in the Tweet. Our method offers an effective way to classify Tweets into topics at scale by yielding superior performance to other approaches (a relative lift of \\\\mathbf{20}\\\\% in median average precision score) and has been successfully deployed in production at Twitter.\",\"PeriodicalId\":382084,\"journal\":{\"name\":\"North American Chapter of the Association for Computational Linguistics\",\"volume\":\"2006 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"North American Chapter of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.01603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.01603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

自动将社交媒体帖子与主题关联是许多社交媒体平台上有效搜索和推荐的重要前提。然而,这类帖子的主题分类相当具有挑战性,因为(a)主题空间大(b)文本短,主题线索弱,(c)每篇帖子有多个主题关联。与大多数先前的工作只关注后分类到少量主题(10-20)相比,我们考虑了Twitter背景下的大规模主题分类任务,其中主题空间大10倍,每个Tweet可能有多个主题关联。针对上述挑战,我们提出了一种新的神经模型,该模型(a)支持300个主题的大型主题空间(b)采用整体方法对tweet内容建模——利用tweet中的多模态内容、作者上下文和更深层次的语义线索。我们的方法提供了一种有效的方法,通过产生优于其他方法的性能(中位数平均精度分数的相对提升\mathbf{20} %),将tweet按主题进行大规模分类,并已成功部署在Twitter的生产环境中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CTM - A Model for Large-Scale Multi-View Tweet Topic Classification
Automatically associating social media posts with topics is an important prerequisite for effective search and recommendation on many social media platforms. However, topic classification of such posts is quite challenging because of (a) a large topic space (b) short text with weak topical cues, and (c) multiple topic associations per post. In contrast to most prior work which only focuses on post-classification into a small number of topics (10-20), we consider the task of large-scale topic classification in the context of Twitter where the topic space is 10 times larger with potentially multiple topic associations per Tweet. We address the challenges above and propose a novel neural model, that (a) supports a large topic space of 300 topics (b) takes a holistic approach to tweet content modeling – leveraging multi-modal content, author context, and deeper semantic cues in the Tweet. Our method offers an effective way to classify Tweets into topics at scale by yielding superior performance to other approaches (a relative lift of \mathbf{20}\% in median average precision score) and has been successfully deployed in production at Twitter.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信