CTM -一种大规模多视图推文主题分类模型

North American Chapter of the Association for Computational Linguistics Pub Date : 2022-05-03 DOI:10.48550/arXiv.2205.01603

Vivek Kulkarni, Kenny Leung, A. Haghighi

{"title":"CTM -一种大规模多视图推文主题分类模型","authors":"Vivek Kulkarni, Kenny Leung, A. Haghighi","doi":"10.48550/arXiv.2205.01603","DOIUrl":null,"url":null,"abstract":"Automatically associating social media posts with topics is an important prerequisite for effective search and recommendation on many social media platforms. However, topic classification of such posts is quite challenging because of (a) a large topic space (b) short text with weak topical cues, and (c) multiple topic associations per post. In contrast to most prior work which only focuses on post-classification into a small number of topics (10-20), we consider the task of large-scale topic classification in the context of Twitter where the topic space is 10 times larger with potentially multiple topic associations per Tweet. We address the challenges above and propose a novel neural model, that (a) supports a large topic space of 300 topics (b) takes a holistic approach to tweet content modeling – leveraging multi-modal content, author context, and deeper semantic cues in the Tweet. Our method offers an effective way to classify Tweets into topics at scale by yielding superior performance to other approaches (a relative lift of \\mathbf{20}\\% in median average precision score) and has been successfully deployed in production at Twitter.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"2006 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"CTM - A Model for Large-Scale Multi-View Tweet Topic Classification\",\"authors\":\"Vivek Kulkarni, Kenny Leung, A. Haghighi\",\"doi\":\"10.48550/arXiv.2205.01603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically associating social media posts with topics is an important prerequisite for effective search and recommendation on many social media platforms. However, topic classification of such posts is quite challenging because of (a) a large topic space (b) short text with weak topical cues, and (c) multiple topic associations per post. In contrast to most prior work which only focuses on post-classification into a small number of topics (10-20), we consider the task of large-scale topic classification in the context of Twitter where the topic space is 10 times larger with potentially multiple topic associations per Tweet. We address the challenges above and propose a novel neural model, that (a) supports a large topic space of 300 topics (b) takes a holistic approach to tweet content modeling – leveraging multi-modal content, author context, and deeper semantic cues in the Tweet. Our method offers an effective way to classify Tweets into topics at scale by yielding superior performance to other approaches (a relative lift of \\\\mathbf{20}\\\\% in median average precision score) and has been successfully deployed in production at Twitter.\",\"PeriodicalId\":382084,\"journal\":{\"name\":\"North American Chapter of the Association for Computational Linguistics\",\"volume\":\"2006 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"North American Chapter of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.01603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.01603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

自动将社交媒体帖子与主题关联是许多社交媒体平台上有效搜索和推荐的重要前提。然而，这类帖子的主题分类相当具有挑战性，因为(a)主题空间大(b)文本短，主题线索弱，(c)每篇帖子有多个主题关联。与大多数先前的工作只关注后分类到少量主题(10-20)相比，我们考虑了Twitter背景下的大规模主题分类任务，其中主题空间大10倍，每个Tweet可能有多个主题关联。针对上述挑战，我们提出了一种新的神经模型，该模型(a)支持300个主题的大型主题空间(b)采用整体方法对tweet内容建模——利用tweet中的多模态内容、作者上下文和更深层次的语义线索。我们的方法提供了一种有效的方法，通过产生优于其他方法的性能(中位数平均精度分数的相对提升\mathbf{20} %)，将tweet按主题进行大规模分类，并已成功部署在Twitter的生产环境中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CTM - A Model for Large-Scale Multi-View Tweet Topic Classification

Automatically associating social media posts with topics is an important prerequisite for effective search and recommendation on many social media platforms. However, topic classification of such posts is quite challenging because of (a) a large topic space (b) short text with weak topical cues, and (c) multiple topic associations per post. In contrast to most prior work which only focuses on post-classification into a small number of topics (10-20), we consider the task of large-scale topic classification in the context of Twitter where the topic space is 10 times larger with potentially multiple topic associations per Tweet. We address the challenges above and propose a novel neural model, that (a) supports a large topic space of 300 topics (b) takes a holistic approach to tweet content modeling – leveraging multi-modal content, author context, and deeper semantic cues in the Tweet. Our method offers an effective way to classify Tweets into topics at scale by yielding superior performance to other approaches (a relative lift of \mathbf{20}\% in median average precision score) and has been successfully deployed in production at Twitter.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

North American Chapter of the Association for Computational Linguistics

自引率

0.00%

发文量