How do Twitter Conversations Differ based on Geography, Time, and Subject? A Framework and Analysis of Topical Conversations in Microblogging

Psychology of Innovation eJournal Pub Date : 2013-08-15 DOI:10.2139/ssrn.2231823

Victoria Lai, W. Rand

{"title":"How do Twitter Conversations Differ based on Geography, Time, and Subject? A Framework and Analysis of Topical Conversations in Microblogging","authors":"Victoria Lai, W. Rand","doi":"10.2139/ssrn.2231823","DOIUrl":null,"url":null,"abstract":"Automatic discovery of how members of social media are discussing different thoughts on particular topics would provide a unique insight into how people perceive different topics. However, identifying trending terms/words within a topical conversation is a difficult task. We take an information retrieval approach and use tf-idf (term frequency-inverse document frequency) to identify words that are more frequent in a focal conversation compared to other conversations on Twitter. This requires a query set of tweets on a particular topic (used for term frequency) and a control set of conversations to use for comparison (used for inverse document frequency). The terms identified as most important within a topical conversation are greatly affected by the particular control set used. There is no clear metric for whether one control set is better than another, since that is determined by the needs of the user, but we can investigate the stability properties of topics given different control sets. We propose a method for doing this, and show that some topics of conversation are more stable than other topics, and that this stability is also affected by whether only the most frequent terms are of interest (top-50), or if all words (full-vocabulary) are being examined. We end with a set of guidelines for how to build better topic analysis tools based on these results.","PeriodicalId":276560,"journal":{"name":"Psychology of Innovation eJournal","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychology of Innovation eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.2231823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic discovery of how members of social media are discussing different thoughts on particular topics would provide a unique insight into how people perceive different topics. However, identifying trending terms/words within a topical conversation is a difficult task. We take an information retrieval approach and use tf-idf (term frequency-inverse document frequency) to identify words that are more frequent in a focal conversation compared to other conversations on Twitter. This requires a query set of tweets on a particular topic (used for term frequency) and a control set of conversations to use for comparison (used for inverse document frequency). The terms identified as most important within a topical conversation are greatly affected by the particular control set used. There is no clear metric for whether one control set is better than another, since that is determined by the needs of the user, but we can investigate the stability properties of topics given different control sets. We propose a method for doing this, and show that some topics of conversation are more stable than other topics, and that this stability is also affected by whether only the most frequent terms are of interest (top-50), or if all words (full-vocabulary) are being examined. We end with a set of guidelines for how to build better topic analysis tools based on these results.

查看原文本刊更多论文

推特对话是如何根据地理、时间和主题而不同的?微博话题对话的框架与分析

自动发现社交媒体成员如何就特定话题讨论不同的想法，将为了解人们如何看待不同话题提供独特的见解。然而，在话题对话中识别流行词汇是一项艰巨的任务。我们采用了一种信息检索方法，并使用tf-idf(术语频率逆文档频率)来识别与Twitter上的其他对话相比，焦点对话中出现频率更高的单词。这需要针对特定主题的tweet查询集(用于术语频率)和用于比较的对话控制集(用于逆文档频率)。在主题对话中被确定为最重要的术语受到所使用的特定控制集的极大影响。对于一个控制集是否比另一个更好，没有明确的度量标准，因为这是由用户的需求决定的，但是我们可以研究给定不同控制集的主题的稳定性。我们提出了一种方法来做到这一点，并表明一些话题比其他话题更稳定，这种稳定性也受到是否只有最常见的术语是感兴趣的(前50名)，或者如果所有的单词(完整的词汇)都被检查的影响。最后，我们给出了一组指导原则，说明如何基于这些结果构建更好的主题分析工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Psychology of Innovation eJournal

自引率

0.00%

发文量