{"title":"推特对话是如何根据地理、时间和主题而不同的?微博话题对话的框架与分析","authors":"Victoria Lai, W. Rand","doi":"10.2139/ssrn.2231823","DOIUrl":null,"url":null,"abstract":"Automatic discovery of how members of social media are discussing different thoughts on particular topics would provide a unique insight into how people perceive different topics. However, identifying trending terms/words within a topical conversation is a difficult task. We take an information retrieval approach and use tf-idf (term frequency-inverse document frequency) to identify words that are more frequent in a focal conversation compared to other conversations on Twitter. This requires a query set of tweets on a particular topic (used for term frequency) and a control set of conversations to use for comparison (used for inverse document frequency). The terms identified as most important within a topical conversation are greatly affected by the particular control set used. There is no clear metric for whether one control set is better than another, since that is determined by the needs of the user, but we can investigate the stability properties of topics given different control sets. We propose a method for doing this, and show that some topics of conversation are more stable than other topics, and that this stability is also affected by whether only the most frequent terms are of interest (top-50), or if all words (full-vocabulary) are being examined. We end with a set of guidelines for how to build better topic analysis tools based on these results.","PeriodicalId":276560,"journal":{"name":"Psychology of Innovation eJournal","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How do Twitter Conversations Differ based on Geography, Time, and Subject? A Framework and Analysis of Topical Conversations in Microblogging\",\"authors\":\"Victoria Lai, W. Rand\",\"doi\":\"10.2139/ssrn.2231823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic discovery of how members of social media are discussing different thoughts on particular topics would provide a unique insight into how people perceive different topics. However, identifying trending terms/words within a topical conversation is a difficult task. We take an information retrieval approach and use tf-idf (term frequency-inverse document frequency) to identify words that are more frequent in a focal conversation compared to other conversations on Twitter. This requires a query set of tweets on a particular topic (used for term frequency) and a control set of conversations to use for comparison (used for inverse document frequency). The terms identified as most important within a topical conversation are greatly affected by the particular control set used. There is no clear metric for whether one control set is better than another, since that is determined by the needs of the user, but we can investigate the stability properties of topics given different control sets. We propose a method for doing this, and show that some topics of conversation are more stable than other topics, and that this stability is also affected by whether only the most frequent terms are of interest (top-50), or if all words (full-vocabulary) are being examined. We end with a set of guidelines for how to build better topic analysis tools based on these results.\",\"PeriodicalId\":276560,\"journal\":{\"name\":\"Psychology of Innovation eJournal\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Psychology of Innovation eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.2231823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychology of Innovation eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.2231823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
How do Twitter Conversations Differ based on Geography, Time, and Subject? A Framework and Analysis of Topical Conversations in Microblogging
Automatic discovery of how members of social media are discussing different thoughts on particular topics would provide a unique insight into how people perceive different topics. However, identifying trending terms/words within a topical conversation is a difficult task. We take an information retrieval approach and use tf-idf (term frequency-inverse document frequency) to identify words that are more frequent in a focal conversation compared to other conversations on Twitter. This requires a query set of tweets on a particular topic (used for term frequency) and a control set of conversations to use for comparison (used for inverse document frequency). The terms identified as most important within a topical conversation are greatly affected by the particular control set used. There is no clear metric for whether one control set is better than another, since that is determined by the needs of the user, but we can investigate the stability properties of topics given different control sets. We propose a method for doing this, and show that some topics of conversation are more stable than other topics, and that this stability is also affected by whether only the most frequent terms are of interest (top-50), or if all words (full-vocabulary) are being examined. We end with a set of guidelines for how to build better topic analysis tools based on these results.