线程讨论社区的主题检测和跟踪

Mingliang Zhu, Weiming Hu, Ou Wu
{"title":"线程讨论社区的主题检测和跟踪","authors":"Mingliang Zhu, Weiming Hu, Ou Wu","doi":"10.1109/WIIAT.2008.50","DOIUrl":null,"url":null,"abstract":"The threaded discussion communities are one of the most common forms of online communities, which are becoming more and more popular among web users. Everyday a huge amount of new discussions are added to these communities, which are difficult to summarize and search. In this paper, we propose a topic detection and tracking (TDT) method for the discussion threads. Most existing TDT methods deal with the news stories, but the language used in discussion data are much more casual, oral and informal compared with news data. To solve this problem, we design several extensions to the basic TDT framework, focusing on the very nature of discussion data, including a thread/post activity validation step, a term pos-weighting strategy, and a two-level decision framework considering not only the content similarity but also the user activity information. Experiment results show that our pro-posed method greatly improves current TDT methods in real discussion community environment. The discussion data can be better organized for searching and visualization with the help of TDT.","PeriodicalId":393772,"journal":{"name":"2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Topic Detection and Tracking for Threaded Discussion Communities\",\"authors\":\"Mingliang Zhu, Weiming Hu, Ou Wu\",\"doi\":\"10.1109/WIIAT.2008.50\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The threaded discussion communities are one of the most common forms of online communities, which are becoming more and more popular among web users. Everyday a huge amount of new discussions are added to these communities, which are difficult to summarize and search. In this paper, we propose a topic detection and tracking (TDT) method for the discussion threads. Most existing TDT methods deal with the news stories, but the language used in discussion data are much more casual, oral and informal compared with news data. To solve this problem, we design several extensions to the basic TDT framework, focusing on the very nature of discussion data, including a thread/post activity validation step, a term pos-weighting strategy, and a two-level decision framework considering not only the content similarity but also the user activity information. Experiment results show that our pro-posed method greatly improves current TDT methods in real discussion community environment. The discussion data can be better organized for searching and visualization with the help of TDT.\",\"PeriodicalId\":393772,\"journal\":{\"name\":\"2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WIIAT.2008.50\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WIIAT.2008.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

摘要

线程式讨论社区是网络社区中最常见的一种形式,越来越受到网络用户的欢迎。每天都有大量的新讨论被添加到这些社区中,这些讨论很难总结和搜索。本文提出了一种针对讨论线程的主题检测与跟踪方法。大多数现有的TDT方法处理的是新闻故事,但讨论数据中使用的语言与新闻数据相比更加随意、口头和非正式。为了解决这个问题,我们设计了几个基本TDT框架的扩展,重点关注讨论数据的本质,包括线程/帖子活动验证步骤,术语帖子权重策略,以及不仅考虑内容相似性而且考虑用户活动信息的两级决策框架。实验结果表明,在真实讨论社区环境下,我们提出的方法大大改进了现有的TDT方法。借助TDT,可以更好地组织讨论数据进行搜索和可视化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Topic Detection and Tracking for Threaded Discussion Communities
The threaded discussion communities are one of the most common forms of online communities, which are becoming more and more popular among web users. Everyday a huge amount of new discussions are added to these communities, which are difficult to summarize and search. In this paper, we propose a topic detection and tracking (TDT) method for the discussion threads. Most existing TDT methods deal with the news stories, but the language used in discussion data are much more casual, oral and informal compared with news data. To solve this problem, we design several extensions to the basic TDT framework, focusing on the very nature of discussion data, including a thread/post activity validation step, a term pos-weighting strategy, and a two-level decision framework considering not only the content similarity but also the user activity information. Experiment results show that our pro-posed method greatly improves current TDT methods in real discussion community environment. The discussion data can be better organized for searching and visualization with the help of TDT.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信