通过聚类邮件自动检测个人主题

Huijie Yang, Junyong Luo, Meijuan Yin, Yan Liu
{"title":"通过聚类邮件自动检测个人主题","authors":"Huijie Yang, Junyong Luo, Meijuan Yin, Yan Liu","doi":"10.1109/ETCS.2010.238","DOIUrl":null,"url":null,"abstract":"Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.","PeriodicalId":193276,"journal":{"name":"2010 Second International Workshop on Education Technology and Computer Science","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Automatically Detecting Personal Topics by Clustering Emails\",\"authors\":\"Huijie Yang, Junyong Luo, Meijuan Yin, Yan Liu\",\"doi\":\"10.1109/ETCS.2010.238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.\",\"PeriodicalId\":193276,\"journal\":{\"name\":\"2010 Second International Workshop on Education Technology and Computer Science\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Second International Workshop on Education Technology and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ETCS.2010.238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Workshop on Education Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCS.2010.238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

电子邮件在我们的日常生活中扮演着重要的角色。人们已经认识到,将电子邮件聚类成有意义的组可以大大节省处理电子邮件的认知负荷。如何对邮件进行组织和管理,如何方便有效地挖掘有意义的数据,成为邮箱用户越来越关心的问题。本文提出了一种基于聚类算法的个人话题检测方法。首先对邮件进行预处理,构建改进的邮件VSM(向量空间模型),以一种新的方法将邮件的正文和主题结合起来进行标记,然后采用先进的k-means算法对邮件进行聚类,并基于最低相似度设计核选择算法,得到合适的关键词对每个聚类的主题进行标记。最后,在20个新闻组邮件数据集上进行的实验表明了该方法的有效性,实验结果与标记的人聚类结果也很好地匹配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatically Detecting Personal Topics by Clustering Emails
Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信