A Novel Approach for Email Clustering Based on Semantics

Binlai He, Zefeng Li, Nan Yang
{"title":"A Novel Approach for Email Clustering Based on Semantics","authors":"Binlai He, Zefeng Li, Nan Yang","doi":"10.1109/WISA.2014.56","DOIUrl":null,"url":null,"abstract":"An increasing interest has been recently devoted to clustering short documents. Short documents don't contain enough text to compute similarities accurately by implementing the most widely used technique called Vector Space Model (VSM). Adding semantics to short documents clustering is one efficient way to solve this problem. However, real life collections are often composed of very short or long documents. For example, the length of email messages for each email user follows a power-law distribution. Long emails and short emails both appear in email corpus. Therefore, both state-of-the-art short documents and long document clustering approaches can't get a high cluster quality or high efficiency in short and long documents clustering. In order to solve this problem, we propose a novel approach for email clustering based on semantics. Empirical validation shows that our method can obtain high cluster quality and high efficiency in real world email datasets.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th Web Information System and Application Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISA.2014.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

An increasing interest has been recently devoted to clustering short documents. Short documents don't contain enough text to compute similarities accurately by implementing the most widely used technique called Vector Space Model (VSM). Adding semantics to short documents clustering is one efficient way to solve this problem. However, real life collections are often composed of very short or long documents. For example, the length of email messages for each email user follows a power-law distribution. Long emails and short emails both appear in email corpus. Therefore, both state-of-the-art short documents and long document clustering approaches can't get a high cluster quality or high efficiency in short and long documents clustering. In order to solve this problem, we propose a novel approach for email clustering based on semantics. Empirical validation shows that our method can obtain high cluster quality and high efficiency in real world email datasets.
基于语义的电子邮件聚类新方法
最近,人们对短文档的聚类越来越感兴趣。短文档不包含足够的文本,无法通过实现最广泛使用的称为向量空间模型(VSM)的技术来精确计算相似度。在短文档聚类中添加语义是解决这个问题的一种有效方法。然而,现实生活中的集合通常由很短或很长的文档组成。例如,每个电子邮件用户的电子邮件消息长度遵循幂律分布。长邮件和短邮件都出现在邮件语料库中。因此,无论是目前最先进的短文档聚类方法还是长文档聚类方法,都无法在短文档和长文档聚类中获得高的聚类质量和高效率。为了解决这一问题,我们提出了一种基于语义的电子邮件聚类方法。经验验证表明,该方法可以在真实的电子邮件数据集上获得较高的聚类质量和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信