邮件是搜索和数据挖掘的下一个前沿领域吗?

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining Pub Date : 2016-02-08 DOI:10.1145/2835776.2835847

Y. Maarek

{"title":"邮件是搜索和数据挖掘的下一个前沿领域吗?","authors":"Y. Maarek","doi":"10.1145/2835776.2835847","DOIUrl":null,"url":null,"abstract":"The nature of Web mail traffic has significantly evolved in the last two decades, and consequently the behavior of Web mail users has also changed. For instance a recent study conducted by Yahoo Labs showed that today 90% of Web mail traffic is machine-generated. This partly explains why email traffic continues to grow even if a significant amount of personal communications has moved towards social media. Most users today are receiving in their inbox important invoices, receipts, and travel itineraries, together with non-malicious junk mail such as hotel newsletters or shopping promotions that could safely ignore. This is one of the reasons that a majority of messages remain unread, and many are deleted without being read. In that sense, Web mail has become quite similar to traditional snail mail. In spite of this drastic change in nature, many mail features remain unchanged. While 70% of mail users do not define even a single folder, folders are still predominant in the left trail of many Web mail clients. Mail search results are still mostly ranked by date, which makes the retrieving of older messages extremely challenging. This is even more painful to users, as unlike in Web search, they will know when a relevant previously read message has not been returned. In this talk, I present the results of multiple large-scale studies that have been conducted at Yahoo Labs in the last few years. I highlight the inherent challenges associated with such studies, especially around privacy concerns. I will discuss the new nature of consumer Web mail, which is dominated by machine-generated messages of highly heterogeneous forms and value. I will show how the change has not been fully recognized yet by my most email clients. As an example, why should there still be a reply option associated with a message coming from a \"do-not-reply@\" address?. I will introduce some approaches for large-scale mail mining specifically tailored to machine-generated email. I will conclude by discussing possible applications and research directions.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"69 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Is Mail The Next Frontier In Search And Data Mining?\",\"authors\":\"Y. Maarek\",\"doi\":\"10.1145/2835776.2835847\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The nature of Web mail traffic has significantly evolved in the last two decades, and consequently the behavior of Web mail users has also changed. For instance a recent study conducted by Yahoo Labs showed that today 90% of Web mail traffic is machine-generated. This partly explains why email traffic continues to grow even if a significant amount of personal communications has moved towards social media. Most users today are receiving in their inbox important invoices, receipts, and travel itineraries, together with non-malicious junk mail such as hotel newsletters or shopping promotions that could safely ignore. This is one of the reasons that a majority of messages remain unread, and many are deleted without being read. In that sense, Web mail has become quite similar to traditional snail mail. In spite of this drastic change in nature, many mail features remain unchanged. While 70% of mail users do not define even a single folder, folders are still predominant in the left trail of many Web mail clients. Mail search results are still mostly ranked by date, which makes the retrieving of older messages extremely challenging. This is even more painful to users, as unlike in Web search, they will know when a relevant previously read message has not been returned. In this talk, I present the results of multiple large-scale studies that have been conducted at Yahoo Labs in the last few years. I highlight the inherent challenges associated with such studies, especially around privacy concerns. I will discuss the new nature of consumer Web mail, which is dominated by machine-generated messages of highly heterogeneous forms and value. I will show how the change has not been fully recognized yet by my most email clients. As an example, why should there still be a reply option associated with a message coming from a \\\"do-not-reply@\\\" address?. I will introduce some approaches for large-scale mail mining specifically tailored to machine-generated email. I will conclude by discussing possible applications and research directions.\",\"PeriodicalId\":20567,\"journal\":{\"name\":\"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining\",\"volume\":\"69 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2835776.2835847\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2835847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

Web邮件通信的性质在过去二十年中发生了显著的变化，因此Web邮件用户的行为也发生了变化。例如，雅虎实验室最近进行的一项研究表明，今天90%的网络邮件流量是由机器产生的。这在一定程度上解释了为什么即使大量的个人通信已经转向社交媒体，电子邮件流量仍在继续增长。今天，大多数用户在他们的收件箱中接收重要的发票、收据和旅行行程，以及可以安全地忽略的非恶意垃圾邮件，如酒店通讯或购物促销。这就是大多数消息保持未读状态的原因之一，许多消息在未读的情况下被删除了。从这个意义上说，网络邮件与传统的蜗牛邮件非常相似。尽管本质上发生了巨大的变化，但邮件的许多特性仍然保持不变。虽然70%的邮件用户甚至不定义一个文件夹，但文件夹仍然是许多Web邮件客户端的主流。邮件搜索结果仍然主要按日期排序，这使得检索较旧的邮件极具挑战性。这对用户来说甚至更痛苦，因为与Web搜索不同，他们将知道先前阅读的相关消息何时没有返回。在这次演讲中，我将展示雅虎实验室在过去几年中进行的多项大规模研究的结果。我强调了与此类研究相关的固有挑战，特别是在隐私问题方面。我将讨论消费者Web邮件的新性质，即由机器生成的具有高度异质形式和价值的消息所主导。我将展示我的大多数电子邮件客户端如何还没有完全认识到这种变化。举个例子，为什么还应该有一个与来自“不回复@”地址的消息相关联的回复选项?我将介绍一些专门针对机器生成的电子邮件进行大规模邮件挖掘的方法。最后我将讨论可能的应用和研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Is Mail The Next Frontier In Search And Data Mining?

The nature of Web mail traffic has significantly evolved in the last two decades, and consequently the behavior of Web mail users has also changed. For instance a recent study conducted by Yahoo Labs showed that today 90% of Web mail traffic is machine-generated. This partly explains why email traffic continues to grow even if a significant amount of personal communications has moved towards social media. Most users today are receiving in their inbox important invoices, receipts, and travel itineraries, together with non-malicious junk mail such as hotel newsletters or shopping promotions that could safely ignore. This is one of the reasons that a majority of messages remain unread, and many are deleted without being read. In that sense, Web mail has become quite similar to traditional snail mail. In spite of this drastic change in nature, many mail features remain unchanged. While 70% of mail users do not define even a single folder, folders are still predominant in the left trail of many Web mail clients. Mail search results are still mostly ranked by date, which makes the retrieving of older messages extremely challenging. This is even more painful to users, as unlike in Web search, they will know when a relevant previously read message has not been returned. In this talk, I present the results of multiple large-scale studies that have been conducted at Yahoo Labs in the last few years. I highlight the inherent challenges associated with such studies, especially around privacy concerns. I will discuss the new nature of consumer Web mail, which is dominated by machine-generated messages of highly heterogeneous forms and value. I will show how the change has not been fully recognized yet by my most email clients. As an example, why should there still be a reply option associated with a message coming from a "do-not-reply@" address?. I will introduce some approaches for large-scale mail mining specifically tailored to machine-generated email. I will conclude by discussing possible applications and research directions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量