迁移学习在销售契约电子邮件分类中的应用

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-04-19 DOI:10.1109/CCGRID.2019.00069

Yong Liu, Pavel A. Dmitriev, Yifei Huang, Andrew Brooks, Li Dong

{"title":"迁移学习在销售契约电子邮件分类中的应用","authors":"Yong Liu, Pavel A. Dmitriev, Yifei Huang, Andrew Brooks, Li Dong","doi":"10.1109/CCGRID.2019.00069","DOIUrl":null,"url":null,"abstract":"This paper conducts an empirical investigation to evaluate transfer learning for classifying sales engagement emails arising from digital sales engagement platforms. Given the complexity of content and context of sales engagement, lack of standardized large corpora and benchmarks, limited labeled examples and heterogenous context of intent, this real-world use case poses both a challenge and an opportunity for adopting a transfer learning approach. We propose an evaluation framework to assess a high performance transfer learning (HPTL) approach in three key areas in addition to commonly used accuracy metrics: 1) effective embeddings and pretrained language model usage, 2) minimum labeled samples requirement and 3) transfer learning implementation strategies. We use in-house sales engagement email samples as the experiment dataset, which includes over 3000 emails labeled as positive, objection, unsubscribe, or not-sure. We discuss our findings on evaluating BERT, ELMo, Flair and GloVe embeddings with both feature-based and fine-tuning approaches and their scalability on a GPU cluster with increasingly larger labeled samples. Our results show that fine-tuning of the BERT model outperforms with as few as 300 labeled samples, but underperforms with fewer than 300 labeled samples, relative to all the feature-based approaches using different embeddings.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An Evaluation of Transfer Learning for Classifying Sales Engagement Emails at Large Scale\",\"authors\":\"Yong Liu, Pavel A. Dmitriev, Yifei Huang, Andrew Brooks, Li Dong\",\"doi\":\"10.1109/CCGRID.2019.00069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper conducts an empirical investigation to evaluate transfer learning for classifying sales engagement emails arising from digital sales engagement platforms. Given the complexity of content and context of sales engagement, lack of standardized large corpora and benchmarks, limited labeled examples and heterogenous context of intent, this real-world use case poses both a challenge and an opportunity for adopting a transfer learning approach. We propose an evaluation framework to assess a high performance transfer learning (HPTL) approach in three key areas in addition to commonly used accuracy metrics: 1) effective embeddings and pretrained language model usage, 2) minimum labeled samples requirement and 3) transfer learning implementation strategies. We use in-house sales engagement email samples as the experiment dataset, which includes over 3000 emails labeled as positive, objection, unsubscribe, or not-sure. We discuss our findings on evaluating BERT, ELMo, Flair and GloVe embeddings with both feature-based and fine-tuning approaches and their scalability on a GPU cluster with increasingly larger labeled samples. Our results show that fine-tuning of the BERT model outperforms with as few as 300 labeled samples, but underperforms with fewer than 300 labeled samples, relative to all the feature-based approaches using different embeddings.\",\"PeriodicalId\":234571,\"journal\":{\"name\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2019.00069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文通过实证调查来评估迁移学习对数字销售参与平台产生的销售参与电子邮件进行分类的效果。考虑到销售参与的内容和上下文的复杂性，缺乏标准化的大型语料库和基准，有限的标记示例和异构的意图上下文，这个现实世界的用例对采用迁移学习方法提出了挑战和机遇。我们提出了一个评估框架来评估高性能迁移学习(HPTL)方法，除了常用的准确性指标之外，还包括三个关键领域:1)有效嵌入和预训练语言模型的使用，2)最小标记样本要求和3)迁移学习实施策略。我们使用内部销售参与电子邮件样本作为实验数据集，其中包括3000多封标记为积极、反对、退订或不确定的电子邮件。我们讨论了基于特征和微调方法评估BERT、ELMo、Flair和GloVe嵌入的发现，以及它们在标记样本越来越大的GPU集群上的可扩展性。我们的结果表明，相对于使用不同嵌入的所有基于特征的方法，BERT模型的微调在少于300个标记样本的情况下表现优异，但在少于300个标记样本的情况下表现不佳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Evaluation of Transfer Learning for Classifying Sales Engagement Emails at Large Scale

This paper conducts an empirical investigation to evaluate transfer learning for classifying sales engagement emails arising from digital sales engagement platforms. Given the complexity of content and context of sales engagement, lack of standardized large corpora and benchmarks, limited labeled examples and heterogenous context of intent, this real-world use case poses both a challenge and an opportunity for adopting a transfer learning approach. We propose an evaluation framework to assess a high performance transfer learning (HPTL) approach in three key areas in addition to commonly used accuracy metrics: 1) effective embeddings and pretrained language model usage, 2) minimum labeled samples requirement and 3) transfer learning implementation strategies. We use in-house sales engagement email samples as the experiment dataset, which includes over 3000 emails labeled as positive, objection, unsubscribe, or not-sure. We discuss our findings on evaluating BERT, ELMo, Flair and GloVe embeddings with both feature-based and fine-tuning approaches and their scalability on a GPU cluster with increasingly larger labeled samples. Our results show that fine-tuning of the BERT model outperforms with as few as 300 labeled samples, but underperforms with fewer than 300 labeled samples, relative to all the feature-based approaches using different embeddings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量