Multi-Task Romanian Email Classification in a Business Context

Inf. Comput. Pub Date : 2023-06-03 DOI:10.3390/info14060321

A. Dima, Stefan Ruseti, Denis Iorga, C. Banica, Mihai Dascalu

引用次数: 0

Abstract

Email classification systems are essential for handling and organizing the massive flow of communication, especially in a business context. Although many solutions exist, the lack of standardized classification categories limits their applicability. Furthermore, the lack of Romanian language business-oriented public datasets makes the development of such solutions difficult. To this end, we introduce a versatile automated email classification system based on a novel public dataset of 1447 manually annotated Romanian business-oriented emails. Our corpus is annotated with 5 token-related labels, as well as 5 sequence-related classes. We establish a strong baseline using pre-trained Transformer models for token classification and multi-task classification, achieving an F1-score of 0.752 and 0.764, respectively. We publicly release our code together with the dataset of labeled emails.

查看原文本刊更多论文

多任务罗马尼亚电子邮件分类在商业环境

电子邮件分类系统对于处理和组织大量的通信流至关重要，特别是在业务环境中。虽然存在许多解决方案，但缺乏标准化的分类类别限制了它们的适用性。此外，缺乏面向商业的罗马尼亚语公共数据集使得开发此类解决方案变得困难。为此，我们介绍了一个多功能的自动电子邮件分类系统，该系统基于一个新的公共数据集，该数据集包含1447封手动注释的罗马尼亚商业电子邮件。我们的语料库有5个与标记相关的标签，以及5个与序列相关的类。我们使用预训练的Transformer模型建立了一个强大的基线，用于令牌分类和多任务分类，分别获得了0.752和0.764的f1得分。我们公开发布了我们的代码以及标记电子邮件的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Inf. Comput.

自引率

0.00%

发文量