Multi-Task Romanian Email Classification in a Business Context

Inf. Comput. Pub Date : 2023-06-03 DOI:10.3390/info14060321
A. Dima, Stefan Ruseti, Denis Iorga, C. Banica, Mihai Dascalu
{"title":"Multi-Task Romanian Email Classification in a Business Context","authors":"A. Dima, Stefan Ruseti, Denis Iorga, C. Banica, Mihai Dascalu","doi":"10.3390/info14060321","DOIUrl":null,"url":null,"abstract":"Email classification systems are essential for handling and organizing the massive flow of communication, especially in a business context. Although many solutions exist, the lack of standardized classification categories limits their applicability. Furthermore, the lack of Romanian language business-oriented public datasets makes the development of such solutions difficult. To this end, we introduce a versatile automated email classification system based on a novel public dataset of 1447 manually annotated Romanian business-oriented emails. Our corpus is annotated with 5 token-related labels, as well as 5 sequence-related classes. We establish a strong baseline using pre-trained Transformer models for token classification and multi-task classification, achieving an F1-score of 0.752 and 0.764, respectively. We publicly release our code together with the dataset of labeled emails.","PeriodicalId":13622,"journal":{"name":"Inf. Comput.","volume":"241 1","pages":"321"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inf. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info14060321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Email classification systems are essential for handling and organizing the massive flow of communication, especially in a business context. Although many solutions exist, the lack of standardized classification categories limits their applicability. Furthermore, the lack of Romanian language business-oriented public datasets makes the development of such solutions difficult. To this end, we introduce a versatile automated email classification system based on a novel public dataset of 1447 manually annotated Romanian business-oriented emails. Our corpus is annotated with 5 token-related labels, as well as 5 sequence-related classes. We establish a strong baseline using pre-trained Transformer models for token classification and multi-task classification, achieving an F1-score of 0.752 and 0.764, respectively. We publicly release our code together with the dataset of labeled emails.
多任务罗马尼亚电子邮件分类在商业环境
电子邮件分类系统对于处理和组织大量的通信流至关重要,特别是在业务环境中。虽然存在许多解决方案,但缺乏标准化的分类类别限制了它们的适用性。此外,缺乏面向商业的罗马尼亚语公共数据集使得开发此类解决方案变得困难。为此,我们介绍了一个多功能的自动电子邮件分类系统,该系统基于一个新的公共数据集,该数据集包含1447封手动注释的罗马尼亚商业电子邮件。我们的语料库有5个与标记相关的标签,以及5个与序列相关的类。我们使用预训练的Transformer模型建立了一个强大的基线,用于令牌分类和多任务分类,分别获得了0.752和0.764的f1得分。我们公开发布了我们的代码以及标记电子邮件的数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信