A Deep Learning Approach for Minimizing False Negatives in Predicting Receipt Emails

2022 International Conference on Computer and Applications (ICCA) Pub Date : 2022-12-20 DOI:10.1109/ICCA56443.2022.10039606

C. Hirway, Enda Fallon, Paul Connolly, Kieran Flanagan, D. Yadav

{"title":"A Deep Learning Approach for Minimizing False Negatives in Predicting Receipt Emails","authors":"C. Hirway, Enda Fallon, Paul Connolly, Kieran Flanagan, D. Yadav","doi":"10.1109/ICCA56443.2022.10039606","DOIUrl":null,"url":null,"abstract":"Businesses generate receipts for their customers that include information such as the products purchased, their cost, the date and time of purchase, the store id etc. After an online purchase of item/s is made, a receipt is often emailed to the buyer's email address. For this evaluation, a classified database with receipt and non-receipt emails was available. Previously, Machine Learning (ML) algorithms for determining receipt validity had been implemented on this test database. The results showed that the Random Forest technique performed better than Naive Bayes and Support Vector Machine. In this paper, a Deep Learning algorithm named Long Short-Term Memory [LSTM] is implemented and its results compared with the previous implementation. The capacity of this recurrent network to handle the exploding/vanishing gradient problem, which is a challenge when training recurrent or very deep neural networks, is one factor in its success. It was found that LSTM is more effective in terms of accuracy compared to the previous ML approach. Also, the false negative values predicted by LSTM were fewer that those predicted by the ML approach. In the classification of receipt emails, processing an email without receipt data incurs a relatively low cost, yet failing to detect a receipt email results in the loss of important data. As a result, the system needs to be tuned to minimize false negatives while permitting a wider tolerance for false positives since the cost of false negatives in this situation is substantially higher than that of false positives.","PeriodicalId":153139,"journal":{"name":"2022 International Conference on Computer and Applications (ICCA)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer and Applications (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA56443.2022.10039606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Businesses generate receipts for their customers that include information such as the products purchased, their cost, the date and time of purchase, the store id etc. After an online purchase of item/s is made, a receipt is often emailed to the buyer's email address. For this evaluation, a classified database with receipt and non-receipt emails was available. Previously, Machine Learning (ML) algorithms for determining receipt validity had been implemented on this test database. The results showed that the Random Forest technique performed better than Naive Bayes and Support Vector Machine. In this paper, a Deep Learning algorithm named Long Short-Term Memory [LSTM] is implemented and its results compared with the previous implementation. The capacity of this recurrent network to handle the exploding/vanishing gradient problem, which is a challenge when training recurrent or very deep neural networks, is one factor in its success. It was found that LSTM is more effective in terms of accuracy compared to the previous ML approach. Also, the false negative values predicted by LSTM were fewer that those predicted by the ML approach. In the classification of receipt emails, processing an email without receipt data incurs a relatively low cost, yet failing to detect a receipt email results in the loss of important data. As a result, the system needs to be tuned to minimize false negatives while permitting a wider tolerance for false positives since the cost of false negatives in this situation is substantially higher than that of false positives.

查看原文本刊更多论文

预测收货电子邮件中最大限度地减少假阴性的深度学习方法

商家为他们的顾客生成收据，其中包括购买的产品、成本、购买的日期和时间、商店id等信息。在网上购买商品后，收据通常会通过电子邮件发送到买家的电子邮件地址。为进行这项评价，有一个分类数据库，其中载有收到和未收到的电子邮件。以前，用于确定收据有效性的机器学习(ML)算法已经在该测试数据库上实现。结果表明，随机森林技术的性能优于朴素贝叶斯和支持向量机。本文实现了一种名为长短期记忆(LSTM)的深度学习算法，并将其结果与之前的算法进行了比较。这个循环网络处理梯度爆炸/消失问题的能力是其成功的一个因素，这是训练循环或非常深的神经网络时的一个挑战。研究发现，LSTM在准确率方面比之前的ML方法更有效。此外，LSTM预测的假阴性值比ML方法预测的假阴性值要少。在收据邮件分类中，处理没有收据数据的邮件成本相对较低，但未检测到收据邮件会导致重要数据丢失。因此，需要对系统进行调整，以尽量减少假阴性，同时允许更大的假阳性容忍度，因为在这种情况下，假阴性的成本大大高于假阳性的成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Computer and Applications (ICCA)

自引率

0.00%

发文量