用于检测特立尼达和多巴哥超市不公平定价的机器学习技术

A. Ramdhanie
{"title":"用于检测特立尼达和多巴哥超市不公平定价的机器学习技术","authors":"A. Ramdhanie","doi":"10.47412/gids9258","DOIUrl":null,"url":null,"abstract":"The tracking of prices in monitored supermarkets across Trinidad and Tobago is done by the Ministry of Trade and Industry. This initiative involves data collection every month for 118 grocery items (“standard basket”). The task of identifying which supermarkets are non-conforming in their pricing schemes is linked to the “total basket price” (total cost of the 118 items). An outlier is defined as any datapoint that varies significantly from all other observations in a dataset. In this paper, it is any supermarket that exceeds this total basket price by 5%. The aim of this research was twofold, with the first goal being to employ feature selection methods to reduce the number of items being collected. The second goal was to create a logistic regression learning model that can identify whether supermarkets are non-conforming, given their pricing information. The dataset contained 692 datapoints and out of these, only eight (8) were classified as outliers. This is an imbalanced dataset. Resampling by SMOTE (Synthetic Minority Oversampling Technique) was used to synthetically generate data for the training set. Seven (7) feature selection methods were also investigated and their results discussed and analysed. In doing this, a more balanced dataset was achieved which was tested and validated on the unseen data (testing set). The metrics indicated that a subset of these features can be collected whilst still maintaining the supermarket outliers.","PeriodicalId":206492,"journal":{"name":"Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MACHINE LEARNING TECHNIQUES FOR THE DETECTION OF UNFAIR PRICING IN SUPERMARKETS ACROSS TRINIDAD AND TOBAGO\",\"authors\":\"A. Ramdhanie\",\"doi\":\"10.47412/gids9258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The tracking of prices in monitored supermarkets across Trinidad and Tobago is done by the Ministry of Trade and Industry. This initiative involves data collection every month for 118 grocery items (“standard basket”). The task of identifying which supermarkets are non-conforming in their pricing schemes is linked to the “total basket price” (total cost of the 118 items). An outlier is defined as any datapoint that varies significantly from all other observations in a dataset. In this paper, it is any supermarket that exceeds this total basket price by 5%. The aim of this research was twofold, with the first goal being to employ feature selection methods to reduce the number of items being collected. The second goal was to create a logistic regression learning model that can identify whether supermarkets are non-conforming, given their pricing information. The dataset contained 692 datapoints and out of these, only eight (8) were classified as outliers. This is an imbalanced dataset. Resampling by SMOTE (Synthetic Minority Oversampling Technique) was used to synthetically generate data for the training set. Seven (7) feature selection methods were also investigated and their results discussed and analysed. In doing this, a more balanced dataset was achieved which was tested and validated on the unseen data (testing set). The metrics indicated that a subset of these features can be collected whilst still maintaining the supermarket outliers.\",\"PeriodicalId\":206492,\"journal\":{\"name\":\"Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47412/gids9258\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47412/gids9258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

特立尼达和多巴哥各受监控超市的价格跟踪是由贸易和工业部完成的。这项计划包括每月收集118种食品杂货(“标准篮子”)的数据。确定哪些超市的定价方案不符合标准的任务与“总篮子价格”(118种商品的总成本)有关。异常值被定义为与数据集中所有其他观测值有显著差异的任何数据点。在本文中,它是任何一个超过这个总篮子价格5%的超市。本研究的目的是双重的,第一个目标是采用特征选择方法来减少被收集的项目数量。第二个目标是创建一个逻辑回归学习模型,该模型可以根据定价信息识别超市是否不符合标准。该数据集包含692个数据点,其中只有8个数据点被归类为异常值。这是一个不平衡的数据集。采用SMOTE (Synthetic Minority Oversampling Technique)重采样技术合成训练集数据。研究了7种特征选择方法,并对其结果进行了讨论和分析。通过这样做,实现了一个更加平衡的数据集,该数据集在未见过的数据(测试集)上进行了测试和验证。指标表明,可以收集这些特征的子集,同时仍然保持超市的异常值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MACHINE LEARNING TECHNIQUES FOR THE DETECTION OF UNFAIR PRICING IN SUPERMARKETS ACROSS TRINIDAD AND TOBAGO
The tracking of prices in monitored supermarkets across Trinidad and Tobago is done by the Ministry of Trade and Industry. This initiative involves data collection every month for 118 grocery items (“standard basket”). The task of identifying which supermarkets are non-conforming in their pricing schemes is linked to the “total basket price” (total cost of the 118 items). An outlier is defined as any datapoint that varies significantly from all other observations in a dataset. In this paper, it is any supermarket that exceeds this total basket price by 5%. The aim of this research was twofold, with the first goal being to employ feature selection methods to reduce the number of items being collected. The second goal was to create a logistic regression learning model that can identify whether supermarkets are non-conforming, given their pricing information. The dataset contained 692 datapoints and out of these, only eight (8) were classified as outliers. This is an imbalanced dataset. Resampling by SMOTE (Synthetic Minority Oversampling Technique) was used to synthetically generate data for the training set. Seven (7) feature selection methods were also investigated and their results discussed and analysed. In doing this, a more balanced dataset was achieved which was tested and validated on the unseen data (testing set). The metrics indicated that a subset of these features can be collected whilst still maintaining the supermarket outliers.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信