用于检测特立尼达和多巴哥超市不公平定价的机器学习技术

Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020) Pub Date : 1900-01-01 DOI:10.47412/gids9258

A. Ramdhanie

{"title":"用于检测特立尼达和多巴哥超市不公平定价的机器学习技术","authors":"A. Ramdhanie","doi":"10.47412/gids9258","DOIUrl":null,"url":null,"abstract":"The tracking of prices in monitored supermarkets across Trinidad and Tobago is done by the Ministry of Trade and Industry. This initiative involves data collection every month for 118 grocery items (“standard basket”). The task of identifying which supermarkets are non-conforming in their pricing schemes is linked to the “total basket price” (total cost of the 118 items). An outlier is defined as any datapoint that varies significantly from all other observations in a dataset. In this paper, it is any supermarket that exceeds this total basket price by 5%. The aim of this research was twofold, with the first goal being to employ feature selection methods to reduce the number of items being collected. The second goal was to create a logistic regression learning model that can identify whether supermarkets are non-conforming, given their pricing information. The dataset contained 692 datapoints and out of these, only eight (8) were classified as outliers. This is an imbalanced dataset. Resampling by SMOTE (Synthetic Minority Oversampling Technique) was used to synthetically generate data for the training set. Seven (7) feature selection methods were also investigated and their results discussed and analysed. In doing this, a more balanced dataset was achieved which was tested and validated on the unseen data (testing set). The metrics indicated that a subset of these features can be collected whilst still maintaining the supermarket outliers.","PeriodicalId":206492,"journal":{"name":"Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MACHINE LEARNING TECHNIQUES FOR THE DETECTION OF UNFAIR PRICING IN SUPERMARKETS ACROSS TRINIDAD AND TOBAGO\",\"authors\":\"A. Ramdhanie\",\"doi\":\"10.47412/gids9258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The tracking of prices in monitored supermarkets across Trinidad and Tobago is done by the Ministry of Trade and Industry. This initiative involves data collection every month for 118 grocery items (“standard basket”). The task of identifying which supermarkets are non-conforming in their pricing schemes is linked to the “total basket price” (total cost of the 118 items). An outlier is defined as any datapoint that varies significantly from all other observations in a dataset. In this paper, it is any supermarket that exceeds this total basket price by 5%. The aim of this research was twofold, with the first goal being to employ feature selection methods to reduce the number of items being collected. The second goal was to create a logistic regression learning model that can identify whether supermarkets are non-conforming, given their pricing information. The dataset contained 692 datapoints and out of these, only eight (8) were classified as outliers. This is an imbalanced dataset. Resampling by SMOTE (Synthetic Minority Oversampling Technique) was used to synthetically generate data for the training set. Seven (7) feature selection methods were also investigated and their results discussed and analysed. In doing this, a more balanced dataset was achieved which was tested and validated on the unseen data (testing set). The metrics indicated that a subset of these features can be collected whilst still maintaining the supermarket outliers.\",\"PeriodicalId\":206492,\"journal\":{\"name\":\"Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47412/gids9258\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47412/gids9258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

特立尼达和多巴哥各受监控超市的价格跟踪是由贸易和工业部完成的。这项计划包括每月收集118种食品杂货(“标准篮子”)的数据。确定哪些超市的定价方案不符合标准的任务与“总篮子价格”(118种商品的总成本)有关。异常值被定义为与数据集中所有其他观测值有显著差异的任何数据点。在本文中，它是任何一个超过这个总篮子价格5%的超市。本研究的目的是双重的，第一个目标是采用特征选择方法来减少被收集的项目数量。第二个目标是创建一个逻辑回归学习模型，该模型可以根据定价信息识别超市是否不符合标准。该数据集包含692个数据点，其中只有8个数据点被归类为异常值。这是一个不平衡的数据集。采用SMOTE (Synthetic Minority Oversampling Technique)重采样技术合成训练集数据。研究了7种特征选择方法，并对其结果进行了讨论和分析。通过这样做，实现了一个更加平衡的数据集，该数据集在未见过的数据(测试集)上进行了测试和验证。指标表明，可以收集这些特征的子集，同时仍然保持超市的异常值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MACHINE LEARNING TECHNIQUES FOR THE DETECTION OF UNFAIR PRICING IN SUPERMARKETS ACROSS TRINIDAD AND TOBAGO

The tracking of prices in monitored supermarkets across Trinidad and Tobago is done by the Ministry of Trade and Industry. This initiative involves data collection every month for 118 grocery items (“standard basket”). The task of identifying which supermarkets are non-conforming in their pricing schemes is linked to the “total basket price” (total cost of the 118 items). An outlier is defined as any datapoint that varies significantly from all other observations in a dataset. In this paper, it is any supermarket that exceeds this total basket price by 5%. The aim of this research was twofold, with the first goal being to employ feature selection methods to reduce the number of items being collected. The second goal was to create a logistic regression learning model that can identify whether supermarkets are non-conforming, given their pricing information. The dataset contained 692 datapoints and out of these, only eight (8) were classified as outliers. This is an imbalanced dataset. Resampling by SMOTE (Synthetic Minority Oversampling Technique) was used to synthetically generate data for the training set. Seven (7) feature selection methods were also investigated and their results discussed and analysed. In doing this, a more balanced dataset was achieved which was tested and validated on the unseen data (testing set). The metrics indicated that a subset of these features can be collected whilst still maintaining the supermarket outliers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020)

自引率

0.00%

发文量