电子商务中不平衡数据的分类

2017 Intelligent Systems Conference (IntelliSys) Pub Date : 2017-09-01 DOI:10.1109/INTELLISYS.2017.8324212

Liliya I. Besaleva, A. Weaver

{"title":"电子商务中不平衡数据的分类","authors":"Liliya I. Besaleva, A. Weaver","doi":"10.1109/INTELLISYS.2017.8324212","DOIUrl":null,"url":null,"abstract":"Applications for machine learning algorithms are beginning to dominate the world of online commerce with their seemingly endless potential for supplementing fully customizable shopping experiences. From socially impactful event predictions to smarter ways of shopping online, big fast data is streaming in and being utilized constantly. Unfortunately, unusual instances of data, called imbalanced data, are still being ignored at large because of the inadequacies of analytical methods that are designed to handle homogenized data sets and to “smooth out” outliers. Consequently, rare use cases of significant importance remain neglected and lead to high-cost loses or even tragedies. In the past decade, a myriad of approaches handling this problem that range from data modifications to alterations of existing algorithms have appeared with varying success. Yet, the majority of them have major drawbacks when applied to different application domains because of the non-uniform nature of the applicable data. Within the vast domain of e-Commerce, we are proposing a new approach for handling imbalanced data, which is a hybrid classification method that will consist of a mixed solution of multi-modal data formats and algorithmic adaptations for an optimal balance between prediction accuracy, precision and specificity. Our solution improves data usability, classification accuracy and resulting costs of analyzing massive data sets used in personalizing customer experiences in e-Commerce.","PeriodicalId":131825,"journal":{"name":"2017 Intelligent Systems Conference (IntelliSys)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Classification of imbalanced data in E-commerce\",\"authors\":\"Liliya I. Besaleva, A. Weaver\",\"doi\":\"10.1109/INTELLISYS.2017.8324212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applications for machine learning algorithms are beginning to dominate the world of online commerce with their seemingly endless potential for supplementing fully customizable shopping experiences. From socially impactful event predictions to smarter ways of shopping online, big fast data is streaming in and being utilized constantly. Unfortunately, unusual instances of data, called imbalanced data, are still being ignored at large because of the inadequacies of analytical methods that are designed to handle homogenized data sets and to “smooth out” outliers. Consequently, rare use cases of significant importance remain neglected and lead to high-cost loses or even tragedies. In the past decade, a myriad of approaches handling this problem that range from data modifications to alterations of existing algorithms have appeared with varying success. Yet, the majority of them have major drawbacks when applied to different application domains because of the non-uniform nature of the applicable data. Within the vast domain of e-Commerce, we are proposing a new approach for handling imbalanced data, which is a hybrid classification method that will consist of a mixed solution of multi-modal data formats and algorithmic adaptations for an optimal balance between prediction accuracy, precision and specificity. Our solution improves data usability, classification accuracy and resulting costs of analyzing massive data sets used in personalizing customer experiences in e-Commerce.\",\"PeriodicalId\":131825,\"journal\":{\"name\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INTELLISYS.2017.8324212\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Intelligent Systems Conference (IntelliSys)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELLISYS.2017.8324212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

机器学习算法的应用程序正开始主导在线商务世界，它们似乎具有无限的潜力，可以补充完全可定制的购物体验。从具有社会影响力的事件预测到更智能的在线购物方式，大数据正在源源不断地涌入并被不断利用。不幸的是，不寻常的数据实例，称为不平衡数据，仍然被忽略，因为分析方法的不足，旨在处理均匀的数据集和“平滑”异常值。因此，非常重要的罕见用例仍然被忽视，并导致高成本的损失甚至悲剧。在过去的十年里，处理这个问题的无数方法，从数据修改到现有算法的改变，都取得了不同程度的成功。然而，由于可应用数据的不统一性质，它们中的大多数在应用于不同的应用程序域时存在主要缺陷。在电子商务的广阔领域中，我们提出了一种处理不平衡数据的新方法，这是一种混合分类方法，它将由多模态数据格式和算法适应的混合解决方案组成，以实现预测准确性，精度和特异性之间的最佳平衡。我们的解决方案提高了数据可用性、分类准确性，并降低了分析电子商务中个性化客户体验中使用的大量数据集的成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classification of imbalanced data in E-commerce

Applications for machine learning algorithms are beginning to dominate the world of online commerce with their seemingly endless potential for supplementing fully customizable shopping experiences. From socially impactful event predictions to smarter ways of shopping online, big fast data is streaming in and being utilized constantly. Unfortunately, unusual instances of data, called imbalanced data, are still being ignored at large because of the inadequacies of analytical methods that are designed to handle homogenized data sets and to “smooth out” outliers. Consequently, rare use cases of significant importance remain neglected and lead to high-cost loses or even tragedies. In the past decade, a myriad of approaches handling this problem that range from data modifications to alterations of existing algorithms have appeared with varying success. Yet, the majority of them have major drawbacks when applied to different application domains because of the non-uniform nature of the applicable data. Within the vast domain of e-Commerce, we are proposing a new approach for handling imbalanced data, which is a hybrid classification method that will consist of a mixed solution of multi-modal data formats and algorithmic adaptations for an optimal balance between prediction accuracy, precision and specificity. Our solution improves data usability, classification accuracy and resulting costs of analyzing massive data sets used in personalizing customer experiences in e-Commerce.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 Intelligent Systems Conference (IntelliSys)

自引率

0.00%

发文量