{"title":"电子商务中不平衡数据的分类","authors":"Liliya I. Besaleva, A. Weaver","doi":"10.1109/INTELLISYS.2017.8324212","DOIUrl":null,"url":null,"abstract":"Applications for machine learning algorithms are beginning to dominate the world of online commerce with their seemingly endless potential for supplementing fully customizable shopping experiences. From socially impactful event predictions to smarter ways of shopping online, big fast data is streaming in and being utilized constantly. Unfortunately, unusual instances of data, called imbalanced data, are still being ignored at large because of the inadequacies of analytical methods that are designed to handle homogenized data sets and to “smooth out” outliers. Consequently, rare use cases of significant importance remain neglected and lead to high-cost loses or even tragedies. In the past decade, a myriad of approaches handling this problem that range from data modifications to alterations of existing algorithms have appeared with varying success. Yet, the majority of them have major drawbacks when applied to different application domains because of the non-uniform nature of the applicable data. Within the vast domain of e-Commerce, we are proposing a new approach for handling imbalanced data, which is a hybrid classification method that will consist of a mixed solution of multi-modal data formats and algorithmic adaptations for an optimal balance between prediction accuracy, precision and specificity. Our solution improves data usability, classification accuracy and resulting costs of analyzing massive data sets used in personalizing customer experiences in e-Commerce.","PeriodicalId":131825,"journal":{"name":"2017 Intelligent Systems Conference (IntelliSys)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Classification of imbalanced data in E-commerce\",\"authors\":\"Liliya I. Besaleva, A. Weaver\",\"doi\":\"10.1109/INTELLISYS.2017.8324212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applications for machine learning algorithms are beginning to dominate the world of online commerce with their seemingly endless potential for supplementing fully customizable shopping experiences. From socially impactful event predictions to smarter ways of shopping online, big fast data is streaming in and being utilized constantly. Unfortunately, unusual instances of data, called imbalanced data, are still being ignored at large because of the inadequacies of analytical methods that are designed to handle homogenized data sets and to “smooth out” outliers. Consequently, rare use cases of significant importance remain neglected and lead to high-cost loses or even tragedies. In the past decade, a myriad of approaches handling this problem that range from data modifications to alterations of existing algorithms have appeared with varying success. Yet, the majority of them have major drawbacks when applied to different application domains because of the non-uniform nature of the applicable data. Within the vast domain of e-Commerce, we are proposing a new approach for handling imbalanced data, which is a hybrid classification method that will consist of a mixed solution of multi-modal data formats and algorithmic adaptations for an optimal balance between prediction accuracy, precision and specificity. Our solution improves data usability, classification accuracy and resulting costs of analyzing massive data sets used in personalizing customer experiences in e-Commerce.\",\"PeriodicalId\":131825,\"journal\":{\"name\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INTELLISYS.2017.8324212\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Intelligent Systems Conference (IntelliSys)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELLISYS.2017.8324212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applications for machine learning algorithms are beginning to dominate the world of online commerce with their seemingly endless potential for supplementing fully customizable shopping experiences. From socially impactful event predictions to smarter ways of shopping online, big fast data is streaming in and being utilized constantly. Unfortunately, unusual instances of data, called imbalanced data, are still being ignored at large because of the inadequacies of analytical methods that are designed to handle homogenized data sets and to “smooth out” outliers. Consequently, rare use cases of significant importance remain neglected and lead to high-cost loses or even tragedies. In the past decade, a myriad of approaches handling this problem that range from data modifications to alterations of existing algorithms have appeared with varying success. Yet, the majority of them have major drawbacks when applied to different application domains because of the non-uniform nature of the applicable data. Within the vast domain of e-Commerce, we are proposing a new approach for handling imbalanced data, which is a hybrid classification method that will consist of a mixed solution of multi-modal data formats and algorithmic adaptations for an optimal balance between prediction accuracy, precision and specificity. Our solution improves data usability, classification accuracy and resulting costs of analyzing massive data sets used in personalizing customer experiences in e-Commerce.