{"title":"双加权 kNN:带有嵌入式特征选择的简单高效变体","authors":"Almudena Moreno-Ribera, Aida Calviño","doi":"10.1057/s41270-024-00302-5","DOIUrl":null,"url":null,"abstract":"<p>Predictive modeling aims at providing estimates of an unknown variable, the target, from a set of known ones, the input. The <i>k</i> Nearest Neighbors (<i>k</i>NN) is one of the best-known predictive algorithms due to its simplicity and well behavior. However, this class of models has some drawbacks, such as the non-robustness to the existence of irrelevant input features or the need to transform qualitative variables into dummies, with the corresponding loss of information for ordinal ones. In this work, a <i>k</i>NN regression variant, easily adaptable for classification purposes, is suggested. The proposal allows dealing with all types of input variables while embedding feature selection in a simple and efficient manner, reducing the tuning phase. More precisely, making use of the weighted Gower distance, we develop a powerful tool to cope with these inconveniences. Finally, to boost the tool predictive power, a second weighting scheme is added to the neighbors. The proposed method is applied to a collection of 20 data sets, different in size, data type, and distribution of the target variable. Moreover, the results are compared with the previously proposed <i>k</i>NN variants, showing its supremacy, particularly when the weighting scheme is based on non-linear association measures.</p>","PeriodicalId":43041,"journal":{"name":"Journal of Marketing Analytics","volume":"52 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Double-weighted kNN: a simple and efficient variant with embedded feature selection\",\"authors\":\"Almudena Moreno-Ribera, Aida Calviño\",\"doi\":\"10.1057/s41270-024-00302-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Predictive modeling aims at providing estimates of an unknown variable, the target, from a set of known ones, the input. The <i>k</i> Nearest Neighbors (<i>k</i>NN) is one of the best-known predictive algorithms due to its simplicity and well behavior. However, this class of models has some drawbacks, such as the non-robustness to the existence of irrelevant input features or the need to transform qualitative variables into dummies, with the corresponding loss of information for ordinal ones. In this work, a <i>k</i>NN regression variant, easily adaptable for classification purposes, is suggested. The proposal allows dealing with all types of input variables while embedding feature selection in a simple and efficient manner, reducing the tuning phase. More precisely, making use of the weighted Gower distance, we develop a powerful tool to cope with these inconveniences. Finally, to boost the tool predictive power, a second weighting scheme is added to the neighbors. The proposed method is applied to a collection of 20 data sets, different in size, data type, and distribution of the target variable. Moreover, the results are compared with the previously proposed <i>k</i>NN variants, showing its supremacy, particularly when the weighting scheme is based on non-linear association measures.</p>\",\"PeriodicalId\":43041,\"journal\":{\"name\":\"Journal of Marketing Analytics\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Marketing Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1057/s41270-024-00302-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BUSINESS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Marketing Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1057/s41270-024-00302-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BUSINESS","Score":null,"Total":0}
Double-weighted kNN: a simple and efficient variant with embedded feature selection
Predictive modeling aims at providing estimates of an unknown variable, the target, from a set of known ones, the input. The k Nearest Neighbors (kNN) is one of the best-known predictive algorithms due to its simplicity and well behavior. However, this class of models has some drawbacks, such as the non-robustness to the existence of irrelevant input features or the need to transform qualitative variables into dummies, with the corresponding loss of information for ordinal ones. In this work, a kNN regression variant, easily adaptable for classification purposes, is suggested. The proposal allows dealing with all types of input variables while embedding feature selection in a simple and efficient manner, reducing the tuning phase. More precisely, making use of the weighted Gower distance, we develop a powerful tool to cope with these inconveniences. Finally, to boost the tool predictive power, a second weighting scheme is added to the neighbors. The proposed method is applied to a collection of 20 data sets, different in size, data type, and distribution of the target variable. Moreover, the results are compared with the previously proposed kNN variants, showing its supremacy, particularly when the weighting scheme is based on non-linear association measures.
期刊介绍:
Data has become the new ore in today’s knowledge economy. However, merely storing and reporting are not enough to thrive in today’s increasingly competitive markets. What is called for is the ability to make sense of all these oceans of data, and to apply those insights to the way companies approach their markets, adjust to changing market conditions, and respond to new competitors.
Marketing analytics lies at the heart of this contemporary wave of data driven decision-making. Companies can no longer survive when they rely on gut instinct to make decisions. Strategic leverage of data is one of the few remaining sources of sustainable competitive advantage. New products can be copied faster than ever before. Staff are becoming less loyal as well as more mobile, and business centers themselves are moving across the globe in a world that is getting flatter and flatter.
The Journal of Marketing Analytics brings together applied research and practice papers in this blossoming field. A unique blend of applied academic research, combined with insights from commercial best practices makes the Journal of Marketing Analytics a perfect companion for academics and practitioners alike. Academics can stay in touch with the latest developments in this field. Marketing analytics professionals can read about the latest trends, and cutting edge academic research in this discipline.
The Journal of Marketing Analytics will feature applied research papers on topics like targeting, segmentation, big data, customer loyalty and lifecycle management, cross-selling, CRM, data quality management, multi-channel marketing, and marketing strategy.
The Journal of Marketing Analytics aims to combine the rigor of carefully controlled scientific research methods with applicability of real world case studies. Our double blind review process ensures that papers are selected on their content and merits alone, selecting the best possible papers in this field.