Football and the dark side of cluster analysis

C. Hennig, Serhat Emre Akhanli
{"title":"Football and the dark side of cluster analysis","authors":"C. Hennig, Serhat Emre Akhanli","doi":"10.20347/WIAS.REPORT.29","DOIUrl":null,"url":null,"abstract":"In cluster analysis, decisions on data preprocessing such as how to select, transform, and standardise variables and how to aggregate information from continuous, count and categorical variables cannot be made in a supervised manner, i.e., based on prediction of a response variable. Statisticians often attempt to make such decisions in an automated way by optimising certain objective functions of the data anyway, but this usually ignores the fact that in cluster analysis these decisions determine the meaning of the resulting clustering. We argue that the decisions should be made based on the aim and intended interpretation of the clustering and the meaning of the variables. The rationale is that preprocessing should be done in such a way that the resulting distances, as used by the clustering method, match as well as possible the \"interpretative distances\" between objects as determined by the meaning of the variables and objects. Such \"interpretative distances\" are usually not precisely specified and involve a certain amount of subjectivity. We will use ongoing work on clustering football players based on performance data to illustrate how such decisions can be made, how much of an impact they can have, how the data can still help with them and to highlight some issues with the approach.","PeriodicalId":330529,"journal":{"name":"International Federation of Classification Societies","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Federation of Classification Societies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20347/WIAS.REPORT.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In cluster analysis, decisions on data preprocessing such as how to select, transform, and standardise variables and how to aggregate information from continuous, count and categorical variables cannot be made in a supervised manner, i.e., based on prediction of a response variable. Statisticians often attempt to make such decisions in an automated way by optimising certain objective functions of the data anyway, but this usually ignores the fact that in cluster analysis these decisions determine the meaning of the resulting clustering. We argue that the decisions should be made based on the aim and intended interpretation of the clustering and the meaning of the variables. The rationale is that preprocessing should be done in such a way that the resulting distances, as used by the clustering method, match as well as possible the "interpretative distances" between objects as determined by the meaning of the variables and objects. Such "interpretative distances" are usually not precisely specified and involve a certain amount of subjectivity. We will use ongoing work on clustering football players based on performance data to illustrate how such decisions can be made, how much of an impact they can have, how the data can still help with them and to highlight some issues with the approach.
足球和聚类分析的阴暗面
在聚类分析中,关于数据预处理的决策,如如何选择、转换和标准化变量,以及如何从连续变量、计数变量和分类变量中聚集信息,不能以监督的方式做出,即基于对响应变量的预测。统计学家经常试图通过优化数据的某些目标函数,以一种自动化的方式做出这样的决定,但这通常忽略了这样一个事实,即在聚类分析中,这些决定决定了最终聚类的意义。我们认为,决策应该基于目标和预期的解释聚类和变量的意义。其基本原理是,预处理应该以这样一种方式进行,即聚类方法使用的结果距离,尽可能匹配由变量和对象的含义决定的对象之间的“解释距离”。这种“解释距离”通常没有精确规定,并且涉及一定程度的主观性。我们将使用正在进行的基于表现数据的足球运动员聚类工作来说明如何做出这样的决定,他们可以产生多大的影响,数据如何仍然可以帮助他们,并强调该方法的一些问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信