{"title":"Mallows' L2 distance in some multivariate methods and its application to histogram-type data","authors":"Katarina Ko, L. Billard","doi":"10.51936/polr7329","DOIUrl":null,"url":null,"abstract":"Mallows' L2 distance allows for decomposition of total inertia into within and between inertia according to Huygens theorem. It can be decomposed into three terms: the location term, the spread term and the shape term; a simple and straightforward proof of this theorem is presented. These characteristics are very helpful in the interpretation of the results for some distance-based methods, such as clustering by k-means and classical multidimensional scaling. For histogram-type data, Mallows' L2 distance is preferable because its calculation is simple, even when the number and length of the histograms' subintervals differ. An illustration of its use on population pyramids for 14 East European countries in the period 1995–2015 is presented. The results provide an insight into the information that this distance can extract from a complex dataset.","PeriodicalId":242585,"journal":{"name":"Advances in Methodology and Statistics","volume":"405 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Methodology and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51936/polr7329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Mallows' L2 distance allows for decomposition of total inertia into within and between inertia according to Huygens theorem. It can be decomposed into three terms: the location term, the spread term and the shape term; a simple and straightforward proof of this theorem is presented. These characteristics are very helpful in the interpretation of the results for some distance-based methods, such as clustering by k-means and classical multidimensional scaling. For histogram-type data, Mallows' L2 distance is preferable because its calculation is simple, even when the number and length of the histograms' subintervals differ. An illustration of its use on population pyramids for 14 East European countries in the period 1995–2015 is presented. The results provide an insight into the information that this distance can extract from a complex dataset.