欧几里得距离差和绝对曼哈顿距离差之和：小数据表的多准则决策工具

IF 6 2区化学 Q1 CHEMISTRY, ANALYTICAL

Analytica Chimica Acta Pub Date : 2025-09-17 DOI:10.1016/j.aca.2025.344649

Károly Héberger

{"title":"欧几里得距离差和绝对曼哈顿距离差之和：小数据表的多准则决策工具","authors":"Károly Héberger","doi":"10.1016/j.aca.2025.344649","DOIUrl":null,"url":null,"abstract":"<h3>Background</h3>Despite its advantages, rank transformation leads inevitably to information loss. This work presents an extension for sum of ranking differences (SRD) algorithm for non-ranking environment. It is expedient to elaborate a new algorithm, which overcomes this difficulty. The procedure has been developed by the analogy of SRD, <em>i.e</em>., pairwise comparisons of (column) vectors, fixing one of them as gold standard and introducing two validation steps (the randomization and Wilcoxon tests after assigning uncertainties by cross-validation).<h3>Results</h3>Two emblematic distance metrics were involved in the development: the most frequently applied Euclidean distance and its robust counterpart the city block (Manhattan) distance. Such a way two new dissimilarity measures have been defined: Sum of Euclidean Distance Differences (DnE) and Sum of Absolute Manhattan Distance Differences (DnM) along with their randomization tests and Variance Analysis (ANOVA). Unfortunately, when leaving the safe rank environment, we also leave the well-known permutations and the theoretical backgrounds (Spearman footrule), as well. This study is limited to a maximum of eight rows in the input matrix, where exact theoretical random distributions are available. Sixteen carefully chosen data sets were selected covering a wide range of scientific disciplines and of numbers for columns and rows in the input matrix: between three to 80 and five to eight, respectively. Three case studies illustrate the advantages and disadvantages of the new dissimilarity measures and statistical tests.<h3>Significance</h3>Superior discrimination ability characterizes DnE and DnM; they provide a more sophisticated ranking (and grouping) patterns than SRD despite their smaller visualization (applicability) domain. The randomization test loses its sensitivity in the order of SRD>DnE>DnM. The latter two realize different clustering patterns from SRD and from each other but (almost) the same ordering. Hence, only one of them is recommended in a ranking environment. Although the random distributions of DnE and DnM is distorted a little, the probability of first kind error (say 5%) can safely be determined from the cumulated frequencies. Comprehensive enumeration of advantages and disadvantages has been completed for SRD, DnE and DnM as dissimilarity measures, clustering tools, multicriteria decision making (MCDM) techniques and their competitors. While preserving great advantages of SRD (simplicity, generality, MCDM character and lack of subjective weights), both new techniques are suitable dissimilarity measures, clustering and MCDM tools in non-ranking environments. DnE and DnM also inflict universal scales for later ANOVA and Wilcoxon tests.","PeriodicalId":240,"journal":{"name":"Analytica Chimica Acta","volume":"38 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sum of Euclidean Distance Differences and Sum of Absolute Manhattan Distance Differences: multicriteria decision making tools for small data tables\",\"authors\":\"Károly Héberger\",\"doi\":\"10.1016/j.aca.2025.344649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3>Background</h3>Despite its advantages, rank transformation leads inevitably to information loss. This work presents an extension for sum of ranking differences (SRD) algorithm for non-ranking environment. It is expedient to elaborate a new algorithm, which overcomes this difficulty. The procedure has been developed by the analogy of SRD, <em>i.e</em>., pairwise comparisons of (column) vectors, fixing one of them as gold standard and introducing two validation steps (the randomization and Wilcoxon tests after assigning uncertainties by cross-validation).<h3>Results</h3>Two emblematic distance metrics were involved in the development: the most frequently applied Euclidean distance and its robust counterpart the city block (Manhattan) distance. Such a way two new dissimilarity measures have been defined: Sum of Euclidean Distance Differences (DnE) and Sum of Absolute Manhattan Distance Differences (DnM) along with their randomization tests and Variance Analysis (ANOVA). Unfortunately, when leaving the safe rank environment, we also leave the well-known permutations and the theoretical backgrounds (Spearman footrule), as well. This study is limited to a maximum of eight rows in the input matrix, where exact theoretical random distributions are available. Sixteen carefully chosen data sets were selected covering a wide range of scientific disciplines and of numbers for columns and rows in the input matrix: between three to 80 and five to eight, respectively. Three case studies illustrate the advantages and disadvantages of the new dissimilarity measures and statistical tests.<h3>Significance</h3>Superior discrimination ability characterizes DnE and DnM; they provide a more sophisticated ranking (and grouping) patterns than SRD despite their smaller visualization (applicability) domain. The randomization test loses its sensitivity in the order of SRD>DnE>DnM. The latter two realize different clustering patterns from SRD and from each other but (almost) the same ordering. Hence, only one of them is recommended in a ranking environment. Although the random distributions of DnE and DnM is distorted a little, the probability of first kind error (say 5%) can safely be determined from the cumulated frequencies. Comprehensive enumeration of advantages and disadvantages has been completed for SRD, DnE and DnM as dissimilarity measures, clustering tools, multicriteria decision making (MCDM) techniques and their competitors. While preserving great advantages of SRD (simplicity, generality, MCDM character and lack of subjective weights), both new techniques are suitable dissimilarity measures, clustering and MCDM tools in non-ranking environments. DnE and DnM also inflict universal scales for later ANOVA and Wilcoxon tests.\",\"PeriodicalId\":240,\"journal\":{\"name\":\"Analytica Chimica Acta\",\"volume\":\"38 1\",\"pages\":\"\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Analytica Chimica Acta\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1016/j.aca.2025.344649\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytica Chimica Acta","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1016/j.aca.2025.344649","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

摘要

尽管排名变换具有优势，但不可避免地会导致信息丢失。本文提出了一种非排序环境下排序差分和算法的扩展。为了克服这一困难，设计一种新的算法是方便的。该程序是通过SRD的类比开发的，即（列）向量的两两比较，将其中一个固定为金标准，并引入两个验证步骤（随机化和Wilcoxon测试，通过交叉验证分配不确定性后）。结果在开发过程中涉及两个具有象征意义的距离度量：最常用的欧几里得距离和其健壮的对应的城市街区（曼哈顿）距离。这样就定义了两种新的差异度量：欧几里得距离差异和绝对曼哈顿距离差异以及它们的随机化检验和方差分析。不幸的是，当我们离开安全的排名环境时，我们也离开了众所周知的排列和理论背景（Spearman footrule）。本研究仅限于输入矩阵中最多8行，其中精确的理论随机分布是可用的。16个精心挑选的数据集涵盖了广泛的科学学科和输入矩阵的列数和行数：分别在3到80和5到8之间。三个案例研究说明了新的不相似度量和统计检验的优缺点。显著性：DnE和DnM具有较强的分辨能力；它们提供了比SRD更复杂的排序（和分组）模式，尽管它们的可视化（适用性）领域更小。随机化试验的灵敏度按SRD>；DnE>；DnM的顺序丧失。后两者实现了不同于SRD的聚类模式，彼此之间也实现了不同的聚类模式，但排序（几乎）相同。因此，在排序环境中只推荐其中一种。虽然DnE和DnM的随机分布有一点扭曲，但从累积频率可以安全地确定第一类误差的概率（例如5%）。全面列举了SRD、DnE和DnM作为差异度量、聚类工具、多标准决策（MCDM）技术及其竞争对手的优缺点。在保留SRD的巨大优势（简单性、通用性、MCDM特征和缺乏主观权重）的同时，这两种新技术都是适用于非排序环境的不相似性度量、聚类和MCDM工具。DnE和DnM也对后来的方差分析和Wilcoxon测试施加通用量表。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Sum of Euclidean Distance Differences and Sum of Absolute Manhattan Distance Differences: multicriteria decision making tools for small data tables

查看原文本刊更多论文

Sum of Euclidean Distance Differences and Sum of Absolute Manhattan Distance Differences: multicriteria decision making tools for small data tables

Background

Despite its advantages, rank transformation leads inevitably to information loss. This work presents an extension for sum of ranking differences (SRD) algorithm for non-ranking environment. It is expedient to elaborate a new algorithm, which overcomes this difficulty. The procedure has been developed by the analogy of SRD, i.e., pairwise comparisons of (column) vectors, fixing one of them as gold standard and introducing two validation steps (the randomization and Wilcoxon tests after assigning uncertainties by cross-validation).

Results

Two emblematic distance metrics were involved in the development: the most frequently applied Euclidean distance and its robust counterpart the city block (Manhattan) distance. Such a way two new dissimilarity measures have been defined: Sum of Euclidean Distance Differences (DnE) and Sum of Absolute Manhattan Distance Differences (DnM) along with their randomization tests and Variance Analysis (ANOVA). Unfortunately, when leaving the safe rank environment, we also leave the well-known permutations and the theoretical backgrounds (Spearman footrule), as well. This study is limited to a maximum of eight rows in the input matrix, where exact theoretical random distributions are available. Sixteen carefully chosen data sets were selected covering a wide range of scientific disciplines and of numbers for columns and rows in the input matrix: between three to 80 and five to eight, respectively. Three case studies illustrate the advantages and disadvantages of the new dissimilarity measures and statistical tests.

Significance

Superior discrimination ability characterizes DnE and DnM; they provide a more sophisticated ranking (and grouping) patterns than SRD despite their smaller visualization (applicability) domain. The randomization test loses its sensitivity in the order of SRD>DnE>DnM. The latter two realize different clustering patterns from SRD and from each other but (almost) the same ordering. Hence, only one of them is recommended in a ranking environment. Although the random distributions of DnE and DnM is distorted a little, the probability of first kind error (say 5%) can safely be determined from the cumulated frequencies. Comprehensive enumeration of advantages and disadvantages has been completed for SRD, DnE and DnM as dissimilarity measures, clustering tools, multicriteria decision making (MCDM) techniques and their competitors. While preserving great advantages of SRD (simplicity, generality, MCDM character and lack of subjective weights), both new techniques are suitable dissimilarity measures, clustering and MCDM tools in non-ranking environments. DnE and DnM also inflict universal scales for later ANOVA and Wilcoxon tests.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Analytica Chimica Acta 化学-分析化学

CiteScore

10.40

自引率

6.50%

发文量

1081

审稿时长

38 days

期刊介绍： Analytica Chimica Acta has an open access mirror journal Analytica Chimica Acta: X, sharing the same aims and scope, editorial team, submission system and rigorous peer review. Analytica Chimica Acta provides a forum for the rapid publication of original research, and critical, comprehensive reviews dealing with all aspects of fundamental and applied modern analytical chemistry. The journal welcomes the submission of research papers which report studies concerning the development of new and significant analytical methodologies. In determining the suitability of submitted articles for publication, particular scrutiny will be placed on the degree of novelty and impact of the research and the extent to which it adds to the existing body of knowledge in analytical chemistry.