{"title":"Using eigenvalues of distance matrices for outlier detection","authors":"Reza Modarres","doi":"10.3233/ida-230048","DOIUrl":null,"url":null,"abstract":"Distance or dissimilarity matrices are widely used in applications. We study the relationships between the eigenvalues of the distance matrices and outliers and show that outliers affect the pairwise distances and inflate the eigenvalues. We obtain the eigenvalues of a distance matrix that is affected by k outliers and compare them to the eigenvalues of a distance matrix with a constant structure. We show a discrepancy in the sizes of the eigenvalues of a distance matrix that is contaminated with outliers, present an algorithm and offer a new outlier detection method based on the eigenvalues of the distance matrix. We compare the new distance-based outlier technique with several existing methods under five distributions. The methods are applied to a study of public utility companies and gene expression data.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"57 2","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/ida-230048","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Distance or dissimilarity matrices are widely used in applications. We study the relationships between the eigenvalues of the distance matrices and outliers and show that outliers affect the pairwise distances and inflate the eigenvalues. We obtain the eigenvalues of a distance matrix that is affected by k outliers and compare them to the eigenvalues of a distance matrix with a constant structure. We show a discrepancy in the sizes of the eigenvalues of a distance matrix that is contaminated with outliers, present an algorithm and offer a new outlier detection method based on the eigenvalues of the distance matrix. We compare the new distance-based outlier technique with several existing methods under five distributions. The methods are applied to a study of public utility companies and gene expression data.
距离矩阵或异同矩阵在应用中被广泛使用。我们研究了距离矩阵的特征值与异常值之间的关系,结果表明异常值会影响成对距离并使特征值增大。我们得到了受 k 个异常值影响的距离矩阵的特征值,并将其与结构不变的距离矩阵的特征值进行了比较。我们显示了受离群值污染的距离矩阵特征值大小的差异,提出了一种算法,并提供了一种基于距离矩阵特征值的新离群值检测方法。我们将基于距离的新离群点技术与现有的几种方法在五种分布下进行了比较。这些方法被应用于对公用事业公司和基因表达数据的研究。
期刊介绍:
Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.