{"title":"Detection of outliers with respect to a MUSIC geotechnical database","authors":"Jianye Ching, Kok-Kwang Phoon, Pengsheng Huang","doi":"10.1139/cgj-2023-0188","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel method that addresses a non-traditional class of outlier detection problems. The purpose of most outlier detection methods in the literature is to detect outliers within a dataset. A record can be considered as an outlier if it is distinct from the regular records in the dataset. However, the purpose of the novel outlier detection method proposed by this paper is to detect outlier data groups (a data group may denote a site or a project) with respect to a soil/rock property database. A data group is an outlier group if its characteristics (mean, variance, correlation, or higher order dependency) are distinct from the regular data groups in the database. This paper frames the outlier detection problem into a formal hypothesis testing problem with the null hypothesis “the target data group is identically distributed as the regular groups in the database”. With the hierarchical Bayesian model (HBM) previously developed by the first two authors, the p-value for this hypothesis testing problem can be estimated rigorously. Numerical and real examples show that the p-value can effectively detect outlier data groups as well as outlier records with respect to a database.","PeriodicalId":9382,"journal":{"name":"Canadian Geotechnical Journal","volume":"14 1","pages":"0"},"PeriodicalIF":3.0000,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Geotechnical Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1139/cgj-2023-0188","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, GEOLOGICAL","Score":null,"Total":0}
引用次数: 1
Abstract
This paper proposes a novel method that addresses a non-traditional class of outlier detection problems. The purpose of most outlier detection methods in the literature is to detect outliers within a dataset. A record can be considered as an outlier if it is distinct from the regular records in the dataset. However, the purpose of the novel outlier detection method proposed by this paper is to detect outlier data groups (a data group may denote a site or a project) with respect to a soil/rock property database. A data group is an outlier group if its characteristics (mean, variance, correlation, or higher order dependency) are distinct from the regular data groups in the database. This paper frames the outlier detection problem into a formal hypothesis testing problem with the null hypothesis “the target data group is identically distributed as the regular groups in the database”. With the hierarchical Bayesian model (HBM) previously developed by the first two authors, the p-value for this hypothesis testing problem can be estimated rigorously. Numerical and real examples show that the p-value can effectively detect outlier data groups as well as outlier records with respect to a database.
期刊介绍:
The Canadian Geotechnical Journal features articles, notes, reviews, and discussions related to new developments in geotechnical and geoenvironmental engineering, and applied sciences. The topics of papers written by researchers and engineers/scientists active in industry include soil and rock mechanics, material properties and fundamental behaviour, site characterization, foundations, excavations, tunnels, dams and embankments, slopes, landslides, geological and rock engineering, ground improvement, hydrogeology and contaminant hydrogeology, geochemistry, waste management, geosynthetics, offshore engineering, ice, frozen ground and northern engineering, risk and reliability applications, and physical and numerical modelling.
Contributions that have practical relevance are preferred, including case records. Purely theoretical contributions are not generally published unless they are on a topic of special interest (like unsaturated soil mechanics or cold regions geotechnics) or they have direct practical value.