关于对存在异常值的双变量寿命数据建模

Q1 Decision Sciences

Annals of Data Science Pub Date : 2024-01-27 DOI:10.1007/s40745-023-00511-2

Sumangal Bhattacharya, Ishapathik Das, Muralidharan Kunnummal

{"title":"关于对存在异常值的双变量寿命数据建模","authors":"Sumangal Bhattacharya, Ishapathik Das, Muralidharan Kunnummal","doi":"10.1007/s40745-023-00511-2","DOIUrl":null,"url":null,"abstract":"<div><p>Many items fail instantaneously or early in life-testing experiments, mainly in electronic parts and clinical trials, due to faulty construction, inferior quality, or non-response to treatments. We record the observed lifetime as zero or near zero, defined as instantaneous or early failure observations. In general, some observations may be concentrated around a point, and others follow some continuous distribution. In data, these kinds of observations are regarded as inliers. Some unimodal parametric distributions, such as Weibull, gamma, log-normal, and Pareto, are usually used to fit the data for analyzing and predicting future events concerning lifetime observations. The usual modelling approach based on uni-modal parametric distributions may not provide the expected results for data with inliers because of the multi-modal nature of the data. The correlated bivariate observations with inliers also frequently occur in life-testing experiments. Here, we propose a method of modelling bivariate lifetime data with instantaneous and early failure observations. A new bivariate distribution is constructed by combining the bivariate uniform and bivariate Weibull distributions. The bivariate Weibull distribution has been obtained by using a 2-dimensional copula, assuming that the marginal distribution is a two-parametric Weibull distribution. An attempt has also been made to derive some properties (viz. joint probability density function, survival (reliability) function, and hazard (failure rate) function) of the modified bivariate Weibull distribution so obtained. The model’s unknown parameters have been estimated using a combination of the Maximum Likelihood Estimation technique and machine learning clustering algorithm, viz. Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Numerical examples are provided using simulated data to illustrate and test the performance of the proposed methodologies. Relevant codes and necessary computations have been developed using R and Python languages. The proposed method has been applied to real data with possible inflation. It has been observed that the data contain inliers with a probability of 0.57. The study also does a comparison test with the proposed method and the existing method in the literature, wherein it was found that the proposed method provides a significantly better fit than the base model (in literature) with a <i>P</i> value less than 0.0001.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"1 - 22"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Modeling Bivariate Lifetime Data in the Presence of Inliers\",\"authors\":\"Sumangal Bhattacharya, Ishapathik Das, Muralidharan Kunnummal\",\"doi\":\"10.1007/s40745-023-00511-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Many items fail instantaneously or early in life-testing experiments, mainly in electronic parts and clinical trials, due to faulty construction, inferior quality, or non-response to treatments. We record the observed lifetime as zero or near zero, defined as instantaneous or early failure observations. In general, some observations may be concentrated around a point, and others follow some continuous distribution. In data, these kinds of observations are regarded as inliers. Some unimodal parametric distributions, such as Weibull, gamma, log-normal, and Pareto, are usually used to fit the data for analyzing and predicting future events concerning lifetime observations. The usual modelling approach based on uni-modal parametric distributions may not provide the expected results for data with inliers because of the multi-modal nature of the data. The correlated bivariate observations with inliers also frequently occur in life-testing experiments. Here, we propose a method of modelling bivariate lifetime data with instantaneous and early failure observations. A new bivariate distribution is constructed by combining the bivariate uniform and bivariate Weibull distributions. The bivariate Weibull distribution has been obtained by using a 2-dimensional copula, assuming that the marginal distribution is a two-parametric Weibull distribution. An attempt has also been made to derive some properties (viz. joint probability density function, survival (reliability) function, and hazard (failure rate) function) of the modified bivariate Weibull distribution so obtained. The model’s unknown parameters have been estimated using a combination of the Maximum Likelihood Estimation technique and machine learning clustering algorithm, viz. Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Numerical examples are provided using simulated data to illustrate and test the performance of the proposed methodologies. Relevant codes and necessary computations have been developed using R and Python languages. The proposed method has been applied to real data with possible inflation. It has been observed that the data contain inliers with a probability of 0.57. The study also does a comparison test with the proposed method and the existing method in the literature, wherein it was found that the proposed method provides a significantly better fit than the base model (in literature) with a <i>P</i> value less than 0.0001.</p></div>\",\"PeriodicalId\":36280,\"journal\":{\"name\":\"Annals of Data Science\",\"volume\":\"12 1\",\"pages\":\"1 - 22\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s40745-023-00511-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Data Science","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s40745-023-00511-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 0

摘要

在生命测试实验中，由于结构缺陷、质量低劣或对治疗无反应，许多项目在瞬间或早期失效，主要是在电子部件和临床试验中。我们将观察到的寿命记录为零或接近零，定义为瞬时或早期的故障观察。一般来说，一些观测值可能集中在一个点周围，而其他观测值则遵循某种连续分布。在数据中，这类观测值被视为内线。一些单峰参数分布，如威布尔分布、伽马分布、对数正态分布和帕累托分布，通常用于拟合数据，以分析和预测与生命周期观测有关的未来事件。由于数据的多模态特性，通常基于单模态参数分布的建模方法可能无法为具有内线的数据提供预期结果。在寿命测试实验中也经常出现与内线相关的双变量观测。在这里，我们提出了一种基于瞬时和早期失效观测的二元寿命数据建模方法。将二元均匀分布与二元威布尔分布相结合，构造了一个新的二元分布。在假设边缘分布为双参数威布尔分布的前提下，利用二维联结公式得到了二元威布尔分布。并尝试推导出由此得到的修正二元威布尔分布的一些性质（即联合概率密度函数、生存（可靠性）函数和危害（故障率）函数）。模型的未知参数已经使用最大似然估计技术和机器学习聚类算法的组合进行估计，即基于密度的空间聚类应用噪声（DBSCAN）。最后，用仿真数据对所提出方法的性能进行了说明和验证。使用R和Python语言开发了相关代码和必要的计算。该方法已应用于可能存在通货膨胀的实际数据。已经观察到，数据包含内线的概率为0.57。本研究还将本文提出的方法与文献中已有的方法进行了对比检验，发现本文提出的方法的拟合效果明显优于文献中基本模型，P值小于0.0001。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

On Modeling Bivariate Lifetime Data in the Presence of Inliers

查看原文本刊更多论文

On Modeling Bivariate Lifetime Data in the Presence of Inliers

Many items fail instantaneously or early in life-testing experiments, mainly in electronic parts and clinical trials, due to faulty construction, inferior quality, or non-response to treatments. We record the observed lifetime as zero or near zero, defined as instantaneous or early failure observations. In general, some observations may be concentrated around a point, and others follow some continuous distribution. In data, these kinds of observations are regarded as inliers. Some unimodal parametric distributions, such as Weibull, gamma, log-normal, and Pareto, are usually used to fit the data for analyzing and predicting future events concerning lifetime observations. The usual modelling approach based on uni-modal parametric distributions may not provide the expected results for data with inliers because of the multi-modal nature of the data. The correlated bivariate observations with inliers also frequently occur in life-testing experiments. Here, we propose a method of modelling bivariate lifetime data with instantaneous and early failure observations. A new bivariate distribution is constructed by combining the bivariate uniform and bivariate Weibull distributions. The bivariate Weibull distribution has been obtained by using a 2-dimensional copula, assuming that the marginal distribution is a two-parametric Weibull distribution. An attempt has also been made to derive some properties (viz. joint probability density function, survival (reliability) function, and hazard (failure rate) function) of the modified bivariate Weibull distribution so obtained. The model’s unknown parameters have been estimated using a combination of the Maximum Likelihood Estimation technique and machine learning clustering algorithm, viz. Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Numerical examples are provided using simulated data to illustrate and test the performance of the proposed methodologies. Relevant codes and necessary computations have been developed using R and Python languages. The proposed method has been applied to real data with possible inflation. It has been observed that the data contain inliers with a probability of 0.57. The study also does a comparison test with the proposed method and the existing method in the literature, wherein it was found that the proposed method provides a significantly better fit than the base model (in literature) with a P value less than 0.0001.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Data Science Decision Sciences-Statistics, Probability and Uncertainty

CiteScore

6.50

自引率

0.00%

发文量

期刊介绍： Annals of Data Science (ADS) publishes cutting-edge research findings, experimental results and case studies of data science. Although Data Science is regarded as an interdisciplinary field of using mathematics, statistics, databases, data mining, high-performance computing, knowledge management and virtualization to discover knowledge from Big Data, it should have its own scientific contents, such as axioms, laws and rules, which are fundamentally important for experts in different fields to explore their own interests from Big Data. ADS encourages contributors to address such challenging problems at this exchange platform. At present, how to discover knowledge from heterogeneous data under Big Data environment needs to be addressed. ADS is a series of volumes edited by either the editorial office or guest editors. Guest editors will be responsible for call-for-papers and the review process for high-quality contributions in their volumes.