Computational Statistics最新文献_第6页

Imbalanced data sampling design based on grid boundary domain for big data 基于网格边界域的大数据不平衡数据采样设计

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-03-08 DOI: 10.1007/s00180-024-01471-8

{"title":"Imbalanced data sampling design based on grid boundary domain for big data","authors":"","doi":"10.1007/s00180-024-01471-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01471-8","url":null,"abstract":"<h3>Abstract</h3> <p>The data distribution is often associated with a <em>priori</em>-known probability, and the occurrence probability of interest events is small, so a large amount of imbalanced data appears in sociology, economics, engineering, and various other fields. The existing over- and under-sampling methods are widely used in imbalanced data classification problems, but over-sampling leads to overfitting, and under-sampling ignores the effective information. We propose a new sampling design algorithm called the neighbor grid of boundary mixed-sampling (NGBM), which focuses on the boundary information. This paper obtains the classification boundary information through grid boundary domain identification, thereby determining the importance of the samples. Based on this premise, the synthetic minority oversampling technique is applied to the boundary grid, and random under-sampling is applied to the other grids. With the help of this mixed sampling strategy, more important classification boundary information, especially for positive sample information identification is extracted. Numerical simulations and real data analysis are used to discuss the parameter-setting strategy of the NGBM and illustrate the advantages of the proposed NGBM in the imbalanced data, as well as practical applications.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"54 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140075873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse estimation of linear model via Bayesian method $$^*$$ 通过贝叶斯方法对线性模型进行稀疏估计 $$^*$$

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-03-04 DOI: 10.1007/s00180-024-01474-5

引用次数: 0

Degree selection methods for curve estimation via Bernstein polynomials 通过伯恩斯坦多项式进行曲线估算的度数选择方法

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-03-02 DOI: 10.1007/s00180-024-01473-6

引用次数: 0

Automatic piecewise linear regression 自动片断线性回归

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-03-01 DOI: 10.1007/s00180-024-01475-4

Mathias von Ottenbreit, Riccardo De Bin

引用次数: 0

Variational Bayesian Lasso for spline regression 用于样条回归的变异贝叶斯套索法

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-02-24 DOI: 10.1007/s00180-024-01470-9

Larissa C. Alves, Ronaldo Dias, Helio S. Migon

引用次数: 0

Bayesian estimation of the number of species from Poisson-Lindley stochastic abundance model using non-informative priors 利用非信息先验从泊松-林德利随机丰度模型中对物种数量进行贝叶斯估计

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-02-23 DOI: 10.1007/s00180-024-01464-7

Anurag Pathak, Manoj Kumar, Sanjay Kumar Singh, Umesh Singh, Sandeep Kumar

引用次数: 0

Generation of normal distributions revisited 重新审视正态分布的生成

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-02-23 DOI: 10.1007/s00180-024-01468-3

Takayuki Umeda

引用次数: 0

Bayesian regression models in gretl: the BayTool package gretl 中的贝叶斯回归模型：BayTool 软件包

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-02-21 DOI: 10.1007/s00180-024-01466-5

Luca Pedini

引用次数: 0

Bayesian sequential probability ratio test for vaccine efficacy trials 疫苗效力试验的贝叶斯序列概率比检验

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-02-20 DOI: 10.1007/s00180-024-01458-5

Erina Paul, Santosh Sutradhar, Jonathan Hartzel, Devan V. Mehrotra

{"title":"Bayesian sequential probability ratio test for vaccine efficacy trials","authors":"Erina Paul, Santosh Sutradhar, Jonathan Hartzel, Devan V. Mehrotra","doi":"10.1007/s00180-024-01458-5","DOIUrl":"https://doi.org/10.1007/s00180-024-01458-5","url":null,"abstract":"<p>Designing vaccine efficacy (VE) trials often requires recruiting large numbers of participants when the diseases of interest have a low incidence. When developing novel vaccines, such as for COVID-19 disease, the plausible range of VE is quite large at the design stage. Thus, the number of events needed to demonstrate efficacy above a pre-defined regulatory threshold can be difficult to predict and the time needed to accrue the necessary events can often be long. Therefore, it is advantageous to evaluate the efficacy at earlier interim analysis in the trial to potentially allow the trials to stop early for overwhelming VE or futility. In such cases, incorporating interim analyses through the use of the sequential probability ratio test (SPRT) can be helpful to allow for multiple analyses while controlling for both type-I and type-II errors. In this article, we propose a Bayesian SPRT for designing a vaccine trial for comparing a test vaccine with a control assuming two Poisson incidence rates. We provide guidance on how to choose the prior distribution and how to optimize the number of events for interim analyses to maximize the efficiency of the design. Through simulations, we demonstrate how the proposed Bayesian SPRT performs better when compared with the corresponding frequentist SPRT. An R repository to implement the proposed method is placed at: https://github.com/Merck/bayesiansprt.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"14 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139927751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Overlapping coefficient in network-based semi-supervised clustering 基于网络的半监督聚类中的重叠系数

IF 1.3 4区数学

Computational Statistics Pub Date : 2024-02-19 DOI: 10.1007/s00180-024-01457-6

Claudio Conversano, Luca Frigau, Giulia Contu

{"title":"Overlapping coefficient in network-based semi-supervised clustering","authors":"Claudio Conversano, Luca Frigau, Giulia Contu","doi":"10.1007/s00180-024-01457-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01457-6","url":null,"abstract":"<p>Network-based Semi-Supervised Clustering (NeSSC) is a semi-supervised approach for clustering in the presence of an outcome variable. It uses a classification or regression model on resampled versions of the original data to produce a proximity matrix that indicates the magnitude of the similarity between pairs of observations measured with respect to the outcome. This matrix is transformed into a complex network on which a community detection algorithm is applied to search for underlying community structures which is a partition of the instances into highly homogeneous clusters to be evaluated in terms of the outcome. In this paper, we focus on the case the outcome variable to be used in NeSSC is numeric and propose an alternative selection criterion of the optimal partition based on a measure of overlapping between density curves as well as a penalization criterion which takes accounts for the number of clusters in a candidate partition. Next, we consider the performance of the proposed method for some artificial datasets and for 20 different real datasets and compare NeSSC with the other three popular methods of semi-supervised clustering with a numeric outcome. Results show that NeSSC with the overlapping criterion works particularly well when a reduced number of clusters are scattered localized.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"18 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139927826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0