Journal of data science : JDS最新文献_第8页

Privacy-Preserving Inference on the Ratio of Two Gaussians Using Sums 基于和的两个Gaussian比率的保密推理

Journal of data science : JDS Pub Date : 2021-10-28 DOI: 10.6339/22-jds1050

Jingang Miao, Yiming Paul Li

{"title":"Privacy-Preserving Inference on the Ratio of Two Gaussians Using Sums","authors":"Jingang Miao, Yiming Paul Li","doi":"10.6339/22-jds1050","DOIUrl":"https://doi.org/10.6339/22-jds1050","url":null,"abstract":"The ratio of two Gaussians is useful in many contexts of statistical inference. We discuss statistically valid inference of the ratio under Differential Privacy (DP). We use the delta method to derive the asymptotic distribution of the ratio estimator and use the Gaussian mechanism to provide (epsilon, delta)-DP guarantees. Like many statistics, quantities involved in the inference of a ratio can be re-written as functions of sums, and sums are easy to work with for many reasons. In the context of DP, the sensitivity of a sum is easy to calculate. We focus on getting the correct coverage probability of 95% confidence intervals (CIs) of the DP ratio estimator. Our simulations show that the no-correction method, which ignores the DP noise, gives CIs that are too narrow to provide proper coverage for small samples. In our specific simulation scenario, the coverage of 95% CIs can be as low as below 10%. We propose two methods to mitigate the under-coverage issue, one based on Monte Carlo simulation and the other based on analytical correction. We show that the CIs of our methods have much better coverage with reasonable privacy budgets. In addition, our methods can handle weighted data, when the weights are fixed and bounded.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42315729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On a Weibull-Distributed Error Component of a Multiplicative Error Model Under Inverse Square Root Transformation 方根反变换下乘法误差模型的威布尔分布误差分量

Journal of data science : JDS Pub Date : 2021-10-12 DOI: 10.11648/J.IJDSA.20210704.12

C. U. Onyemachi, S. Onyeagu, Samuel Ademola Phillips, Jamiu Adebowale Oke, Callistus Ezekwe Ugwo

{"title":"On a Weibull-Distributed Error Component of a Multiplicative Error Model Under Inverse Square Root Transformation","authors":"C. U. Onyemachi, S. Onyeagu, Samuel Ademola Phillips, Jamiu Adebowale Oke, Callistus Ezekwe Ugwo","doi":"10.11648/J.IJDSA.20210704.12","DOIUrl":"https://doi.org/10.11648/J.IJDSA.20210704.12","url":null,"abstract":"We first consider the Multiplicative Error Model (MEM) introduced in financial econometrics by Engle (2002) as a general class of time series model for positive-valued random variables, which are decomposed into the product of their conditional mean and a positive-valued error term. Considering the possibility that the error component of a MEM can be a Weibull distribution and the need for data transformation as a popular remedial measure to stabilize the variance of a data set prior to statistical modeling, this paper investigates the impact of the inverse square root transformation (ISRT) on the mean and variance of a Weibull-distributed error component of a MEM. The mean and variance of the Weibull distribution and those of the inverse square root transformed distribution are calculated for σ=6, 7,.., 99, 100 with the corresponding values of n for which the mean of the untransformed distribution is equal to one. The paper concludes that the inverse square root would yield better results when using MEM with a Weibull-distributed error component and where data transformation is deemed necessary to stabilize the variance of the data set.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"7 1","pages":"109"},"PeriodicalIF":0.0,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42564890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Additive-Multiplicative Rates Model for Recurrent Event Data with Intermittently Observed Time-Dependent Covariates. 具有间断性观测时变协变量的重复事件数据的加乘率模型。

Journal of data science : JDS Pub Date : 2021-10-01 Epub Date: 2021-11-04 DOI: 10.6339/21-jds1027

Tianmeng Lyu, Xianghua Luo, Yifei Sun

{"title":"Additive-Multiplicative Rates Model for Recurrent Event Data with Intermittently Observed Time-Dependent Covariates.","authors":"Tianmeng Lyu, Xianghua Luo, Yifei Sun","doi":"10.6339/21-jds1027","DOIUrl":"https://doi.org/10.6339/21-jds1027","url":null,"abstract":"<p><p>Regression methods, including the proportional rates model and additive rates model, have been proposed to evaluate the effect of covariates on the risk of recurrent events. These two models have different assumptions on the form of the covariate effects. A more flexible model, the additive-multiplicative rates model, is considered to allow the covariates to have both additive and multiplicative effects on the marginal rate of recurrent event process. However, its use is limited to the cases where the time-dependent covariates are monitored continuously throughout the follow-up time. In practice, time-dependent covariates are often only measured intermittently, which renders the current estimation method for the additive-multiplicative rates model inapplicable. In this paper, we propose a semiparametric estimator for the regression coefficients of the additive-multiplicative rates model to allow intermittently observed time-dependent covariates. We present the simulation results for the comparison between the proposed method and the simple methods, including last covariate carried forward and linear interpolation, and apply the proposed method to an epidemiologic study aiming to evaluate the effect of time-varying streptococcal infections on the risk of pharyngitis among school children. The R package implementing the proposed method is available at www.github.com/TianmengL/rectime.</p>","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"19 4","pages":"615-633"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9232183/pdf/nihms-1761398.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40398395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sequence Mutations of Genes Pertaining to Malignancy in Cancer 恶性肿瘤相关基因的序列突变

Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/jds.201310_11(4).0004

Nardnisa Sintupisut, Chen-Hsiang Yeang

引用次数: 0

Derivation of Sample Size Formula for Cluster Randomized Trials with Binary Responses Using a General Continuity Correction Factor and Identification of Optimal Settings for Small Event Rates 利用一般连续性校正因子推导二元响应聚类随机试验的样本量公式，并确定小事件率的最佳设置

Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(1).1089

M. John

{"title":"Derivation of Sample Size Formula for Cluster Randomized Trials with Binary Responses Using a General Continuity Correction Factor and Identification of Optimal Settings for Small Event Rates","authors":"M. John","doi":"10.6339/JDS.2013.11(1).1089","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1089","url":null,"abstract":"Trials for comparing interventions where cluster of subjects, rather than individuals, are randomized, are commonly called cluster randomized trials (CRTs). For comparison of binary outcomes in a CRT, although there are a few published formulations for sample size computation, the most commonly used is the one developed by Donner, Birkett, and Buck (Am J Epidemiol, 1981) probably due to its incorporation in the text book by Fleiss, Levin, and Paik (Wiley, 2003). In this paper, we derive a new 2 approximation formula with a general continuity correction factor (c) and show that specially for the scenarios of small event rates (< 0:01), the new formulation recommends lower number of clusters than the Donner et al. formulation thereby providing better eciency. All known formulations can be shown to be special cases at specic value of the general correction factor (e.g., Donner formulation is equivalent to the new formulation for c = 1). Statistical simulation is presented with data on comparative ecacy of the available methods identifying correction factors that are optimal for rare event rates. Table of sample size recommendation for variety of rare event rates along with code inR\" language for easy computation of sample size in other settings is also provided. Sample size calculations for a published CRT (Pathways to Health study\" that evaluates the value of intervention for smoking cessation) are computed for various correction factors to illustrate that with an optimal choice of the correction factor, the study could have maintained the same power with a 20% less sample size.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71323940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Exponentiated Generalized Class of Distributions 指数广义分布类

Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(1).1086

G. Cordeiro, E. Ortega, Daniel C. C. da Cunha

引用次数: 263

Weighted Clayton Copulas and their Characterizations: Application to Probable Modeling of the Hydrology Data 加权克莱顿copula及其表征:在水文数据概率建模中的应用

Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/jds.201304_11(2).0006

H. Bekrizadeh, G. Parham

引用次数: 3

A Folded Normal Slash Distribution and Its Applications to Non-negative Measurements 折叠正态斜线分布及其在非负测量中的应用

Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1142

Wenhao Gui, Pei-Hua Chen, Haiyan Wu

引用次数: 5

Variable Selection in the Chlamydia Pneumoniae Lung Infection Study 肺炎衣原体肺部感染研究中的变量选择

Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1073

Yuan Kang, N. Billor

{"title":"Variable Selection in the Chlamydia Pneumoniae Lung Infection Study","authors":"Yuan Kang, N. Billor","doi":"10.6339/JDS.2013.11(2).1073","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1073","url":null,"abstract":"In this study, the data based on nucleic acid amplication tech- niques (Polymerase chain reaction) consisting of 23 dierent transcript vari- ables which are involved to investigate genetic mechanism regulating chlamy- dial infection disease by measuring two dierent outcomes of muring C. pneumonia lung infection (disease expressed as lung weight increase and C. pneumonia load in the lung), have been analyzed. A model with fewer reduced transcript variables of interests at early infection stage has been obtained by using some of the traditional (stepwise regression, partial least squares regression (PLS)) and modern variable selection methods (least ab- solute shrinkage and selection operator (LASSO), forward stagewise regres- sion and least angle regression (LARS)). Through these variable selection methods, the variables of interest are selected to investigate the genetic mechanisms that determine the outcomes of chlamydial lung infection. The transcript variables Tim3, GATA3, Lacf, Arg2 (X4, X5, X8 and X13) are being detected as the main variables of interest to study the C. pneumonia disease (lung weight increase) or C. pneumonia lung load outcomes. Models including these key variables may provide possible answers to the problem of molecular mechanisms of chlamydial pathogenesis.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42140481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Bayesian Adjustment of the HP Law via a Switching Nonlinear Regression Model 基于切换非线性回归模型的HP律贝叶斯平差

Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(1).1118

Dilli Bhatta, B. Nandram

引用次数: 3