Journal of data science : JDS最新文献

筛选
英文 中文
Privacy-Preserving Inference on the Ratio of Two Gaussians Using Sums 基于和的两个Gaussian比率的保密推理
Journal of data science : JDS Pub Date : 2021-10-28 DOI: 10.6339/22-jds1050
Jingang Miao, Yiming Paul Li
{"title":"Privacy-Preserving Inference on the Ratio of Two Gaussians Using Sums","authors":"Jingang Miao, Yiming Paul Li","doi":"10.6339/22-jds1050","DOIUrl":"https://doi.org/10.6339/22-jds1050","url":null,"abstract":"The ratio of two Gaussians is useful in many contexts of statistical inference. We discuss statistically valid inference of the ratio under Differential Privacy (DP). We use the delta method to derive the asymptotic distribution of the ratio estimator and use the Gaussian mechanism to provide (epsilon, delta)-DP guarantees. Like many statistics, quantities involved in the inference of a ratio can be re-written as functions of sums, and sums are easy to work with for many reasons. In the context of DP, the sensitivity of a sum is easy to calculate. We focus on getting the correct coverage probability of 95% confidence intervals (CIs) of the DP ratio estimator. Our simulations show that the no-correction method, which ignores the DP noise, gives CIs that are too narrow to provide proper coverage for small samples. In our specific simulation scenario, the coverage of 95% CIs can be as low as below 10%. We propose two methods to mitigate the under-coverage issue, one based on Monte Carlo simulation and the other based on analytical correction. We show that the CIs of our methods have much better coverage with reasonable privacy budgets. In addition, our methods can handle weighted data, when the weights are fixed and bounded.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42315729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On a Weibull-Distributed Error Component of a Multiplicative Error Model Under Inverse Square Root Transformation 方根反变换下乘法误差模型的威布尔分布误差分量
Journal of data science : JDS Pub Date : 2021-10-12 DOI: 10.11648/J.IJDSA.20210704.12
C. U. Onyemachi, S. Onyeagu, Samuel Ademola Phillips, Jamiu Adebowale Oke, Callistus Ezekwe Ugwo
{"title":"On a Weibull-Distributed Error Component of a Multiplicative Error Model Under Inverse Square Root Transformation","authors":"C. U. Onyemachi, S. Onyeagu, Samuel Ademola Phillips, Jamiu Adebowale Oke, Callistus Ezekwe Ugwo","doi":"10.11648/J.IJDSA.20210704.12","DOIUrl":"https://doi.org/10.11648/J.IJDSA.20210704.12","url":null,"abstract":"We first consider the Multiplicative Error Model (MEM) introduced in financial econometrics by Engle (2002) as a general class of time series model for positive-valued random variables, which are decomposed into the product of their conditional mean and a positive-valued error term. Considering the possibility that the error component of a MEM can be a Weibull distribution and the need for data transformation as a popular remedial measure to stabilize the variance of a data set prior to statistical modeling, this paper investigates the impact of the inverse square root transformation (ISRT) on the mean and variance of a Weibull-distributed error component of a MEM. The mean and variance of the Weibull distribution and those of the inverse square root transformed distribution are calculated for σ=6, 7,.., 99, 100 with the corresponding values of n for which the mean of the untransformed distribution is equal to one. The paper concludes that the inverse square root would yield better results when using MEM with a Weibull-distributed error component and where data transformation is deemed necessary to stabilize the variance of the data set.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"7 1","pages":"109"},"PeriodicalIF":0.0,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42564890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Additive-Multiplicative Rates Model for Recurrent Event Data with Intermittently Observed Time-Dependent Covariates. 具有间断性观测时变协变量的重复事件数据的加乘率模型。
Journal of data science : JDS Pub Date : 2021-10-01 Epub Date: 2021-11-04 DOI: 10.6339/21-jds1027
Tianmeng Lyu, Xianghua Luo, Yifei Sun
{"title":"Additive-Multiplicative Rates Model for Recurrent Event Data with Intermittently Observed Time-Dependent Covariates.","authors":"Tianmeng Lyu,&nbsp;Xianghua Luo,&nbsp;Yifei Sun","doi":"10.6339/21-jds1027","DOIUrl":"https://doi.org/10.6339/21-jds1027","url":null,"abstract":"<p><p>Regression methods, including the proportional rates model and additive rates model, have been proposed to evaluate the effect of covariates on the risk of recurrent events. These two models have different assumptions on the form of the covariate effects. A more flexible model, the additive-multiplicative rates model, is considered to allow the covariates to have both additive and multiplicative effects on the marginal rate of recurrent event process. However, its use is limited to the cases where the time-dependent covariates are monitored continuously throughout the follow-up time. In practice, time-dependent covariates are often only measured intermittently, which renders the current estimation method for the additive-multiplicative rates model inapplicable. In this paper, we propose a semiparametric estimator for the regression coefficients of the additive-multiplicative rates model to allow intermittently observed time-dependent covariates. We present the simulation results for the comparison between the proposed method and the simple methods, including last covariate carried forward and linear interpolation, and apply the proposed method to an epidemiologic study aiming to evaluate the effect of time-varying streptococcal infections on the risk of pharyngitis among school children. The R package implementing the proposed method is available at www.github.com/TianmengL/rectime.</p>","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"19 4","pages":"615-633"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9232183/pdf/nihms-1761398.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40398395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sequence Mutations of Genes Pertaining to Malignancy in Cancer 恶性肿瘤相关基因的序列突变
Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/jds.201310_11(4).0004
Nardnisa Sintupisut, Chen-Hsiang Yeang
{"title":"Sequence Mutations of Genes Pertaining to Malignancy in Cancer","authors":"Nardnisa Sintupisut, Chen-Hsiang Yeang","doi":"10.6339/jds.201310_11(4).0004","DOIUrl":"https://doi.org/10.6339/jds.201310_11(4).0004","url":null,"abstract":"Cancer is a complex disease where various types of molecular aberrations drive the development and progression of malignancies. Among the diverse molecular aberrations, inherited and somatic mutations on DNA sequences are considered as major drivers for oncogenesis. The complexity of somatic alterations is revealed from large-scale investigations of cancer genomes and robust methods for interring the function of genes. In this review, we will describe sequence mutations of several cancer-related genes and discuss their functional implications in cancer. In addition, we will introduce the on-line resources for accessing and analyzing sequence mutations in cancer. We will also provide an overview of the statistical and computational approaches and future prospects to conduct comprehensive analyses of the somatic alterations in cancer genomes.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43260710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Derivation of Sample Size Formula for Cluster Randomized Trials with Binary Responses Using a General Continuity Correction Factor and Identification of Optimal Settings for Small Event Rates 利用一般连续性校正因子推导二元响应聚类随机试验的样本量公式,并确定小事件率的最佳设置
Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(1).1089
M. John
{"title":"Derivation of Sample Size Formula for Cluster Randomized Trials with Binary Responses Using a General Continuity Correction Factor and Identification of Optimal Settings for Small Event Rates","authors":"M. John","doi":"10.6339/JDS.2013.11(1).1089","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1089","url":null,"abstract":"Trials for comparing interventions where cluster of subjects, rather than individuals, are randomized, are commonly called cluster randomized trials (CRTs). For comparison of binary outcomes in a CRT, although there are a few published formulations for sample size computation, the most commonly used is the one developed by Donner, Birkett, and Buck (Am J Epidemiol, 1981) probably due to its incorporation in the text book by Fleiss, Levin, and Paik (Wiley, 2003). In this paper, we derive a new 2 approximation formula with a general continuity correction factor (c) and show that specially for the scenarios of small event rates (< 0:01), the new formulation recommends lower number of clusters than the Donner et al. formulation thereby providing better eciency. All known formulations can be shown to be special cases at specic value of the general correction factor (e.g., Donner formulation is equivalent to the new formulation for c = 1). Statistical simulation is presented with data on comparative ecacy of the available methods identifying correction factors that are optimal for rare event rates. Table of sample size recommendation for variety of rare event rates along with code inR\" language for easy computation of sample size in other settings is also provided. Sample size calculations for a published CRT (Pathways to Health study\" that evaluates the value of intervention for smoking cessation) are computed for various correction factors to illustrate that with an optimal choice of the correction factor, the study could have maintained the same power with a 20% less sample size.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71323940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Exponentiated Generalized Class of Distributions 指数广义分布类
Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(1).1086
G. Cordeiro, E. Ortega, Daniel C. C. da Cunha
{"title":"The Exponentiated Generalized Class of Distributions","authors":"G. Cordeiro, E. Ortega, Daniel C. C. da Cunha","doi":"10.6339/JDS.2013.11(1).1086","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1086","url":null,"abstract":"We propose a new method of adding two parameters to a contin- uous distribution that extends the idea rst introduced by Lehmann (1953) and studied by Nadarajah and Kotz (2006). This method leads to a new class of exponentiated generalized distributions that can be interpreted as a double construction of Lehmann alternatives. Some special models are dis- cussed. We derive some mathematical properties of this class including the ordinary moments, generating function, mean deviations and order statis- tics. Maximum likelihood estimation is investigated and four applications to real data are presented.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49551431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 263
Weighted Clayton Copulas and their Characterizations: Application to Probable Modeling of the Hydrology Data 加权克莱顿copula及其表征:在水文数据概率建模中的应用
Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/jds.201304_11(2).0006
H. Bekrizadeh, G. Parham
{"title":"Weighted Clayton Copulas and their Characterizations: Application to Probable Modeling of the Hydrology Data","authors":"H. Bekrizadeh, G. Parham","doi":"10.6339/jds.201304_11(2).0006","DOIUrl":"https://doi.org/10.6339/jds.201304_11(2).0006","url":null,"abstract":"Copulas have recently emerged as practical methods for multivari- ate modeling. To our knowledge, only a limited amount of work has been done to apply copula-based modeling in context analysis. In this study, we generalized Clayton copula under the appropriate weighted function. In some examples, bivariate distributions by using the weighted Clayton cop- ula are generalized. Also the properties of generalized Clayton copula are provided. The Clayton copula and weighted Clayton model cannot be used for negative dependence. These have been used to study left tail depen- dence. This property is stronger in weighted Clayton model with respect to ordinary Clayton copula. It will also be shown that the generalized Clayton copula is suitable for the probable modeling of the hydrology data.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43361480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Folded Normal Slash Distribution and Its Applications to Non-negative Measurements 折叠正态斜线分布及其在非负测量中的应用
Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1142
Wenhao Gui, Pei-Hua Chen, Haiyan Wu
{"title":"A Folded Normal Slash Distribution and Its Applications to Non-negative Measurements","authors":"Wenhao Gui, Pei-Hua Chen, Haiyan Wu","doi":"10.6339/JDS.2013.11(2).1142","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1142","url":null,"abstract":"We introduce a new class of the slash distribution using folded normal distribution. The proposed model dened on non-negative measure- ments extends the slashed half normal distribution and has higher kurtosis than the ordinary half normal distribution. We study the characterization and properties involving moments and some measures based on moments of this distribution. Finally, we illustrate the proposed model with a simulation study and a real application.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48524074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Variable Selection in the Chlamydia Pneumoniae Lung Infection Study 肺炎衣原体肺部感染研究中的变量选择
Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1073
Yuan Kang, N. Billor
{"title":"Variable Selection in the Chlamydia Pneumoniae Lung Infection Study","authors":"Yuan Kang, N. Billor","doi":"10.6339/JDS.2013.11(2).1073","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1073","url":null,"abstract":"In this study, the data based on nucleic acid amplication tech- niques (Polymerase chain reaction) consisting of 23 dierent transcript vari- ables which are involved to investigate genetic mechanism regulating chlamy- dial infection disease by measuring two dierent outcomes of muring C. pneumonia lung infection (disease expressed as lung weight increase and C. pneumonia load in the lung), have been analyzed. A model with fewer reduced transcript variables of interests at early infection stage has been obtained by using some of the traditional (stepwise regression, partial least squares regression (PLS)) and modern variable selection methods (least ab- solute shrinkage and selection operator (LASSO), forward stagewise regres- sion and least angle regression (LARS)). Through these variable selection methods, the variables of interest are selected to investigate the genetic mechanisms that determine the outcomes of chlamydial lung infection. The transcript variables Tim3, GATA3, Lacf, Arg2 (X4, X5, X8 and X13) are being detected as the main variables of interest to study the C. pneumonia disease (lung weight increase) or C. pneumonia lung load outcomes. Models including these key variables may provide possible answers to the problem of molecular mechanisms of chlamydial pathogenesis.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42140481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Bayesian Adjustment of the HP Law via a Switching Nonlinear Regression Model 基于切换非线性回归模型的HP律贝叶斯平差
Journal of data science : JDS Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(1).1118
Dilli Bhatta, B. Nandram
{"title":"A Bayesian Adjustment of the HP Law via a Switching Nonlinear Regression Model","authors":"Dilli Bhatta, B. Nandram","doi":"10.6339/JDS.2013.11(1).1118","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1118","url":null,"abstract":"For many years actuaries and demographers have been doing curve tting of age-specic mortality data. We use the eight-parameter Heligman- Pollard (HP) empirical law to t the mortality curve. It consists of three nonlinear curves, child mortality, mid-life mortality and adult mortality. It is now well-known that the eight unknown parameters in the HP law are dicult to estimate because numerical algorithms generally do not converge when model tting is done. We consider a novel idea to t the three curves (nonlinear splines) separately, and then connect them smoothly at the two knots. To connect the curves smoothly, we express uncertainty about the knots because these curves do not have turning points. We have important prior information about the location of the knots, and this helps in the es- timation convergence problem. Thus, the Bayesian paradigm is particularly attractive. We show the theory, method and application of our approach. We discuss estimation of the curve for English and Welsh mortality data. We also make comparisons with the recent Bayesian method.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45787741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信