{"title":"Additive-Multiplicative Rates Model for Recurrent Event Data with Intermittently Observed Time-Dependent Covariates.","authors":"Tianmeng Lyu, Xianghua Luo, Yifei Sun","doi":"10.6339/21-jds1027","DOIUrl":"https://doi.org/10.6339/21-jds1027","url":null,"abstract":"<p><p>Regression methods, including the proportional rates model and additive rates model, have been proposed to evaluate the effect of covariates on the risk of recurrent events. These two models have different assumptions on the form of the covariate effects. A more flexible model, the additive-multiplicative rates model, is considered to allow the covariates to have both additive and multiplicative effects on the marginal rate of recurrent event process. However, its use is limited to the cases where the time-dependent covariates are monitored continuously throughout the follow-up time. In practice, time-dependent covariates are often only measured intermittently, which renders the current estimation method for the additive-multiplicative rates model inapplicable. In this paper, we propose a semiparametric estimator for the regression coefficients of the additive-multiplicative rates model to allow intermittently observed time-dependent covariates. We present the simulation results for the comparison between the proposed method and the simple methods, including last covariate carried forward and linear interpolation, and apply the proposed method to an epidemiologic study aiming to evaluate the effect of time-varying streptococcal infections on the risk of pharyngitis among school children. The R package implementing the proposed method is available at www.github.com/TianmengL/rectime.</p>","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"19 4","pages":"615-633"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9232183/pdf/nihms-1761398.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40398395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequence Mutations of Genes Pertaining to Malignancy in Cancer","authors":"Nardnisa Sintupisut, Chen-Hsiang Yeang","doi":"10.6339/jds.201310_11(4).0004","DOIUrl":"https://doi.org/10.6339/jds.201310_11(4).0004","url":null,"abstract":"Cancer is a complex disease where various types of molecular aberrations drive the development and progression of malignancies. Among the diverse molecular aberrations, inherited and somatic mutations on DNA sequences are considered as major drivers for oncogenesis. The complexity of somatic alterations is revealed from large-scale investigations of cancer genomes and robust methods for interring the function of genes. In this review, we will describe sequence mutations of several cancer-related genes and discuss their functional implications in cancer. In addition, we will introduce the on-line resources for accessing and analyzing sequence mutations in cancer. We will also provide an overview of the statistical and computational approaches and future prospects to conduct comprehensive analyses of the somatic alterations in cancer genomes.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43260710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Derivation of Sample Size Formula for Cluster Randomized Trials with Binary Responses Using a General Continuity Correction Factor and Identification of Optimal Settings for Small Event Rates","authors":"M. John","doi":"10.6339/JDS.2013.11(1).1089","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1089","url":null,"abstract":"Trials for comparing interventions where cluster of subjects, rather than individuals, are randomized, are commonly called cluster randomized trials (CRTs). For comparison of binary outcomes in a CRT, although there are a few published formulations for sample size computation, the most commonly used is the one developed by Donner, Birkett, and Buck (Am J Epidemiol, 1981) probably due to its incorporation in the text book by Fleiss, Levin, and Paik (Wiley, 2003). In this paper, we derive a new 2 approximation formula with a general continuity correction factor (c) and show that specially for the scenarios of small event rates (< 0:01), the new formulation recommends lower number of clusters than the Donner et al. formulation thereby providing better eciency. All known formulations can be shown to be special cases at specic value of the general correction factor (e.g., Donner formulation is equivalent to the new formulation for c = 1). Statistical simulation is presented with data on comparative ecacy of the available methods identifying correction factors that are optimal for rare event rates. Table of sample size recommendation for variety of rare event rates along with code inR\" language for easy computation of sample size in other settings is also provided. Sample size calculations for a published CRT (Pathways to Health study\" that evaluates the value of intervention for smoking cessation) are computed for various correction factors to illustrate that with an optimal choice of the correction factor, the study could have maintained the same power with a 20% less sample size.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71323940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Exponentiated Generalized Class of Distributions","authors":"G. Cordeiro, E. Ortega, Daniel C. C. da Cunha","doi":"10.6339/JDS.2013.11(1).1086","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1086","url":null,"abstract":"We propose a new method of adding two parameters to a contin- uous distribution that extends the idea rst introduced by Lehmann (1953) and studied by Nadarajah and Kotz (2006). This method leads to a new class of exponentiated generalized distributions that can be interpreted as a double construction of Lehmann alternatives. Some special models are dis- cussed. We derive some mathematical properties of this class including the ordinary moments, generating function, mean deviations and order statis- tics. Maximum likelihood estimation is investigated and four applications to real data are presented.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49551431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weighted Clayton Copulas and their Characterizations: Application to Probable Modeling of the Hydrology Data","authors":"H. Bekrizadeh, G. Parham","doi":"10.6339/jds.201304_11(2).0006","DOIUrl":"https://doi.org/10.6339/jds.201304_11(2).0006","url":null,"abstract":"Copulas have recently emerged as practical methods for multivari- ate modeling. To our knowledge, only a limited amount of work has been done to apply copula-based modeling in context analysis. In this study, we generalized Clayton copula under the appropriate weighted function. In some examples, bivariate distributions by using the weighted Clayton cop- ula are generalized. Also the properties of generalized Clayton copula are provided. The Clayton copula and weighted Clayton model cannot be used for negative dependence. These have been used to study left tail depen- dence. This property is stronger in weighted Clayton model with respect to ordinary Clayton copula. It will also be shown that the generalized Clayton copula is suitable for the probable modeling of the hydrology data.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43361480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Folded Normal Slash Distribution and Its Applications to Non-negative Measurements","authors":"Wenhao Gui, Pei-Hua Chen, Haiyan Wu","doi":"10.6339/JDS.2013.11(2).1142","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1142","url":null,"abstract":"We introduce a new class of the slash distribution using folded normal distribution. The proposed model dened on non-negative measure- ments extends the slashed half normal distribution and has higher kurtosis than the ordinary half normal distribution. We study the characterization and properties involving moments and some measures based on moments of this distribution. Finally, we illustrate the proposed model with a simulation study and a real application.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48524074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable Selection in the Chlamydia Pneumoniae Lung Infection Study","authors":"Yuan Kang, N. Billor","doi":"10.6339/JDS.2013.11(2).1073","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1073","url":null,"abstract":"In this study, the data based on nucleic acid amplication tech- niques (Polymerase chain reaction) consisting of 23 dierent transcript vari- ables which are involved to investigate genetic mechanism regulating chlamy- dial infection disease by measuring two dierent outcomes of muring C. pneumonia lung infection (disease expressed as lung weight increase and C. pneumonia load in the lung), have been analyzed. A model with fewer reduced transcript variables of interests at early infection stage has been obtained by using some of the traditional (stepwise regression, partial least squares regression (PLS)) and modern variable selection methods (least ab- solute shrinkage and selection operator (LASSO), forward stagewise regres- sion and least angle regression (LARS)). Through these variable selection methods, the variables of interest are selected to investigate the genetic mechanisms that determine the outcomes of chlamydial lung infection. The transcript variables Tim3, GATA3, Lacf, Arg2 (X4, X5, X8 and X13) are being detected as the main variables of interest to study the C. pneumonia disease (lung weight increase) or C. pneumonia lung load outcomes. Models including these key variables may provide possible answers to the problem of molecular mechanisms of chlamydial pathogenesis.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42140481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bayesian Adjustment of the HP Law via a Switching Nonlinear Regression Model","authors":"Dilli Bhatta, B. Nandram","doi":"10.6339/JDS.2013.11(1).1118","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1118","url":null,"abstract":"For many years actuaries and demographers have been doing curve tting of age-specic mortality data. We use the eight-parameter Heligman- Pollard (HP) empirical law to t the mortality curve. It consists of three nonlinear curves, child mortality, mid-life mortality and adult mortality. It is now well-known that the eight unknown parameters in the HP law are dicult to estimate because numerical algorithms generally do not converge when model tting is done. We consider a novel idea to t the three curves (nonlinear splines) separately, and then connect them smoothly at the two knots. To connect the curves smoothly, we express uncertainty about the knots because these curves do not have turning points. We have important prior information about the location of the knots, and this helps in the es- timation convergence problem. Thus, the Bayesian paradigm is particularly attractive. We show the theory, method and application of our approach. We discuss estimation of the curve for English and Welsh mortality data. We also make comparisons with the recent Bayesian method.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45787741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bonus-Malus System in Iran: An Empirical Evaluation","authors":"R. Mahmoudvand, A. Edalati, Farhad Shokoohi","doi":"10.6339/JDS.2013.11(1).1098","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1098","url":null,"abstract":"The aim of this paper is to represent the Bonus-Malus System (BMS) of Iran, which is a mandatory scheme based on Insurance act num- ber 56. We examine the current Iranian BMS, using various criteria such as elasticity and time of convergence to steady state with respect to the claim frequency as well as nancial balance. We also nd the closed form of stationary distribution of the Iranian BMS that plays a key role in study of BMSs. Moreover, we compare the results with the German and Japan BMS. Finally we give some hints that can be used to improve the performance of the current Iranian BMS.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44922936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Notes on Entropy for Concomitants of Record Values in Farlie-Gumbel-Morgenstern (FGM) Family","authors":"S. Tahmasebi","doi":"10.6339/JDS.2013.11(1).1104","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1104","url":null,"abstract":"Let {(Xi; Yi), i ≥ 1} be a sequence of bivariate random variables from a continuous distribution. If {R(subscript n), n ≥ 1} is the sequence of record values in the sequence of X's, then the Y which corresponds with the nth-record will be called the concomitant of the nth-record, denoted by R(subscript [n]). In FGM family, we determine the amount of information contained in R(subscript [n]) and compare it with amount of information given in R(subscript n). Also, we show that the Kullback-Leibler distance among the concomitants of record values is distribution-free. Finally, we provide some numerical results of mutual information and Pearson correlation coefficient for measuring the amount of dependency between R(subscript n) and R(subscript [n]) in the copula model of FGM family.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42626849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}