{"title":"Non-parametric calibration of multiple related radiocarbon determinations and their calendar age summarisation","authors":"Timothy J. Heaton","doi":"10.1111/rssc.12599","DOIUrl":"10.1111/rssc.12599","url":null,"abstract":"<p>Due to fluctuations in past radiocarbon (<math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mrow></mrow>\u0000 <mrow>\u0000 <mn>14</mn>\u0000 </mrow>\u0000 </msup>\u0000 </mrow>\u0000 <annotation>$$ {}^{14} $$</annotation>\u0000 </semantics></math>C) levels, calibration is required to convert <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mrow></mrow>\u0000 <mrow>\u0000 <mn>14</mn>\u0000 </mrow>\u0000 </msup>\u0000 </mrow>\u0000 <annotation>$$ {}^{14} $$</annotation>\u0000 </semantics></math>C determinations <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>i</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ {X}_i $$</annotation>\u0000 </semantics></math> into calendar ages <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>θ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>i</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ {theta}_i $$</annotation>\u0000 </semantics></math>. In many studies, we wish to calibrate a set of related samples taken from the same site or context, which have calendar ages drawn from the same shared, but unknown, density <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>f</mi>\u0000 <mo>(</mo>\u0000 <mi>θ</mi>\u0000 <mo>)</mo>\u0000 </mrow>\u0000 <annotation>$$ fleft(theta right) $$</annotation>\u0000 </semantics></math>. Calibration of <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>1</mn>\u0000 </mrow>\u0000 </msub>\u0000 <mo>,</mo>\u0000 <mi>…</mi>\u0000 <mo>,</mo>\u0000 <msub>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>n</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ {X}_1,dots, {X}_n","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12599","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80137362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nedka Dechkova Nikiforova, Rossella Berni, Jesús Fernando López-Fidalgo
{"title":"Optimal approximate choice designs for a two-step coffee choice, taste and choice again experiment","authors":"Nedka Dechkova Nikiforova, Rossella Berni, Jesús Fernando López-Fidalgo","doi":"10.1111/rssc.12601","DOIUrl":"10.1111/rssc.12601","url":null,"abstract":"<p>This work deals with consumers' preferences about coffee. Firstly, a choice experiment is performed on a sample of potential consumers. Following this, a sensory test involving the tasting of two varieties of coffee is carried out with the respondents, after which the same choice experiment is supplied to them again. An innovative approach for building heterogeneous choice designs is specifically developed for the case-study, based on approximate design theory and compound design criterion. Panel Mixed Logit models are used, thereby allowing for the inclusion of correlation among consumers' responses; choice-sets are supplied to a proportion of respondents according to optimal weights. The estimation results of the Panel Mixed Logit model are satisfactory, confirming the validity of the proposed approach.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12601","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76942285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible domain prediction using mixed effects random forests","authors":"Patrick Krennmair, Timo Schmid","doi":"10.1111/rssc.12600","DOIUrl":"10.1111/rssc.12600","url":null,"abstract":"<p>This paper promotes the use of random forests as versatile tools for estimating spatially disaggregated indicators in the presence of small area-specific sample sizes. Small area estimators are predominantly conceptualised within the regression-setting and rely on linear mixed models to account for the hierarchical structure of the survey data. In contrast, machine learning methods offer non-linear and non-parametric alternatives, combining excellent predictive performance and a reduced risk of model-misspecification. Mixed effects random forests combine advantages of regression forests with the ability to model hierarchical dependencies. This paper provides a coherent framework based on mixed effects random forests for estimating small area averages and proposes a non-parametric bootstrap estimator for assessing the uncertainty of the estimates. We illustrate advantages of our proposed methodology using Mexican income-data from the state Nuevo León. Finally, the methodology is evaluated in model-based and design-based simulations comparing the proposed methodology to traditional regression-based approaches for estimating small area averages.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12600","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117390797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bayesian model for estimating Sustainable Development Goal indicator 4.1.2: School completion rates","authors":"Ameer Dharamshi, Bilal Barakat, Leontine Alkema, Manos Antoninis","doi":"10.1111/rssc.12595","DOIUrl":"10.1111/rssc.12595","url":null,"abstract":"<p>Estimating school completion is crucial for monitoring Sustainable Development Goal (SDG) 4 on education. The recently introduced SDG indicator 4.1.2, defined as the percentage of children aged 3–5 years above the expected completion age of a given level of education that have completed the respective level, differs from enrolment indicators in that it relies primarily on household surveys. This introduces a number of challenges including gaps between survey waves, conflicting estimates, age misreporting and delayed completion. We introduce the Adjusted Bayesian Completion Rates (ABCR) model to address these challenges and produce the first complete and consistent time series for SDG indicator 4.1.2, by school level and sex, for 164 countries. Validation exercises indicate that the model appears well-calibrated and offers a meaningful improvement over simpler approaches in predictive performance. The ABCR model is now used by the United Nations to monitor completion rates for all countries with available survey data.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12595","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72417219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient estimation of the marginal mean of recurrent events","authors":"Giuliana Cortese, Thomas H. Scheike","doi":"10.1111/rssc.12586","DOIUrl":"10.1111/rssc.12586","url":null,"abstract":"<p>Recurrent events are often encountered in clinical and epidemiological studies where a terminal event is also observed. With recurrent events data it is of great interest to estimate the marginal mean of the cumulative number of recurrent events experienced prior to the terminal event. The standard nonparametric estimator was suggested in Cook and Lawless and further developed in Ghosh and Lin. We here investigate the efficiency of this estimator that, surprisingly, has not been studied before. We rewrite the standard estimator as an inverse probability of censoring weighted estimator. From this representation we derive an efficient augmented estimator using efficient estimation theory for right-censored data. We show that the standard estimator is efficient in settings with no heterogeneity. In other settings with different sources of heterogeneity, we show theoretically and by simulations that the efficiency can be greatly improved when an efficient augmented estimator based on dynamic predictions is employed, at no extra cost to robustness. The estimators are applied and compared to study the mean number of catheter-related bloodstream infections in heterogeneous patients with chronic intestinal failure who can possibly die, and the efficiency gain is highlighted in the resulting point-wise confidence intervals.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12586","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79223958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contour models for physical boundaries enclosing star-shaped and approximately star-shaped polygons","authors":"Hannah M. Director, Adrian E. Raftery","doi":"10.1111/rssc.12592","DOIUrl":"10.1111/rssc.12592","url":null,"abstract":"<p>Boundaries on spatial fields divide regions with particular features from surrounding background areas. Methods to identify boundary lines from interpolated spatial fields are well established. Less attention has been paid to how to model sequences of connected spatial points. Such models are needed for physical boundaries. For example, in the Arctic ocean, large contiguous areas are covered by sea ice, or frozen ocean water. We define the ice edge contour as the ordered sequences of spatial points that connect to form a line around set(s) of contiguous grid boxes with sea ice present. Polar scientists need to describe how this contiguous area behaves in present and historical data and under future climate change scenarios. We introduce the Gaussian Star-shaped Contour Model (GSCM) for modelling boundaries represented as connected sequences of spatial points such as the sea ice edge. GSCMs generate sequences of spatial points via generating sets of distances in various directions from a fixed starting point. The GSCM can be applied to contours that enclose regions that are star-shaped polygons or approximately star-shaped polygons. Metrics are introduced to assess the extent to which a polygon deviates from star-shapedness. Simulation studies illustrate the performance of the GSCM in different situations.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89579451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feifei Wang, Danyang Huang, Tianchen Gao, Shuyuan Wu, Hansheng Wang
{"title":"Sequential one-step estimator by sub-sampling for customer churn analysis with massive data sets","authors":"Feifei Wang, Danyang Huang, Tianchen Gao, Shuyuan Wu, Hansheng Wang","doi":"10.1111/rssc.12597","DOIUrl":"10.1111/rssc.12597","url":null,"abstract":"<p>Customer churn is one of the most important concerns for large companies. Currently, massive data are often encountered in customer churn analysis, which bring new challenges for model computation. To cope with these concerns, sub-sampling methods are often used to accomplish data analysis tasks of large scale. To cover more informative samples in one sampling round, classic sub-sampling methods need to compute <i>non-uniform</i> sampling probabilities for all data points. However, this method creates a huge computational burden for data sets of large scale and therefore, is not applicable in practice. In this study, we propose a sequential one-step (SOS) estimation method based on repeated sub-sampling data sets. In the SOS method, data points need to be sampled only with <i>uniform</i> probabilities, and the sampling step is conducted repeatedly. In each sampling step, a new estimate is computed via one-step updating based on the newly sampled data points. This leads to a sequence of estimates, of which the final SOS estimate is their average. We theoretically show that both the bias and the standard error of the SOS estimator can decrease with increasing sub-sampling sizes or sub-sampling times. The finite sample SOS performances are assessed through simulations. Finally, we apply this SOS method to analyse a real large-scale customer churn data set in a securities company. The results show that the SOS method has good interpretability and prediction power in this real application.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88578893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ian Flint, Nick Golding, Peter Vesk, Yan Wang, Aihua Xia
{"title":"The saturated pairwise interaction Gibbs point process as a joint species distribution model","authors":"Ian Flint, Nick Golding, Peter Vesk, Yan Wang, Aihua Xia","doi":"10.1111/rssc.12596","DOIUrl":"10.1111/rssc.12596","url":null,"abstract":"<p>In an effort to effectively model observed patterns in the spatial configuration of individuals of multiple species in nature, we introduce the saturated pairwise interaction Gibbs point process. Its main strength lies in its ability to model both attraction and repulsion within and between species, over different scales. As such, it is particularly well-suited to the study of associations in complex ecosystems. Based on the existing literature, we provide an easy to implement fitting procedure as well as a technique to make inference for the model parameters. We also prove that under certain hypotheses the point process is locally stable, which allows us to use the well-known ‘coupling from the past’ algorithm to draw samples from the model. Different numerical experiments show the robustness of the model. We study three different ecological data sets, demonstrating in each one that our model helps disentangle competing ecological effects on species' distribution.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/rssc.12596","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89252881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Score test for assessing the conditional dependence in latent class models and its application to record linkage","authors":"Huiping Xu, Xiaochun Li, Zuoyi Zhang, Shaun Grannis","doi":"10.1111/rssc.12590","DOIUrl":"10.1111/rssc.12590","url":null,"abstract":"<p>The Fellegi–Sunter model has been widely used in probabilistic record linkage despite its often invalid conditional independence assumption. Prior research has demonstrated that conditional dependence latent class models yield improved match performance when using the correct conditional dependence structure. With a misspecified conditional dependence structure, these models can yield worse performance. It is, therefore, critically important to correctly identify the conditional dependence structure. Existing methods for identifying the conditional dependence structure include the correlation residual plot, the log-odds ratio check, and the bivariate residual, all of which have been shown to perform inadequately. Bootstrap bivariate residual approach and score test have also been proposed and found to have better performance, with the score test having greater power and lower computational burden. In this paper, we extend the score-test-based approach to account for different conditional dependence structures. Through a simulation study, we develop practical recommendations on the utilisation of the score test and assess the match performance with conditional dependence identified by the proposed method. Performance of the proposed method is further evaluated using a real-world record linkage example. Findings show that the proposed method leads to improved matching accuracy relative to the Fellegi–Sunter model.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82870632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging network structure to improve pooled testing efficiency","authors":"Daniel K. Sewell","doi":"10.1111/rssc.12594","DOIUrl":"10.1111/rssc.12594","url":null,"abstract":"<p>Screening is a powerful tool for infection control, allowing for infectious individuals, whether they be symptomatic or asymptomatic, to be identified and isolated. The resource burden of regular and comprehensive screening can often be prohibitive, however. One such measure to address this is pooled testing, whereby groups of individuals are each given a composite test; should a group receive a positive diagnostic test result, those comprising the group are then tested individually. Infectious disease is spread through a transmission network, and this paper shows how assigning individuals to pools based on this underlying network can improve the efficiency of the pooled testing strategy, thereby reducing the resource burden. We designed a simulated annealing algorithm to improve the pooled testing efficiency as measured by the ratio of the expected number of correct classifications to the expected number of tests performed. We then evaluated our approach using an agent-based model designed to simulate the spread of SARS-CoV-2 in a school setting. Our results suggest that our approach can decrease the number of tests required to regularly screen the student body, and that these reductions are quite robust to assigning pools based on partially observed or noisy versions of the network.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/0b/29/RSSC-71-1648.PMC9826453.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10257743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}