Proceedings of the 3rd International Conference on Statistics: Theory and Applications最新文献

The Concept of Statistical Evidence 统计证据的概念

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.002

Michael Evans

{"title":"The Concept of Statistical Evidence","authors":"Michael Evans","doi":"10.11159/icsta21.002","DOIUrl":"https://doi.org/10.11159/icsta21.002","url":null,"abstract":"The concept of statistical evidence has proven to be somewhat elusive in the development of the discipline of Statistics. Still there is a conviction that appropriately collected data contains evidence concerning the answers to questions of scientific interest. We discuss some of the attempts at making the concept of evidence precise and, in particular, present an approach based upon measuring how beliefs change from a priori to a posteriori. Of necessity this is Bayesian in nature as a proper prior is required that reflects beliefs about where the truth lies before the data is observed. Bayesian inference is often criticized for its subjective nature. It is possible, however, to deal with this subjectivity in a scientifically sound manner. In part, this is done by assessing and controlling the bias the prior and model induce into inferences and this depends intrinsically on being clear about statistical evidence. In addition, the model and the prior are falsifiable through model checking and checking for prior-data conflict. Both the assessment of bias and the falsification steps are essentially frequentist in nature so this provides a degree of unity between sometimes conflicting philosophies. This approach to statistical reasoning can be seen as dealing with the inevitable subjectivity required in the choice of ingredients to an analysis so that a statistical analysis can approach the goal of objectivity that is central to scientific work.","PeriodicalId":403959,"journal":{"name":"Proceedings of the 3rd International Conference on Statistics: Theory and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134238665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robustness of Gaussian Mixture Reduction for Split-and-Conquer Learning of Finite Gaussian Mixtures 高斯混合约简对有限高斯混合分治学习的鲁棒性

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.135

Qiong Zhang, Jiahua Chen

{"title":"Robustness of Gaussian Mixture Reduction for Split-and-Conquer Learning of Finite Gaussian Mixtures","authors":"Qiong Zhang, Jiahua Chen","doi":"10.11159/icsta21.135","DOIUrl":"https://doi.org/10.11159/icsta21.135","url":null,"abstract":"In the era of big data, there is an increasing demand for split-and-conquer learning of finite mixture models. Recent work [1] proposes several split-and-conquer approaches for learning finite Gaussian mixtures and they are found to be both statistically and computationally efficient when the order of the mixture is correctly specified. Due to the nature of mixture models, correctly specifying the order of mixture on local machines can be an unrealistic assumption. In this paper, we evaluate the performance of several split-andconquer learning approaches, both when the order is correct and when it is over-specified on the local machines, based on simulations. We find that there is a trade-off between robustness and computational efficiency: the computationally intensive approach is robust against over-specification, while the two computationally friendly approaches have compromised statistical performance when the order is over-specified. The results suggest that the information in the data about the true distribution is not lost in the split step of the learning, and aggregation strategies must be developed in a computationally and statistically efficient way.","PeriodicalId":403959,"journal":{"name":"Proceedings of the 3rd International Conference on Statistics: Theory and Applications","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133420761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of Underlying Partial Differential Equations from Noisy Data with Splines 用样条法从噪声数据中辨识偏微分方程

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.005

X. Huo

引用次数: 0

Prediction Model for the Result of Percutaneous Coronary Intervention in Coronary Chronic Total Occlusions 慢性冠脉全闭塞经皮冠状动脉介入治疗结果的预测模型

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.129

Maria Ganopoulou, G. Sianos, I. Kangelidis, L. Angelis

{"title":"Prediction Model for the Result of Percutaneous Coronary Intervention in Coronary Chronic Total Occlusions","authors":"Maria Ganopoulou, G. Sianos, I. Kangelidis, L. Angelis","doi":"10.11159/icsta21.129","DOIUrl":"https://doi.org/10.11159/icsta21.129","url":null,"abstract":"Coronary chronic total occlusions (CTOs) are very common in patients undergoing coronary angiography. There has been an increasing acceptance of the percutaneous coronary interventions (PCI) in CTOs. The success rate of PCI has been boosted over the last few years by, among else, operator experience and advances in technology, even achieving levels of approximately 90%. This study proposes a prediction model for the classification of the cases in successful and unsuccessful operations and addresses the problem of class imbalance in the response variable (operation result). It is based on the EuroCTO Registry, which is the largest database available worldwide consisting of 29,995 cases for the period 2008-2018. Binary logistic regression analysis and down-sampling were applied within a customized step-algorithm and standard statistical accuracy measures were employed for the assessment of the prediction model, such as sensitivity, specificity and the value of the area under the ROC (AUROC) curve. The analysis revealed new predictive factors, validating at the same time the impact of well-known predictors. A brief comparison has been performed with other models from the literature, which showed that the proposed model performs similarly or better than its contemporary competitors.","PeriodicalId":403959,"journal":{"name":"Proceedings of the 3rd International Conference on Statistics: Theory and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129432646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The Impact of Entity Resolution on Observed Social Network Structure 实体决议对观察到的社会网络结构的影响

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.136

Abby M. Smith

{"title":"The Impact of Entity Resolution on Observed Social Network Structure","authors":"Abby M. Smith","doi":"10.11159/icsta21.136","DOIUrl":"https://doi.org/10.11159/icsta21.136","url":null,"abstract":"Extended Abstract Deduplication, also referred to as \"entity resolution\", is a common and crucial pre-processing step in the construction of social networks [1]. Citation network studies have indicated that false “splitting” and “lumping” of nodes can have dramatic downstream network impacts, and choices in deduplication methods are important for network analysis [2] [3]. Traditional deduplication methods compare the attributes (such as name and age) of potential matching pairs to estimate a match probability for a pair. Fellegi and Sunter (1969) [4] introduced an optimal decision threshold where above a certain matching score, pairs are declared a match, and below that threshold, pairs are considered a non-match. Recently research has used clustering techniques for entity resolution, where each cluster represents a unique underlying entity. Collective clustering techniques, pioneered by Bhattacharya and Getoor (2007) [5], relax unrealistic assumptions made by earlier probabilistic entity resolution techniques and allow matching decisions to be made dependent on each other. In social network datasets, we can also use relational information (e.g., a person’s network ties) in deduplication as further evidence for matching status of pair. Entity resolution is inherently an imperfect process and is an outcome of existing measurement error, particularly when there is a lack of a manually-reviewed, \"ground-truth\" dataset to rely on for parameter tuning in a chosen technique [6]. I focus on two tuning parameters: the match decision threshold (t) in Felligi-Sunter, and the alpha trade-off parameter between attributional and relational similarity in Bhattacarya-Getoor. My work is focused on methods for evaluating entity resolution in a network setting, measuring the sensitivity of entity resolution results to choices in tuning parameters (alpha and t), and the downstream impacts these parameter choices can have on network metrics and topologies such as degree, closeness, and connectivity. I apply the evaluation methods to two real-world ego-centric network studies, (i) Care2Hope, a respondentdriven sample of rural people who use drugs (PWUD) in Appalachian Kentucky [1], and (ii) RADAR, a longitudinal network study of young men in Chicago who have sex with men. I consider evaluation scenarios in both the presence [7] and absence [8] of “ground truth” data . I discuss implications these findings could have for drug use and HIV policy, and make reporting recommendations for network analysts.","PeriodicalId":403959,"journal":{"name":"Proceedings of the 3rd International Conference on Statistics: Theory and Applications","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130193482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Statistical Challenges for Studying Replication 研究复制的统计挑战

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.006

J. Schauer

引用次数: 0

Stochastic Version of EM Algorithm for Nonlinear Random ChangePoint Models 非线性随机变点模型的随机版EM算法

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.119

Hongbin Zhang, Binod Manandhar

引用次数: 0

Sufficient Dimension Reduction with Deep Neural Networks for Phenotype Prediction 用深度神经网络进行表型预测的充分降维

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.134

Siqi Liang, Wei-Heng Huang, F. Liang

引用次数: 3

On the Analytic Power of Divide & Recombine (D&R) 论分割重组(D&R)的解析力

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.003

W. Cleveland

{"title":"On the Analytic Power of Divide & Recombine (D&R)","authors":"W. Cleveland","doi":"10.11159/icsta21.003","DOIUrl":"https://doi.org/10.11159/icsta21.003","url":null,"abstract":"In D&R (aka Split & Conquer), the data are divided into subsets. The division serves as a base for analysis of big data and for data visualization. Different analytic processes are applied to the subsets that constitute a recombination of the information in the data. For big data there are three scenarios. (1) The division is based on the subject matter, e.g., financial data for 100 banks; the division is by bank, and the 100 outputs of analytic methods are further analyzed. (2) An analytic method is applied to each subset, and the outputs are recombined with a recombination method applied to get one result for all of the data. This can provide, for all if the data, estimates of parameters or more complex information such as a likelihood function. D&R research consists of finding division and recombination methods that maximize statistical accuracy. Parallel distributed environments like Hadoop and Spark provide high computational performance for (1) and (2). (3) Similarly, an analytic method is applied to all subsets, but an iterative MM algorithm is used for optimization, e.g., maximum likelihood, that among other nice properties can avoid very large matrix inversion, turn a non-differentiable problem into a smooth problem, etc. For visualization, subsets are created by conditioning on one more variables of the analysis to create subsets of the other variables in the analysis. The subsets are displayed using the Trellis Display framework of multi-panel display. This provides a very powerful mechanism for exploratory study of multi-dimensional datasets, modeling the data, and understanding the results of analysis.","PeriodicalId":403959,"journal":{"name":"Proceedings of the 3rd International Conference on Statistics: Theory and Applications","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116683058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Statistical Analysis of Measurements in Exact and Inexact Sciences: An Open Problem 精确科学和不精确科学测量的统计分析:一个开放的问题

Proceedings of the 3rd International Conference on Statistics: Theory and Applications Pub Date : 2021-08-01 DOI: 10.11159/icsta21.126

L. Q. Amaral

{"title":"Statistical Analysis of Measurements in Exact and Inexact Sciences: An Open Problem","authors":"L. Q. Amaral","doi":"10.11159/icsta21.126","DOIUrl":"https://doi.org/10.11159/icsta21.126","url":null,"abstract":"Differences between statistical analysis of measurements in exact and inexact sciences are the focus of this work. The early and independent beginning of Probability and Statistics had a theoretical synthesis, with an initial development based in Physics and Astronomy. This lead to Error Theory, used in Statistics of Measurements in Exact sciences, with defined criteria of validity. This direction of Mathematical Physics resulted in the progresses and achievements in Classical Physics, and also on established ways of treating measurements of physical properties. It is discussed that Exact Sciences treat only Inanimate Matter, and things that can be defined and measured, in terms of only seven fundamental physical quantities, with the definition of the International System of Units (SI). On the other hand a direction of Mathematical Statistics emerged later on, based on “Sampling”, to study properties of a population, with criteria of significance, within validity intervals, which depend on the size and characteristics of the studied sample, and on the inferences to be made in the research. These are two very different approaches, but both use probability density functions related to hypothesis about data. The modern inferential sampling statistics can be applied to all practical problems, in particular in Biology and Humanities, where there are “models”, but not Theories as in Physics. The word “theory” is many times used in a mistaken way. Life and Human Sciences use this modern type of Statistics. This paper discusses a particular case, in which the same ensemble of experimental results in samples of biological origin (hairs from hominoids) can be analyzed with the two different statistical approaches, in a proposal for Human Evolution, and the conditions for inference of accurate conclusions are discussed. A philosophical discussion between subjective and objective criteria of the researcher is made, and also of the concept of knowledge.","PeriodicalId":403959,"journal":{"name":"Proceedings of the 3rd International Conference on Statistics: Theory and Applications","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134472512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1