StatsPub Date : 2023-05-15DOI: 10.3390/stats6020040
Jiecheng Song, Merry H. Ma
{"title":"Climate Change: Linear and Nonlinear Causality Analysis","authors":"Jiecheng Song, Merry H. Ma","doi":"10.3390/stats6020040","DOIUrl":"https://doi.org/10.3390/stats6020040","url":null,"abstract":"The goal of this study is to detect linear and nonlinear causal pathways toward climate change as measured by changes in global mean surface temperature and global mean sea level over time using a data-based approach in contrast to the traditional physics-based models. Monthly data on potential climate change causal factors, including greenhouse gas concentrations, sunspot numbers, humidity, ice sheets mass, and sea ice coverage, from January 2003 to December 2021, have been utilized in the analysis. We first applied the vector autoregressive model (VAR) and Granger causality test to gauge the linear Granger causal relationships among climate factors. We then adopted the vector error correction model (VECM) as well as the autoregressive distributed lag model (ARDL) to quantify the linear long-run equilibrium and the linear short-term dynamics. Cointegration analysis has also been adopted to examine the dual directional Granger causalities. Furthermore, in this work, we have presented a novel pipeline based on the artificial neural network (ANN) and the VAR and ARDL models to detect nonlinear causal relationships embedded in the data. The results in this study indicate that the global sea level rise is affected by changes in ice sheet mass (both linearly and nonlinearly), global mean temperature (nonlinearly), and the extent of sea ice coverage (nonlinearly and weakly); whereas the global mean temperature is affected by the global surface mean specific humidity (both linearly and nonlinearly), greenhouse gas concentration as measured by the global warming potential (both linearly and nonlinearly) and the sunspot number (only nonlinearly and weakly). Furthermore, the nonlinear neural network models tend to fit the data closer than the linear models as expected due to the increased parameter dimension of the neural network models. Given that the information criteria are not generally applicable to the comparison of neural network models and statistical time series models, our next step is to examine the robustness and compare the forecast accuracy of these two models using the soon-available 2022 monthly data.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43183566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-05-05DOI: 10.3390/stats6020038
Elena Barzizza, Nicolò Biasetton, R. Ceccato, L. Salmaso
{"title":"Big Data Analytics and Machine Learning in Supply Chain 4.0: A Literature Review","authors":"Elena Barzizza, Nicolò Biasetton, R. Ceccato, L. Salmaso","doi":"10.3390/stats6020038","DOIUrl":"https://doi.org/10.3390/stats6020038","url":null,"abstract":"Owing to the development of the technologies of Industry 4.0, recent years have witnessed the emergence of a new concept of supply chain management, namely Supply Chain 4.0 (SC 4.0). Huge investments in information technology have enabled manufacturers to trace the intangible flow of information, but instruments are required to take advantage of the available data sources: big data analytics (BDA) and machine learning (ML) represent important tools for this task. Use of advanced technologies can improve supply chain performances and support reaching strategic goals, but their implementation is challenging in supply chain management. The aim of this study was to understand the main benefits, challenges, and areas of application of BDA and ML in SC 4.0 as well as to understand the BDA and ML techniques most commonly used in the field, with a particular focus on nonparametric techniques. To this end, we carried out a literature review. From our analysis, we identified three main gaps, namely, the need for appropriate analytical tools to manage challenging data configurations; the need for a more reliable link with practice; the need for instruments to select the most suitable BDA or ML techniques. As a solution, we suggest and comment on two viable solutions: nonparametric statistics, and sentiment analysis and clustering.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43796258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-05-04DOI: 10.3390/stats6020037
G. Ougolnitsky, A. Korolev
{"title":"Game-Theoretic Models of Coopetition in Cournot Oligopoly","authors":"G. Ougolnitsky, A. Korolev","doi":"10.3390/stats6020037","DOIUrl":"https://doi.org/10.3390/stats6020037","url":null,"abstract":"Coopetition means that in economic interactions, both competition and cooperation are presented in the same time. We built and investigated analytically and numerically game theoretic models of coopetition in normal form and in the form of characteristic function. The basic model in normal form reflects competition between firms in Cournot oligopoly and their cooperation in mutually profitable activities such as marketing, R&D, and environmental protection. Each firm divides its resource between competition and cooperation. In the model in normal form we study Nash and Stackelberg settings and compare the results. In cooperative setting we consider Neumann–Morgenstern, Petrosyan–Zaccour, and Gromova–Petrosyan versions of characteristic functions and calculate the respective Shapley values. The payoffs in all cases are compared, and the respective conclusions about the relative efficiency of different ways of organization for separate agents and the whole society are made.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43979754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-04-28DOI: 10.3390/stats6020036
Yiming Chen, P. Smith, Mei-Ling Ting Lee
{"title":"Causal Inference in Threshold Regression and the Neural Network Extension (TRNN)","authors":"Yiming Chen, P. Smith, Mei-Ling Ting Lee","doi":"10.3390/stats6020036","DOIUrl":"https://doi.org/10.3390/stats6020036","url":null,"abstract":"The first-hitting-time based model conceptualizes a random process for subjects’ latent health status. The time-to-event outcome is modeled as the first hitting time of the random process to a pre-specified threshold. Threshold regression with linear predictors has numerous benefits in causal survival analysis, such as the estimators’ collapsibility. We propose a neural network extension of the first-hitting-time based threshold regression model. With the flexibility of neural networks, the extended threshold regression model can efficiently capture complex relationships among predictors and underlying health processes while providing clinically meaningful interpretations, and also tackle the challenge of high-dimensional inputs. The proposed neural network extended threshold regression model can further be applied in causal survival analysis, such as performing as the Q-model in G-computation. More efficient causal estimations are expected given the algorithm’s robustness. Simulations were conducted to validate estimator collapsibility and threshold regression G-computation. The performance of the neural network extended threshold regression model is also illustrated by using simulated and real high-dimensional data from an observational study.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44349710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-04-25DOI: 10.3390/stats6020035
E. Verykouki, Chris Nakas
{"title":"Adaptations on the Use of p-Values for Statistical Inference: An Interpretation of Messages from Recent Public Discussions","authors":"E. Verykouki, Chris Nakas","doi":"10.3390/stats6020035","DOIUrl":"https://doi.org/10.3390/stats6020035","url":null,"abstract":"P-values have played a central role in the advancement of research in virtually all scientific fields; however, there has been significant controversy over their use. “The ASA president’s task force statement on statistical significance and replicability” has provided a solid basis for resolving the quarrel, but although the significance part is clearly dealt with, the replicability part raises further discussions. Given the clear statement regarding significance, in this article, we consider the validity of p-value use for statistical inference as de facto. We briefly review the bibliography regarding the relevant controversy in recent years and illustrate how already proposed approaches, or slight adaptations thereof, can be readily implemented to address both significance and reproducibility, adding credibility to empirical study findings. The definitions used for the notions of replicability and reproducibility are also clearly described. We argue that any p-value must be reported along with its corresponding s-value followed by (1−α)% confidence intervals and the rejection replication index.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44422807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-04-21DOI: 10.3390/stats6020033
T. Hill, Rosalind Arden
{"title":"Recurring Errors in Studies of Gender Differences in Variability","authors":"T. Hill, Rosalind Arden","doi":"10.3390/stats6020033","DOIUrl":"https://doi.org/10.3390/stats6020033","url":null,"abstract":"The past quarter century has seen a resurgence of research on the controversial topic of gender differences in variability, in part because of its potential implications for the issue of under- and over-representation of various subpopulations of our society, with respect to different traits. Unfortunately, several basic statistical, inferential, and logical errors are being propagated in studies on this highly publicized topic. These errors include conflicting interpretations of the numerical significance of actual variance ratio values; a mistaken claim about variance ratios in mixtures of distributions; incorrect inferences from variance ratio values regarding the relative roles of sociocultural and biological factors; and faulty experimental designs. Most importantly, without knowledge of the underlying distributions, the standard variance ratio test statistic is shown to have no implications for tail ratios. The main aim of this note is to correct the scientific record and to illuminate several of these key errors in order to reduce their further propagation. For concreteness, the arguments will focus on one highly influential paper.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47022726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-04-15DOI: 10.3390/stats6020032
Lucio Palazzo, Riccardo Ievoli
{"title":"Detecting Regional Differences in Italian Health Services during Five COVID-19 Waves","authors":"Lucio Palazzo, Riccardo Ievoli","doi":"10.3390/stats6020032","DOIUrl":"https://doi.org/10.3390/stats6020032","url":null,"abstract":"During the waves of the COVID-19 pandemic, both national and/or territorial healthcare systems have been severely stressed in many countries. The availability (and complexity) of data requires proper comparisons for understanding differences in the performance of health services. With this aim, we propose a methodological approach to compare the performance of the Italian healthcare system at the territorial level, i.e., considering NUTS 2 regions. Our approach consists of three steps: the choice of a distance measure between available time series, the application of weighted multidimensional scaling (wMDS) based on this distance, and, finally, a cluster analysis on the MDS coordinates. We separately consider daily time series regarding the deceased, intensive care units, and ordinary hospitalizations of patients affected by COVID-19. The proposed procedure identifies four clusters apart from two outlier regions. Changes between the waves at a regional level emerge from the main results, allowing the pressure on territorial health services to be mapped between 2020 and 2022.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42503535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-04-11DOI: 10.3390/stats6020031
Keiji Takai, Kenichi Hayashi
{"title":"Model Selection with Missing Data Embedded in Missing-at-Random Data","authors":"Keiji Takai, Kenichi Hayashi","doi":"10.3390/stats6020031","DOIUrl":"https://doi.org/10.3390/stats6020031","url":null,"abstract":"When models are built with missing data, an information criterion is needed to select the best model among the various candidates. Using a conventional information criterion for missing data may lead to the selection of the wrong model when data are not missing at random. Conventional information criteria implicitly assume that any subset of missing-at-random data is also missing at random, and thus the maximum likelihood estimator is assumed to be consistent; that is, it is assumed that the estimator will converge to the true value. However, this assumption may not be practical. In this paper, we develop an information criterion that works even for not-missing-at-random data, so long as the largest missing data set is missing at random. Simulations are performed to show the superiority of the proposed information criterion over conventional criteria.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43087422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-03-24DOI: 10.3390/stats6020030
G. Modanese
{"title":"The Network Bass Model with Behavioral Compartments","authors":"G. Modanese","doi":"10.3390/stats6020030","DOIUrl":"https://doi.org/10.3390/stats6020030","url":null,"abstract":"A Bass diffusion model is defined on an arbitrary network, with the additional introduction of behavioral compartments, such that nodes can have different probabilities of receiving the information/innovation from the source and transmitting it to other nodes. The dynamics are described by a large system of non-linear ordinary differential equations, whose numerical solutions can be analyzed in dependence on diffusion parameters, network parameters, and relations between the compartments. For example, in a simple case with two compartments (Enthusiasts and Sceptics about the innovation), we consider cases in which the “publicity” and imitation terms act differently on the compartments, and individuals from one compartment do not imitate those of the other, thus increasing the polarization of the system and creating sectors of the population where adoption becomes very slow. For some categories of scale-free networks, we also investigate the dependence on the features of the networks of the diffusion peak time and of the time at which adoptions reach 90% of the population.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43460600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-03-19DOI: 10.3390/stats6010029
Zheng Xu, Song Yan, Shuai Yuan, Cong Wu, Sixia Chen, Zifang Guo, Yun Li
{"title":"Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data","authors":"Zheng Xu, Song Yan, Shuai Yuan, Cong Wu, Sixia Chen, Zifang Guo, Yun Li","doi":"10.3390/stats6010029","DOIUrl":"https://doi.org/10.3390/stats6010029","url":null,"abstract":"Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths d≥2.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42241283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}