StatsPub Date : 2025-03-01Epub Date: 2025-01-14DOI: 10.3390/stats8010007
Wen-Shan Liu, Tong Si, Aldas Kriauciunas, Marcus Snell, Haijun Gong
{"title":"Bidirectional f-Divergence-Based Deep Generative Method for Imputing Missing Values in Time-Series Data.","authors":"Wen-Shan Liu, Tong Si, Aldas Kriauciunas, Marcus Snell, Haijun Gong","doi":"10.3390/stats8010007","DOIUrl":"10.3390/stats8010007","url":null,"abstract":"<p><p>Imputing missing values in high-dimensional time-series data remains a significant challenge in statistics and machine learning. Although various methods have been proposed in recent years, many struggle with limitations and reduced accuracy, particularly when the missing rate is high. In this work, we present a novel f-divergence-based bidirectional generative adversarial imputation network, tf-BiGAIN, designed to address these challenges in time-series data imputation. Unlike traditional imputation methods, tf-BiGAIN employs a generative model to synthesize missing values without relying on distributional assumptions. The imputation process is achieved by training two neural networks, implemented using bidirectional modified gated recurrent units, with f-divergence serving as the objective function to guide optimization. Compared to existing deep learning-based methods, tf-BiGAIN introduces two key innovations. First, the use of f-divergence provides a flexible and adaptable framework for optimizing the model across diverse imputation tasks, enhancing its versatility. Second, the use of bidirectional gated recurrent units allows the model to leverage both forward and backward temporal information. This bidirectional approach enables the model to effectively capture dependencies from both past and future observations, enhancing its imputation accuracy and robustness. We applied tf-BiGAIN to analyze two real-world time-series datasets, demonstrating its superior performance in imputing missing values and outperforming existing methods in terms of accuracy and robustness.</p>","PeriodicalId":93142,"journal":{"name":"Stats","volume":"8 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11793919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143257500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating Risk Factors for Racial Disparity in E-Cigarette Use with PATH Study.","authors":"Amy Liu, Kennedy Dorsey, Almetra Granger, Ty-Runet Bryant, Tung-Sung Tseng, Michael Celestin, Qingzhao Yu","doi":"10.3390/stats7030037","DOIUrl":"10.3390/stats7030037","url":null,"abstract":"<p><strong>Background: </strong>Previous research has identified differences in e-cigarette use and socioeconomic factors between different racial groups However, there is little research examining specific risk factors contributing to the racial differences.</p><p><strong>Objective: </strong>This study sought to identify racial disparities in e-cigarette use and to determine risk factors that help explain these differences.</p><p><strong>Methods: </strong>We used Wave 5 (2018-2019) of the Adult Population Assessment of Tobacco and Health (PATH) Study. First, we conducted descriptive statistics of e-smoking across our risk factor variables. Next, we used multiple logistic regression to check the risk effects by adjusting all covariates. Finally, we conducted a mediation analysis to determine whether identified factors showed evidence of influencing the association between race and e-cigarette use. All analyses were performed in R or SAS. The R package mma was used for the mediation analysis.</p><p><strong>Results: </strong>Between Hispanic and non-Hispanic White populations, our potential risk factors collectively explain 17.5% of the racial difference, former cigarette smoking explains 7.6%, receiving e-cigarette advertising 2.6%, and perception of e-cigarette harm explains 27.8% of the racial difference. Between non-Hispanic Black and non-Hispanic White populations, former cigarette smoking, receiving e-cigarette advertising, and perception of e-cigarette harm explain 5.2%, 1.8%, and 6.8% of the racial difference, respectively. E-cigarette use is most prevalent in the non-Hispanic White population compared to non-Hispanic Black and Hispanic populations, which may be explained by former cigarette smoking, exposure to e-cigarette advertising, and e-cigarette harm perception.</p><p><strong>Conclusions: </strong>These findings suggest that racial differences in e-cigarette use may be reduced by increasing knowledge of the dangers associated with e-cigarette use and reducing exposure to e-cigarette advertisements. This comprehensive analysis of risk factors can be used to significantly guide smoking cessation efforts and address potential health burden disparities arising from differences in e-cigarette usage.</p>","PeriodicalId":93142,"journal":{"name":"Stats","volume":"7 3","pages":"613-626"},"PeriodicalIF":0.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-01-10DOI: 10.3390/stats7010003
Nathaniel E. Helwig
{"title":"Precise Tensor Product Smoothing via Spectral Splines","authors":"Nathaniel E. Helwig","doi":"10.3390/stats7010003","DOIUrl":"https://doi.org/10.3390/stats7010003","url":null,"abstract":"Tensor product smoothers are frequently used to include interaction effects in multiple nonparametric regression models. Current implementations of tensor product smoothers either require using approximate penalties, such as those typically used in generalized additive models, or costly parameterizations, such as those used in smoothing spline analysis of variance models. In this paper, I propose a computationally efficient and theoretically precise approach for tensor product smoothing. Specifically, I propose a spectral representation of a univariate smoothing spline basis, and I develop an efficient approach for building tensor product smooths from marginal spectral spline representations. The developed theory suggests that current tensor product smoothing methods could be improved by incorporating the proposed tensor product spectral smoothers. Simulation results demonstrate that the proposed approach can outperform popular tensor product smoothing implementations, which supports the theoretical results developed in the paper.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"59 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-01-08DOI: 10.3390/stats7010002
Mulubrhan G. Haile, Lingling Zhang, David J. Olive
{"title":"Predicting Random Walks and a Data-Splitting Prediction Region","authors":"Mulubrhan G. Haile, Lingling Zhang, David J. Olive","doi":"10.3390/stats7010002","DOIUrl":"https://doi.org/10.3390/stats7010002","url":null,"abstract":"Perhaps the first nonparametric, asymptotically optimal prediction intervals are provided for univariate random walks, with applications to renewal processes. Perhaps the first nonparametric prediction regions are introduced for vector-valued random walks. This paper further derives nonparametric data-splitting prediction regions, which are underpinned by very simple theory. Some of the prediction regions can be used when the data distribution does not have first moments, and some can be used for high-dimensional data, where the number of predictors is larger than the sample size. The prediction regions can make use of many estimators of multivariate location and dispersion.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"53 36","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139447266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-12-30DOI: 10.3390/stats7010001
Shieh-Liang Chen, Kuo-Liang Chen
{"title":"The Mediating Impact of Innovation Types in the Relationship between Innovation Use Theory and Market Performance","authors":"Shieh-Liang Chen, Kuo-Liang Chen","doi":"10.3390/stats7010001","DOIUrl":"https://doi.org/10.3390/stats7010001","url":null,"abstract":"The ultimate goal of innovation is to improve performance. But if people’s needs and uses are ignored, innovation will only be a formality. In the past, research on innovation mostly focused on technology, processes, business models, services, and organizations. The measurement of innovation focuses on capabilities, processes, results, and methods, but there has always been a lack of pre-innovation measurements and tools. This study is the first to use the innovation use theory proposed by Christensen et al. combined with innovation types, and it uses the measurement focus on the early stage of innovation as a post-innovation performance prediction. This study collected 590 valid samples and used SPSS and the four-step BK method to conduct regression analysis and mediation tests. The empirical results obtained the following: (1) a confirmed model and scale of the innovation use theory; (2) that three constructs of innovation use theory have an impact on market performance; and (3) that innovation types acting as mediators will improve market performance. This study establishes an academic model of the innovation use theory to provide a clear scale tool for subsequent research. In practice, it can first measure the direction of innovation and performance prediction, providing managers with a reference when developing new products and applying market strategies.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139138519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-12-12DOI: 10.3390/stats6040082
Julien Chevallier, Bilel Sanhaji
{"title":"Jump-Robust Realized-GARCH-MIDAS-X Estimators for Bitcoin and Ethereum Volatility Indices","authors":"Julien Chevallier, Bilel Sanhaji","doi":"10.3390/stats6040082","DOIUrl":"https://doi.org/10.3390/stats6040082","url":null,"abstract":"In this paper, we conducted an empirical investigation of the realized volatility of cryptocurrencies using an econometric approach. This work’s two main characteristics are: (i) the realized volatility to be forecast filters jumps, and (ii) the benefit of using various historical/implied volatility indices from brokers as exogenous variables was explicitly considered. We feature a jump-robust extension of the REGARCH-MIDAS-X model incorporating realized beta GARCH processes and MIDAS filters with monthly, daily, and hourly components. First, we estimated six jump-robust estimators of realized volatility for Bitcoin and Ethereum that were retained as the dependent variable. Second, we inserted ten Bitcoin and Ethereum volatility indices gathered from various exchanges as an exogenous variable, each at a time. Third, we explored their forecasting ability based on the MSE and QLIKE statistics. Our sample spanned the period from May 2018 to January 2023. The main result featured the best predictors among the volatility indices for Bitcoin and Ethereum derived from 30-day implied volatility. The significance of the findings could mostly be attributable to the ability of our new model to incorporate financial and technological variables directly into the specification of the Bitcoin and Ethereum volatility dynamics.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"3 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139007733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-12-05DOI: 10.3390/stats6040081
Aris Spanos
{"title":"Revisiting the Large n (Sample Size) Problem: How to Avert Spurious Significance Results","authors":"Aris Spanos","doi":"10.3390/stats6040081","DOIUrl":"https://doi.org/10.3390/stats6040081","url":null,"abstract":"Although large data sets are generally viewed as advantageous for their ability to provide more precise and reliable evidence, it is often overlooked that these benefits are contingent upon certain conditions being met. The primary condition is the approximate validity (statistical adequacy) of the probabilistic assumptions comprising the statistical model Mθ(x) applied to the data. In the case of a statistically adequate Mθ(x) and a given significance level α, as n increases, the power of a test increases, and the p-value decreases due to the inherent trade-off between type I and type II error probabilities in frequentist testing. This trade-off raises concerns about the reliability of declaring ‘statistical significance’ based on conventional significance levels when n is exceptionally large. To address this issue, the author proposes that a principled approach, in the form of post-data severity (SEV) evaluation, be employed. The SEV evaluation represents a post-data error probability that converts unduly data-specific ‘accept/reject H0 results’ into evidence either supporting or contradicting inferential claims regarding the parameters of interest. This approach offers a more nuanced and robust perspective in navigating the challenges posed by the large n problem.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"68 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138598495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Process Monitoring Using Truncated Gamma Distribution","authors":"Sajid Ali, Shayaan Rajput, Ismail Shah, Hassan Houmani","doi":"10.3390/stats6040080","DOIUrl":"https://doi.org/10.3390/stats6040080","url":null,"abstract":"The time-between-events idea is commonly used for monitoring high-quality processes. This study aims to monitor the increase and/or decrease in the process mean rapidly using a one-sided exponentially weighted moving average (EWMA) chart for the detection of upward or downward mean shifts using a truncated gamma distribution. The use of the truncation method helps to enhance and improve the sensitivity of the proposed chart. The performance of the proposed chart with known and estimated parameters is analyzed by using the run length properties, including the average run length (ARL) and standard deviation run length (SDRL), through extensive Monte Carlo simulation. The numerical results show that the proposed scheme is more sensitive than the existing ones. Finally, the chart is implemented in real-world situations to highlight the significance of the proposed chart.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138613377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-11-29DOI: 10.3390/stats6040079
A. Adebanji, Franz Aschl, Ednah Chepkemoi Chumo, Emmanuel Odame Owiredu, Johannes Müller, Tukae Mbegalo
{"title":"Social Response and Measles Dynamics","authors":"A. Adebanji, Franz Aschl, Ednah Chepkemoi Chumo, Emmanuel Odame Owiredu, Johannes Müller, Tukae Mbegalo","doi":"10.3390/stats6040079","DOIUrl":"https://doi.org/10.3390/stats6040079","url":null,"abstract":"Measles remains one of the leading causes of death among young children globally, even though a safe and cost-effective vaccine is available. Vaccine hesitancy and social response to vaccination continue to undermine efforts to eradicate measles. In this study, we consider data about measles vaccination and measles prevalence in Germany for the years 2008–2012 in 345 districts. In the first part of the paper, we show that the probability of a local outbreak does not significantly depend on the vaccination coverage, but—if an outbreak does take place—the scale of the outbreak depends significantly on the vaccination coverage. Additionally, we show that the willingness to be vaccinated is significantly increased by local outbreaks, with a delay of about one year. In the second part of the paper, we consider a deterministic delay model to investigate the consequences of the statistical findings on the dynamics of the infection. Here, we find that the delay might induce oscillations if the vaccination coverage is rather low and the social response to an outbreak is sufficiently strong. The relevance of our findings is discussed at the end of the paper.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139212369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-11-21DOI: 10.3390/stats6040078
R. Guerra, Fernando A. Peña-Ramírez, G. Cordeiro
{"title":"The Logistic Burr XII Distribution: Properties and Applications to Income Data","authors":"R. Guerra, Fernando A. Peña-Ramírez, G. Cordeiro","doi":"10.3390/stats6040078","DOIUrl":"https://doi.org/10.3390/stats6040078","url":null,"abstract":"We define and study the four-parameter logistic Burr XII distribution. It is obtained by inserting the three-parameter Burr XII distribution as the baseline in the logistic-X family and may be a useful alternative method to model income distribution and could be applied to other areas. We illustrate that the new distribution can have decreasing and upside-down-bathtub hazard functions and that its density function is an infinite linear combination of Burr XII densities. Some mathematical properties of the proposed model are determined, such as the quantile function, ordinary and incomplete moments, and generating function. We also obtain the maximum likelihood estimators of the model parameters and perform a Monte Carlo simulation study. Further, we present a parametric regression model based on the introduced distribution as an alternative to the location-scale regression model. The potentiality of the new distribution is illustrated by means of two applications to income data sets.","PeriodicalId":93142,"journal":{"name":"Stats","volume":"92 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139252783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}