{"title":"Directional False Discovery Rate Control in Large-Scale Multiple Testing Under Data Dependence","authors":"Wendong Li, Jianqing Shi, Yi Wang, Dongdong Xiang","doi":"10.1002/asmb.70041","DOIUrl":"https://doi.org/10.1002/asmb.70041","url":null,"abstract":"<div>\u0000 \u0000 <p>Detecting directional signals in multiple testing is crucial to take targeted and effective measures. In this article, we consider the directional multiple testing under the dependence problem within a three-group model. Given the assumption that the observed data are generated according to an underlying three-state hidden Markov model, we develop oracle and data-driven procedures to maximize the expected number of true discoveries (ETD) while controlling the false discovery rates (FDRs) of both alternative states at their nominal levels. It is shown theoretically that the proposed directional multiple testing procedures are valid and have certain optimality properties for directional FDR-control. An extensive numerical study shows that our procedures are significantly more powerful than their competitors since the former can accommodate the dependence structure among hypotheses. The proposed procedures also exhibit high flexibility by allowing different nominal levels for the two alternative states, which is appealing in cases when the false discoveries of different alternative states are not equally important. As a demonstration, the proposed data-driven procedure is applied to learn the transcriptomic characteristics of bronchoalveolar lavage fluid in COVID-19 patients.</p>\u0000 </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144923789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Éder S. Brito, Vera L. D. Tomazella, Paulo H. Ferreira, Francisco Louzada
{"title":"Bayesian Analysis of Shared Frailty Models for Repairable Systems Subject to Imperfect Repair","authors":"Éder S. Brito, Vera L. D. Tomazella, Paulo H. Ferreira, Francisco Louzada","doi":"10.1002/asmb.70039","DOIUrl":"https://doi.org/10.1002/asmb.70039","url":null,"abstract":"<div>\u0000 \u0000 <p>Repairable systems, crucial in reliability studies, are characterized by recurrent failure times modeled as counting processes with intensity functions. This paper explores models for these failure times incorporating imperfect repairs, addressing unobserved heterogeneity via shared frailty models. In this context, our approach involves scenarios with general imperfect repairs, which offer a more realistic perspective compared to the minimal or perfect repair assumptions commonly employed in the reliability literature. We propose hierarchical Bayesian methods to estimate parameters, leveraging the Power-Law Process for initial intensities and gamma distributions for frailty terms. Bayesian methods are highly flexible and can accommodate complex shared frailty models that include random effects and dependencies between units. Applying Bayesian inference with gamma and beta distribution priors, coupled with Monte Carlo simulations, provides a robust methodology for estimating unknown parameters and deriving posterior distributions. This flexibility is crucial for capturing the underlying structure of the data in repairable systems with imperfect repairs. Our hierarchical Bayesian framework accommodates multiple systems, providing insights into failure processes and supporting enhanced maintenance strategies. We demonstrate our approach using a real failure times dataset and evaluate its performance through simulation studies, showcasing its applicability and relevance in practical settings.</p>\u0000 </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajat Das, Yogesh Mani Tripathi, Liang Wang, Shuo-Jye Wu
{"title":"Inference for Simple Step Stress Accelerated Life Test Model Under Progressively Censored Gompertz Data","authors":"Rajat Das, Yogesh Mani Tripathi, Liang Wang, Shuo-Jye Wu","doi":"10.1002/asmb.70037","DOIUrl":"https://doi.org/10.1002/asmb.70037","url":null,"abstract":"<div>\u0000 \u0000 <p>In this article analysis of a simple step-stress accelerated life test is considered under progressive type-II censoring. A cumulative exposure model is considered when the latent lifetimes of test units follow the Gompertz distribution with different shape parameters and a common scale parameter. We explore the study by estimating all unknown parameters using classical and Bayesian techniques. The model parameters are estimated using maximum likelihood and Bayesian methods. Subsequently, interval estimates are derived based on the observed Fisher information matrix. Bayesian estimates are obtained using squared error and linear exponential loss functions. Subsequently highest posterior density intervals are also constructed. We examine the efficiency of all estimators through simulation studies. Finally, we provide a real-life example in support of the considered model.</p>\u0000 </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144897464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Albano, Chiara Di Maria, Mariangela Sciandra, Antonella Plaia
{"title":"Causal Forests for Discovering Diagnostic Language in Electronic Health Records","authors":"Alessandro Albano, Chiara Di Maria, Mariangela Sciandra, Antonella Plaia","doi":"10.1002/asmb.70038","DOIUrl":"https://doi.org/10.1002/asmb.70038","url":null,"abstract":"<p>Textual analysis has gained significant interest in medical research, particularly for automated patient diagnosis based on clinical narratives. While traditional approaches often focus on associational methods, this paper explores the application of causal forests to analyze textual data from electronic health records (EHRs), aiming to identify causal relationships between specific words and the likelihood of receiving certain medical diagnoses. Utilizing the MIMIC-III dataset, we assess how linguistic factors influence diagnosis probabilities for three conditions: diabetes, hypothyroidism, and adrenal gland disorders. Our findings reveal significant causal links between certain clinical terms and diagnosis probabilities, emphasizing the potential of causal inference techniques to improve the analysis of language in clinical narratives. Additionally, we uncover heterogeneity in treatment effects, demonstrating that specific words can identify high-risk patient subgroups. This study highlights the importance of integrating causal inference in natural language processing within healthcare settings.</p>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144897465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliability Inference in GLFP Models Based on EM Algorithm With Related Application","authors":"Chih-Ying Tai, Tsai-Hung Fan","doi":"10.1002/asmb.70030","DOIUrl":"https://doi.org/10.1002/asmb.70030","url":null,"abstract":"<div>\u0000 \u0000 <p>During the manufacturing processes for the integrated circuit (IC) products, defective units may not be screened out by the quality inspections. The defective units often lead to infant mortality failure in the early stages of operation, while non-defective units will eventually fail due to wear-out failure. The general limited failure population (GLFP) model can be used to describe such a phenomenon in which defective units induce failure affected by both failure mechanisms, but failure of non-defective units is only due to wear-out. Besides, when a failure occurs, it is not known whether it is defective and yet which failure mode causes the failure. This article proposes an EM algorithm along with the missing information principle for the GLFP models under multiply censored Weibull distributions to simplify the maximum likelihood (ML) inference. It resolves the computational instability and provides more accurate reliability inference. With the embedded latent variables, failure mode detection and defect identification are also made for masked data, consequently. Furthermore, the proposed method can be extended to the GLFP models of interval data. The simulation study shows that the proposed method provides more accurate results. Two illustrative examples highlight the feasibility and advantages of the proposed approach.</p>\u0000 </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144767709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul Pao-Yen Wu, Yu Yi Yu, Liam A. Toohey, Michael Drew, Scott A. Sisson, Clara Grazian, Kerrie Mengersen
{"title":"Rejoinder to Next Generation Models for Subsequent Sports Injuries by Wu et al.","authors":"Paul Pao-Yen Wu, Yu Yi Yu, Liam A. Toohey, Michael Drew, Scott A. Sisson, Clara Grazian, Kerrie Mengersen","doi":"10.1002/asmb.70035","DOIUrl":"https://doi.org/10.1002/asmb.70035","url":null,"abstract":"<p>We greatly appreciate the commentary and positive feedback of discussants Prof. Jialiang Li and Dr. Rhythm Grover to enrich our paper and its context.</p><p>As noted by Prof. Li, survival models are highly applicable to the subsequent sports injury problem given the temporal dimension of injury data. In the sporting context, censoring can arise, for example, from finite surveillance windows associated with a sporting season, athletes joining and leaving a team, or even extended absence due to injury [<span>1, 2</span>]. However, given the complex systems nature of individual athletes and potentially changing dynamics and susceptibility to injury over time, it is also important to capture the changing state of the athlete explicitly [<span>3</span>]. For example, increasing strength with training over a season could reduce injury risk; however, a serious injury such as an ACL injury could lead to increased susceptibility to subsequent injuries.</p><p>Our paper presented a pragmatic approach, as noted by Dr. Grover, to tackle the challenges of modeling subsequent injury, reducing dimensionality through a time-varying Cox Proportional Hazards (PH) model, and using a discrete-time HMM to capture changes in susceptibility and covariate effects over time. Both Prof. Li and Dr. Grover note the potential computational challenge associated with Hidden Markov Models (HMMs) especially in the presence of large-scale and high-dimensional datasets. Hence, the need for dimension reduction, which was undertaken using survival modeling to explicitly cater for the time-to-event nature of injury data and censoring. The appropriateness of using the survival model was supported by checks of the assumptions of the PH model (e.g., proportional hazards, Schoenfeld residuals) and validation results (concordance index) as reported in our paper.</p><p>In addition to computational complexity, however, is the somewhat associated challenge of model convergence. Greater model complexity, such as more HMM states or more model covariates, can lead to challenges with model identifiability, estimation, computation, and thus model convergence [<span>4</span>]. This is a current research challenge when faced with limited data as in our subsequent injury application, which is limited to 33 players and 2523 training and competition sessions over one season. Computationally, the proposed discrete-time HMM fitted with Expectation Maximization (EM) took approximately 155 s to converge for the entire team of players over one season, compared to less than a second for the Cox PH model. However, model convergence with more than two states could not be achieved with this limited dataset. Therefore, although the computational cost is feasible in this case study, the data available can limit the level of model complexity that can be achieved. Hence, it highlights the utility of the proposed combination of dimension reduction and state space modelling as a more generalizable approach, and th","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacob Majakwara, Patrick L. Mthisi, Honest W. Chipoyera
{"title":"Enhancing Credit Risk Management Through Integration of Multiple Imputation Methodology and Long-Term Survival Modelling","authors":"Jacob Majakwara, Patrick L. Mthisi, Honest W. Chipoyera","doi":"10.1002/asmb.70027","DOIUrl":"https://doi.org/10.1002/asmb.70027","url":null,"abstract":"<p>Credit risk management plays a crucial role in financial institutions by identifying, assessing and controlling the credit risks arising from lending activities. However, missing data pose a common problem in credit risk modelling, leading to biased estimates and a loss of statistical power. To address this issue and improve predictive accuracy, multiple imputation methods are increasingly employed. This study evaluates the performance of the Multivariate Imputation by Chained Equations (MICE) method in identifying factors associated with time to default, using the publicly available Prosper personal loan data. The analysis is conducted within the framework of mixture cure rate models based on the generalised gamma family of distributions. This research is the first of its kind to integrate the MICE approach into mixture cure rate modelling. The flexibility of the generalised gamma distribution was utilised to select the optimal mixture cure rate model. The estimated cure rate using complete cases (CC) was higher than that obtained using MICE imputation. This highlights the potential pitfalls of solely relying on CC analysis in survival analysis.</p>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144673029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wilmer Martínez-Rivera, Eliana González-Molano, Edgar Caicedo-Garcia
{"title":"Forecasting Inflation From Disaggregated Data","authors":"Wilmer Martínez-Rivera, Eliana González-Molano, Edgar Caicedo-Garcia","doi":"10.1002/asmb.70023","DOIUrl":"https://doi.org/10.1002/asmb.70023","url":null,"abstract":"<div>\u0000 \u0000 <p>We forecast inflation aggregates for the United States, the United Kingdom, and Colombia using forecasts aggregation of disaggregates and forecasts obtained directly from the aggregate. We implement helpful models for many predictors, such as dimension reduction, shrinkage methods, machine learning models, and traditional time-series models (ARIMA and TAR). We evaluate out-sample forecasts for the period before COVID-19 and the period afterward. It was found that the aggregation of forecasts performs as well as the forecast using the aggregate directly. In some cases, there is a reduction in the forecast error from the disaggregate analysis.</p>\u0000 </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Quality: What if Deming Were Born Today?","authors":"Dennis K. J. Lin, Nicholas Rios","doi":"10.1002/asmb.70025","DOIUrl":"https://doi.org/10.1002/asmb.70025","url":null,"abstract":"<p>If Francis Bacon were born today, he might have said “data is power” instead of his original saying, “knowledge is power.” In modern society, data is everywhere. In memory of Deming (a guru in quality), this paper attempts to address the fundamental issue of data quality and how Deming would handle it. Specifically, we attempt to explain what data quality really means, and the critical impact that it has on data science. Statisticians, who understand how to collect high quality data, have much more to contribute to both the intellectual vitality and the practical utility of data science. At the same time, data science challenges statisticians to move out of some familiar habits to engage less structured problems, to become more comfortable with ambiguity, and to engage more scientists in a fruitful discussion on what various parties can bring to this new mode of investigation. Some potential avenues for future research in the collection of high-quality data will be proposed.</p>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144514730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topic-Sentiment Hybrid Networks for Explainable Document Clustering: A Probabilistic Multi-Dimensional Similarity Analysis","authors":"Marco Ortu","doi":"10.1002/asmb.70024","DOIUrl":"https://doi.org/10.1002/asmb.70024","url":null,"abstract":"<p>This study introduces a statistical methodology for document clustering that integrates multiple dimensions of textual similarity through network topology analysis. The proposed methodology, which we call Multi-dimensional Similarity Network Analysis (MSNA), extends traditional document-clustering approaches by combining semantic embeddings, topic probability distributions, and emotional probability distribution into a unified similarity measure. We formalize this through a weighted combination of Jensen-Shannon divergences across different probability spaces, creating a comprehensive similarity network. The clustering is achieved through a community detection algorithm that optimizes a multi-objective modularity function, accounting for the different similarity dimensions. We prove the statistical consistency of our approach and derive bounds for the clustering performance under mild regularity conditions. The methodology is validated on a large-scale data set of Airbnb reviews <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mo>(</mo>\u0000 <mi>n</mi>\u0000 <mo>=</mo>\u0000 <mn>114</mn>\u0000 <mo>,</mo>\u0000 <mn>000</mn>\u0000 <mo>)</mo>\u0000 </mrow>\u0000 <annotation>$$ left(n=114,000right) $$</annotation>\u0000 </semantics></math> from Sardinia, Italy, containing text content, topic distributions, and emotional features. Results show significant improvements in both clustering quality (average silhouette score increased) and interpretability compared to traditional single-dimension approaches. From an empirical perspective, the synthetic data validation demonstrates robust performance with topic strength in the range <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mo>[</mo>\u0000 <mn>0</mn>\u0000 <mo>.</mo>\u0000 <mn>4</mn>\u0000 <mo>,</mo>\u0000 <mn>1</mn>\u0000 <mo>.</mo>\u0000 <mn>0</mn>\u0000 <mo>]</mo>\u0000 </mrow>\u0000 <annotation>$$ left[0.4,1.0right] $$</annotation>\u0000 </semantics></math> and emotion strength in <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mo>[</mo>\u0000 <mn>0</mn>\u0000 <mo>.</mo>\u0000 <mn>2</mn>\u0000 <mo>,</mo>\u0000 <mn>1</mn>\u0000 <mo>.</mo>\u0000 <mn>0</mn>\u0000 <mo>]</mo>\u0000 </mrow>\u0000 <annotation>$$ left[0.2,1.0right] $$</annotation>\u0000 </semantics></math>, achieving mean Adjusted Rand Index scores of 0.44. The application to real-world data identifies five distinct clusters through PROCSIMA (PRObabilistic Clustering SIMilarity A","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144339419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}