{"title":"A Joint Cognitive Latent Variable Model for Binary Decision-making Tasks and Reaction Time Outcomes","authors":"Mahdi Mollakazemiha, Ehsan Bahrami Samani","doi":"10.1007/s40745-024-00519-2","DOIUrl":"10.1007/s40745-024-00519-2","url":null,"abstract":"<div><p>Traditionally, in cognitive modeling for binary decision-making tasks, stochastic differential equations, particularly a family of diffusion decision models, are applied. These models suffer from difficulties in parameter estimation and forecasting due to the non-existence of analytical solutions for the differential equations. In this paper, we introduce a joint latent variable model for binary decision-making tasks and reaction time outcomes. Additionally, accelerated Failure Time models can be used for the analysis of reaction time to estimate the effects of covariates on acceleration/deceleration of the survival time. A full likelihood-based approach is used to obtain maximum likelihood estimates of the parameters of the model.To illustrate the utility of the proposed models, a simulation study and real data are analyzed.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"499 - 516"},"PeriodicalIF":0.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140424420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changlin Wang, Zhixia Yang, Junyou Ye, Xue Yang, Manchen Ding
{"title":"Supervised Feature Selection via Quadratic Surface Regression with (l_{2,1})-Norm Regularization","authors":"Changlin Wang, Zhixia Yang, Junyou Ye, Xue Yang, Manchen Ding","doi":"10.1007/s40745-024-00518-3","DOIUrl":"10.1007/s40745-024-00518-3","url":null,"abstract":"<div><p>This paper proposes a supervised kernel-free quadratic surface regression method for feature selection (QSR-FS). The method is to find a quadratic function in each class and incorporates it into the least squares loss function. The <span>(l_{2,1})</span>-norm regularization term is introduced to obtain a sparse solution, and a feature weight vector is constructed by the coefficients of the quadratic functions in all classes to explain the importance of each feature. An alternating iteration algorithm is designed to solve the optimization problem of this model. The computational complexity of the algorithm is provided, and the iterative formula is reformulated to further accelerate computation. In the experimental part, feature selection and its downstream classification tasks are performed on eight datasets from different domains, and the experimental results are analyzed by relevant evaluation index. Furthermore, feature selection interpretability and parameter sensitivity analysis are provided. The experimental results demonstrate the feasibility and effectiveness of our method.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 2","pages":"647 - 675"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139836443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Hyperbolic Tangent Family of Distributions: Properties and Applications","authors":"Shahid Mohammad, Isabel Mendoza","doi":"10.1007/s40745-024-00516-5","DOIUrl":"10.1007/s40745-024-00516-5","url":null,"abstract":"<div><p>This paper introduces a new family of distributions called the hyperbolic tangent (HT) family. The cumulative distribution function of this model is defined using the standard hyperbolic tangent function. The fundamental properties of the distribution are thoroughly examined and presented. Additionally, an inverse exponential distribution is employed as a sub-model within the HT family, and its properties are also derived. The parameters of the HT family are estimated using the maximum likelihood method, and the performance of these estimators is assessed using a simulation approach. To demonstrate the significance and flexibility of the newly introduced family of distributions, two real data sets are utilized. These data sets serve as practical examples that showcase the applicability and usefulness of the HT family in real-world scenarios. By introducing the HT family, exploring its properties, employing the maximum likelihood estimation, and conducting simulations and real data analyses, this paper contributes to the advancement of statistical modeling and distribution theory.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"457 - 480"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139835200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing the Risk of Bitcoin Futures Market: New Evidence","authors":"Anupam Dutta","doi":"10.1007/s40745-024-00517-4","DOIUrl":"10.1007/s40745-024-00517-4","url":null,"abstract":"<div><p>The main objective of this paper is to forecast the realized volatility (RV) of Bitcoin futures (BTCF) market. To serve our purpose, we propose an augmented heterogenous autoregressive (HAR) model to consider the information on time-varying jumps observed in BTCF returns. Specifically, we estimate the jump-induced volatility using the GARCH-jump process and then consider this information in the HAR model. Both the in-sample and out-of-sample analyses show that jumps offer added information which is not provided by the existing HAR models. In addition, a novel finding is that the jump-induced volatility offers incremental information relative to the Bitcoin implied volatility index. In sum, our results indicate that the HAR-RV process comprising the leverage effects and jump volatility would predict the RV more precisely compared to the standard HAR-type models. These findings have important implications to cryptocurrency investors.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"481 - 497"},"PeriodicalIF":0.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-024-00517-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139778431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Innovative Technique for Generating Probability Distributions: A Study on Lomax Distribution with Applications in Medical and Engineering Fields","authors":"Shamshad Ur Rasool, M. A. Lone, S. P. Ahmad","doi":"10.1007/s40745-024-00515-6","DOIUrl":"10.1007/s40745-024-00515-6","url":null,"abstract":"<div><p>In this paper, we propose and investigate a novel approach for generating the probability distributions. The novel method is known as the SMP transformation technique. By using the SMP Transformation technique, we have developed a new model of the Lomax distribution known as SMP Lomax (SMPL) distribution. The SMPL distribution, which is comparable to the Sine Power Lomax distribution, Power Length BiasedWeighted Lomax Distribution, Exponentiated Lomax and Lomax distribution have the desirable attribute of allowing the superiority and the flexibility over other well known existing models. Furthermore, the research article examines various aspects related to the SMPL , including the statistical properties along with the maximum likelihood estimation procedure to estimate the parameters. An extensive simulation study is carried out to illustrate the behaviour of MLEs on the basis of Mean Square Errors. To evaluate the effectiveness and flexibility of the proposed distribution, two real-life data sets are employed and it is observed that SMPL outperforms base model of Lomax distribution as well as other mentioned competing models based on Akaike Information Criterion, Akaike Information criterion Corrected, Hannan–Quinn information criterion and other goodness of fit measures.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"439 - 455"},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139782016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parameter Estimation for Geometric Lévy Processes with Constant Volatility","authors":"Sher Chhetri, Hongwei Long, Cory Ball","doi":"10.1007/s40745-024-00513-8","DOIUrl":"10.1007/s40745-024-00513-8","url":null,"abstract":"<div><p>In finance, various stochastic models have been used to describe price movements of financial instruments. Following the seminal work of Robert Merton, several jump-diffusion models have been proposed for option pricing and risk management. In this study, we augment the process related to the dynamics of log returns in the Black–Scholes model by incorporating alpha-stable Lévy motion with constant volatility. We employ the sample characteristic function approach to investigate parameter estimation for discretely observed stochastic differential equations driven by Lévy noises. Furthermore, we discuss the consistency and asymptotic properties of the proposed estimators and establish a Central Limit Theorem. To further demonstrate the validity of the estimators, we present simulation results for the model. The utility of the proposed model is demonstrated using the Dow Jones Industrial Average data, and all parameters involved in the model are estimated. In addition, we delved into the broader implications of our work, discussing the relevance of our methods to big data-driven research, particularly in the fields of financial data modeling and climate models. We also highlight the importance of optimization and data mining in these contexts, referencing key works in the field. This study thus contributes to the specific area of finance and beyond to the wider scientific community engaged in data science research and analysis.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"63 - 93"},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140485101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Road Traffic Speed Prediction Using Data Augmentation: A Deep Generative Models-based Approach","authors":"Redouane Benabdallah Benarmas, Kadda Beghdad Bey","doi":"10.1007/s40745-023-00508-x","DOIUrl":"10.1007/s40745-023-00508-x","url":null,"abstract":"<div><p>Deep learning prediction models have emerged as the most widely used for the development of intelligent transportation systems (ITS), and their success is strongly reliant on the volume and quality of training data. However, traffic datasets are often small due to the limitations of the resources used to collect and store traffic flow data. Data Augmentation (DA) is a key method to improve the amount of the training dataset before applying a prediction model. In this paper, we demonstrate the effectiveness of data augmentation for predicting traffic speed by using a Deep Generative Model-based approach (DGM). We empirically evaluate the ability of time series-appropriate architectures to improve traffic prediction over a Train on Synthetic Test on Real(TSTR) process. A Time Series-based Generative Adversarial Network model is used to transform an original road traffic dataset into a synthetic dataset to improve traffic prediction. Experiments were carried out using the 6th Beijing and PeMS datasets to show that the transformation improves the prediction model’s accuracy using both parametric and non-parametric methods. Original datasets are compared with the generated ones using statistical analysis methods to measure the fidelity and behavior of the produced data.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2199 - 2216"},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140485444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Modeling Bivariate Lifetime Data in the Presence of Inliers","authors":"Sumangal Bhattacharya, Ishapathik Das, Muralidharan Kunnummal","doi":"10.1007/s40745-023-00511-2","DOIUrl":"10.1007/s40745-023-00511-2","url":null,"abstract":"<div><p>Many items fail instantaneously or early in life-testing experiments, mainly in electronic parts and clinical trials, due to faulty construction, inferior quality, or non-response to treatments. We record the observed lifetime as zero or near zero, defined as instantaneous or early failure observations. In general, some observations may be concentrated around a point, and others follow some continuous distribution. In data, these kinds of observations are regarded as inliers. Some unimodal parametric distributions, such as Weibull, gamma, log-normal, and Pareto, are usually used to fit the data for analyzing and predicting future events concerning lifetime observations. The usual modelling approach based on uni-modal parametric distributions may not provide the expected results for data with inliers because of the multi-modal nature of the data. The correlated bivariate observations with inliers also frequently occur in life-testing experiments. Here, we propose a method of modelling bivariate lifetime data with instantaneous and early failure observations. A new bivariate distribution is constructed by combining the bivariate uniform and bivariate Weibull distributions. The bivariate Weibull distribution has been obtained by using a 2-dimensional copula, assuming that the marginal distribution is a two-parametric Weibull distribution. An attempt has also been made to derive some properties (viz. joint probability density function, survival (reliability) function, and hazard (failure rate) function) of the modified bivariate Weibull distribution so obtained. The model’s unknown parameters have been estimated using a combination of the Maximum Likelihood Estimation technique and machine learning clustering algorithm, viz. Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Numerical examples are provided using simulated data to illustrate and test the performance of the proposed methodologies. Relevant codes and necessary computations have been developed using R and Python languages. The proposed method has been applied to real data with possible inflation. It has been observed that the data contain inliers with a probability of 0.57. The study also does a comparison test with the proposed method and the existing method in the literature, wherein it was found that the proposed method provides a significantly better fit than the base model (in literature) with a <i>P</i> value less than 0.0001.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139592282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amal S. Hassan, E. A. Elsherpieny, Rokaya E. Mohamed
{"title":"Bayesian Estimation of Stress Strength Modeling Using MCMC Method Based on Outliers","authors":"Amal S. Hassan, E. A. Elsherpieny, Rokaya E. Mohamed","doi":"10.1007/s40745-023-00512-1","DOIUrl":"10.1007/s40745-023-00512-1","url":null,"abstract":"<div><p>In reliability literature and engineering applications, stress-strength (SS) models are particularly important. This paper aims to estimate the SS reliability for an inverse Weibull distribution having the same shape parameters but different scale parameters when the strength (<i>X</i>) and stress (<i>Y</i>) random variables are independent. In the presence of outliers and in a homogeneous situation, the maximum likelihood reliability estimator is computed. With independent gamma priors, a Bayesian estimation approach for SS reliability is also proposed. The symmetric and asymmetric loss functions are used to derive the Bayesian estimators of SS reliability. Some sophisticated calculations are carried out using Markov chain Monte Carlo methods. Simulations are used to investigate the precision of Bayesian and non-Bayesian estimates for SS reliability. Further, a comparative study among the Bayesian estimates in the case of uniform and gamma priors is carried out utilizing a simulation methodology. The provided methodology is ultimately applied to the actual data using the discussed model and data from head-neck cancer. According to the results of a study, larger sample sizes resulted in better reliability estimates for both techniques. Generally, as the number of outliers increased, the precision measures from both methods decreased. In all circumstances, the Bayesian estimates under the precautionary loss function outperformed the observed estimates under alternative loss functions. The actual data analysis assured the theoretical and simulated studies.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"23 - 62"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139598586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Structural Time Series Models for Predicting the ({textrm{CO}}_2) Emissions in Afghanistan","authors":"Sayed Rahmi Khuda Haqbin, Athar Ali Khan","doi":"10.1007/s40745-023-00510-3","DOIUrl":"10.1007/s40745-023-00510-3","url":null,"abstract":"<div><p>There are numerous forecasting methods, and these approaches take only data, analyse it, produce a prediction by analysing, ignore the prior information side, and do not take into account the variations that occur over time. The Bayesian structural time series (BSTS) models are the best way to forecast <span>({textrm{CO}}_2)</span> emissions and is updated. Because <span>({textrm{CO}}_2)</span> emissions play an essential part in climate change, forecasting future <span>({textrm{CO}}_2)</span> emissions is critical for all countries where global warming is a hazard to the planet. This study models and forecasts <span>({textrm{CO}}_2)</span> emissions in Afghanistan from 1990 to 2019 using the BSTS models, <b>bsts </b>function from the <span>bsts R package</span> statistical tool. We did a diagnostics test of the normality of the residuals out of the <span>bsts R package</span>. According to the findings for 12 years ahead, <span>({textrm{CO}}_2)</span> emissions will rise by 2031 in all models findings. The study’s findings indicate that <span>({textrm{CO}}_2)</span> emissions in Afghanistan are projected to rise, exposing the country to climate-related concerns.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2235 - 2252"},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139602244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}