{"title":"Parameter Estimation for Geometric Lévy Processes with Constant Volatility","authors":"Sher Chhetri, Hongwei Long, Cory Ball","doi":"10.1007/s40745-024-00513-8","DOIUrl":"10.1007/s40745-024-00513-8","url":null,"abstract":"<div><p>In finance, various stochastic models have been used to describe price movements of financial instruments. Following the seminal work of Robert Merton, several jump-diffusion models have been proposed for option pricing and risk management. In this study, we augment the process related to the dynamics of log returns in the Black–Scholes model by incorporating alpha-stable Lévy motion with constant volatility. We employ the sample characteristic function approach to investigate parameter estimation for discretely observed stochastic differential equations driven by Lévy noises. Furthermore, we discuss the consistency and asymptotic properties of the proposed estimators and establish a Central Limit Theorem. To further demonstrate the validity of the estimators, we present simulation results for the model. The utility of the proposed model is demonstrated using the Dow Jones Industrial Average data, and all parameters involved in the model are estimated. In addition, we delved into the broader implications of our work, discussing the relevance of our methods to big data-driven research, particularly in the fields of financial data modeling and climate models. We also highlight the importance of optimization and data mining in these contexts, referencing key works in the field. This study thus contributes to the specific area of finance and beyond to the wider scientific community engaged in data science research and analysis.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"63 - 93"},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140485101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Road Traffic Speed Prediction Using Data Augmentation: A Deep Generative Models-based Approach","authors":"Redouane Benabdallah Benarmas, Kadda Beghdad Bey","doi":"10.1007/s40745-023-00508-x","DOIUrl":"10.1007/s40745-023-00508-x","url":null,"abstract":"<div><p>Deep learning prediction models have emerged as the most widely used for the development of intelligent transportation systems (ITS), and their success is strongly reliant on the volume and quality of training data. However, traffic datasets are often small due to the limitations of the resources used to collect and store traffic flow data. Data Augmentation (DA) is a key method to improve the amount of the training dataset before applying a prediction model. In this paper, we demonstrate the effectiveness of data augmentation for predicting traffic speed by using a Deep Generative Model-based approach (DGM). We empirically evaluate the ability of time series-appropriate architectures to improve traffic prediction over a Train on Synthetic Test on Real(TSTR) process. A Time Series-based Generative Adversarial Network model is used to transform an original road traffic dataset into a synthetic dataset to improve traffic prediction. Experiments were carried out using the 6th Beijing and PeMS datasets to show that the transformation improves the prediction model’s accuracy using both parametric and non-parametric methods. Original datasets are compared with the generated ones using statistical analysis methods to measure the fidelity and behavior of the produced data.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2199 - 2216"},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140485444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Modeling Bivariate Lifetime Data in the Presence of Inliers","authors":"Sumangal Bhattacharya, Ishapathik Das, Muralidharan Kunnummal","doi":"10.1007/s40745-023-00511-2","DOIUrl":"10.1007/s40745-023-00511-2","url":null,"abstract":"<div><p>Many items fail instantaneously or early in life-testing experiments, mainly in electronic parts and clinical trials, due to faulty construction, inferior quality, or non-response to treatments. We record the observed lifetime as zero or near zero, defined as instantaneous or early failure observations. In general, some observations may be concentrated around a point, and others follow some continuous distribution. In data, these kinds of observations are regarded as inliers. Some unimodal parametric distributions, such as Weibull, gamma, log-normal, and Pareto, are usually used to fit the data for analyzing and predicting future events concerning lifetime observations. The usual modelling approach based on uni-modal parametric distributions may not provide the expected results for data with inliers because of the multi-modal nature of the data. The correlated bivariate observations with inliers also frequently occur in life-testing experiments. Here, we propose a method of modelling bivariate lifetime data with instantaneous and early failure observations. A new bivariate distribution is constructed by combining the bivariate uniform and bivariate Weibull distributions. The bivariate Weibull distribution has been obtained by using a 2-dimensional copula, assuming that the marginal distribution is a two-parametric Weibull distribution. An attempt has also been made to derive some properties (viz. joint probability density function, survival (reliability) function, and hazard (failure rate) function) of the modified bivariate Weibull distribution so obtained. The model’s unknown parameters have been estimated using a combination of the Maximum Likelihood Estimation technique and machine learning clustering algorithm, viz. Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Numerical examples are provided using simulated data to illustrate and test the performance of the proposed methodologies. Relevant codes and necessary computations have been developed using R and Python languages. The proposed method has been applied to real data with possible inflation. It has been observed that the data contain inliers with a probability of 0.57. The study also does a comparison test with the proposed method and the existing method in the literature, wherein it was found that the proposed method provides a significantly better fit than the base model (in literature) with a <i>P</i> value less than 0.0001.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139592282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amal S. Hassan, E. A. Elsherpieny, Rokaya E. Mohamed
{"title":"Bayesian Estimation of Stress Strength Modeling Using MCMC Method Based on Outliers","authors":"Amal S. Hassan, E. A. Elsherpieny, Rokaya E. Mohamed","doi":"10.1007/s40745-023-00512-1","DOIUrl":"10.1007/s40745-023-00512-1","url":null,"abstract":"<div><p>In reliability literature and engineering applications, stress-strength (SS) models are particularly important. This paper aims to estimate the SS reliability for an inverse Weibull distribution having the same shape parameters but different scale parameters when the strength (<i>X</i>) and stress (<i>Y</i>) random variables are independent. In the presence of outliers and in a homogeneous situation, the maximum likelihood reliability estimator is computed. With independent gamma priors, a Bayesian estimation approach for SS reliability is also proposed. The symmetric and asymmetric loss functions are used to derive the Bayesian estimators of SS reliability. Some sophisticated calculations are carried out using Markov chain Monte Carlo methods. Simulations are used to investigate the precision of Bayesian and non-Bayesian estimates for SS reliability. Further, a comparative study among the Bayesian estimates in the case of uniform and gamma priors is carried out utilizing a simulation methodology. The provided methodology is ultimately applied to the actual data using the discussed model and data from head-neck cancer. According to the results of a study, larger sample sizes resulted in better reliability estimates for both techniques. Generally, as the number of outliers increased, the precision measures from both methods decreased. In all circumstances, the Bayesian estimates under the precautionary loss function outperformed the observed estimates under alternative loss functions. The actual data analysis assured the theoretical and simulated studies.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"23 - 62"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139598586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Structural Time Series Models for Predicting the ({textrm{CO}}_2) Emissions in Afghanistan","authors":"Sayed Rahmi Khuda Haqbin, Athar Ali Khan","doi":"10.1007/s40745-023-00510-3","DOIUrl":"10.1007/s40745-023-00510-3","url":null,"abstract":"<div><p>There are numerous forecasting methods, and these approaches take only data, analyse it, produce a prediction by analysing, ignore the prior information side, and do not take into account the variations that occur over time. The Bayesian structural time series (BSTS) models are the best way to forecast <span>({textrm{CO}}_2)</span> emissions and is updated. Because <span>({textrm{CO}}_2)</span> emissions play an essential part in climate change, forecasting future <span>({textrm{CO}}_2)</span> emissions is critical for all countries where global warming is a hazard to the planet. This study models and forecasts <span>({textrm{CO}}_2)</span> emissions in Afghanistan from 1990 to 2019 using the BSTS models, <b>bsts </b>function from the <span>bsts R package</span> statistical tool. We did a diagnostics test of the normality of the residuals out of the <span>bsts R package</span>. According to the findings for 12 years ahead, <span>({textrm{CO}}_2)</span> emissions will rise by 2031 in all models findings. The study’s findings indicate that <span>({textrm{CO}}_2)</span> emissions in Afghanistan are projected to rise, exposing the country to climate-related concerns.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2235 - 2252"},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139602244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Quantum Inspired Particle Swarm Optimization for Forest Cover Prediction","authors":"Parul Agarwal, Anita Sahoo, Divyanshi Garg","doi":"10.1007/s40745-023-00509-w","DOIUrl":"10.1007/s40745-023-00509-w","url":null,"abstract":"<div><p>Forest cover prediction plays a crucial role in assessing and managing natural resources, biodiversity, and environmental sustainability. Traditional optimization algorithms have been employed for this task, but their effectiveness and efficiency in handling complex forest cover prediction problems are limited. This paper presents a novel approach, Annealing Lévy Quantum Inspired Particle Swarm Optimization (ALQPSO) that combines principles from quantum computing, particle swarm optimization; annealing, and Lévy distribution to enhance the accuracy and efficiency of forest cover prediction models by significant feature selection. The proposed algorithm utilizes quantum-inspired operators, such as quantum rotation gate, superposition, and entanglement, to explore the search space effectively and efficiently. By leveraging the principle of Lévy distribution and annealing, ALQPSO facilitated the exploration of multiple potential solutions simultaneously, leading to improved convergence speed and enhanced solution quality. To evaluate the performance of ALQPSO for forest cover prediction, experiments are conducted on the forest cover dataset. Initially, exploratory data analysis is performed to determine the nature of features. Thereafter, feature selection is performed through the proposed ALQPSO algorithm and compared with Quantum-based PSO (QPSO) and its variants. The experiments are conducted on all potential fields to identify the best among them. The experimental analysis demonstrates that ALQPSO outperforms traditional algorithms in terms of prediction accuracy, convergence speed, and solution quality (in terms of a number of features), highlighting its efficacy in addressing complex forest cover prediction problems.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2217 - 2233"},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139599440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Auxilia Osvin Nancy, P. Prabhavathy, Meenakshi S. Arya
{"title":"Role of Artificial Intelligence and Deep Learning in Skin Disease Prediction: A Systematic Review and Meta-analysis","authors":"V. Auxilia Osvin Nancy, P. Prabhavathy, Meenakshi S. Arya","doi":"10.1007/s40745-023-00503-2","DOIUrl":"10.1007/s40745-023-00503-2","url":null,"abstract":"<div><p>Skin is a most essential and extraordinary part of the human structure. Exposure to chemicals such as nitrates, sunlight, arsenic, and UV rays due to pollution and depletion of the ozone layer is causing various skin diseases to spread rapidly. Digital healthcare offers many opportunities to reduce time, and human error, and improve clinical outcomes. However, the automatic recognition of skin disease is a major challenge due to high visual similarity between different skin diseases, low contrast, and large inter variation. Early detection of skin cancer can prevent death. Thus, Artificial intelligence (AI) and Machine Learning (ML) helps the physicians to improve clinical judgment or change manual perception. For skin cancer diagnostics, the ML/AI algorithm can outperform or match professional dermatologists in multiple studies. Different pre-trained architectures such as ResNet152, AlexNet, VGGNet, etc. are used for fusing different skin disease features such as texture, color, etc. and they are also utilized for conducting segmentation tasks. The variations in reflection, lesion size, shape, illumination, etc. often make automatic skin disease classification a complex task. ISIC 2019 and HAM 10000 are the widely used public datasets for skin disease prediction. More technical paper on skin cancer diagnosis is compared in this study. This report examines the majority of technical papers published between 2018 and October 2022 in order to appreciate current trends in the disciplines of skin cancer prediction. A study that combined clinical patient data with deep learning models (DL) increased the accuracy of predicting skin cancer. This article presents a visually attractive and well-organized summary of the current study findings.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2109 - 2139"},"PeriodicalIF":0.0,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00503-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139524140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahabuba Akhter, Syed Md. Minhaz Hossain, Rizma Sijana Nigar, Srabanti Paul, Khaleque Md. Aashiq Kamal, Anik Sen, Iqbal H. Sarker
{"title":"COVID-19 Fake News Detection using Deep Learning Model","authors":"Mahabuba Akhter, Syed Md. Minhaz Hossain, Rizma Sijana Nigar, Srabanti Paul, Khaleque Md. Aashiq Kamal, Anik Sen, Iqbal H. Sarker","doi":"10.1007/s40745-023-00507-y","DOIUrl":"10.1007/s40745-023-00507-y","url":null,"abstract":"<div><p>People may now receive and share information more quickly and easily than ever due to the widespread use of mobile networked devices. However, this can occasionally lead to the spread of false information. Such information is being disseminated widely, which may cause people to make incorrect decisions about potentially crucial topics. This occurred in 2020, the year of the fatal and extremely contagious Coronavirus Disease (COVID-19) outbreak. The spread of false information about COVID-19 on social media has already been labeled as an “infodemic” by the World Health Organization (WHO), causing serious difficulties for governments attempting to control the pandemic. Consequently, it is crucial to have a model for detecting fake news related to COVID-19. In this paper, we present an effective Convolutional Neural Network (CNN)-based deep learning model using word embedding. For selecting the best CNN architecture, we take into account the optimal values of model hyper-parameters using grid search. Further, for measuring the effectiveness of our proposed CNN model, various state-of-the-art machine learning algorithms are conducted for COVID-19 fake news detection. Among them, CNN outperforms with 96.19% mean accuracy, 95% mean F1-score, and 0.985 area under ROC curve (AUC).</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2167 - 2198"},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139525256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chinta Someswara Rao, Chitri Raminaidu, K. Butchi Raju, B. Sujatha
{"title":"Effective Fake News Classification Based on Lightweight RNN with NLP","authors":"Chinta Someswara Rao, Chitri Raminaidu, K. Butchi Raju, B. Sujatha","doi":"10.1007/s40745-023-00506-z","DOIUrl":"10.1007/s40745-023-00506-z","url":null,"abstract":"<div><p>Data is the most essential thing in the current world. By the year 2024, we will be able to generate 1.9 gigabytes of data per second. The creation of massive amounts of data has led to the birth of a wide range of technologies, which in turn is changing the world. Social media has brought the world to the tip of our fingers. It enables a person to access news from anywhere and at any time, but this has its cons too. It is leading to the spread of fake news and false information, and it is having a negative impact on society. Fake news is manipulated information that is disseminated via social media with the intent of causing harm to a person, agency, or organization. Keeping this view in mind, one must necessarily determine whether or not the news being spread is true before drawing conclusions. This will help avoid confusion among social media users, which is critical for ensuring positive social development. Detecting fake news has become one of the most difficult tasks a person can undertake. To get started with fake news detection, this paper will present a solution for detecting fake news based on recurrent neural networks.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2141 - 2165"},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139525455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepKPred: Prediction and Functional Analysis of Lysine 2-Hydroxyisobutyrylation Sites Based on Deep Learning","authors":"Shiqi Fan, Yan Xu","doi":"10.1007/s40745-023-00504-1","DOIUrl":"10.1007/s40745-023-00504-1","url":null,"abstract":"<div><p>Protein 2-hydroxyisobutyrylation (Khib), a newly identified post-translational modification, plays a role in various cellular processes. To gain a comprehensive understanding of its regulatory mechanisms, it is crucial to identify the sites of 2-hydroxyisobutyrylation. Therefore, we developed a novel ensemble method, DeepKPred, for predicting species-specific 2-hydroxyisobutyrylation sites. We employed one-hot and AAindex encoding schemes to construct features from protein sequences and integrated two densely convolutional neural networks and two long short-term memory networks to build the model. In the 5-fold cross-validation dataset, DeepKPred achieved AUC values of 0.859, 0.804, 0.821, and 0.819 for Human, <i>Candida albicans</i>, Rice, Wheat, and <i>Physcomitrella patens</i>. Additionally, function analysis further indicated that different organisms tend to engage in distinct biological processes and pathways. Detailed analysis can help us learn more about the mechanism of 2-hydroxyisobutyrylation and provide insights for associated experimental verification.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 2","pages":"693 - 707"},"PeriodicalIF":0.0,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138964498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}