Changlin Wang, Zhixia Yang, Junyou Ye, Xue Yang, Manchen Ding
{"title":"Supervised Feature Selection via Quadratic Surface Regression with (l_{2,1})-Norm Regularization","authors":"Changlin Wang, Zhixia Yang, Junyou Ye, Xue Yang, Manchen Ding","doi":"10.1007/s40745-024-00518-3","DOIUrl":"10.1007/s40745-024-00518-3","url":null,"abstract":"<div><p>This paper proposes a supervised kernel-free quadratic surface regression method for feature selection (QSR-FS). The method is to find a quadratic function in each class and incorporates it into the least squares loss function. The <span>(l_{2,1})</span>-norm regularization term is introduced to obtain a sparse solution, and a feature weight vector is constructed by the coefficients of the quadratic functions in all classes to explain the importance of each feature. An alternating iteration algorithm is designed to solve the optimization problem of this model. The computational complexity of the algorithm is provided, and the iterative formula is reformulated to further accelerate computation. In the experimental part, feature selection and its downstream classification tasks are performed on eight datasets from different domains, and the experimental results are analyzed by relevant evaluation index. Furthermore, feature selection interpretability and parameter sensitivity analysis are provided. The experimental results demonstrate the feasibility and effectiveness of our method.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139836443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Road Traffic Speed Prediction Using Data Augmentation: A Deep Generative Models-based Approach","authors":"Redouane Benabdallah Benarmas, Kadda Beghdad Bey","doi":"10.1007/s40745-023-00508-x","DOIUrl":"10.1007/s40745-023-00508-x","url":null,"abstract":"<div><p>Deep learning prediction models have emerged as the most widely used for the development of intelligent transportation systems (ITS), and their success is strongly reliant on the volume and quality of training data. However, traffic datasets are often small due to the limitations of the resources used to collect and store traffic flow data. Data Augmentation (DA) is a key method to improve the amount of the training dataset before applying a prediction model. In this paper, we demonstrate the effectiveness of data augmentation for predicting traffic speed by using a Deep Generative Model-based approach (DGM). We empirically evaluate the ability of time series-appropriate architectures to improve traffic prediction over a Train on Synthetic Test on Real(TSTR) process. A Time Series-based Generative Adversarial Network model is used to transform an original road traffic dataset into a synthetic dataset to improve traffic prediction. Experiments were carried out using the 6th Beijing and PeMS datasets to show that the transformation improves the prediction model’s accuracy using both parametric and non-parametric methods. Original datasets are compared with the generated ones using statistical analysis methods to measure the fidelity and behavior of the produced data.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140485444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Structural Time Series Models for Predicting the ({textrm{CO}}_2) Emissions in Afghanistan","authors":"Sayed Rahmi Khuda Haqbin, Athar Ali Khan","doi":"10.1007/s40745-023-00510-3","DOIUrl":"10.1007/s40745-023-00510-3","url":null,"abstract":"<div><p>There are numerous forecasting methods, and these approaches take only data, analyse it, produce a prediction by analysing, ignore the prior information side, and do not take into account the variations that occur over time. The Bayesian structural time series (BSTS) models are the best way to forecast <span>({textrm{CO}}_2)</span> emissions and is updated. Because <span>({textrm{CO}}_2)</span> emissions play an essential part in climate change, forecasting future <span>({textrm{CO}}_2)</span> emissions is critical for all countries where global warming is a hazard to the planet. This study models and forecasts <span>({textrm{CO}}_2)</span> emissions in Afghanistan from 1990 to 2019 using the BSTS models, <b>bsts </b>function from the <span>bsts R package</span> statistical tool. We did a diagnostics test of the normality of the residuals out of the <span>bsts R package</span>. According to the findings for 12 years ahead, <span>({textrm{CO}}_2)</span> emissions will rise by 2031 in all models findings. The study’s findings indicate that <span>({textrm{CO}}_2)</span> emissions in Afghanistan are projected to rise, exposing the country to climate-related concerns.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139602244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Quantum Inspired Particle Swarm Optimization for Forest Cover Prediction","authors":"Parul Agarwal, Anita Sahoo, Divyanshi Garg","doi":"10.1007/s40745-023-00509-w","DOIUrl":"10.1007/s40745-023-00509-w","url":null,"abstract":"<div><p>Forest cover prediction plays a crucial role in assessing and managing natural resources, biodiversity, and environmental sustainability. Traditional optimization algorithms have been employed for this task, but their effectiveness and efficiency in handling complex forest cover prediction problems are limited. This paper presents a novel approach, Annealing Lévy Quantum Inspired Particle Swarm Optimization (ALQPSO) that combines principles from quantum computing, particle swarm optimization; annealing, and Lévy distribution to enhance the accuracy and efficiency of forest cover prediction models by significant feature selection. The proposed algorithm utilizes quantum-inspired operators, such as quantum rotation gate, superposition, and entanglement, to explore the search space effectively and efficiently. By leveraging the principle of Lévy distribution and annealing, ALQPSO facilitated the exploration of multiple potential solutions simultaneously, leading to improved convergence speed and enhanced solution quality. To evaluate the performance of ALQPSO for forest cover prediction, experiments are conducted on the forest cover dataset. Initially, exploratory data analysis is performed to determine the nature of features. Thereafter, feature selection is performed through the proposed ALQPSO algorithm and compared with Quantum-based PSO (QPSO) and its variants. The experiments are conducted on all potential fields to identify the best among them. The experimental analysis demonstrates that ALQPSO outperforms traditional algorithms in terms of prediction accuracy, convergence speed, and solution quality (in terms of a number of features), highlighting its efficacy in addressing complex forest cover prediction problems.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139599440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Auxilia Osvin Nancy, P. Prabhavathy, Meenakshi S. Arya
{"title":"Role of Artificial Intelligence and Deep Learning in Skin Disease Prediction: A Systematic Review and Meta-analysis","authors":"V. Auxilia Osvin Nancy, P. Prabhavathy, Meenakshi S. Arya","doi":"10.1007/s40745-023-00503-2","DOIUrl":"10.1007/s40745-023-00503-2","url":null,"abstract":"<div><p>Skin is a most essential and extraordinary part of the human structure. Exposure to chemicals such as nitrates, sunlight, arsenic, and UV rays due to pollution and depletion of the ozone layer is causing various skin diseases to spread rapidly. Digital healthcare offers many opportunities to reduce time, and human error, and improve clinical outcomes. However, the automatic recognition of skin disease is a major challenge due to high visual similarity between different skin diseases, low contrast, and large inter variation. Early detection of skin cancer can prevent death. Thus, Artificial intelligence (AI) and Machine Learning (ML) helps the physicians to improve clinical judgment or change manual perception. For skin cancer diagnostics, the ML/AI algorithm can outperform or match professional dermatologists in multiple studies. Different pre-trained architectures such as ResNet152, AlexNet, VGGNet, etc. are used for fusing different skin disease features such as texture, color, etc. and they are also utilized for conducting segmentation tasks. The variations in reflection, lesion size, shape, illumination, etc. often make automatic skin disease classification a complex task. ISIC 2019 and HAM 10000 are the widely used public datasets for skin disease prediction. More technical paper on skin cancer diagnosis is compared in this study. This report examines the majority of technical papers published between 2018 and October 2022 in order to appreciate current trends in the disciplines of skin cancer prediction. A study that combined clinical patient data with deep learning models (DL) increased the accuracy of predicting skin cancer. This article presents a visually attractive and well-organized summary of the current study findings.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00503-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139524140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahabuba Akhter, Syed Md. Minhaz Hossain, Rizma Sijana Nigar, Srabanti Paul, Khaleque Md. Aashiq Kamal, Anik Sen, Iqbal H. Sarker
{"title":"COVID-19 Fake News Detection using Deep Learning Model","authors":"Mahabuba Akhter, Syed Md. Minhaz Hossain, Rizma Sijana Nigar, Srabanti Paul, Khaleque Md. Aashiq Kamal, Anik Sen, Iqbal H. Sarker","doi":"10.1007/s40745-023-00507-y","DOIUrl":"10.1007/s40745-023-00507-y","url":null,"abstract":"<div><p>People may now receive and share information more quickly and easily than ever due to the widespread use of mobile networked devices. However, this can occasionally lead to the spread of false information. Such information is being disseminated widely, which may cause people to make incorrect decisions about potentially crucial topics. This occurred in 2020, the year of the fatal and extremely contagious Coronavirus Disease (COVID-19) outbreak. The spread of false information about COVID-19 on social media has already been labeled as an “infodemic” by the World Health Organization (WHO), causing serious difficulties for governments attempting to control the pandemic. Consequently, it is crucial to have a model for detecting fake news related to COVID-19. In this paper, we present an effective Convolutional Neural Network (CNN)-based deep learning model using word embedding. For selecting the best CNN architecture, we take into account the optimal values of model hyper-parameters using grid search. Further, for measuring the effectiveness of our proposed CNN model, various state-of-the-art machine learning algorithms are conducted for COVID-19 fake news detection. Among them, CNN outperforms with 96.19% mean accuracy, 95% mean F1-score, and 0.985 area under ROC curve (AUC).</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139525256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chinta Someswara Rao, Chitri Raminaidu, K. Butchi Raju, B. Sujatha
{"title":"Effective Fake News Classification Based on Lightweight RNN with NLP","authors":"Chinta Someswara Rao, Chitri Raminaidu, K. Butchi Raju, B. Sujatha","doi":"10.1007/s40745-023-00506-z","DOIUrl":"10.1007/s40745-023-00506-z","url":null,"abstract":"<div><p>Data is the most essential thing in the current world. By the year 2024, we will be able to generate 1.9 gigabytes of data per second. The creation of massive amounts of data has led to the birth of a wide range of technologies, which in turn is changing the world. Social media has brought the world to the tip of our fingers. It enables a person to access news from anywhere and at any time, but this has its cons too. It is leading to the spread of fake news and false information, and it is having a negative impact on society. Fake news is manipulated information that is disseminated via social media with the intent of causing harm to a person, agency, or organization. Keeping this view in mind, one must necessarily determine whether or not the news being spread is true before drawing conclusions. This will help avoid confusion among social media users, which is critical for ensuring positive social development. Detecting fake news has become one of the most difficult tasks a person can undertake. To get started with fake news detection, this paper will present a solution for detecting fake news based on recurrent neural networks.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139525455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepKPred: Prediction and Functional Analysis of Lysine 2-Hydroxyisobutyrylation Sites Based on Deep Learning","authors":"Shiqi Fan, Yan Xu","doi":"10.1007/s40745-023-00504-1","DOIUrl":"10.1007/s40745-023-00504-1","url":null,"abstract":"<div><p>Protein 2-hydroxyisobutyrylation (Khib), a newly identified post-translational modification, plays a role in various cellular processes. To gain a comprehensive understanding of its regulatory mechanisms, it is crucial to identify the sites of 2-hydroxyisobutyrylation. Therefore, we developed a novel ensemble method, DeepKPred, for predicting species-specific 2-hydroxyisobutyrylation sites. We employed one-hot and AAindex encoding schemes to construct features from protein sequences and integrated two densely convolutional neural networks and two long short-term memory networks to build the model. In the 5-fold cross-validation dataset, DeepKPred achieved AUC values of 0.859, 0.804, 0.821, and 0.819 for Human, <i>Candida albicans</i>, Rice, Wheat, and <i>Physcomitrella patens</i>. Additionally, function analysis further indicated that different organisms tend to engage in distinct biological processes and pathways. Detailed analysis can help us learn more about the mechanism of 2-hydroxyisobutyrylation and provide insights for associated experimental verification.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138964498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blockchain Adoption in Operations Management: A Systematic Literature Review of 14 Years of Research","authors":"Mansoureh Beheshti Nejad, Seyed Mahmoud Zanjirchi, Seyed Mojtaba Hosseini Bamakan, Negar Jalilian","doi":"10.1007/s40745-023-00505-0","DOIUrl":"10.1007/s40745-023-00505-0","url":null,"abstract":"<div><p>Blockchain technology has ushered in significant technological disruptions within the operational management sphere, fostering value creation within operational management networks. In recent years, researchers have increasingly explored the potential applications of blockchain across diverse facets of operational management. Recognizing the pivotal role of comprehending prior research endeavors within any scientific domain for the development of a robust theoretical framework and a nuanced understanding of research progression in both the scientific realm and its practical applications, this study aims to identify areas where blockchain can be effectively employed. This objective is accomplished through an exhaustive systematic review of existing research on blockchain applications in the field of operations management. In pursuit of this goal, a comprehensive dataset comprising 9188 papers published up to the year 2020 is amassed and subjected to analysis employing life cycle analysis, bibliometrics, and textual analysis. The outcomes of this research elucidate the emergence of five distinctive clusters within the landscape of blockchain applications in operational management: Decentralized Finance, Traceability, Trust, Sustainability, and Information Sharing. These findings underscore the dynamic and evolving nature of blockchain’s impact in this domain.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138995498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On a New Mixed Pareto–Weibull Distribution: Its Parametric Regression Model with an Insurance Applications","authors":"Deepesh Bhati, Buddepu Pavan, Girish Aradhye","doi":"10.1007/s40745-023-00502-3","DOIUrl":"10.1007/s40745-023-00502-3","url":null,"abstract":"<div><p>This article introduces a new probability distribution suitable for modeling heavy-tailed and right-skewed data sets. The proposed distribution is derived from the continuous mixture of the scale parameter of the Pareto family with the Weibull distribution. Analytical expressions for various distributional properties and actuarial risk measures of the proposed model are derived. The applicability of the proposed model is assessed using two real-world insurance data sets, and its performance is compared with the existing class of heavy-tailed models. The proposed model is assumed for the response variable in parametric regression modeling to account for the heterogeneity of individual policyholders. The Expectation-Maximization (EM) Algorithm is included to expedite the process of finding maximum likelihood (ML) estimates for the parameters of the proposed models. Real-world data application demonstrates that the proposed distribution performs well compared to its competitor models. The regression model utilizing the mixed Pareto–Weibull response distribution, characterized by regression structures for both the mean and dispersion parameters, demonstrates superior performance when compared to the Pareto–Weibull regression model, where the dispersion parameter depends on covariates.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138967097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}