{"title":"A New Kernel Density Estimation-Based Entropic Isometric Feature Mapping for Unsupervised Metric Learning","authors":"Alaor Cervati Neto, Alexandre Luís Magalhães Levada, Michel Ferreira Cardia Haddad","doi":"10.1007/s40745-024-00548-x","DOIUrl":"10.1007/s40745-024-00548-x","url":null,"abstract":"<div><p>Metric learning consists of designing adaptive distance functions that are well-suited to a specific dataset. Such tailored distance functions aim to deliver superior results compared to standard distance measures while performing machine learning tasks. In particular, the widely adopted Euclidean distance may be severely influenced due to noisy data and outliers, leading to suboptimal performance. In the present work, it is introduced a nonparametric isometric feature mapping (ISOMAP) method. The new algorithm is based on the kernel density estimation, exploring the relative entropy between probability density functions calculated in patches of the neighbourhood graph. The entropic neighbourhood network is built, where edges are weighted by a function of the relative entropies of the neighbouring patches instead of the Euclidean distance. A variety of datasets is considered in the analysis. The results indicate a superior performance compared to cutting edge manifold learning algorithms, such as the ISOMAP, unified manifold approximation and projection, and <i>t</i>-distributed stochastic neighbour embedding (<i>t</i>-SNE).</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 3","pages":"929 - 945"},"PeriodicalIF":0.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141672260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power Evaluation of Some Tests for Inverse Rayleigh Distribution","authors":"Vahideh Ahrari, Parisa Hasanalipour","doi":"10.1007/s40745-024-00536-1","DOIUrl":"10.1007/s40745-024-00536-1","url":null,"abstract":"<div><p>The Inverse Rayleigh distribution has many applications in the area of reliability studies. It is regarded as a model for a lifetime random variable. It is essential to develop an efficient goodness-of-fit test for this distribution. In this paper, the problem of the goodness-of-fit test for the Inverse Rayleigh distribution based on different statistics is studied. Each method is described, and the corresponding test statistics are constructed. The critical values and power comparisons are also obtained using Monte Carlo computations. The results are discussed and interpreted separately.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"739 - 755"},"PeriodicalIF":0.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141675008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visual Question Answer System for Skeletal Image Using Radiology Images in the Healthcare Domain Based on Visual and Textual Feature Extraction Techniques","authors":"Jinesh Melvin Y.I., Mukesh Shrimali, Sushopti Gawade","doi":"10.1007/s40745-024-00553-0","DOIUrl":"10.1007/s40745-024-00553-0","url":null,"abstract":"<div><p>The Medical Imaging Query Response System is among the most challenging concepts in the medical field. It requires a significant amount of effort to organize and comprehend the various representations of the human body. Additionally, the system needs to be verified by users in the healthcare industry. With the aid of various images, including MRI scans, CT scans, ultrasounds, X-rays, PET-CT scans, and more, it may be possible to identify human health issues. It is anticipated to encourage patient participation and support clinical decision-making. As a result of the use of a number of characteristics that are inadequately matched to medical images and questions, technically, the VQA system in the healthcare domain is more complicated than in the common domain. The challenges were caused by the datasets, approaches, and models used for both visual and textual aspects. This can sometimes make it harder for clinical assistance to provide relevant answers. The proposed system will analyze current models and diagnose the problem in order to improve the medical visual question-answering system for recent datasets. The models that were compared to the model were convolutional neural networks (CNN), deep belief networks (DBN), recurrent neural networks (RNN), long short-term memory networks (LSTM), and bidirectional long short-term memory (BiLSTM). To assess the effectiveness of each model, the following measures should be used: Classification Accuracy, F-Classification, F-Measure, C-False Negative Rate (FNR), C-Positive Predictive Value, C-Precision, C-Recall, C-Sensitivity, and C-True Positive Rate (CTPR) With the objective of improving the performance of any dataset with accuracy and measures for both visual and textual features to get the right answers for given questions, the proposed system helps to recognize how ideal the existing models are and generates new models using the B12 FASTER Recurrent Neural Network (RNN) and Kai-Bi-LSTM. With questions and appropriate answers, the suggested model will assist in extracting the features of imported images and text.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 3","pages":"969 - 990"},"PeriodicalIF":0.0,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145170882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining LASSO-type Methods with a Smooth Transition Random Forest","authors":"Alexandre L. D. Gandini, Flavio A. Ziegelmann","doi":"10.1007/s40745-024-00541-4","DOIUrl":"10.1007/s40745-024-00541-4","url":null,"abstract":"<div><p>In this work, we propose a novel hybrid method for the estimation of regression models, which is based on a combination of LASSO-type methods and smooth transition (STR) random forests. Tree-based regression models are known for their flexibility and skills to learn even very nonlinear patterns. The STR-Tree model introduces smoothness into traditional splitting nodes, leading to a non-binary labeling, which can be interpreted as a group membership degree for each observation. Our approach involves two steps. First, we fit a penalized linear regression using LASSO-type methods. Then, we estimate an STR random forest on the residuals from the first step, using the original covariates. This dual-step process allows us to capture any significant linear relationships in the data generating process through a parametric approach, and then addresses nonlinearities with a flexible model. We conducted numerical studies with both simulated and real data to demonstrate our method’s effectiveness. Our findings indicate that our proposal offers superior predictive power, particularly in datasets with both linear and nonlinear characteristics, when compared to traditional benchmarks.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 3","pages":"899 - 928"},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145168479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Survey of Image Generation Models Based on Deep Learning","authors":"Jun Li, Chenyang Zhang, Wei Zhu, Yawei Ren","doi":"10.1007/s40745-024-00544-1","DOIUrl":"10.1007/s40745-024-00544-1","url":null,"abstract":"<div><p>In recent years, generative artificial intelligence has been developing rapidly. In the image domain, image generation models based on deep learning have made remarkable achievements. Early frameworks for image generation models were dominated by generative adversarial networks (GANs) and variational autoencoders (VAEs). Nowadays, large-scale generative models based on diffusion models have become mainstream, and the quality of their generated images is significantly improved. We will review the research and development of image generation models and delve into the significant progress made in the field in recent years. Initially, we revisit the development of traditional image generation models like GANs and VAEs, emphasizing their contributions and challenges. We also introduce diffusion models, which have received much attention in the field of image generation due to their unique generative process and excellent generative performance. Subsequently, we emphasized the large vision models with SAM as the focal point. We also pay special attention to large-scale generative models like Stable Diffusion, which have demonstrated unprecedented capabilities in high-quality image generation tasks. Additionally, we explore target models and respective fine-tuning methods for domain-oriented image generation tasks, predicts future directions in image generation, and proposes potential research focuses and challenges.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"141 - 170"},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shabanam K. Shikalgar, N. V. S. Pavan Kumar, Gavendra Singh, Faizur Rashid
{"title":"Classification of Privacy Preserved Medical Data with Fractional Tuna Sailfish Optimization Based Deep Residual Network in Cloud","authors":"Shabanam K. Shikalgar, N. V. S. Pavan Kumar, Gavendra Singh, Faizur Rashid","doi":"10.1007/s40745-024-00538-z","DOIUrl":"10.1007/s40745-024-00538-z","url":null,"abstract":"<div><p>Nowadays, with the growth of emerging technologies, increased attention has been paid to the classification of privacy-preserved medical data and development of various privacy-preserving models for the promotion of online medical pre-diagnosis systems. Medical data is highly sensitive and it is essential to ensure privacy of medical records from third-party users to increase service quality, satisfy patients and earn trust. The classification of medical preserved data is helpful to build a clinical decision system by classifying patients based on their disease and symptoms. In this article, a hybrid optimization-based deep learning model named Fractional Tuna Sailfish Optimization–Deep Residual Network (FractionalTSFO-DRN) is designed to precisely classify the privacy preserved medical data. A privacy utility coefficient matrix is used to ensure the privacy of medical data by generating a key matrix using Tuna Sailfish Optimization (TSFO) algorithmic technique. The privacy-preserved medical data is allowed for the classification process using DRN and the introduced Fractional TSFO is used to optimize and enhance the classification in DRN. The assessment followed by using heart disease prediction databases proved that the employed classification technique recorded an accuracy of 94.67%, a True Positive Rate of 93.56%, and a True Negative Rate of 89.68% respectively.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 3","pages":"829 - 854"},"PeriodicalIF":0.0,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145166413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Pricing of Data Based on Bi-level Programming Model","authors":"Yurong Ding, Yingjie Tian","doi":"10.1007/s40745-024-00549-w","DOIUrl":"10.1007/s40745-024-00549-w","url":null,"abstract":"<div><p>Effective value measurement and pricing methods can greatly promote the healthy development of data sharing, exchange and reuse. However, the uncertainty of data value and neglect of interactivity lead to information asymmetry in the transaction process. A perfect pricing system and well-designed data trading market (hereafter called data market) can widely promote data transactions. We take the three-agents data market as an example to construct a sound data trading process. The data owner who provides data records, the model buyer who is interested in buying machine learning (ML) model instances, and the data broker who interacts between the data owner and the model buyer. Based on the characteristics of data market, like truthfulness, revenue maximization, version control, fairness and non-arbitrage, we propose a data pricing methods based on different model versions. Firstly, we utilize market research and construct a revenue maximization (RM) problem to price the different versions of ML models and solve it with the RM-ILP process. However, the RM model based on market research has two major problems: one is that the model buyer has no incentive to tell the truth, that is, the model buyer will lie in the market research to obtain a lower model price; the other is that it asks the data broker to release version menu in advance, resulting in an inefficient operation of the data market. In view of the defects of the RM transaction model, we propose a model buyers behavior analysis, establish the revenue maximization function based on different data versions to establish a bi-level linear programming model. We further add the incentive compatibility constraint and the individual rationality constraint, taking the utility of the model buyer and the revenue of the data broker into account. This reflects the consumer driven model in the data transaction mode. Finally, the RM-BLP process is proposed to transform RM problem into an equivalent single-level integer programming problem and we solve it with the “Gurobi” solver. The validity of the model is verified by experiments.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1391 - 1419"},"PeriodicalIF":0.0,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142412038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Stage Analysis of Interaction Between Stock and Exchange Rate Markets: Evidence from Turkey","authors":"Muhammad Ali Faisal, Murat Donduran","doi":"10.1007/s40745-024-00547-y","DOIUrl":"10.1007/s40745-024-00547-y","url":null,"abstract":"<div><p>In this study, we use a novel approach to explore possible connections between foreign exchange and stock returns using Turkish financial data from 2005 to 2022. Our method involves a two-stage technique. The first stage begins by decomposing individual time series signals into separate intrinsic mode functions (IMFs) with a complete ensemble empirical mode decomposition with added noise algorithm. Extracted IMFs are then used to construct high and low-frequency components through a fine-to-coarse algorithm. In the second phase, we utilized a cross-quantilogram technique to analyze the dependence in quantiles of the original return series along with frequency components obtained in the previous stage. Results revealed several important insights. Firstly, a relatively higher effect ran from stock returns to exchange rate returns for the pertinent period. Secondly, tail dependence is apparent, as returns are discernibly linked. Thirdly, the tail dependence in the returns is more profound in the high-frequency composition than in the low-frequency component. Lastly, the structure of dependence has stayed mostly constant throughout the sample period analyzed.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"171 - 198"},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141359846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Dementia Prediction Using Ensemble Majority Voting Classifier","authors":"K. P. Muhammed Niyas, P. Thiyagarajan","doi":"10.1007/s40745-024-00550-3","DOIUrl":"10.1007/s40745-024-00550-3","url":null,"abstract":"<div><p>Early detection of dementia patients in advance is a great concern for the physicians. That is why physicians make use of multi modal data to accomplish this. The baseline visit data of the patients are mainly utilized for this task. Modern Machine Learning techniques provide empirical evidence based approach to physicians for predicting the diagnosis status of the patients. This paper proposes an ensemble majority voting classifier approach for improving the detection of dementia using baseline visit data. The ensemble model consists of Logistic Regression, Random Forest, and Naive Bayes Classifiers. The proposed ensemble classifier reported with a BCA, F1-score of 92%, 0.92 for classifying demented and non-demented patients. Our results suggest that the prediction using the ensemble majority voting classifier improves the Balanced Classification Accuracy, F1-score for predicting dementia on the multi modal data of Open Access Series Imaging Dataset. The results using ensemble models are promising and highlight the importance of using ensemble models for dementia detection using multimodal data.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 3","pages":"947 - 967"},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141369288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Study and Research Perception towards Secured Data Sharing for Lung Cancer Detection with Blockchain Technology","authors":"Hari Krishna Kalidindi, N. Srinivasu","doi":"10.1007/s40745-024-00537-0","DOIUrl":"10.1007/s40745-024-00537-0","url":null,"abstract":"<div><p>Modernization in the healthcare industry is happening with the support of artificial intelligence and blockchain technologies. Collecting healthcare data is done through any Google survey from different governing bodies and data available on the Web of Sciences. However, the researchers continually suffered on developing effective classification approaches. In the recently developed models, deep learning is used for better generalization and training performance using a massive amount of data. A better learning model is built by sharing the data from organizations like research centers, testing labs, hospitals, etc. Each healthcare institution requires proper data privacy, and thus, these industries desire to use efficient and accurate learning systems for different applications. Among various diseases in the world, lung cancer is one of a hazardous diseases. Thus, early identification of lung cancer and followed by the appropriate treatment can save a life. Hence, the Computer Aided Diagnosis (CAD) model is essential for supporting healthcare applications. Therefore, an automated lung cancer detection models are developed to identify cancer from the different modalities of medical images. As a result, the privacy concern in clinical data restricts data sharing between various organizations based on legal and ethical problems. Hence, for these security reasons, the blockchain comes into focus. Here, there is a need to get access to the blockchain by healthcare professionals for displaying the clinical records of the patient, which ensures the security of the patient’s data. For this purpose, artificial intelligence utilizes numerous techniques, large quantities of data, and decision-making capability. Thus, the medical system must have democratized healthcare, reduced costs, and enhanced service efficiency by combining technological advancement. Therefore, this paper aims to review several lung cancer detection approaches in data sharing to help future research. Here, the systematic review of lung cancer detection models is done based on ML and DL algorithms. In recent years, the fundamental well-performed techniques have been discussed by categorizing them. Furthermore, the simulation platforms, dataset utilized, and performance measures are evaluated as an extended review. This survey explores the challenges and research findings for supporting future works. This work will produce many suggestions for future professionals and researchers for enhancing the secure data transmission of medical data.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"757 - 797"},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141368507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}