{"title":"Research on Pricing of Data Based on Bi-level Programming Model","authors":"Yurong Ding, Yingjie Tian","doi":"10.1007/s40745-024-00549-w","DOIUrl":"10.1007/s40745-024-00549-w","url":null,"abstract":"<div><p>Effective value measurement and pricing methods can greatly promote the healthy development of data sharing, exchange and reuse. However, the uncertainty of data value and neglect of interactivity lead to information asymmetry in the transaction process. A perfect pricing system and well-designed data trading market (hereafter called data market) can widely promote data transactions. We take the three-agents data market as an example to construct a sound data trading process. The data owner who provides data records, the model buyer who is interested in buying machine learning (ML) model instances, and the data broker who interacts between the data owner and the model buyer. Based on the characteristics of data market, like truthfulness, revenue maximization, version control, fairness and non-arbitrage, we propose a data pricing methods based on different model versions. Firstly, we utilize market research and construct a revenue maximization (RM) problem to price the different versions of ML models and solve it with the RM-ILP process. However, the RM model based on market research has two major problems: one is that the model buyer has no incentive to tell the truth, that is, the model buyer will lie in the market research to obtain a lower model price; the other is that it asks the data broker to release version menu in advance, resulting in an inefficient operation of the data market. In view of the defects of the RM transaction model, we propose a model buyers behavior analysis, establish the revenue maximization function based on different data versions to establish a bi-level linear programming model. We further add the incentive compatibility constraint and the individual rationality constraint, taking the utility of the model buyer and the revenue of the data broker into account. This reflects the consumer driven model in the data transaction mode. Finally, the RM-BLP process is proposed to transform RM problem into an equivalent single-level integer programming problem and we solve it with the “Gurobi” solver. The validity of the model is verified by experiments.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1391 - 1419"},"PeriodicalIF":0.0,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142412038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Stage Analysis of Interaction Between Stock and Exchange Rate Markets: Evidence from Turkey","authors":"Muhammad Ali Faisal, Murat Donduran","doi":"10.1007/s40745-024-00547-y","DOIUrl":"10.1007/s40745-024-00547-y","url":null,"abstract":"<div><p>In this study, we use a novel approach to explore possible connections between foreign exchange and stock returns using Turkish financial data from 2005 to 2022. Our method involves a two-stage technique. The first stage begins by decomposing individual time series signals into separate intrinsic mode functions (IMFs) with a complete ensemble empirical mode decomposition with added noise algorithm. Extracted IMFs are then used to construct high and low-frequency components through a fine-to-coarse algorithm. In the second phase, we utilized a cross-quantilogram technique to analyze the dependence in quantiles of the original return series along with frequency components obtained in the previous stage. Results revealed several important insights. Firstly, a relatively higher effect ran from stock returns to exchange rate returns for the pertinent period. Secondly, tail dependence is apparent, as returns are discernibly linked. Thirdly, the tail dependence in the returns is more profound in the high-frequency composition than in the low-frequency component. Lastly, the structure of dependence has stayed mostly constant throughout the sample period analyzed.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"171 - 198"},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141359846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Study and Research Perception towards Secured Data Sharing for Lung Cancer Detection with Blockchain Technology","authors":"Hari Krishna Kalidindi, N. Srinivasu","doi":"10.1007/s40745-024-00537-0","DOIUrl":"10.1007/s40745-024-00537-0","url":null,"abstract":"<div><p>Modernization in the healthcare industry is happening with the support of artificial intelligence and blockchain technologies. Collecting healthcare data is done through any Google survey from different governing bodies and data available on the Web of Sciences. However, the researchers continually suffered on developing effective classification approaches. In the recently developed models, deep learning is used for better generalization and training performance using a massive amount of data. A better learning model is built by sharing the data from organizations like research centers, testing labs, hospitals, etc. Each healthcare institution requires proper data privacy, and thus, these industries desire to use efficient and accurate learning systems for different applications. Among various diseases in the world, lung cancer is one of a hazardous diseases. Thus, early identification of lung cancer and followed by the appropriate treatment can save a life. Hence, the Computer Aided Diagnosis (CAD) model is essential for supporting healthcare applications. Therefore, an automated lung cancer detection models are developed to identify cancer from the different modalities of medical images. As a result, the privacy concern in clinical data restricts data sharing between various organizations based on legal and ethical problems. Hence, for these security reasons, the blockchain comes into focus. Here, there is a need to get access to the blockchain by healthcare professionals for displaying the clinical records of the patient, which ensures the security of the patient’s data. For this purpose, artificial intelligence utilizes numerous techniques, large quantities of data, and decision-making capability. Thus, the medical system must have democratized healthcare, reduced costs, and enhanced service efficiency by combining technological advancement. Therefore, this paper aims to review several lung cancer detection approaches in data sharing to help future research. Here, the systematic review of lung cancer detection models is done based on ML and DL algorithms. In recent years, the fundamental well-performed techniques have been discussed by categorizing them. Furthermore, the simulation platforms, dataset utilized, and performance measures are evaluated as an extended review. This survey explores the challenges and research findings for supporting future works. This work will produce many suggestions for future professionals and researchers for enhancing the secure data transmission of medical data.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"757 - 797"},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141368507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of the HIV/AIDS Data Using Joint Modeling of Longitudinal (k,l)-Inflated Count and Time to Event Data in Clinical Trials","authors":"Mojtaba Zeinali Najafabadi, Ehsan Bahrami Samani","doi":"10.1007/s40745-024-00532-5","DOIUrl":"10.1007/s40745-024-00532-5","url":null,"abstract":"<div><p>Generalized linear mixed effect models (GLMEMs) are widely applied for the analysis of correlated non-Gaussian data such as those found in longitudinal studies. On the other hand, the Cox (proportional hazards, PHs) and the accelerated failure time (AFT) regression models are two well-known approaches in survival analysis to modeling time to event (TTE) data. In this article, we develop joint modeling of longitudinal count (LC) and TTE data and consider extensions with fixed effects and parametric random effects in our proposed joint models. The LC response is inflated in two points k and l (k < l) and we use some members of (k, l)-inflated power series distribution (PSD) as the distribution of this response. Also, for modeling of TTE process, the PHs model of Cox and the AFT model, based on a flexible hazard function, are separately proposed. One of the goals of the present paper is to evaluate and compare the performance of joint models of (k, l)-inflated LC and TTE data under two mentioned approaches via extensive simulations. The estimation is through the penalized likelihood method, and our concentration is on efficient computation and effective parameter selection. To assist efficient computation, the joint likelihoods of the observations and the latent variables of the random effects are used instead of the marginal likelihood of the observations. Finally, a real AIDS data example is presented to illustrate the potential applications of our joint models.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"695 - 719"},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UAV-YOLOv5: A Swin-Transformer-Enabled Small Object Detection Model for Long-Range UAV Images","authors":"Jun Li, Chong Xie, Sizheng Wu, Yawei Ren","doi":"10.1007/s40745-024-00546-z","DOIUrl":"10.1007/s40745-024-00546-z","url":null,"abstract":"<div><p>This paper tackle the challenges associated with low recognition accuracy and the detection of occlusions when identifying long-range and diminutive targets (such as UAVs). We introduce a sophisticated detection framework named UAV-YOLOv5, which amalgamates the strengths of Swin Transformer V2 and YOLOv5. Firstly, we introduce Focal-EIOU, a refinement of the K-means algorithm tailored to generate anchor boxes better suited for the current dataset, thereby improving detection performance. Second, the convolutional and pooling layers in the network with step size greater than 1 are replaced to prevent information loss during feature extraction. Then, the Swin Transformer V2 module is introduced in the Neck to improve the accuracy of the model, and the BiFormer module is introduced to improve the ability of the model to acquire global and local feature information at the same time. In addition, BiFPN is introduced to replace the original FPN structure so that the network can acquire richer semantic information and fuse features across scales more effectively. Lastly, a small target detection head is appended to the existing architecture, augmenting the model’s proficiency in detecting smaller targets with heightened precision. Furthermore, various experiments are conducted on the comprehensive dataset to verify the effectiveness of UAV-YOLOv5, achieving an average accuracy of 87%. Compared with YOLOv5, the mAP of UAV-YOLOv5 is improved by 8.5%, which verifies that it has high-precision long-range small-target UAV optoelectronic detection capability.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1109 - 1138"},"PeriodicalIF":0.0,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142413758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey of Artificial Intelligence for Industrial Detection","authors":"Jun Li, YiFei Hai, SongJia Yin","doi":"10.1007/s40745-024-00545-0","DOIUrl":"10.1007/s40745-024-00545-0","url":null,"abstract":"<div><p>In the past decade, deep learning has greatly increased the complexity of industrial production intelligence by virtue of its powerful learning capability. At the same time, it has also brought security challenges to the field of industrial production information networks, mainly in two aspects: production safety and network information security. The former is mainly focused on ensuring the safety of personnel behavior in the production environment, including two different categories: detection of dangerous targets and identification of dangerous behaviors. The latter focuses on the safety of industrial information systems, especially networks. In recent years, deep learning-based detection techniques have made great strides in addressing these dual problems. Therefore, this paper presents an exhaustive study on the development of deep learning-based detection methods for industrial production safety analysis and information network security problem detection. The paper presents a comprehensive taxonomy for classifying production environments and production network information, classifying and clustering prevalent industrial security challenges, with a special emphasis on the role of deep learning in insecure behavior identification and information security risk detection.We provides an in-depth analysis of the advantages, limitations, and suitable application scenarios of these two approaches. In addition, the paper provides insights into contemporary challenges and future trends in this field and concludes with a discussion of prospects for future research.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"799 - 827"},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141103821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elias Mazrooei Rad, Mahdi Azarnoosh, Majid Ghoshuni, Mohammad Mahdi Khalilzadeh
{"title":"Combining Nonlinear Features of EEG and MRI to Diagnose Alzheimer’s Disease","authors":"Elias Mazrooei Rad, Mahdi Azarnoosh, Majid Ghoshuni, Mohammad Mahdi Khalilzadeh","doi":"10.1007/s40745-024-00533-4","DOIUrl":"10.1007/s40745-024-00533-4","url":null,"abstract":"<div><p>This article, a new method for the diagnosis of Alzheimer’s disease in the mild stage is presented according to combining the characteristics of EEG signal and MRI images. The brain signal is recorded in four modes of closed-eyes, open eye, reminder, and stimulation from three channels Pz, Cz, and Fz of 90 participants in three groups of healthy subjects, mild, and severe Alzheimer’s disease (AD) patients.In addition, MRI images are taken with at least 3 Tesla and the thickness of 3 mm so it can be examined the senile plaques and neurofibrillary tangles. Proper image segmentation, mask, and sharp filters are used for preprocessing. Then proper features of brain signals extracted according to the nonlinear and chaotic nature of the brain such as Lyapunov exponent, correlation dimension, and entropy. Results: These features combined with brain MRI images properties including Medial Temporal Lobe Atrophy (MTA), Cerebral Spinal Fluid (CSF), Gray Matter (GM), Index Asymmetry (IA) and White Matter (WM) to diagnose the disease. Then two classifiers, the support vector machine, and Elman neural network are used with the optimal combined features extracted by analysis of variance. Results showed that between the three brain signals, and between the four modes of evaluation, the accuracy of the Pz channel and excitation mode was more than the others. Conclusions: Finally, by using neural network dynamics because of the nonlinear properties studied and due to the nonlinear dynamics of the EEG signal, the Elman neural network is used. However, it is the important to note that, by the way of analyzing medical images, we can determine the most effective channel for recording brain signals. 3D segmentation of MRI images further helps researchers diagnose Alzheimer’s disease and obtain important information. The accuracy of the results in Elman neural network with the combination of brain signal features and medical images is 94.4% and in the case without combining the signal and image features, the accuracy of the results is 92.2%. The use of nonlinear classifiers is more appropriate than other classification methods due to the nonlinear dynamics of the brain signal. The accuracy of the results in the support vector machine with RBF core with the combination of brain signal features and medical images is 75.5% and in the case without combining the signal and image features, the accuracy of the results is 76.8%.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"95 - 116"},"PeriodicalIF":0.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141115548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Data Analysis for Robust Classification of Network Topology Through Synthetic Combinatorics","authors":"Samrat Hore, Stabak Roy, Malabika Boruah, Saptarshi Mitra","doi":"10.1007/s40745-024-00523-6","DOIUrl":"10.1007/s40745-024-00523-6","url":null,"abstract":"<div><p>The measurement of network topology through various spatial topological indices like Alpha, Beta and Gamma are widely used for spatial data analysis. However, explaining the classification of the network topology of a city based on Alpha, Beta and Gamma indices is not conclusive, as the result of individual indices are different. To address an efficient classification of network topology, a Modified Synthetic Indicator (MSI) has been proposed and criticised over existing synthetic indicators based on the Composite Weighted Connectivity Index (CWCI), the linear combination of Alpha, Beta and Gamma indices. Application of the proposed MSI in micro-level (ward level) classification of network topology i.e., road network connectivity, has been verified in Agartala City and calibrates the efficiency of CWCI over Alpha, Beta and Gamma indices. The study reveals that the proposed CWCI is more robust than any individual graph-theoretic measure.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1341 - 1359"},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the Performance of Machine Learning Algorithm for Classification of Safer Sexual Negotiation among Married Women in Bangladesh","authors":"Md. Mizanur Rahman, Deluar J. Moloy, Mashfiqul Huq Chowdhury, Arzo Ahmed, Taksina Kabir","doi":"10.1007/s40745-024-00535-2","DOIUrl":"10.1007/s40745-024-00535-2","url":null,"abstract":"<div><p>Safer sexual practice is essential for improving women’s reproductive and sexual health outcomes. The goal of this study is to identify the contributing factors influencing safer sexual negotiations (SSN) through the application of machine learning algorithms. The algorithms include logistic regression (LR), random forest, Naïve Bayes, linear discriminant analysis, classification and regression trees, support vector machines (SVM), and K-nearest neighbors. This study utilized data from the 2017-18 Bangladesh Demographic and Health Survey, encompassing 19,457 married women within the ages of 15–49 years. The analysis reveals that the SVM algorithm achieved the highest classification accuracy (99.66%), along with high sensitivity (99.98%) and the lowest specificity. Conversely, the LR model produced the highest area under the curve statistics (0.6699), indicating good performance in distinguishing SSN among married women. The outcome illustrated that women’s autonomy, engagement with financial institutions, educational attainment, and their partner’s education play a significant role in SSN with their partners. The findings highlight the significance of empowering women, enhancing reproductive health awareness, and improving socio-economic conditions and education to encourage SSN. The government needs to consider all these risk factors to promote greater SSN for preventing sexually transmitted diseases among women in Bangladesh.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"721 - 737"},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unified Image Harmonization with Region Augmented Attention Normalization","authors":"Junjie Hou, Yuqi Zhang, Duo Su","doi":"10.1007/s40745-024-00531-6","DOIUrl":"10.1007/s40745-024-00531-6","url":null,"abstract":"<div><p>The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1865 - 1886"},"PeriodicalIF":0.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140989549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}