Annals of Data Science最新文献

筛选
英文 中文
The Modified Lindley Distribution Through Convex Combination with Applications in Engineering 凸组合修正Lindley分布及其工程应用
Annals of Data Science Pub Date : 2024-08-22 DOI: 10.1007/s40745-024-00569-6
Afaq Ahmad, A. A. Bhat, S. P. Ahmad, Raheela Jan
{"title":"The Modified Lindley Distribution Through Convex Combination with Applications in Engineering","authors":"Afaq Ahmad,&nbsp;A. A. Bhat,&nbsp;S. P. Ahmad,&nbsp;Raheela Jan","doi":"10.1007/s40745-024-00569-6","DOIUrl":"10.1007/s40745-024-00569-6","url":null,"abstract":"<div><p>This paper introduces a Modified Lindley distribution using a convex combination of exponential and gamma distribution. The fundamental properties of the proposed distribution such as the shapes of the distribution, moments, mean, variance, reliability, hazard rate, moment generating function, stochastic ordering and the distribution of order statistics have been derived. The proposed distribution is observed to be a heavy-tailed distribution and can also be used to model data with upside-down bathtub shape for its hazard rate function. The maximum likelihood estimators of the unknown parameters of the proposed distribution have been obtained. Two numerical examples are given to demonstrate the applicability of the proposed distribution and for the two real data sets, the proposed distribution is found to be superior in its ability to sufficiently model heavy-tailed data than many other models.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 5","pages":"1463 - 1478"},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144905022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gated Graph Attention-based Crossover Snake (GGA-CS) Algorithm for Hyperspectral Image Classification 基于门控图注意的交叉蛇(GGA-CS)高光谱图像分类算法
Annals of Data Science Pub Date : 2024-08-20 DOI: 10.1007/s40745-024-00567-8
R. Ablin, G. Prabin
{"title":"Gated Graph Attention-based Crossover Snake (GGA-CS) Algorithm for Hyperspectral Image Classification","authors":"R. Ablin,&nbsp;G. Prabin","doi":"10.1007/s40745-024-00567-8","DOIUrl":"10.1007/s40745-024-00567-8","url":null,"abstract":"<div><p>Hyperspectral image classification involves assigning pixels or regions within a hyperspectral image to specific classes or categories based on the spectral information captured across multiple bands. Traditional method faces several challenges such as High Dimensionality, Scalability, Spectral Variability, as well as Limited Contextual Information. Hence to solve these issues a novel Gated Graph Attention-based Crossover Snake (GGA-CS) algorithm is proposed for classifying hyperspectral images. In this work, a Graph Neural Network (GNN) is employed to capture both spectral and spatial relationships between pixels, and a gated attention mechanism is utilized to enhance specific spectral bands. After the training process, a crossover-based snake optimization is applied that tuned the parameter and obtain classification output of GNN and adjust the pixels to enhance the performances of GGA-CS method. The study is validated on diverse datasets namely the Indian Pines dataset, the University of Pavia dataset, as well as Salinas dataset. The evaluation of the GGA-CS method’s performance includes assessing its effectiveness using key metrics. Comparisons with state-of-the-art methods are conducted to gauge its efficacy in hyperspectral image classification, as demonstrated by experimental results.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"281 - 305"},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel-free Reduced Quadratic Surface Support Vector Machine with 0-1 Loss Function and L(_p)-norm Regularization 具有0-1损失函数和L (_p) -范数正则化的无核简化二次曲面支持向量机
Annals of Data Science Pub Date : 2024-08-19 DOI: 10.1007/s40745-024-00573-w
Mingyang Wu, Zhixia Yang
{"title":"Kernel-free Reduced Quadratic Surface Support Vector Machine with 0-1 Loss Function and L(_p)-norm Regularization","authors":"Mingyang Wu,&nbsp;Zhixia Yang","doi":"10.1007/s40745-024-00573-w","DOIUrl":"10.1007/s40745-024-00573-w","url":null,"abstract":"<div><p>This paper presents a novel nonlinear binary classification method, namely the kernel-free reduced quadratic surface support vector machine with 0-1 loss function and L<span>(_{p})</span>-norm regularization (L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span>). It uses kernel-free trick aimed at finding a reduced quadratic surface to separate samples, without considering the cross terms in quadratic form. This saves computational costs and provides better interpretability than methods using kernel functions. In addition, adding the 0-1 loss function and L<span>(_p)</span>-norm regularization to construct our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> enables sample sparsity and feature sparsity. The support vector (SV) of L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> is defined, and it is derived that all SVs fall on the support hypersurfaces. Moreover, the optimality condition is explored theoretically, and a new iterative algorithm based on the alternating direction method of multipliers (ADMM) framework is used to solve our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> on the selected working set. The computational complexity and convergence of the algorithm are discussed. Furthermore, numerical experiments demonstrate that our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> achieves better classification accuracy, less SVs, and higher computational efficiency than other methods on most datasets. It also has feature sparsity under certain conditions.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"381 - 412"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Empirical Study of Nature-Inspired Algorithms for Feature Selection in Medical Applications 医学应用中基于自然的特征选择算法的实证研究
Annals of Data Science Pub Date : 2024-08-14 DOI: 10.1007/s40745-024-00571-y
Varun Arora, Parul Agarwal
{"title":"An Empirical Study of Nature-Inspired Algorithms for Feature Selection in Medical Applications","authors":"Varun Arora,&nbsp;Parul Agarwal","doi":"10.1007/s40745-024-00571-y","DOIUrl":"10.1007/s40745-024-00571-y","url":null,"abstract":"<div><p>Nature-inspired algorithms (NIA) are proven to be the potential tool for solving intricate optimization problems and aid in the development of better computational techniques. In recent years, these algorithms have raised considerable interest to optimize feature selection problems. In literature, NIA is found to select relevant features among available features in the diagnosis of many chronic diseases. In this paper, a comprehensive review of existing nature-inspired feature selection techniques is presented. Along with this, the fundamental definitions of feature selection and the usage of NIA to optimize feature selection are shown. We have given a review showcasing the NIA application for selecting feature subsets from the available features in the domain of medical applications. The paper reviews and analyzes numerous relevant papers from 2008 to 2022 on feature selection through NIA on biomedical applications. Moreover, to find the best optimization algorithm for feature selection, we have conducted experiments among four well-known nature-inspired algorithms on ten benchmark datasets of the biomedical domain for classification. We have reported results on various state-of-the-art evaluation measures and presented the convergence graphs for analysis. Based on the average rank of fitness values, Particle Swarm Optimization is found to be better than Harris Hawk Optimization, Grey Wolf Optimization, and Whale Optimization. In this paper, we have also presented some open challenges of this research area to guide researchers as well as experts of computational intelligence for future work. The paper will help future researchers understand the use and implementation of nature-inspired algorithms for feature selection in the medical domain.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 5","pages":"1479 - 1524"},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144905221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Analysis of Machine Learning Techniques for Imbalanced Genetic Data 不平衡遗传数据的机器学习技术比较分析
Annals of Data Science Pub Date : 2024-08-13 DOI: 10.1007/s40745-024-00575-8
Arshmeet Kaur, Morteza Sarmadi
{"title":"Comparative Analysis of Machine Learning Techniques for Imbalanced Genetic Data","authors":"Arshmeet Kaur,&nbsp;Morteza Sarmadi","doi":"10.1007/s40745-024-00575-8","DOIUrl":"10.1007/s40745-024-00575-8","url":null,"abstract":"<div><p>Advancements in genome sequencing technologies have significantly increased the availability of genomic data. The use of machine learning models to predict the pathogenicity or clinical significance of genetic mutations is crucial. However, genetic datasets often feature imbalanced target variables and high-cardinality, skewed predictor variables. These attributes complicate machine learning modeling processes. This study addresses these challenges in both regression and classification tasks. In this study, we systematically explored the impact of various data preprocessing techniques, feature selection methods, and model choices on the performance of machine learning models trained on imbalanced genetic data. We evaluated the performance metrics using fivefold cross-validation. Our key findings demonstrate that the regression models are robust to outliers and skew in predictor and target variables. Similarly, in classification tasks, class-imbalanced target variables and skewed predictors minimally impact model performance. Among the models tested, random forest was the most effective model for both imbalanced regression and classification tasks. Our key contributions are as follows: we address a significant research gap by focusing on imbalanced regression, a problem that is sparsely explored compared to class-imbalanced classification. We identify the techniques that improve prediction performance and provide practical insights into handling genetic data. Additionally, we provide a foundation for future research to further optimize machine learning approaches in genomics. This study uses a genetic dataset as a case, but our findings are applicable to imbalanced data in other fields.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 5","pages":"1553 - 1575"},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144905138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-negative Sparse Matrix Factorization for Soft Clustering of Territory Risk Analysis 用于领土风险软聚类分析的非负稀疏矩阵因式分解
Annals of Data Science Pub Date : 2024-08-10 DOI: 10.1007/s40745-024-00570-z
Shengkun Xie, Chong Gan, Anna T. Lawniczak
{"title":"Non-negative Sparse Matrix Factorization for Soft Clustering of Territory Risk Analysis","authors":"Shengkun Xie,&nbsp;Chong Gan,&nbsp;Anna T. Lawniczak","doi":"10.1007/s40745-024-00570-z","DOIUrl":"10.1007/s40745-024-00570-z","url":null,"abstract":"<div><p>Developing effective methodologies for territory design and relativity estimation is crucial in auto insurance rate filings and reviews. This study introduces a novel approach utilizing fuzzy clustering to enhance the design process of territories for auto insurance rate-making and regulation. By adopting a soft clustering method, we aim to overcome the limitations of traditional hard clustering techniques and improve the assessment of territory risk. Furthermore, we employ non-negative sparse matrix approximation techniques to refine the estimates of risk relativities for basic rating units. This method promotes sparsity in the fuzzy membership matrix by eliminating small membership values, leading to more robust and interpretable results. We also compare the outcomes with those obtained using non-negative sparse principal component analysis, a technique explored in our previous research. Integrating fuzzy clustering with non-negative sparse matrix decomposition offers a promising approach for auto insurance rate filings. The combined methodology enhances decision-making and provides sparse estimates, which can be advantageous in various data science applications where fuzzy clustering is relevant.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"307 - 340"},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Effect of Company Size, Profitability, Leverage, Media Exposure, and Liquidity on Carbon Emissions Disclosure 公司规模、盈利能力、杠杆、媒体曝光和流动性对碳排放披露的影响
Annals of Data Science Pub Date : 2024-08-07 DOI: 10.1007/s40745-024-00564-x
Eva Yulianti, Stephanus Remond Waworuntu
{"title":"The Effect of Company Size, Profitability, Leverage, Media Exposure, and Liquidity on Carbon Emissions Disclosure","authors":"Eva Yulianti,&nbsp;Stephanus Remond Waworuntu","doi":"10.1007/s40745-024-00564-x","DOIUrl":"10.1007/s40745-024-00564-x","url":null,"abstract":"<div><p>Carbon emissions disclosure (CED) has become a pivotal aspect of corporate sustainability efforts, reflecting a company’s commitment to environmental responsibility and accountability. This study delves into the complex connection between CED and corporate attributes. The study aims to uncover carbon emission (CE) transparency within the organization and provide environmental sustainability. The data were collected from 420 professionals from diverse industry sectors (IS). Board diversity (BD) and IS influence carbon disclosure (CD) behaviors, and elucidate the symbiotic relationship between financial performance (FP) and sustainability commitments. The SPSS program was used to examine the gathered data. The results show that firms with strong FP and governance structures (GS) have been positively correlated with transparent CED. The study result indicates companies engage in more transparent CED to maintain stakeholder trust and attract socially responsible investors. The study’s novelty lies in promoting transparency, and accountability on CED in the corporate governance (CG) and principles, as well as transformative corporate environment sustainability practices in a dynamic business landscape. The study contributed to providing valuable insights to companies exhibiting higher levels of CED and operating the firms with an environmental conscience. This research further contributes to offering actionable insights for policymakers, practitioners, and investors seeking to navigate the evolving landscape of corporate sustainability. GSs are fostered to drive positive environmental outcomes and promote ecological performances and the trust of the stakeholders. The study demonstrates that diverse companies across industries disclose CE practices under industry-specific pressures and regulatory expectations.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1285 - 1313"},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145162408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial Label Learning with Noisy Labels 带噪声标签的部分标签学习
Annals of Data Science Pub Date : 2024-07-31 DOI: 10.1007/s40745-024-00552-1
Pan Zhao, Long Tang, Zhigeng Pan
{"title":"Partial Label Learning with Noisy Labels","authors":"Pan Zhao,&nbsp;Long Tang,&nbsp;Zhigeng Pan","doi":"10.1007/s40745-024-00552-1","DOIUrl":"10.1007/s40745-024-00552-1","url":null,"abstract":"<div><p>Partial label learning (PLL) is a particular problem setting within weakly supervised learning. In PLL, each sample corresponds to a candidate label set in which only one label is true. However, in some practical application scenarios, the emergence of label noise can make some candidate sets lose their true labels, leading to a decline in model performance. In this work, a robust training strategy for PLL, derived from the joint training with co-regularization (JoCoR), is proposed to address this issue in PLL. Specifically, the proposed approach constructs two separate PLL models and a joint loss. The joint loss consists of not only two PLL losses but also a co-regularization term measuring the disagreement of the two models. By automatically selecting samples with small joint loss and using them to update the two models, our proposed approach is able to filter more and more suspected samples with noise candidate label sets. Gradually, the robustness of the PLL models to label noise strengthens due to the reduced disagreement of the two models. Experiments are conducted on two state-of-the-art PLL models using benchmark datasets under various noise levels. The results show that the proposed method can effectively stabilize the training process and reduce the model's overfitting to noisy candidate label sets.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"199 - 212"},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel Method for Estimating Matusita Overlapping Coefficient Using Numerical Approximations 使用数值近似法估算马图西塔重叠系数的核方法
Annals of Data Science Pub Date : 2024-07-27 DOI: 10.1007/s40745-024-00563-y
Omar M. Eidous, Enas A. Ananbeh
{"title":"Kernel Method for Estimating Matusita Overlapping Coefficient Using Numerical Approximations","authors":"Omar M. Eidous,&nbsp;Enas A. Ananbeh","doi":"10.1007/s40745-024-00563-y","DOIUrl":"10.1007/s40745-024-00563-y","url":null,"abstract":"<div><p>In this paper, a nonparametric kernel method is introduced to estimate the well-known overlapping coefficient, Matusita <span>(rho (X,Y))</span>, between two random variables <span>(X)</span> and <span>(Y)</span>. Due to the complexity of finding the formula expression of this coefficient when using the kernel estimators, we suggest to use the numerical integration method to approximate its integral as a first step. Then the kernel estimators were combined with the new approximation to formulate the proposed estimators. Two numerical integration rules known as trapezoidal and Simpson rules were used to approximate the interesting integral. The proposed technique produces two new estimators for <span>(rho (X,Y))</span>. The resulting estimators are studied and compared with existing estimator developed by Eidous and Al-Talafheh (Commun Stat Simul Comput 51(9):5139–5156, 2022. https://doi.org/10.1080/03610918.2020.1757711) via Monte-Carlo simulation technique. The simulation results demonstrated the usefulness and effectiveness of the new technique for estimating <span>(rho (X,Y))</span>.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1265 - 1283"},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141798320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Likelihood Estimation for Generalized Inflated Power Series Distributions 广义膨胀幂级数分布的最大似然估计
Annals of Data Science Pub Date : 2024-07-23 DOI: 10.1007/s40745-024-00560-1
Robert L. Paige
{"title":"Maximum Likelihood Estimation for Generalized Inflated Power Series Distributions","authors":"Robert L. Paige","doi":"10.1007/s40745-024-00560-1","DOIUrl":"10.1007/s40745-024-00560-1","url":null,"abstract":"<div><p>In this paper we first define the class of Generalized Inflated Power Series Distributions (GIPSDs) which contain the inflated discrete distributions most often seen in practice as special cases. We describe the hitherto unkown exponential family structure of GIPSDs and use this to derive closed-form, easy to program, conditional and unconditional maximum likelihood estimators for essentially any number of parameters. We also show how the GIPSD exponential family can be extended to model deflated mass points. Our results provide easy access to likelihood-based inference and automated model selection procedures for GIPSDs that only involve one-dimensional numerical root-finding problems that are easily solved with simple routines. We consider four real-data examples which illustrate the utility and scope of our results.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1189 - 1209"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141812645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信