Annals of Data Science最新文献

筛选
英文 中文
Identifying the Intents Behind Website Visits by Employing Unsupervised Machine Learning Models
Annals of Data Science Pub Date : 2025-01-09 DOI: 10.1007/s40745-024-00586-5
Judah Soobramoney, Retius Chifurira, Temesgen Zewotir, Knowledge Chinhamu
{"title":"Identifying the Intents Behind Website Visits by Employing Unsupervised Machine Learning Models","authors":"Judah Soobramoney,&nbsp;Retius Chifurira,&nbsp;Temesgen Zewotir,&nbsp;Knowledge Chinhamu","doi":"10.1007/s40745-024-00586-5","DOIUrl":"10.1007/s40745-024-00586-5","url":null,"abstract":"<div><p>With digitisation globally on the rise, corporates are compelled to better understand the usage of their websites. In doing so, corporates will be empowered to better understand consumers, and make necessary adjustments to ultimately improve the corporate’s stance in the competitive global landscape of this modern age. However, the online website visit data has proven to be highly complex, big in data volume, and highly transactional with users expressing unique behaviours. Thus, extracting insight can be a complex problem to solve. This study aimed to employ unsupervised machine learning models to identify the intentions behind the visits on the observed website. The data studied was sourced from the Google Analytics tracking tool that was deployed on a corporate informative website. The study employed a k-means, hierarchical and dbscan unsupervised machine learning models to understand the intents behind visitors on the studied website. All three models detected five major intents that were expressed within the observed data. The intents identified were labelled as “accidentals”, “drop-offs”, “engrossed”, “get-in-touch” and “seekers”. On the observed data, all three unsupervised machine learning methods have performed well. However, in the context of the study, which investigated the intents that drove online visits, the hierarchical clustering method yielded superior results by maintaining the best balance between cluster homogeneity (stronger silhouette coefficients) and cluster size.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"413 - 437"},"PeriodicalIF":0.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-024-00586-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Finite Mixture Model Based on the Generalized t Distributions with Two-Sided Censored Data
Annals of Data Science Pub Date : 2024-09-25 DOI: 10.1007/s40745-024-00572-x
Ruijie Guan, Yaohua Rong, Weihu Cheng, Zhenyu Xin
{"title":"A Novel Finite Mixture Model Based on the Generalized t Distributions with Two-Sided Censored Data","authors":"Ruijie Guan,&nbsp;Yaohua Rong,&nbsp;Weihu Cheng,&nbsp;Zhenyu Xin","doi":"10.1007/s40745-024-00572-x","DOIUrl":"10.1007/s40745-024-00572-x","url":null,"abstract":"<div><p>In light of the rapid technological advancements witnessed in recent decades, numerous disciplines have been inundated with voluminous datasets characterized by multimodality, heavy-tailed distributions, and prevalent missing information. Consequently, the task of effectively modeling such intricate data poses a formidable yet indispensable challenge. This paper endeavors to address this challenge by introducing a novel finite mixture model predicated upon the generalized <i>t</i> distribution, tailored specifically to accommodate two-sided censored observations, thereby establishing a foundational framework for modeling this complex data structure. To facilitate parameter estimation within this model, we devise a variant of the EM-type algorithm, amalgamating the profile likelihood approach with the classical Expectation Conditional Maximization algorithm. Notably, this hybridized methodology affords analytical expressions in the E-step and a tractable M-step, thereby substantially enhancing computational expediency and efficiency. Furthermore, we furnish closed-form expressions delineating the observed information matrix, pivotal for approximating the asymptotic covariance matrix of the MLEs within this mixture model. To empirically evaluate the efficacy of the proposed algorithm, a series of simulation studies are conducted, demonstrating promising performance across various artificial datasets. Additionally, the practical applicability of the proposed methodology is elucidated through its deployment on two real-world datasets, thereby underscoring its feasibility and utility in practical settings.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"341 - 379"},"PeriodicalIF":0.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gated Graph Attention-based Crossover Snake (GGA-CS) Algorithm for Hyperspectral Image Classification
Annals of Data Science Pub Date : 2024-08-20 DOI: 10.1007/s40745-024-00567-8
R. Ablin, G. Prabin
{"title":"Gated Graph Attention-based Crossover Snake (GGA-CS) Algorithm for Hyperspectral Image Classification","authors":"R. Ablin,&nbsp;G. Prabin","doi":"10.1007/s40745-024-00567-8","DOIUrl":"10.1007/s40745-024-00567-8","url":null,"abstract":"<div><p>Hyperspectral image classification involves assigning pixels or regions within a hyperspectral image to specific classes or categories based on the spectral information captured across multiple bands. Traditional method faces several challenges such as High Dimensionality, Scalability, Spectral Variability, as well as Limited Contextual Information. Hence to solve these issues a novel Gated Graph Attention-based Crossover Snake (GGA-CS) algorithm is proposed for classifying hyperspectral images. In this work, a Graph Neural Network (GNN) is employed to capture both spectral and spatial relationships between pixels, and a gated attention mechanism is utilized to enhance specific spectral bands. After the training process, a crossover-based snake optimization is applied that tuned the parameter and obtain classification output of GNN and adjust the pixels to enhance the performances of GGA-CS method. The study is validated on diverse datasets namely the Indian Pines dataset, the University of Pavia dataset, as well as Salinas dataset. The evaluation of the GGA-CS method’s performance includes assessing its effectiveness using key metrics. Comparisons with state-of-the-art methods are conducted to gauge its efficacy in hyperspectral image classification, as demonstrated by experimental results.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"281 - 305"},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel-free Reduced Quadratic Surface Support Vector Machine with 0-1 Loss Function and L(_p)-norm Regularization
Annals of Data Science Pub Date : 2024-08-19 DOI: 10.1007/s40745-024-00573-w
Mingyang Wu, Zhixia Yang
{"title":"Kernel-free Reduced Quadratic Surface Support Vector Machine with 0-1 Loss Function and L(_p)-norm Regularization","authors":"Mingyang Wu,&nbsp;Zhixia Yang","doi":"10.1007/s40745-024-00573-w","DOIUrl":"10.1007/s40745-024-00573-w","url":null,"abstract":"<div><p>This paper presents a novel nonlinear binary classification method, namely the kernel-free reduced quadratic surface support vector machine with 0-1 loss function and L<span>(_{p})</span>-norm regularization (L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span>). It uses kernel-free trick aimed at finding a reduced quadratic surface to separate samples, without considering the cross terms in quadratic form. This saves computational costs and provides better interpretability than methods using kernel functions. In addition, adding the 0-1 loss function and L<span>(_p)</span>-norm regularization to construct our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> enables sample sparsity and feature sparsity. The support vector (SV) of L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> is defined, and it is derived that all SVs fall on the support hypersurfaces. Moreover, the optimality condition is explored theoretically, and a new iterative algorithm based on the alternating direction method of multipliers (ADMM) framework is used to solve our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> on the selected working set. The computational complexity and convergence of the algorithm are discussed. Furthermore, numerical experiments demonstrate that our L<span>(_p)</span>-RQSSVM<span>(_{0/1})</span> achieves better classification accuracy, less SVs, and higher computational efficiency than other methods on most datasets. It also has feature sparsity under certain conditions.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"381 - 412"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-negative Sparse Matrix Factorization for Soft Clustering of Territory Risk Analysis 用于领土风险软聚类分析的非负稀疏矩阵因式分解
Annals of Data Science Pub Date : 2024-08-10 DOI: 10.1007/s40745-024-00570-z
Shengkun Xie, Chong Gan, Anna T. Lawniczak
{"title":"Non-negative Sparse Matrix Factorization for Soft Clustering of Territory Risk Analysis","authors":"Shengkun Xie,&nbsp;Chong Gan,&nbsp;Anna T. Lawniczak","doi":"10.1007/s40745-024-00570-z","DOIUrl":"10.1007/s40745-024-00570-z","url":null,"abstract":"<div><p>Developing effective methodologies for territory design and relativity estimation is crucial in auto insurance rate filings and reviews. This study introduces a novel approach utilizing fuzzy clustering to enhance the design process of territories for auto insurance rate-making and regulation. By adopting a soft clustering method, we aim to overcome the limitations of traditional hard clustering techniques and improve the assessment of territory risk. Furthermore, we employ non-negative sparse matrix approximation techniques to refine the estimates of risk relativities for basic rating units. This method promotes sparsity in the fuzzy membership matrix by eliminating small membership values, leading to more robust and interpretable results. We also compare the outcomes with those obtained using non-negative sparse principal component analysis, a technique explored in our previous research. Integrating fuzzy clustering with non-negative sparse matrix decomposition offers a promising approach for auto insurance rate filings. The combined methodology enhances decision-making and provides sparse estimates, which can be advantageous in various data science applications where fuzzy clustering is relevant.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"307 - 340"},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial Label Learning with Noisy Labels
Annals of Data Science Pub Date : 2024-07-31 DOI: 10.1007/s40745-024-00552-1
Pan Zhao, Long Tang, Zhigeng Pan
{"title":"Partial Label Learning with Noisy Labels","authors":"Pan Zhao,&nbsp;Long Tang,&nbsp;Zhigeng Pan","doi":"10.1007/s40745-024-00552-1","DOIUrl":"10.1007/s40745-024-00552-1","url":null,"abstract":"<div><p>Partial label learning (PLL) is a particular problem setting within weakly supervised learning. In PLL, each sample corresponds to a candidate label set in which only one label is true. However, in some practical application scenarios, the emergence of label noise can make some candidate sets lose their true labels, leading to a decline in model performance. In this work, a robust training strategy for PLL, derived from the joint training with co-regularization (JoCoR), is proposed to address this issue in PLL. Specifically, the proposed approach constructs two separate PLL models and a joint loss. The joint loss consists of not only two PLL losses but also a co-regularization term measuring the disagreement of the two models. By automatically selecting samples with small joint loss and using them to update the two models, our proposed approach is able to filter more and more suspected samples with noise candidate label sets. Gradually, the robustness of the PLL models to label noise strengthens due to the reduced disagreement of the two models. Experiments are conducted on two state-of-the-art PLL models using benchmark datasets under various noise levels. The results show that the proposed method can effectively stabilize the training process and reduce the model's overfitting to noisy candidate label sets.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"199 - 212"},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel Method for Estimating Matusita Overlapping Coefficient Using Numerical Approximations 使用数值近似法估算马图西塔重叠系数的核方法
Annals of Data Science Pub Date : 2024-07-27 DOI: 10.1007/s40745-024-00563-y
Omar M. Eidous, Enas A. Ananbeh
{"title":"Kernel Method for Estimating Matusita Overlapping Coefficient Using Numerical Approximations","authors":"Omar M. Eidous, Enas A. Ananbeh","doi":"10.1007/s40745-024-00563-y","DOIUrl":"https://doi.org/10.1007/s40745-024-00563-y","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"82 21","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141798320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Likelihood Estimation for Generalized Inflated Power Series Distributions 广义膨胀幂级数分布的最大似然估计
Annals of Data Science Pub Date : 2024-07-23 DOI: 10.1007/s40745-024-00560-1
Robert L. Paige
{"title":"Maximum Likelihood Estimation for Generalized Inflated Power Series Distributions","authors":"Robert L. Paige","doi":"10.1007/s40745-024-00560-1","DOIUrl":"https://doi.org/10.1007/s40745-024-00560-1","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"82 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141812645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Farm-Level Smart Crop Recommendation Framework Using Machine Learning 利用机器学习的农场级智能作物推荐框架
Annals of Data Science Pub Date : 2024-07-20 DOI: 10.1007/s40745-024-00534-3
Amit Bhola, Prabhat Kumar
{"title":"Farm-Level Smart Crop Recommendation Framework Using Machine Learning","authors":"Amit Bhola,&nbsp;Prabhat Kumar","doi":"10.1007/s40745-024-00534-3","DOIUrl":"10.1007/s40745-024-00534-3","url":null,"abstract":"<div><p>Agriculture is the primary source of food, fuel, and raw materials and is vital to any country’s economy. Farmers, the backbone of agriculture, primarily rely on instinct to determine what crops to plant in any given season. They are comfortable following customary farming practices and standards and are oblivious to the fact that crop yield is highly dependent on current environmental and soil conditions. Crop recommendations involve multifaceted factors such as weather, soil quality, crop production, market demand, and prices, making it crucial for farmers to make well-informed decisions. An improper or imprudent crop recommendation can affect them, their families, and the entire agricultural sector. Modern technologies like artificial intelligence, machine learning, and data science have emerged as efficient solutions to combat issues like declining crop production and lower profits. This research proposes a Smart Crop Recommendation framework that leverages machine learning to empower farmers to make informed decisions about optimal crop selection. The framework consists of two phases: crop filtration and yield prediction. Crops are filtered in the first phase using an artificial neural network based on local input parameters. The second phase estimates yield for filtered crops, considering the season, farm area, and location data. The final recommendation provides farmers with crops aimed at maximizing profit. The remarkable 99.10% accuracy of the framework is demonstrated through experimentation using artificial neural networks and the 0.99 <span>(text {R}^{text {2}})</span> error metric for the random forest. The uniqueness of this framework lies in its distinctive focus on the farm level and its consideration of the challenges and various agricultural features that change over time. The experimental results affirm the effectiveness of the framework, and its lightweight nature enhances its practicality, making it an efficient real-time recommendation solution.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"117 - 140"},"PeriodicalIF":0.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141819448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reaction Function for Financial Market Reacting to Events or Information 金融市场对事件或信息的反应函数
Annals of Data Science Pub Date : 2024-07-17 DOI: 10.1007/s40745-024-00565-w
Bo Li, Guangle Du
{"title":"Reaction Function for Financial Market Reacting to Events or Information","authors":"Bo Li,&nbsp;Guangle Du","doi":"10.1007/s40745-024-00565-w","DOIUrl":"10.1007/s40745-024-00565-w","url":null,"abstract":"<div><p>Observations indicate that the distributions of stock returns in financial markets usually do not conform to normal distributions, but rather exhibit characteristics of high peaks, fat tails and biases. In this work, we assume that the effects of events or information on prices obey normal distribution, while financial markets often overreact or underreact to events or information, resulting in non normal distributions of stock returns. Based on the above assumptions, we for the first time propose a reaction function for a financial market reacting to events or information, and a model based on it to describe the distribution of real stock returns. Our analysis of the returns of China Securities Index 300 (CSI 300), the Standard &amp; Poor’s 500 Index (SPX or S &amp;P 500) and the Nikkei 225 Index (N225) at different time scales shows that financial markets often underreact to events or information with minor impacts, overreact to events or information with relatively significant impacts, and react slightly stronger to positive events or information than to negative ones. In addition, differences in financial markets and time scales of returns can also affect the shapes of the reaction functions.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1265 - 1290"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141830830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信