Big Data Research最新文献

筛选
英文 中文
Attentive Implicit Relation Embedding for Event Recommendation in Event-Based Social Network 为基于事件的社交网络中的事件推荐嵌入注意隐含关系
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2024-02-05 DOI: 10.1016/j.bdr.2024.100426
Yuan Liang
{"title":"Attentive Implicit Relation Embedding for Event Recommendation in Event-Based Social Network","authors":"Yuan Liang","doi":"10.1016/j.bdr.2024.100426","DOIUrl":"10.1016/j.bdr.2024.100426","url":null,"abstract":"<div><p>The <u>e</u>vent-<u>b</u>ased <u>s</u>ocial <u>n</u>etwork (EBSN) is a new type of social network that combines online and offline networks, and its primary goal is to recommend appropriate events to users. Most studies do not model event recommendations on the EBSN platform as graph representation learning, nor do they consider the implicit relationship between events, resulting in recommendations that are not accepted by users. Thus, we study graph representation learning, which integrates implicit relationships between social networks and events. First, we propose an algorithm that integrates implicit relationships between social networks and events based on a multiple attention model. The graph structure that integrates implicit relationships between social networks and events is divided into user modeling and event modeling: modeling the interactive information of user events, user social relationships, and implicit relationships between users in user modeling; modeling user information and implicit relationships between events in event modeling; and deeply mining high-level transfer relationships between users and events. Then, the user modeling and event modeling models are fused using a multiattention joint learning mechanism to capture the different impacts of social and implicit relationships on user preferences, improving the recommendation quality of the recommendation system. Finally, the effectiveness of the proposed algorithm is verified in real datasets.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100426"},"PeriodicalIF":3.3,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chlorophyll-a concentration variations in Bohai sea: Impacts of environmental complexity and human activities based on remote sensing technologies 渤海叶绿素 a 浓度变化:基于遥感技术的环境复杂性和人类活动的影响
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2024-02-03 DOI: 10.1016/j.bdr.2024.100440
Yong Du , Xiaoyu Zhang , Shuchang Ma , Nan Yao
{"title":"Chlorophyll-a concentration variations in Bohai sea: Impacts of environmental complexity and human activities based on remote sensing technologies","authors":"Yong Du ,&nbsp;Xiaoyu Zhang ,&nbsp;Shuchang Ma ,&nbsp;Nan Yao","doi":"10.1016/j.bdr.2024.100440","DOIUrl":"10.1016/j.bdr.2024.100440","url":null,"abstract":"<div><p>This study extensively explores the intricate dynamics of the Bohai Sea ecosystem, a semi-closed marginal sea in China, influenced by both environmental complexity and human activities. By utilizing chlorophyll-a as an indicator, we closely examine how phytoplankton responds to coastal environmental conditions and stressors. The temporal analysis conducted over the 23-year period from 1998 to 2020 reveals a distinctive \"bell-shaped\" variation in chlorophyll-a concentration. Spatially, a declining trend is observed from coastal to central regions, characterized by widespread low-value areas. Employing M-K and slope trend analyses, we observe a 42.13 % decline in the northern Bohai Sea, contrasting with a significant 57.87 % increase in the central and southern regions. The innovative aspects of this research lie in identifying the complex interplay between chlorophyll-a concentration, human pollution controls, and nutrient inputs. Factors contributing to chlorophyll-a concentration, ranked by significance, include sea surface temperature, photosynthetically available radiation (PAR), and wind speed. Remarkably, the negligible impact of the \"2015 Tianjin explosion\" underscores the robustness of the Bohai Sea's chlorophyll-a dynamics. Furthermore, the positive correlation between phosphorus input and chlorophyll classifies Bohai Bay as a phosphorus-limited aquatic ecosystem. In conclusion, this study provides crucial insights for the preservation of the Bohai Sea ecosystem, emphasizing the necessity for ongoing monitoring and management strategies in the face of evolving environmental and anthropogenic influences.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100440"},"PeriodicalIF":3.3,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139663075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tropical cyclone trajectory based on satellite remote sensing prediction and time attention mechanism ConvLSTM model 基于卫星遥感预测和时间注意机制 ConvLSTM 模型的热带气旋轨迹
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2024-02-03 DOI: 10.1016/j.bdr.2024.100439
Tongfei Li , Mingzheng Lai , Shixian Nie , Haifeng Liu , Zhiyao Liang , Wei Lv
{"title":"Tropical cyclone trajectory based on satellite remote sensing prediction and time attention mechanism ConvLSTM model","authors":"Tongfei Li ,&nbsp;Mingzheng Lai ,&nbsp;Shixian Nie ,&nbsp;Haifeng Liu ,&nbsp;Zhiyao Liang ,&nbsp;Wei Lv","doi":"10.1016/j.bdr.2024.100439","DOIUrl":"10.1016/j.bdr.2024.100439","url":null,"abstract":"<div><p>The accurate and timely prediction of tropical cyclones is of paramount importance in mitigating the impact of these catastrophic meteorological events. Presently, methods for predicting tropical cyclones based on satellite remote sensing images encounter notable challenges, including the inadequate extraction of three-dimensional spatial features and limitations in long-term forecasting. As a response to these challenges, this study introduces the Temporal Attention Mechanism ConvLSTM (TAM-CL) model, designed to conduct thorough spatiotemporal feature extraction on three-dimensional atmospheric reanalysis data of tropical cyclones. By leveraging ConvLSTM with three-dimensional convolution kernels, our model enhances the extraction of three-dimensional spatiotemporal features. Furthermore, an attention mechanism is integrated to bolster long-term prediction accuracy by emphasizing crucial temporal nodes. In the evaluation of tropical cyclone track and intensity forecasts across 24, 48, and 72 h, TAM-CL demonstrates a notable reduction in prediction errors, thereby underscoring its efficacy in forecasting both cyclone tracks and intensities. This contributes to an effective exploration of the application of deep networks in conjunction with atmospheric reanalysis data.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100439"},"PeriodicalIF":3.3,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Spatial-Temporal Transformer Network for Traffic Prediction 用于交通预测的图时空变换器网络
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2024-01-26 DOI: 10.1016/j.bdr.2024.100427
Zhenzhen Zhao , Guojiang Shen , Lei Wang , Xiangjie Kong
{"title":"Graph Spatial-Temporal Transformer Network for Traffic Prediction","authors":"Zhenzhen Zhao ,&nbsp;Guojiang Shen ,&nbsp;Lei Wang ,&nbsp;Xiangjie Kong","doi":"10.1016/j.bdr.2024.100427","DOIUrl":"10.1016/j.bdr.2024.100427","url":null,"abstract":"<div><p><span>Traffic information can reflect the operating status of a city, and accurate traffic forecasting is critical in intelligent transportation systems (ITS) and urban planning. However, traffic information has complex nonlinearity and dynamic spatial-temporal dependencies due to human mobility, bringing new traffic forecasting challenges. This paper proposed a graph spatial-temporal transformer network for </span>traffic prediction<span> (GSTTN) to cope with the above problems. Specifically, the proposed framework explores spatial characteristics of the across-road network of traffic information hidden in human behavior patterns via a multi-view graph convolutional network<span> (GCN). Furthermore, the transformer network with a multi-head attention mechanism is adopted to capture the random disturbance in the time series characteristics of traffic information. As a result, these two components can be used to model spatial relations and temporal trends. Finally, we examine real-world datasets, and the experiments show that the proposed framework outperforms the current state-of-the-art baselines.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"36 ","pages":"Article 100427"},"PeriodicalIF":3.3,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139582754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Airspace situation analysis of terminal area traffic flow prediction based on big data and machine learning methods 基于大数据和机器学习方法的终端区交通流预测空域态势分析
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2024-01-18 DOI: 10.1016/j.bdr.2024.100425
Yandong Li , Bo Jiang , Weilong Liu , Chenglong Li , Yunfan Zhou
{"title":"Airspace situation analysis of terminal area traffic flow prediction based on big data and machine learning methods","authors":"Yandong Li ,&nbsp;Bo Jiang ,&nbsp;Weilong Liu ,&nbsp;Chenglong Li ,&nbsp;Yunfan Zhou","doi":"10.1016/j.bdr.2024.100425","DOIUrl":"10.1016/j.bdr.2024.100425","url":null,"abstract":"<div><p>Real-time and accurate prediction of terminal area arrival traffic flow is a key issue for terminal area traffic management. In this paper, we study the advantages and disadvantages of traditional dynamics-based prediction methods and time-series based prediction methods in the first step. Taking the advantages of the two type of methods, a terminal area arrival flow prediction framework based on airspace situation is proposed. In our method, the airspace situation is used as the machine learning feature to estimate the number of arrival aircraft. In addition, also based on machine learning approach, a correction stage is added to the algorithm to improve the accuracy of the prediction. ADS-B data collected from the terminal area of Chengdu is used to study the prediction accuracy based on different machine learning algorithms in the proposed framework. Experimental results show that the proposed method can predict the air traffic flow accurately. The average absolute error is only 0.35 aircraft/15 min, the root mean square error is 0.67 aircraft/15 min, and the maximum absolute error is 2 aircraft/15 min. Compared with the AOL method, our proposed method improves the accuracy of prediction by a margin of 90 % and 60 % according to the evaluation metrics of MAE and MAXAE, respectively.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"35 ","pages":"Article 100425"},"PeriodicalIF":3.3,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579624000017/pdfft?md5=399453e55e15e7b2fc74c8ad5fce66dc&pid=1-s2.0-S2214579624000017-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Predictability of Stock Price: Empirical Study on Tick Data in Chinese Stock Market 股票价格的可预测性:基于中国股票市场波动数据的实证研究
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2023-11-17 DOI: 10.1016/j.bdr.2023.100414
Yueshan Chen , Xingyu Xu , Tian Lan , Sihai Zhang
{"title":"The Predictability of Stock Price: Empirical Study on Tick Data in Chinese Stock Market","authors":"Yueshan Chen ,&nbsp;Xingyu Xu ,&nbsp;Tian Lan ,&nbsp;Sihai Zhang","doi":"10.1016/j.bdr.2023.100414","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100414","url":null,"abstract":"<div><p>Whether or not stocks are predictable has been a topic of concern for decades. The efficient market hypothesis (EMH) says that it is difficult for investors to make extra profits by predicting stock prices, but this may not be true, especially for the Chinese stock market. Therefore, we explore the predictability of the Chinese stock market based on tick data, a widely studied high-frequency data. We obtain the predictability of 3, 834 Chinese stocks by adopting the concept of true entropy, which is calculated by Limpel-Ziv data compression method. The Markov chain model and the diffusion kernel model are used to compare the upper bounds on predictability, and it is concluded that there is still a significant performance gap between the forecasting models used and the theoretical upper bounds. Our work shows that more than 73% of stocks have prediction accuracy greater than 70% and RMSE less than 2 CNY under different quantification intervals with different models. We further take Spearman's correlation to reveal that the average stock price and price volatility may have a negative impact on prediction accuracy, which may be helpful for stock investors.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"35 ","pages":"Article 100414"},"PeriodicalIF":3.3,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579623000473/pdfft?md5=df49b0edd2f0330b446f4870f4a82ce5&pid=1-s2.0-S2214579623000473-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138413020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost optimization model design of fresh food cold chain system in the context of big data 大数据背景下生鲜食品冷链系统成本优化模型设计
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2023-11-11 DOI: 10.1016/j.bdr.2023.100417
Lei Wang , Guangjun Liu , Ibrar Ahmad
{"title":"Cost optimization model design of fresh food cold chain system in the context of big data","authors":"Lei Wang ,&nbsp;Guangjun Liu ,&nbsp;Ibrar Ahmad","doi":"10.1016/j.bdr.2023.100417","DOIUrl":"10.1016/j.bdr.2023.100417","url":null,"abstract":"<div><p>The assessment of cold chain logistics for fresh products can be more precise with high-dimensional information data, providing valuable insights for the optimization of associated costs. Nonetheless, traditional data processing techniques fail to meet the processing efficiency required for such high-dimensional cold chain logistics data. Therefore, this paper proposes a spectral clustering algorithm based on the local standard deviation and optimized initial center, which comprehensively analyzes the fixed, transportation, refrigeration, and cargo damage costs of cold chain logistics. Additionally, this algorithm includes a variation operator based on clustering and introduces a large neighborhood search mechanism for optimizing the individual connectivity gene layer after selecting the gene layer site for variation. Simulation results demonstrate that the proposed algorithm exhibits better convergence in 15 iterations, reduces error rates, and significantly cuts down on the clustering process time. This ultimately leads to a reduction in the total cost of cold chain calculation.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"35 ","pages":"Article 100417"},"PeriodicalIF":3.3,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579623000503/pdfft?md5=0db9cf3ef6ea7d1e1fd34d6a3e87e1ee&pid=1-s2.0-S2214579623000503-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135670379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A methodology to assess and evaluate sites with high potential for stormwater harvesting in Dehradun, India 一种评估和评价印度德拉敦具有高雨水收集潜力的地点的方法
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2023-11-10 DOI: 10.1016/j.bdr.2023.100415
Shray Pathak , Shreya Sharma , Abhishek Banerjee , Sanjeev Kumar
{"title":"A methodology to assess and evaluate sites with high potential for stormwater harvesting in Dehradun, India","authors":"Shray Pathak ,&nbsp;Shreya Sharma ,&nbsp;Abhishek Banerjee ,&nbsp;Sanjeev Kumar","doi":"10.1016/j.bdr.2023.100415","DOIUrl":"10.1016/j.bdr.2023.100415","url":null,"abstract":"<div><p>The urgency to protect natural water resources in a sustainable manner has risen as water scarcity and global climate change continue to worsen. Among various methods of collecting water, stormwater harvesting (SWH) is regarded as the most environmentally friendly approach to alleviating the strain on freshwater resources. The study introduces a robust approach to evaluating the potential for SWH, considering both technical and socioeconomic aspects. This method effectively identifies and assesses suitable areas, referred to as hotspots, for implementing SWH. Multiple criteria are established to quickly evaluate and analyze the suitability of these sites for stormwater harvesting. Moreover, the input from water experts is incorporated into the decision-making process. Initially, potential locations are chosen, and hotspots are identified based on the concept of accumulated catchments. Subsequently, a more detailed analysis is carried out on the shortlisted sites, utilizing multiple screening criteria such as demand, inverse weighted distance, and the runoff-to-demand ratio. A standardized method is then employed to rank the sites and determine the most suitable one for stormwater harvesting. The study identifies eight locations that are appropriate for SWH, with two of them being particularly suitable locations. Further, the radius of influence is added to encompass these sites in order to pinpoint the areas conducive to fulfilling water requirements and availability. This approach empowers water planners to make well-informed decisions in a more streamlined manner. Consequently, the methodology emphasizes the benefits of these tools for water experts who are actively seeking sustainable solutions to mitigate the pressure on freshwater resources.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"35 ","pages":"Article 100415"},"PeriodicalIF":3.3,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579623000485/pdfft?md5=1736971c2f1584138324cb67603cb69a&pid=1-s2.0-S2214579623000485-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135614493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wetland identification through remote sensing: Insights into wetness, greenness, turbidity, temperature, and changing landscapes 通过遥感识别湿地:对湿度、绿度、浊度、温度和变化景观的见解
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2023-11-09 DOI: 10.1016/j.bdr.2023.100416
Rana Waqar Aslam , Hong Shu , Kanwal Javid , Shazia Pervaiz , Farhan Mustafa , Danish Raza , Bilal Ahmed , Abdul Quddoos , Saad Al-Ahmadi , Wesam Atef Hatamleh
{"title":"Wetland identification through remote sensing: Insights into wetness, greenness, turbidity, temperature, and changing landscapes","authors":"Rana Waqar Aslam ,&nbsp;Hong Shu ,&nbsp;Kanwal Javid ,&nbsp;Shazia Pervaiz ,&nbsp;Farhan Mustafa ,&nbsp;Danish Raza ,&nbsp;Bilal Ahmed ,&nbsp;Abdul Quddoos ,&nbsp;Saad Al-Ahmadi ,&nbsp;Wesam Atef Hatamleh","doi":"10.1016/j.bdr.2023.100416","DOIUrl":"10.1016/j.bdr.2023.100416","url":null,"abstract":"<div><p>Wetlands are important in many ways, including hydrological cycles, ecosystem diversity, climate change, and economic activity. Despite the Ramsar Convention's awareness programmes, the importance of wetlands is frequently disregarded in underdeveloped countries. The Ramsar Convention recognises 2491 wetlands worldwide, 19 of which are in Pakistan. The goal of this study is to use satellite sensor technology to identify neglected wetlands in Pakistan. The key goals of this research are to analyse water quality, monitor ecological changes, and comprehend the impact of climate change on the aforementioned wetlands. We used approaches like supervised classification and TCW to identify wetlands. To detect climate-induced changes, a change detection index was used to Quick Bird imagery. TCG and the NDTI were also employed to examine the water quality and ecological changes in these wetlands. Sentinel-2 data between 2016 and 2019 were used in the analysis. Furthermore, watershed analysis was carried out using ASTER DEM data. Modis data was used to calculate the LST (°C) of the selected wetlands, while rainfall (mm) data was collected from ANN databases. According to the study's findings, in 2016, Borith, Phander, Upper Kachura, Satpara, and Rama Lake held 22.73%, 20.79%, 23.01%, 24.63%, and 23.03% water, respectively. In 2019, the water ratios for these lakes were 23.40%, 22.10%, 22.43%, 25.01%, and 24.56%. These findings emphasise the need of taking preventative actions to protect these wetlands in order to improve ecosystem dynamics in the future. As a result, it is critical that the relevant authorities implement the necessary conservation measures.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"35 ","pages":"Article 100416"},"PeriodicalIF":3.3,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579623000497/pdfft?md5=6c2fd850b51a67adc45a9dc630b4afe6&pid=1-s2.0-S2214579623000497-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135565832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ML-aVAT: A Novel 2-Stage Machine-Learning Approach for Automatic Clustering Tendency Assessment ML-aVAT:一种新的两阶段机器学习方法用于自动聚类倾向评估
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2023-10-31 DOI: 10.1016/j.bdr.2023.100413
Harshal Mittal, Jagarlamudi Sai Laxman, Dheeraj Kumar
{"title":"ML-aVAT: A Novel 2-Stage Machine-Learning Approach for Automatic Clustering Tendency Assessment","authors":"Harshal Mittal,&nbsp;Jagarlamudi Sai Laxman,&nbsp;Dheeraj Kumar","doi":"10.1016/j.bdr.2023.100413","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100413","url":null,"abstract":"<div><p>Clustering tendency assessment, which aims to deduce if a dataset contains any cluster structure, and, if it does, how many clusters it has, is a critical problem in exploratory data analysis. The VAT family of algorithms provides a “visual” means to assess the clustering tendency for various datasets. The VAT algorithm operates by reordering the pairwise distance matrix of the input data. When viewed as a monochrome image, this reordered dissimilarity matrix is called a reordered dissimilarity image (RDI), showing possible data clusters by dark blocks along the diagonal. This process, however, requires human intervention to interpret an RDI. Moreover, for datasets having complex cluster structure or noise, dark blocks along the diagonal of the RDI are not easily distinguishable, making it difficult to count them accurately, and different individuals can report different numbers of dark blocks. Only a handful of approaches have been proposed in the literature to automatically (algorithmically) infer the cluster structure from a VAT-type RDI without requiring human input. However, these approaches do not perform well for several data types and have impractically high run-time. This paper proposes and develops ML-aVAT: a novel two-stage machine-learning-based approach for automatic clustering tendency assessment from VAT-type RDI. Besides estimating the number of clusters, ML-aVAT can also infer the clustering hierarchy, i.e., sub-clusters within each group, something none of the previously proposed algorithms could do. Numerical experiments performed on various synthetic and real-life labeled and unlabeled datasets prove the effectiveness of ML-aVAT in estimating clustering tendency and cluster hierarchy.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"34 ","pages":"Article 100413"},"PeriodicalIF":3.3,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92043108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信