Big Data Research最新文献

筛选
英文 中文
NoSQL data warehouse optimizing models: A comparative study of column-oriented approaches
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-03-20 DOI: 10.1016/j.bdr.2025.100523
Mohamed Mouhiha, Abdelfettah Mabrouk
{"title":"NoSQL data warehouse optimizing models: A comparative study of column-oriented approaches","authors":"Mohamed Mouhiha,&nbsp;Abdelfettah Mabrouk","doi":"10.1016/j.bdr.2025.100523","DOIUrl":"10.1016/j.bdr.2025.100523","url":null,"abstract":"<div><div>There is a great challenge when building an efficient Big Data Warehouse (DW) from the traditional data warehouse which used to handle the large datasets. Several presented solutions concentrate on the conversion of a standard DW to an columnar model, especially for direct and traditional data sources. Though there have been many successful algorithms that apply data clustering methods, these approaches also come with their fair share of limitations. This paper provides a comprehensive review of the existing methods, both tuned and out-of-the box, exposing their strengths and weaknesses. Further, a comparative study of the different options is always conducted to compare and assess them.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100523"},"PeriodicalIF":3.5,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143681953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-dimensional feature learning for visible-infrared person re-identification
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-03-17 DOI: 10.1016/j.bdr.2025.100522
Zhenzhen Yang, Xinyi Wu, Yongpeng Yang
{"title":"Multi-dimensional feature learning for visible-infrared person re-identification","authors":"Zhenzhen Yang,&nbsp;Xinyi Wu,&nbsp;Yongpeng Yang","doi":"10.1016/j.bdr.2025.100522","DOIUrl":"10.1016/j.bdr.2025.100522","url":null,"abstract":"<div><div>Visible-infrared person re-identification (VI-ReID) is a challenging task due to significant differences between modalities and feature representation of visible and infrared images. The primary goal of current VI-ReID is to reduce discrepancies between modalities. However, existing research primarily focuses on learning modality-invariant features. Due to significant modality differences, it is challenging to learn an effectively common feature space. Moreover, the intra-modality differences have not been well addressed. Therefore, a novel multi-dimensional feature learning network (MFLNet) is proposed in this paper to tackle the inherent challenges of intra-modality and inter-modality differences in VI-ReID. Specifically, to effectively address intra-modality variations, we employ the random local shear (RLS) augmentation, which accurately simulates viewpoint and posture changes. This augmentation can be seamlessly incorporated into other methods without modifying the network or parameters. Additionally, we integrate the multi-dimensional information mining (MIM) module to extract discriminative features and bridge the gap between modalities. Moreover, the cyclical smoothing focal (CSF) loss is introduced to prioritize challenging samples during training, thereby enhancing the ReID performance. Finally, the experimental results indicate that the proposed MFLNet outperforms other VI-ReID approaches on the SYSU-MM01, RegDB and LLCM datasets.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100522"},"PeriodicalIF":3.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep attention dynamic representation learning networks for recommender system review modeling
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-03-15 DOI: 10.1016/j.bdr.2025.100521
Shivangi Gheewala , Shuxiang Xu , Soonja Yeom
{"title":"Deep attention dynamic representation learning networks for recommender system review modeling","authors":"Shivangi Gheewala ,&nbsp;Shuxiang Xu ,&nbsp;Soonja Yeom","doi":"10.1016/j.bdr.2025.100521","DOIUrl":"10.1016/j.bdr.2025.100521","url":null,"abstract":"<div><div>Despite considerable research of utilizing deep learning technology and textual reviews in recommender systems, improving system performance is a contentious matter. This is primarily due to issues faced in learning user-item representations. One issue is the limited ability of networks to model dynamic user-item representations from reviews. Particularly, in sequence-to-sequence learning models, there appears a substantial likelihood of losing semantic knowledge of previous review sequences, as overridden by the next. Another issue lies in effectively integrating global-level and topical-level representations to extract informative content and enhance user-item representations. Existing methods struggle to maintain contextual consistency during this integration process, resulting in suboptimal representation learning, especially attempting to capture finer details. To address these issues, we propose a novel recommendation model called Deep Attention Dynamic Representation Learning (DADRL). Specifically, we employ Latent Dirichlet Allocation and dynamic modulator-based Long Short-Term Memory to extract topical and dynamic global representations. Then, we introduce an attentional fusion methodology to integrate these representations in a contextually consistent manner and construct informative attentional user-item representations. We use these representations into the factorization machines layer to predict the final scores. Experimental results on Amazon categories, Yelp, and LibraryThing show that our model exhibits superior performance compared to several state-of-the-arts. We further examine the DADRL architecture under various conditions to provide insights on the model's employed components.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100521"},"PeriodicalIF":3.5,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143681952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complex data in tourism analysis: A stochastic approach to price competition
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-03-13 DOI: 10.1016/j.bdr.2025.100520
Giovanni Angelini , Michele Costa , Andrea Guizzardi
{"title":"Complex data in tourism analysis: A stochastic approach to price competition","authors":"Giovanni Angelini ,&nbsp;Michele Costa ,&nbsp;Andrea Guizzardi","doi":"10.1016/j.bdr.2025.100520","DOIUrl":"10.1016/j.bdr.2025.100520","url":null,"abstract":"<div><div>This study examines pricing strategies and decision-making processes in the hospitality industry by analyzing “ask” prices on online travel agencies (i.e., the rates at which hoteliers are willing to sell their rooms). We face the challenge of modeling a continuous flow of big data organized as “time series of time series,” where daily seasonality and advance bookings intersect. Our research combines insights from tourism, quantitative methods, and big data to improve pricing strategies, contributing to both theory and practice in revenue management. Focusing on Venice, we analyze price competition as a multivariate stochastic process using a Structural Vector Autoregressive (SVAR) approach, aligning with modern dynamic pricing algorithms.</div><div>The findings show that time-based pricing strategies, which adjust based on the day of arrival and booking, are more important than room features in setting hotel prices. We also find that price changes have a non-linear and decreasing effect as the booking date approaches. These insights suggest that hotels could create more advanced pricing strategies, and policymakers should consider these factors when addressing the challenges related to overtourism.</div><div>We study the complex competitive relationships among heterogeneous service providers with an approach applicable to any market where consumption is delayed relative to purchase time. However, we highlight that the quality and accessibility of information in the tourism sector are key aspects to be considered when using big data in this industry.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100520"},"PeriodicalIF":3.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143641669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ImDMI: Improved Distributed M-Invariance model to achieve privacy continuous big data publishing using Apache Spark
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-03-07 DOI: 10.1016/j.bdr.2025.100519
Salheddine Kabou , Laid Gasmi , Abdelbaset Kabou , Sidi Mohammed Benslimane
{"title":"ImDMI: Improved Distributed M-Invariance model to achieve privacy continuous big data publishing using Apache Spark","authors":"Salheddine Kabou ,&nbsp;Laid Gasmi ,&nbsp;Abdelbaset Kabou ,&nbsp;Sidi Mohammed Benslimane","doi":"10.1016/j.bdr.2025.100519","DOIUrl":"10.1016/j.bdr.2025.100519","url":null,"abstract":"<div><div>One of the critical challenges in the big data analytics is the individual's privacy issues. Data anonymization models including k-anonymity and l-diversity are used to guarantee the tradeoff between privacy and data utility while publishing the data. However, these models focus only on the single release of datasets and produce a certain level of privacy. In practical big data applications, data publishing is more complicated where the data is published continuously as new data is collected, and the privacy should be achieved for different releases. In this research, we propose a new distributed bottom up approach on Apache Spark for achievement of the m-invariance privacy model in the continuous big data context. The proposed approach, which is the first study that deals with dynamic big data publishing, is based on the insertion and the split process. In the first process, the data records collected from different workers are inserted into an improved bottom up R-tree generalization in order to minimizing the information loss. The second process concentrates on splitting the overflowed node with respect to the m-invariance model requirement by minimizing the overlap between the resulting partitions. The experimental results show significant improvement in term of data utility, execution time and counterfeit data records as compared to existing techniques in the literature.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100519"},"PeriodicalIF":3.5,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143609162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting option prices: From the Black-Scholes model to machine learning methods
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-26 DOI: 10.1016/j.bdr.2025.100518
Angela Maria D'Uggento, Marta Biancardi, Domenico Ciriello
{"title":"Predicting option prices: From the Black-Scholes model to machine learning methods","authors":"Angela Maria D'Uggento,&nbsp;Marta Biancardi,&nbsp;Domenico Ciriello","doi":"10.1016/j.bdr.2025.100518","DOIUrl":"10.1016/j.bdr.2025.100518","url":null,"abstract":"<div><div>In the ever-changing landscape of financial markets, accurate option pricing remains critical for investors, traders and financial institutions. Traditionally, the Black-Scholes (B&amp;S) model has been the cornerstone for option pricing, providing a solid framework based on mathematical and physical principles. Nevertheless, the B&amp;S model has some limitations, such as the restriction to European options, the absence of dividends, constant volatility, etc. Studies and academic literature on the application of machine learning models in the financial sector are rapidly increasing. The main objective of this paper is to provide a comprehensive comparative analysis between the traditional B&amp;S model and the most commonly used machine learning algorithms such as Artificial Neural Networks (ANNs). The rationale is twofold. First, to examine the assumptions of the B&amp;S model, such as constant volatility and a perfectly efficient market, in light of the complexity of the real world, even though it is recognized that the model has been known as a pillar for decades. Secondly, to emphasize that the proliferation of big data and advances in computing power have fuelled the rise of machine learning techniques in finance. These algorithms have remarkable capabilities in discovering non-linear patterns and extracting information from large data sets, providing a compelling alternative to traditional quantitative methods. Machine learning offers a new way to capture and model such complex financial dynamics, which can lead to more accurate pricing models. By comparing the B&amp;S model and some machine learning approaches, this paper aims to shed light on their respective strengths, weaknesses and applicability in the context of options pricing using real data. Through rigorous empirical analyses and performance metrics, our results demonstrate the importance of using machine learning techniques that can outperform or complement the established B&amp;S model in predicting option prices by achieving higher prediction accuracy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100518"},"PeriodicalIF":3.5,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling meaningful volatility events to classify monetary policy announcements
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-26 DOI: 10.1016/j.bdr.2025.100517
Giampiero M. Gallo , Demetrio Lacava , Edoardo Otranto
{"title":"Modeling meaningful volatility events to classify monetary policy announcements","authors":"Giampiero M. Gallo ,&nbsp;Demetrio Lacava ,&nbsp;Edoardo Otranto","doi":"10.1016/j.bdr.2025.100517","DOIUrl":"10.1016/j.bdr.2025.100517","url":null,"abstract":"<div><div>Central Bank monetary policy interventions frequently have direct implications for financial market volatility. In this paper, we introduce an intradaily Asymmetric Multiplicative Error Model with Meaningful Volatility (MV) events (AMEM-MV), which decomposes realized variance into a base component and an MV component. A novel model-based classification of monetary announcements is developed based on their impact on the MV component of the variance. By focusing on the 30-minute window following each Federal Reserve communication, we isolate the specific impact of monetary announcements on the volatility of seven US tickers.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100517"},"PeriodicalIF":3.5,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient training: Federated learning cost analysis
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-20 DOI: 10.1016/j.bdr.2025.100510
Rafael Teixeira , Leonardo Almeida , Mário Antunes , Diogo Gomes , Rui L. Aguiar
{"title":"Efficient training: Federated learning cost analysis","authors":"Rafael Teixeira ,&nbsp;Leonardo Almeida ,&nbsp;Mário Antunes ,&nbsp;Diogo Gomes ,&nbsp;Rui L. Aguiar","doi":"10.1016/j.bdr.2025.100510","DOIUrl":"10.1016/j.bdr.2025.100510","url":null,"abstract":"<div><div>With the rapid development of 6G, Artificial Intelligence (AI) is expected to play a pivotal role in network management, resource optimization, and intrusion detection. However, deploying AI models in 6G networks faces several challenges, such as the lack of dedicated hardware for AI tasks and the need to protect user privacy. To address these challenges, Federated Learning (FL) emerges as a promising solution for distributed AI training without the need to move data from users' devices. This paper investigates the performance and costs of different FL approaches regarding training time, communication overhead, and energy consumption. The results show that FL can significantly accelerate the training process while reducing the data transferred across the network. However, the effectiveness of FL depends on the specific FL approach and the network conditions.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100510"},"PeriodicalIF":3.5,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Tesseract optical character recognition performance on Thai document datasets
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-08 DOI: 10.1016/j.bdr.2025.100508
Noppol Anakpluek, Watcharakorn Pasanta, Latthawan Chantharasukha, Pattanawong Chokratansombat, Pajaya Kanjanakaew, Thitirat Siriborvornratanakul
{"title":"Improved Tesseract optical character recognition performance on Thai document datasets","authors":"Noppol Anakpluek,&nbsp;Watcharakorn Pasanta,&nbsp;Latthawan Chantharasukha,&nbsp;Pattanawong Chokratansombat,&nbsp;Pajaya Kanjanakaew,&nbsp;Thitirat Siriborvornratanakul","doi":"10.1016/j.bdr.2025.100508","DOIUrl":"10.1016/j.bdr.2025.100508","url":null,"abstract":"<div><div>This research aims to improve the accuracy and efficiency of Optical Character Recognition (OCR) technology for the Thai language, specifically in the context of Thai government documents. OCR enables the conversion of text from images into machine-readable format, facilitating document storage and further processing. However, applying OCR to the Thai language presents unique challenges due to its complexity. This study focuses on enhancing the performance of the Tesseract OCR engine, a widely used free OCR technology, by implementing various image preprocessing techniques such as masking, adaptive thresholds, median filtering, Canny edge detection, and morphological operators. A dataset of Thai documents is utilized, and the OCR system's output is evaluated using word error rate (WER) and character error rate (CER) metrics. To improve text extraction accuracy, the research employs the original U-Net architecture [<span><span>19</span></span>] for image segmentation. Furthermore, the Tesseract OCR engine is finetuned, and image preprocessing is performed to optimize OCR system accuracy. The developed tools automate workflow processes, alleviate constraints on model training, and enable the effective utilization of information from official Thai documents for various purposes.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100508"},"PeriodicalIF":3.5,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach for job matching and skill recommendation using transformers and the O*NET database
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-07 DOI: 10.1016/j.bdr.2025.100509
Rubén Alonso , Danilo Dessí , Antonello Meloni , Diego Reforgiato Recupero
{"title":"A novel approach for job matching and skill recommendation using transformers and the O*NET database","authors":"Rubén Alonso ,&nbsp;Danilo Dessí ,&nbsp;Antonello Meloni ,&nbsp;Diego Reforgiato Recupero","doi":"10.1016/j.bdr.2025.100509","DOIUrl":"10.1016/j.bdr.2025.100509","url":null,"abstract":"<div><div>Today we have tons of information posted on the web every day regarding job supply and demand which has heavily affected the job market. The online enrolling process has thus become efficient for applicants as it allows them to present their resumes using the Internet and, as such, simultaneously to numerous organizations. Online systems such as Monster.com, OfferZen, and LinkedIn contain millions of job offers and resumes of potential candidates leaving to companies with the hard task to face an enormous amount of data to manage to select the most suitable applicant. The task of assessing the resumes of candidates and providing automatic recommendations on which one suits a particular position best has, therefore, become essential to speed up the hiring process. Similarly, it is important to help applicants to quickly find a job appropriate to their skills and provide recommendations about what they need to master to become eligible for certain jobs. Our approach lies in this context and proposes a new method to identify skills from candidates' resumes and match resumes with job descriptions. We employed the O*NET database entities related to different skills and abilities required by different jobs; moreover, we leveraged deep learning technologies to compute the semantic similarity between O*NET entities and part of text extracted from candidates' resumes. The ultimate goal is to identify the most suitable job for a certain resume according to the information there contained. We have defined two scenarios: i) given a resume, identify the top O*NET occupations with the highest match with the resume, ii) given a candidate's resume and a set of job descriptions, identify which one of the input jobs is the most suitable for the candidate. The evaluation that has been carried out indicates that the proposed approach outperforms the baselines in the two scenarios. Finally, we provide a use case for candidates where it is possible to recommend courses with the goal to fill certain skills and make them qualified for a certain job.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100509"},"PeriodicalIF":3.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信