Big Data Research最新文献_第4页

NoSQL data warehouse optimizing models: A comparative study of column-oriented approaches NoSQL数据仓库优化模型：面向列方法的比较研究

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-03-20 DOI: 10.1016/j.bdr.2025.100523

Mohamed Mouhiha, Abdelfettah Mabrouk

引用次数: 0

Multi-dimensional feature learning for visible-infrared person re-identification 基于多维特征学习的可见-红外人物再识别

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-03-17 DOI: 10.1016/j.bdr.2025.100522

Zhenzhen Yang, Xinyi Wu, Yongpeng Yang

{"title":"Multi-dimensional feature learning for visible-infrared person re-identification","authors":"Zhenzhen Yang, Xinyi Wu, Yongpeng Yang","doi":"10.1016/j.bdr.2025.100522","DOIUrl":"10.1016/j.bdr.2025.100522","url":null,"abstract":"<div><div>Visible-infrared person re-identification (VI-ReID) is a challenging task due to significant differences between modalities and feature representation of visible and infrared images. The primary goal of current VI-ReID is to reduce discrepancies between modalities. However, existing research primarily focuses on learning modality-invariant features. Due to significant modality differences, it is challenging to learn an effectively common feature space. Moreover, the intra-modality differences have not been well addressed. Therefore, a novel multi-dimensional feature learning network (MFLNet) is proposed in this paper to tackle the inherent challenges of intra-modality and inter-modality differences in VI-ReID. Specifically, to effectively address intra-modality variations, we employ the random local shear (RLS) augmentation, which accurately simulates viewpoint and posture changes. This augmentation can be seamlessly incorporated into other methods without modifying the network or parameters. Additionally, we integrate the multi-dimensional information mining (MIM) module to extract discriminative features and bridge the gap between modalities. Moreover, the cyclical smoothing focal (CSF) loss is introduced to prioritize challenging samples during training, thereby enhancing the ReID performance. Finally, the experimental results indicate that the proposed MFLNet outperforms other VI-ReID approaches on the SYSU-MM01, RegDB and LLCM datasets.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100522"},"PeriodicalIF":3.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep attention dynamic representation learning networks for recommender system review modeling 基于深度关注动态表征学习网络的推荐系统评审建模

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-03-15 DOI: 10.1016/j.bdr.2025.100521

Shivangi Gheewala , Shuxiang Xu , Soonja Yeom

{"title":"Deep attention dynamic representation learning networks for recommender system review modeling","authors":"Shivangi Gheewala , Shuxiang Xu , Soonja Yeom","doi":"10.1016/j.bdr.2025.100521","DOIUrl":"10.1016/j.bdr.2025.100521","url":null,"abstract":"<div><div>Despite considerable research of utilizing deep learning technology and textual reviews in recommender systems, improving system performance is a contentious matter. This is primarily due to issues faced in learning user-item representations. One issue is the limited ability of networks to model dynamic user-item representations from reviews. Particularly, in sequence-to-sequence learning models, there appears a substantial likelihood of losing semantic knowledge of previous review sequences, as overridden by the next. Another issue lies in effectively integrating global-level and topical-level representations to extract informative content and enhance user-item representations. Existing methods struggle to maintain contextual consistency during this integration process, resulting in suboptimal representation learning, especially attempting to capture finer details. To address these issues, we propose a novel recommendation model called Deep Attention Dynamic Representation Learning (DADRL). Specifically, we employ Latent Dirichlet Allocation and dynamic modulator-based Long Short-Term Memory to extract topical and dynamic global representations. Then, we introduce an attentional fusion methodology to integrate these representations in a contextually consistent manner and construct informative attentional user-item representations. We use these representations into the factorization machines layer to predict the final scores. Experimental results on Amazon categories, Yelp, and LibraryThing show that our model exhibits superior performance compared to several state-of-the-arts. We further examine the DADRL architecture under various conditions to provide insights on the model's employed components.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100521"},"PeriodicalIF":3.5,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143681952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Complex data in tourism analysis: A stochastic approach to price competition 旅游分析中的复杂数据：价格竞争的随机方法

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-03-13 DOI: 10.1016/j.bdr.2025.100520

Giovanni Angelini , Michele Costa , Andrea Guizzardi

{"title":"Complex data in tourism analysis: A stochastic approach to price competition","authors":"Giovanni Angelini , Michele Costa , Andrea Guizzardi","doi":"10.1016/j.bdr.2025.100520","DOIUrl":"10.1016/j.bdr.2025.100520","url":null,"abstract":"<div><div>This study examines pricing strategies and decision-making processes in the hospitality industry by analyzing “ask” prices on online travel agencies (i.e., the rates at which hoteliers are willing to sell their rooms). We face the challenge of modeling a continuous flow of big data organized as “time series of time series,” where daily seasonality and advance bookings intersect. Our research combines insights from tourism, quantitative methods, and big data to improve pricing strategies, contributing to both theory and practice in revenue management. Focusing on Venice, we analyze price competition as a multivariate stochastic process using a Structural Vector Autoregressive (SVAR) approach, aligning with modern dynamic pricing algorithms.</div><div>The findings show that time-based pricing strategies, which adjust based on the day of arrival and booking, are more important than room features in setting hotel prices. We also find that price changes have a non-linear and decreasing effect as the booking date approaches. These insights suggest that hotels could create more advanced pricing strategies, and policymakers should consider these factors when addressing the challenges related to overtourism.</div><div>We study the complex competitive relationships among heterogeneous service providers with an approach applicable to any market where consumption is delayed relative to purchase time. However, we highlight that the quality and accessibility of information in the tourism sector are key aspects to be considered when using big data in this industry.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100520"},"PeriodicalIF":3.5,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143641669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ImDMI: Improved Distributed M-Invariance model to achieve privacy continuous big data publishing using Apache Spark ImDMI：改进的分布式m -不变性模型，使用Apache Spark实现隐私连续大数据发布

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-03-07 DOI: 10.1016/j.bdr.2025.100519

Salheddine Kabou , Laid Gasmi , Abdelbaset Kabou , Sidi Mohammed Benslimane

{"title":"ImDMI: Improved Distributed M-Invariance model to achieve privacy continuous big data publishing using Apache Spark","authors":"Salheddine Kabou , Laid Gasmi , Abdelbaset Kabou , Sidi Mohammed Benslimane","doi":"10.1016/j.bdr.2025.100519","DOIUrl":"10.1016/j.bdr.2025.100519","url":null,"abstract":"<div><div>One of the critical challenges in the big data analytics is the individual's privacy issues. Data anonymization models including k-anonymity and l-diversity are used to guarantee the tradeoff between privacy and data utility while publishing the data. However, these models focus only on the single release of datasets and produce a certain level of privacy. In practical big data applications, data publishing is more complicated where the data is published continuously as new data is collected, and the privacy should be achieved for different releases. In this research, we propose a new distributed bottom up approach on Apache Spark for achievement of the m-invariance privacy model in the continuous big data context. The proposed approach, which is the first study that deals with dynamic big data publishing, is based on the insertion and the split process. In the first process, the data records collected from different workers are inserted into an improved bottom up R-tree generalization in order to minimizing the information loss. The second process concentrates on splitting the overflowed node with respect to the m-invariance model requirement by minimizing the overlap between the resulting partitions. The experimental results show significant improvement in term of data utility, execution time and counterfeit data records as compared to existing techniques in the literature.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100519"},"PeriodicalIF":3.5,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143609162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting option prices: From the Black-Scholes model to machine learning methods 预测期权价格：从布莱克-斯科尔斯模型到机器学习方法

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-02-26 DOI: 10.1016/j.bdr.2025.100518

Angela Maria D'Uggento, Marta Biancardi, Domenico Ciriello

{"title":"Predicting option prices: From the Black-Scholes model to machine learning methods","authors":"Angela Maria D'Uggento, Marta Biancardi, Domenico Ciriello","doi":"10.1016/j.bdr.2025.100518","DOIUrl":"10.1016/j.bdr.2025.100518","url":null,"abstract":"<div><div>In the ever-changing landscape of financial markets, accurate option pricing remains critical for investors, traders and financial institutions. Traditionally, the Black-Scholes (B&S) model has been the cornerstone for option pricing, providing a solid framework based on mathematical and physical principles. Nevertheless, the B&S model has some limitations, such as the restriction to European options, the absence of dividends, constant volatility, etc. Studies and academic literature on the application of machine learning models in the financial sector are rapidly increasing. The main objective of this paper is to provide a comprehensive comparative analysis between the traditional B&S model and the most commonly used machine learning algorithms such as Artificial Neural Networks (ANNs). The rationale is twofold. First, to examine the assumptions of the B&S model, such as constant volatility and a perfectly efficient market, in light of the complexity of the real world, even though it is recognized that the model has been known as a pillar for decades. Secondly, to emphasize that the proliferation of big data and advances in computing power have fuelled the rise of machine learning techniques in finance. These algorithms have remarkable capabilities in discovering non-linear patterns and extracting information from large data sets, providing a compelling alternative to traditional quantitative methods. Machine learning offers a new way to capture and model such complex financial dynamics, which can lead to more accurate pricing models. By comparing the B&S model and some machine learning approaches, this paper aims to shed light on their respective strengths, weaknesses and applicability in the context of options pricing using real data. Through rigorous empirical analyses and performance metrics, our results demonstrate the importance of using machine learning techniques that can outperform or complement the established B&S model in predicting option prices by achieving higher prediction accuracy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100518"},"PeriodicalIF":3.5,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling meaningful volatility events to classify monetary policy announcements 建立有意义的波动事件模型，对货币政策公告进行分类

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-02-26 DOI: 10.1016/j.bdr.2025.100517

Giampiero M. Gallo , Demetrio Lacava , Edoardo Otranto

引用次数: 0

Efficient training: Federated learning cost analysis 高效训练：联邦学习成本分析

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-02-20 DOI: 10.1016/j.bdr.2025.100510

Rafael Teixeira , Leonardo Almeida , Mário Antunes , Diogo Gomes , Rui L. Aguiar

引用次数: 0

Improved Tesseract optical character recognition performance on Thai document datasets 改进泰语文档数据集上的Tesseract光学字符识别性能

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-02-08 DOI: 10.1016/j.bdr.2025.100508

Noppol Anakpluek, Watcharakorn Pasanta, Latthawan Chantharasukha, Pattanawong Chokratansombat, Pajaya Kanjanakaew, Thitirat Siriborvornratanakul

{"title":"Improved Tesseract optical character recognition performance on Thai document datasets","authors":"Noppol Anakpluek, Watcharakorn Pasanta, Latthawan Chantharasukha, Pattanawong Chokratansombat, Pajaya Kanjanakaew, Thitirat Siriborvornratanakul","doi":"10.1016/j.bdr.2025.100508","DOIUrl":"10.1016/j.bdr.2025.100508","url":null,"abstract":"<div><div>This research aims to improve the accuracy and efficiency of Optical Character Recognition (OCR) technology for the Thai language, specifically in the context of Thai government documents. OCR enables the conversion of text from images into machine-readable format, facilitating document storage and further processing. However, applying OCR to the Thai language presents unique challenges due to its complexity. This study focuses on enhancing the performance of the Tesseract OCR engine, a widely used free OCR technology, by implementing various image preprocessing techniques such as masking, adaptive thresholds, median filtering, Canny edge detection, and morphological operators. A dataset of Thai documents is utilized, and the OCR system's output is evaluated using word error rate (WER) and character error rate (CER) metrics. To improve text extraction accuracy, the research employs the original U-Net architecture [<span><span>19</span></span>] for image segmentation. Furthermore, the Tesseract OCR engine is finetuned, and image preprocessing is performed to optimize OCR system accuracy. The developed tools automate workflow processes, alleviate constraints on model training, and enable the effective utilization of information from official Thai documents for various purposes.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100508"},"PeriodicalIF":3.5,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel approach for job matching and skill recommendation using transformers and the O*NET database 一种利用变压器和O*NET数据库进行工作匹配和技能推荐的新方法

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-02-07 DOI: 10.1016/j.bdr.2025.100509

Rubén Alonso , Danilo Dessí , Antonello Meloni , Diego Reforgiato Recupero

{"title":"A novel approach for job matching and skill recommendation using transformers and the O*NET database","authors":"Rubén Alonso , Danilo Dessí , Antonello Meloni , Diego Reforgiato Recupero","doi":"10.1016/j.bdr.2025.100509","DOIUrl":"10.1016/j.bdr.2025.100509","url":null,"abstract":"<div><div>Today we have tons of information posted on the web every day regarding job supply and demand which has heavily affected the job market. The online enrolling process has thus become efficient for applicants as it allows them to present their resumes using the Internet and, as such, simultaneously to numerous organizations. Online systems such as Monster.com, OfferZen, and LinkedIn contain millions of job offers and resumes of potential candidates leaving to companies with the hard task to face an enormous amount of data to manage to select the most suitable applicant. The task of assessing the resumes of candidates and providing automatic recommendations on which one suits a particular position best has, therefore, become essential to speed up the hiring process. Similarly, it is important to help applicants to quickly find a job appropriate to their skills and provide recommendations about what they need to master to become eligible for certain jobs. Our approach lies in this context and proposes a new method to identify skills from candidates' resumes and match resumes with job descriptions. We employed the O*NET database entities related to different skills and abilities required by different jobs; moreover, we leveraged deep learning technologies to compute the semantic similarity between O*NET entities and part of text extracted from candidates' resumes. The ultimate goal is to identify the most suitable job for a certain resume according to the information there contained. We have defined two scenarios: i) given a resume, identify the top O*NET occupations with the highest match with the resume, ii) given a candidate's resume and a set of job descriptions, identify which one of the input jobs is the most suitable for the candidate. The evaluation that has been carried out indicates that the proposed approach outperforms the baselines in the two scenarios. Finally, we provide a use case for candidates where it is possible to recommend courses with the goal to fill certain skills and make them qualified for a certain job.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100509"},"PeriodicalIF":3.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0