Big Data Research最新文献

筛选
英文 中文
Predicting option prices: From the Black-Scholes model to machine learning methods
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-26 DOI: 10.1016/j.bdr.2025.100518
Angela Maria D'Uggento, Marta Biancardi, Domenico Ciriello
{"title":"Predicting option prices: From the Black-Scholes model to machine learning methods","authors":"Angela Maria D'Uggento,&nbsp;Marta Biancardi,&nbsp;Domenico Ciriello","doi":"10.1016/j.bdr.2025.100518","DOIUrl":"10.1016/j.bdr.2025.100518","url":null,"abstract":"<div><div>In the ever-changing landscape of financial markets, accurate option pricing remains critical for investors, traders and financial institutions. Traditionally, the Black-Scholes (B&amp;S) model has been the cornerstone for option pricing, providing a solid framework based on mathematical and physical principles. Nevertheless, the B&amp;S model has some limitations, such as the restriction to European options, the absence of dividends, constant volatility, etc. Studies and academic literature on the application of machine learning models in the financial sector are rapidly increasing. The main objective of this paper is to provide a comprehensive comparative analysis between the traditional B&amp;S model and the most commonly used machine learning algorithms such as Artificial Neural Networks (ANNs). The rationale is twofold. First, to examine the assumptions of the B&amp;S model, such as constant volatility and a perfectly efficient market, in light of the complexity of the real world, even though it is recognized that the model has been known as a pillar for decades. Secondly, to emphasize that the proliferation of big data and advances in computing power have fuelled the rise of machine learning techniques in finance. These algorithms have remarkable capabilities in discovering non-linear patterns and extracting information from large data sets, providing a compelling alternative to traditional quantitative methods. Machine learning offers a new way to capture and model such complex financial dynamics, which can lead to more accurate pricing models. By comparing the B&amp;S model and some machine learning approaches, this paper aims to shed light on their respective strengths, weaknesses and applicability in the context of options pricing using real data. Through rigorous empirical analyses and performance metrics, our results demonstrate the importance of using machine learning techniques that can outperform or complement the established B&amp;S model in predicting option prices by achieving higher prediction accuracy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100518"},"PeriodicalIF":3.5,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling meaningful volatility events to classify monetary policy announcements
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-26 DOI: 10.1016/j.bdr.2025.100517
Giampiero M. Gallo , Demetrio Lacava , Edoardo Otranto
{"title":"Modeling meaningful volatility events to classify monetary policy announcements","authors":"Giampiero M. Gallo ,&nbsp;Demetrio Lacava ,&nbsp;Edoardo Otranto","doi":"10.1016/j.bdr.2025.100517","DOIUrl":"10.1016/j.bdr.2025.100517","url":null,"abstract":"<div><div>Central Bank monetary policy interventions frequently have direct implications for financial market volatility. In this paper, we introduce an intradaily Asymmetric Multiplicative Error Model with Meaningful Volatility (MV) events (AMEM-MV), which decomposes realized variance into a base component and an MV component. A novel model-based classification of monetary announcements is developed based on their impact on the MV component of the variance. By focusing on the 30-minute window following each Federal Reserve communication, we isolate the specific impact of monetary announcements on the volatility of seven US tickers.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100517"},"PeriodicalIF":3.5,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient training: Federated learning cost analysis
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-20 DOI: 10.1016/j.bdr.2025.100510
Rafael Teixeira , Leonardo Almeida , Mário Antunes , Diogo Gomes , Rui L. Aguiar
{"title":"Efficient training: Federated learning cost analysis","authors":"Rafael Teixeira ,&nbsp;Leonardo Almeida ,&nbsp;Mário Antunes ,&nbsp;Diogo Gomes ,&nbsp;Rui L. Aguiar","doi":"10.1016/j.bdr.2025.100510","DOIUrl":"10.1016/j.bdr.2025.100510","url":null,"abstract":"<div><div>With the rapid development of 6G, Artificial Intelligence (AI) is expected to play a pivotal role in network management, resource optimization, and intrusion detection. However, deploying AI models in 6G networks faces several challenges, such as the lack of dedicated hardware for AI tasks and the need to protect user privacy. To address these challenges, Federated Learning (FL) emerges as a promising solution for distributed AI training without the need to move data from users' devices. This paper investigates the performance and costs of different FL approaches regarding training time, communication overhead, and energy consumption. The results show that FL can significantly accelerate the training process while reducing the data transferred across the network. However, the effectiveness of FL depends on the specific FL approach and the network conditions.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"40 ","pages":"Article 100510"},"PeriodicalIF":3.5,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Tesseract optical character recognition performance on Thai document datasets
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-08 DOI: 10.1016/j.bdr.2025.100508
Noppol Anakpluek, Watcharakorn Pasanta, Latthawan Chantharasukha, Pattanawong Chokratansombat, Pajaya Kanjanakaew, Thitirat Siriborvornratanakul
{"title":"Improved Tesseract optical character recognition performance on Thai document datasets","authors":"Noppol Anakpluek,&nbsp;Watcharakorn Pasanta,&nbsp;Latthawan Chantharasukha,&nbsp;Pattanawong Chokratansombat,&nbsp;Pajaya Kanjanakaew,&nbsp;Thitirat Siriborvornratanakul","doi":"10.1016/j.bdr.2025.100508","DOIUrl":"10.1016/j.bdr.2025.100508","url":null,"abstract":"<div><div>This research aims to improve the accuracy and efficiency of Optical Character Recognition (OCR) technology for the Thai language, specifically in the context of Thai government documents. OCR enables the conversion of text from images into machine-readable format, facilitating document storage and further processing. However, applying OCR to the Thai language presents unique challenges due to its complexity. This study focuses on enhancing the performance of the Tesseract OCR engine, a widely used free OCR technology, by implementing various image preprocessing techniques such as masking, adaptive thresholds, median filtering, Canny edge detection, and morphological operators. A dataset of Thai documents is utilized, and the OCR system's output is evaluated using word error rate (WER) and character error rate (CER) metrics. To improve text extraction accuracy, the research employs the original U-Net architecture [<span><span>19</span></span>] for image segmentation. Furthermore, the Tesseract OCR engine is finetuned, and image preprocessing is performed to optimize OCR system accuracy. The developed tools automate workflow processes, alleviate constraints on model training, and enable the effective utilization of information from official Thai documents for various purposes.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100508"},"PeriodicalIF":3.5,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach for job matching and skill recommendation using transformers and the O*NET database
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-02-07 DOI: 10.1016/j.bdr.2025.100509
Rubén Alonso , Danilo Dessí , Antonello Meloni , Diego Reforgiato Recupero
{"title":"A novel approach for job matching and skill recommendation using transformers and the O*NET database","authors":"Rubén Alonso ,&nbsp;Danilo Dessí ,&nbsp;Antonello Meloni ,&nbsp;Diego Reforgiato Recupero","doi":"10.1016/j.bdr.2025.100509","DOIUrl":"10.1016/j.bdr.2025.100509","url":null,"abstract":"<div><div>Today we have tons of information posted on the web every day regarding job supply and demand which has heavily affected the job market. The online enrolling process has thus become efficient for applicants as it allows them to present their resumes using the Internet and, as such, simultaneously to numerous organizations. Online systems such as Monster.com, OfferZen, and LinkedIn contain millions of job offers and resumes of potential candidates leaving to companies with the hard task to face an enormous amount of data to manage to select the most suitable applicant. The task of assessing the resumes of candidates and providing automatic recommendations on which one suits a particular position best has, therefore, become essential to speed up the hiring process. Similarly, it is important to help applicants to quickly find a job appropriate to their skills and provide recommendations about what they need to master to become eligible for certain jobs. Our approach lies in this context and proposes a new method to identify skills from candidates' resumes and match resumes with job descriptions. We employed the O*NET database entities related to different skills and abilities required by different jobs; moreover, we leveraged deep learning technologies to compute the semantic similarity between O*NET entities and part of text extracted from candidates' resumes. The ultimate goal is to identify the most suitable job for a certain resume according to the information there contained. We have defined two scenarios: i) given a resume, identify the top O*NET occupations with the highest match with the resume, ii) given a candidate's resume and a set of job descriptions, identify which one of the input jobs is the most suitable for the candidate. The evaluation that has been carried out indicates that the proposed approach outperforms the baselines in the two scenarios. Finally, we provide a use case for candidates where it is possible to recommend courses with the goal to fill certain skills and make them qualified for a certain job.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100509"},"PeriodicalIF":3.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Has machine paraphrasing skills approached humans? Detecting automatically and manually generated paraphrased cases
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-01-22 DOI: 10.1016/j.bdr.2025.100507
Iqra Muneer , Aysha Shehzadi , Muhammad Adnan Ashraf , Rao Muhammad Adeel Nawab
{"title":"Has machine paraphrasing skills approached humans? Detecting automatically and manually generated paraphrased cases","authors":"Iqra Muneer ,&nbsp;Aysha Shehzadi ,&nbsp;Muhammad Adnan Ashraf ,&nbsp;Rao Muhammad Adeel Nawab","doi":"10.1016/j.bdr.2025.100507","DOIUrl":"10.1016/j.bdr.2025.100507","url":null,"abstract":"<div><div>In recent years, automatic text rewriting (or paraphrasing) tools are readily and publicly available. These tools have enabled text paraphrasing as an exceptionally straightforward approach that encourages trouble-free plagiarism and text reuse. In literature, the majority of efforts have focused on detecting real cases (manual/human paraphrasing) of paraphrasing (mainly in the domain of journalism). However, the problem of paraphrase detection has not been thoroughly explored for artificial cases (machine paraphrased), mainly, due to lack of standard resources for its evaluation. To fulfill this gap, this study proposes three benchmark corpora for artificial cases of paraphrases at sentence level, and one real corpus contains examples from daily life activities. Three popular and widely used automatic text rewriting online tools have been used, i.e., paraphrasing-tools, articlerewritetool and rewritertools, to develop artificial case corpora. Further, we used two real cases corpora, including Microsoft Paraphrase Corpus (MSRP) (from the domain of journalism) and a proposed real corpus which is a combination of carefully extracted Quora question pairs and MSRP (Q-MSRP). Both real case and artificial case paraphrases were evaluated using classical machine learning, transfer learning, Large language models and a proposed model, to investigate which of the two types of paraphrasing is more difficult to detect. The results show that our proposed model outperforms all the other approaches for both artificial and real case paraphrase detection. A thorough analysis of the results suggests that, by far, manual paraphrasing is still harder to detect but certain machine paraphrased texts are equally difficult to detect. All proposed corpora are freely available to promote the research on artificial case paraphrase detection.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100507"},"PeriodicalIF":3.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-granularity enhanced graph convolutional network for aspect sentiment triplet extraction
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2025-01-17 DOI: 10.1016/j.bdr.2025.100506
Mingwei Tang , Kun Yang , Linping Tao , Mingfeng Zhao , Wei Zhou
{"title":"Multi-granularity enhanced graph convolutional network for aspect sentiment triplet extraction","authors":"Mingwei Tang ,&nbsp;Kun Yang ,&nbsp;Linping Tao ,&nbsp;Mingfeng Zhao ,&nbsp;Wei Zhou","doi":"10.1016/j.bdr.2025.100506","DOIUrl":"10.1016/j.bdr.2025.100506","url":null,"abstract":"<div><div>Aspect Sentiment Triple Extraction (ASTE) is an emerging sentiment analysis task, which describes both aspect terms and their sentiment polarity, as well as opinion terms that represent sentiment polarity. Some models have been presented to analyze sentence sentiment more accurately. Nonetheless, previous models have had problems, like inconsistent sentiment predictions for one-to-many, many-to-one, and sequence annotation. In addition, part-of-speech and contextual semantic information are ignored, resulting in the inability to identify complete multi-word aspect terms and opinion terms. To address these problems, we propose a <em>Multi-granularity Enhanced Graph Convolutional Network</em> (MGEGCN) to solve the problem of inaccurate multi-word term recognition. First, we propose a dual-channel enhanced graph convolutional network, which simultaneously analyzes syntactic structure and part-of-speech information and uses the combined effect of the two to enhance the deep semantic information of aspect terms and opinion terms. Second, we also design a multi-scale attention, which combines self-attention with deep separable convolution to enhance attention to aspect terms and opinion terms. In addition, a convolutional decoding strategy is used in the decoding stage to extract triples by directly detecting and classifying the relational regions in the table. In the experimental part, we conduct analysis on two public datasets (ASTE-DATA-v1 and ASTE-DATA-v2) to prove that the model improves the performance of ASTE tasks. In four subsets (14res, 14lap, 15res, and 16res), the F1 scores of the MGEGCN method are 75.65%, 61.62%, 67.62%, 74.12% and 74.69%, 62.10%, 68.18%, 74.00%, respectively.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100506"},"PeriodicalIF":3.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Positional-attention based bidirectional deep stacked AutoEncoder for aspect based sentimental analysis
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2024-12-16 DOI: 10.1016/j.bdr.2024.100505
S. Anjali Devi , M. Sitha Ram , Pulugu Dileep , Sasibhushana Rao Pappu , T. Subha Mastan Rao , Mula Malyadri
{"title":"Positional-attention based bidirectional deep stacked AutoEncoder for aspect based sentimental analysis","authors":"S. Anjali Devi ,&nbsp;M. Sitha Ram ,&nbsp;Pulugu Dileep ,&nbsp;Sasibhushana Rao Pappu ,&nbsp;T. Subha Mastan Rao ,&nbsp;Mula Malyadri","doi":"10.1016/j.bdr.2024.100505","DOIUrl":"10.1016/j.bdr.2024.100505","url":null,"abstract":"<div><div>With the rapid growth of Internet technology and social networks, the generation of text-based information on the web is increased. To ease the Natural Language Processing (NLP) tasks, analyzing the sentiments behind the provided input text is highly important. To effectively analyze the polarities of sentiments (positive, negative and neutral), categorizing the aspects in the text is an essential task. Several existing studies have attempted to accurately classify aspects based on sentiments in text inputs. However, the existing methods attained limited performance because of reduced aspect coverage, inefficiency in handling ambiguous language, inappropriate feature extraction, lack of contextual understanding and overfitting issues. Thus, the proposed study intends to develop an effective word embedding scheme with a novel hybrid deep learning technique for performing aspect-based sentimental analysis in a social media text. Initially, the collected raw input text data are pre-processed to reduce the undesirable data by initiating tokenization, stemming, lemmatization, duplicate removal, stop words removal, empty sets removal and empty rows removal. The required information from the pre-processed text is extracted using three varied word-level embedding methods: Scored-Lexicon based Word2Vec, Glove modelling and Extended Bidirectional Encoder Representation from Transformers (E-BERT). After extracting sufficient features, the aspects are analyzed, and the exact sentimental polarities are classified through a novel Positional-Attention-based Bidirectional Deep Stacked AutoEncoder (PA_BiDSAE) model. In this proposed classification, the BiLSTM network is hybridized with a deep stacked autoencoder (DSAE) model to categorize sentiment. The experimental analysis is done by using Python software, and the proposed model is simulated with three publicly available datasets: SemEval Challenge 2014 (Restaurant), SemEval Challenge 2014 (Laptop) and SemEval Challenge 2015 (Restaurant). The performance analysis proves that the proposed hybrid deep learning model obtains improved classification performance in accuracy, precision, recall, specificity, F1 score and kappa measure.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100505"},"PeriodicalIF":3.5,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal component analysis of multivariate spatial functional data
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2024-12-16 DOI: 10.1016/j.bdr.2024.100504
Idris Si-ahmed , Leila Hamdad , Christelle Judith Agonkoui , Yoba Kande , Sophie Dabo-Niang
{"title":"Principal component analysis of multivariate spatial functional data","authors":"Idris Si-ahmed ,&nbsp;Leila Hamdad ,&nbsp;Christelle Judith Agonkoui ,&nbsp;Yoba Kande ,&nbsp;Sophie Dabo-Niang","doi":"10.1016/j.bdr.2024.100504","DOIUrl":"10.1016/j.bdr.2024.100504","url":null,"abstract":"<div><div>This paper is devoted to the study of dimension reduction techniques for multivariate spatially indexed functional data and defined on different domains. We present a method called Spatial Multivariate Functional Principal Component Analysis (SMFPCA), which performs principal component analysis for multivariate spatial functional data. In contrast to Multivariate Karhunen-Loève approach for independent data, SMFPCA is notably adept at effectively capturing spatial dependencies among multiple functions. SMFPCA applies spectral functional component analysis to multivariate functional spatial data, focusing on data points arranged on a regular grid. The methodological framework and algorithm of SMFPCA have been developed to tackle the challenges arising from the lack of appropriate methods for managing this type of data. The performance of the proposed method has been verified through finite sample properties using simulated datasets and sea-surface temperature dataset. Additionally, we conducted comparative studies of SMFPCA against some existing methods providing valuable insights into the properties of multivariate spatial functional data within a finite sample.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100504"},"PeriodicalIF":3.5,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incomplete data classification via positive approximation based rough subspaces ensemble 通过基于正逼近的粗糙子空间集合进行不完整数据分类
IF 3.5 3区 计算机科学
Big Data Research Pub Date : 2024-11-14 DOI: 10.1016/j.bdr.2024.100496
Yuanting Yan , Meili Yang , Zhong Zheng , Hao Ge , Yiwen Zhang , Yanping Zhang
{"title":"Incomplete data classification via positive approximation based rough subspaces ensemble","authors":"Yuanting Yan ,&nbsp;Meili Yang ,&nbsp;Zhong Zheng ,&nbsp;Hao Ge ,&nbsp;Yiwen Zhang ,&nbsp;Yanping Zhang","doi":"10.1016/j.bdr.2024.100496","DOIUrl":"10.1016/j.bdr.2024.100496","url":null,"abstract":"<div><div>Classifying incomplete data using ensemble techniques is a prevalent method for addressing missing values, where multiple classifiers are trained on diverse subsets of features. However, current ensemble-based methods overlook the redundancy within feature subsets, presenting challenges for training robust prediction models, because the redundant features can hinder the learning of the underlying rules in the data. In this paper, we propose a Reduct-Missing Pattern Fusion (RMPF) method to address the aforementioned limitation. It leverages both the advantages of rough set theory and the effectiveness of missing patterns in classifying incomplete data. RMPF employs a heuristic algorithm to generate a set of positive approximation-based attribute reducts. Subsequently, it integrates the missing patterns with these reducts through a fusion strategy to minimize data redundancy. Finally, the optimized subsets are utilized to train a group of base classifiers, and a selective prediction procedure is applied to produce the ensembled prediction results. Experimental results show that our method is superior to the compared state-of-the-art methods in both performance and robustness. Especially, our method obtains significant superiority in the scenarios of data with high missing rates.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100496"},"PeriodicalIF":3.5,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信