Big Data Research最新文献_第5页

Has machine paraphrasing skills approached humans? Detecting automatically and manually generated paraphrased cases 机器的释义能力已经接近人类了吗？检测自动和手动生成的释义案例

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-01-22 DOI: 10.1016/j.bdr.2025.100507

Iqra Muneer , Aysha Shehzadi , Muhammad Adnan Ashraf , Rao Muhammad Adeel Nawab

{"title":"Has machine paraphrasing skills approached humans? Detecting automatically and manually generated paraphrased cases","authors":"Iqra Muneer , Aysha Shehzadi , Muhammad Adnan Ashraf , Rao Muhammad Adeel Nawab","doi":"10.1016/j.bdr.2025.100507","DOIUrl":"10.1016/j.bdr.2025.100507","url":null,"abstract":"<div><div>In recent years, automatic text rewriting (or paraphrasing) tools are readily and publicly available. These tools have enabled text paraphrasing as an exceptionally straightforward approach that encourages trouble-free plagiarism and text reuse. In literature, the majority of efforts have focused on detecting real cases (manual/human paraphrasing) of paraphrasing (mainly in the domain of journalism). However, the problem of paraphrase detection has not been thoroughly explored for artificial cases (machine paraphrased), mainly, due to lack of standard resources for its evaluation. To fulfill this gap, this study proposes three benchmark corpora for artificial cases of paraphrases at sentence level, and one real corpus contains examples from daily life activities. Three popular and widely used automatic text rewriting online tools have been used, i.e., paraphrasing-tools, articlerewritetool and rewritertools, to develop artificial case corpora. Further, we used two real cases corpora, including Microsoft Paraphrase Corpus (MSRP) (from the domain of journalism) and a proposed real corpus which is a combination of carefully extracted Quora question pairs and MSRP (Q-MSRP). Both real case and artificial case paraphrases were evaluated using classical machine learning, transfer learning, Large language models and a proposed model, to investigate which of the two types of paraphrasing is more difficult to detect. The results show that our proposed model outperforms all the other approaches for both artificial and real case paraphrase detection. A thorough analysis of the results suggests that, by far, manual paraphrasing is still harder to detect but certain machine paraphrased texts are equally difficult to detect. All proposed corpora are freely available to promote the research on artificial case paraphrase detection.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100507"},"PeriodicalIF":3.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-granularity enhanced graph convolutional network for aspect sentiment triplet extraction 面向方面情感三元组提取的多粒度增强图卷积网络

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2025-01-17 DOI: 10.1016/j.bdr.2025.100506

Mingwei Tang , Kun Yang , Linping Tao , Mingfeng Zhao , Wei Zhou

{"title":"Multi-granularity enhanced graph convolutional network for aspect sentiment triplet extraction","authors":"Mingwei Tang , Kun Yang , Linping Tao , Mingfeng Zhao , Wei Zhou","doi":"10.1016/j.bdr.2025.100506","DOIUrl":"10.1016/j.bdr.2025.100506","url":null,"abstract":"<div><div>Aspect Sentiment Triple Extraction (ASTE) is an emerging sentiment analysis task, which describes both aspect terms and their sentiment polarity, as well as opinion terms that represent sentiment polarity. Some models have been presented to analyze sentence sentiment more accurately. Nonetheless, previous models have had problems, like inconsistent sentiment predictions for one-to-many, many-to-one, and sequence annotation. In addition, part-of-speech and contextual semantic information are ignored, resulting in the inability to identify complete multi-word aspect terms and opinion terms. To address these problems, we propose a <em>Multi-granularity Enhanced Graph Convolutional Network</em> (MGEGCN) to solve the problem of inaccurate multi-word term recognition. First, we propose a dual-channel enhanced graph convolutional network, which simultaneously analyzes syntactic structure and part-of-speech information and uses the combined effect of the two to enhance the deep semantic information of aspect terms and opinion terms. Second, we also design a multi-scale attention, which combines self-attention with deep separable convolution to enhance attention to aspect terms and opinion terms. In addition, a convolutional decoding strategy is used in the decoding stage to extract triples by directly detecting and classifying the relational regions in the table. In the experimental part, we conduct analysis on two public datasets (ASTE-DATA-v1 and ASTE-DATA-v2) to prove that the model improves the performance of ASTE tasks. In four subsets (14res, 14lap, 15res, and 16res), the F1 scores of the MGEGCN method are 75.65%, 61.62%, 67.62%, 74.12% and 74.69%, 62.10%, 68.18%, 74.00%, respectively.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100506"},"PeriodicalIF":3.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Positional-attention based bidirectional deep stacked AutoEncoder for aspect based sentimental analysis 基于位置注意力的双向深度堆叠自编码器，用于面向情感分析

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2024-12-16 DOI: 10.1016/j.bdr.2024.100505

S. Anjali Devi , M. Sitha Ram , Pulugu Dileep , Sasibhushana Rao Pappu , T. Subha Mastan Rao , Mula Malyadri

{"title":"Positional-attention based bidirectional deep stacked AutoEncoder for aspect based sentimental analysis","authors":"S. Anjali Devi , M. Sitha Ram , Pulugu Dileep , Sasibhushana Rao Pappu , T. Subha Mastan Rao , Mula Malyadri","doi":"10.1016/j.bdr.2024.100505","DOIUrl":"10.1016/j.bdr.2024.100505","url":null,"abstract":"<div><div>With the rapid growth of Internet technology and social networks, the generation of text-based information on the web is increased. To ease the Natural Language Processing (NLP) tasks, analyzing the sentiments behind the provided input text is highly important. To effectively analyze the polarities of sentiments (positive, negative and neutral), categorizing the aspects in the text is an essential task. Several existing studies have attempted to accurately classify aspects based on sentiments in text inputs. However, the existing methods attained limited performance because of reduced aspect coverage, inefficiency in handling ambiguous language, inappropriate feature extraction, lack of contextual understanding and overfitting issues. Thus, the proposed study intends to develop an effective word embedding scheme with a novel hybrid deep learning technique for performing aspect-based sentimental analysis in a social media text. Initially, the collected raw input text data are pre-processed to reduce the undesirable data by initiating tokenization, stemming, lemmatization, duplicate removal, stop words removal, empty sets removal and empty rows removal. The required information from the pre-processed text is extracted using three varied word-level embedding methods: Scored-Lexicon based Word2Vec, Glove modelling and Extended Bidirectional Encoder Representation from Transformers (E-BERT). After extracting sufficient features, the aspects are analyzed, and the exact sentimental polarities are classified through a novel Positional-Attention-based Bidirectional Deep Stacked AutoEncoder (PA_BiDSAE) model. In this proposed classification, the BiLSTM network is hybridized with a deep stacked autoencoder (DSAE) model to categorize sentiment. The experimental analysis is done by using Python software, and the proposed model is simulated with three publicly available datasets: SemEval Challenge 2014 (Restaurant), SemEval Challenge 2014 (Laptop) and SemEval Challenge 2015 (Restaurant). The performance analysis proves that the proposed hybrid deep learning model obtains improved classification performance in accuracy, precision, recall, specificity, F1 score and kappa measure.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100505"},"PeriodicalIF":3.5,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Principal component analysis of multivariate spatial functional data 多元空间函数数据的主成分分析

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2024-12-16 DOI: 10.1016/j.bdr.2024.100504

Idris Si-ahmed , Leila Hamdad , Christelle Judith Agonkoui , Yoba Kande , Sophie Dabo-Niang

{"title":"Principal component analysis of multivariate spatial functional data","authors":"Idris Si-ahmed , Leila Hamdad , Christelle Judith Agonkoui , Yoba Kande , Sophie Dabo-Niang","doi":"10.1016/j.bdr.2024.100504","DOIUrl":"10.1016/j.bdr.2024.100504","url":null,"abstract":"<div><div>This paper is devoted to the study of dimension reduction techniques for multivariate spatially indexed functional data and defined on different domains. We present a method called Spatial Multivariate Functional Principal Component Analysis (SMFPCA), which performs principal component analysis for multivariate spatial functional data. In contrast to Multivariate Karhunen-Loève approach for independent data, SMFPCA is notably adept at effectively capturing spatial dependencies among multiple functions. SMFPCA applies spectral functional component analysis to multivariate functional spatial data, focusing on data points arranged on a regular grid. The methodological framework and algorithm of SMFPCA have been developed to tackle the challenges arising from the lack of appropriate methods for managing this type of data. The performance of the proposed method has been verified through finite sample properties using simulated datasets and sea-surface temperature dataset. Additionally, we conducted comparative studies of SMFPCA against some existing methods providing valuable insights into the properties of multivariate spatial functional data within a finite sample.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"39 ","pages":"Article 100504"},"PeriodicalIF":3.5,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incomplete data classification via positive approximation based rough subspaces ensemble 通过基于正逼近的粗糙子空间集合进行不完整数据分类

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2024-11-14 DOI: 10.1016/j.bdr.2024.100496

Yuanting Yan , Meili Yang , Zhong Zheng , Hao Ge , Yiwen Zhang , Yanping Zhang

{"title":"Incomplete data classification via positive approximation based rough subspaces ensemble","authors":"Yuanting Yan , Meili Yang , Zhong Zheng , Hao Ge , Yiwen Zhang , Yanping Zhang","doi":"10.1016/j.bdr.2024.100496","DOIUrl":"10.1016/j.bdr.2024.100496","url":null,"abstract":"<div><div>Classifying incomplete data using ensemble techniques is a prevalent method for addressing missing values, where multiple classifiers are trained on diverse subsets of features. However, current ensemble-based methods overlook the redundancy within feature subsets, presenting challenges for training robust prediction models, because the redundant features can hinder the learning of the underlying rules in the data. In this paper, we propose a Reduct-Missing Pattern Fusion (RMPF) method to address the aforementioned limitation. It leverages both the advantages of rough set theory and the effectiveness of missing patterns in classifying incomplete data. RMPF employs a heuristic algorithm to generate a set of positive approximation-based attribute reducts. Subsequently, it integrates the missing patterns with these reducts through a fusion strategy to minimize data redundancy. Finally, the optimized subsets are utilized to train a group of base classifiers, and a selective prediction procedure is applied to produce the ensembled prediction results. Experimental results show that our method is superior to the compared state-of-the-art methods in both performance and robustness. Especially, our method obtains significant superiority in the scenarios of data with high missing rates.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100496"},"PeriodicalIF":3.5,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint embedding in hierarchical distance and semantic representation learning for link prediction 分层距离和语义表征学习中的联合嵌入，用于链接预测

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2024-11-13 DOI: 10.1016/j.bdr.2024.100495

Jin Liu, Jianye Chen, Chongfeng Fan, Fengyu Zhou

{"title":"Joint embedding in hierarchical distance and semantic representation learning for link prediction","authors":"Jin Liu, Jianye Chen, Chongfeng Fan, Fengyu Zhou","doi":"10.1016/j.bdr.2024.100495","DOIUrl":"10.1016/j.bdr.2024.100495","url":null,"abstract":"<div><div>The link prediction task aims to predict missing entities or relations in the knowledge graph and is essential for the downstream application. Existing well-known models deal with this task by mainly focusing on representing knowledge graph triplets in the distance space or semantic space. However, they can not fully capture the information of head and tail entities, nor even make good use of hierarchical level information. Thus, in this paper, we propose a novel knowledge graph embedding model for the link prediction task, namely, HIE, which models each triplet (<em>h</em>, <em>r</em>, <em>t</em>) into distance measurement space and semantic measurement space, simultaneously. Moreover, HIE is introduced into hierarchical-aware space to leverage rich hierarchical information of entities and relations for better representation learning. Specifically, we apply distance transformation operation on the head entity in distance space to obtain the tail entity instead of translation-based or rotation-based approaches. Experimental results of HIE on four real-world datasets show that HIE outperforms several existing state-of-the-art knowledge graph embedding methods on the link prediction task and deals with complex relations accurately.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100495"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep semantics-preserving cross-modal hashing 深度语义保全跨模态散列

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2024-11-07 DOI: 10.1016/j.bdr.2024.100494

Zhihui Lai , Xiaomei Fang , Heng Kong

{"title":"Deep semantics-preserving cross-modal hashing","authors":"Zhihui Lai , Xiaomei Fang , Heng Kong","doi":"10.1016/j.bdr.2024.100494","DOIUrl":"10.1016/j.bdr.2024.100494","url":null,"abstract":"<div><div>Cross-modal hashing has been paid widespread attention in recent years due to its outstanding performance in cross-modal data retrieval. Cross-modal hashing can be decomposed into two steps, i.e., the feature learning and the binarization. However, most existing cross-modal hash methods do not take the supervisory information of the data into consideration during binary quantization, and thus often fail to adequately preserve semantic information. To solve these problems, this paper proposes a novel deep cross-modal hashing method called deep semantics-preserving cross-modal hashing (DSCMH), which makes full use of intra and inter-modal semantic information to improve the model's performance. Moreover, by designing a label network for semantic alignment during the binarization process, DSCMH's performance can be further improved. In order to verify the performance of the proposed method, extensive experiments were conducted on four big datasets. The results show that the proposed method is better than most of the existing cross-modal hashing methods. In addition, the ablation experiment shows that the proposed new regularized terms all have positive effects on the model's performances in cross-modal retrieval. The code of this paper can be downloaded from <span><span>http://www.scholat.com/laizhihui</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100494"},"PeriodicalIF":3.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on the characteristics of information propagation dynamic on the weighted multiplex Weibo networks 加权复用微博网络信息传播动态特征研究

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2024-09-27 DOI: 10.1016/j.bdr.2024.100493

Yinuo Qian, Fuzhong Nian

{"title":"Research on the characteristics of information propagation dynamic on the weighted multiplex Weibo networks","authors":"Yinuo Qian, Fuzhong Nian","doi":"10.1016/j.bdr.2024.100493","DOIUrl":"10.1016/j.bdr.2024.100493","url":null,"abstract":"<div><div>In order to simulate the forwarding situation of different categories of Weibo and discover interesting propagation phenomena in different layers of Weibo networks, this paper proposes the retweeting weighted multiplex networks and propagation model coupled with multi-class Weibo. Firstly, the weighted multiplex social network is constructed through the processing of Weibo network data. Secondly, a new information propagation model is established by using the weight and interlayer information of the Weibo multiplex network combined with the coupling factors in the propagation. Finally, the information propagation simulated by the propagation model is compared with the real data, so as to summarize different information propagation phenomena in multiplex social multiplex network. At the same time, by comparing the structure of the forwarding weighted multiplex network constructed by the short time data and the long time data, we find the self-similarity of the forwarding weighted multiplex network, which proves the generalization of the experiment. Through the above research, the mystery of the Weibo social network has been deeply explored, and a new perspective has been opened up for the exploration of social media information propagation.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100493"},"PeriodicalIF":3.5,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging social computing for epidemic surveillance: A case study 利用社交计算进行流行病监测：案例研究

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2024-08-08 DOI: 10.1016/j.bdr.2024.100483

Bilal Tahir , Muhammad Amir Mehmood

{"title":"Leveraging social computing for epidemic surveillance: A case study","authors":"Bilal Tahir , Muhammad Amir Mehmood","doi":"10.1016/j.bdr.2024.100483","DOIUrl":"10.1016/j.bdr.2024.100483","url":null,"abstract":"<div><p>Social media platforms have become a popular source of information for real-time monitoring of events and user behavior. In particular, Twitter provides invaluable information related to diseases and public health to build real-time disease surveillance systems. Effective use of such social media platforms for public health surveillance requires data-driven AI models which are hindered by the difficult, expensive, and time-consuming task of collecting high-quality and large-scale datasets. In this paper, we build and analyze the Epidemic TweetBank (EpiBank) dataset containing 271 million English tweets related to six epidemic-prone diseases COVID19, Flu, Hepatitis, Dengue, Malaria, and HIV/AIDs. For this purpose, we develop a tool of ESS-T (Epidemic Surveillance Study via Twitter) which collects tweets according to provided input parameters and keywords. Also, our tool assigns location to tweets with 95% accuracy value and performs analysis of collected tweets focusing on temporal distribution, spatial patterns, users, entities, sentiment, and misinformation. Leveraging ESS-T, we build two geo-tagged datasets of EpiBank-global and EpiBank-Pak containing 86 million tweets from 190 countries and 2.6 million tweets from Pakistan, respectively. Our spatial analysis of EpiBank-global for COVID19, Malaria, and Dengue indicates that our framework correctly identifies high-risk epidemic-prone countries according to World Health Organization (WHO) statistics.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"38 ","pages":"Article 100483"},"PeriodicalIF":3.5,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anomaly detection based on system text logs of virtual network functions 基于虚拟网络功能系统文本日志的异常检测

IF 3.5 3区计算机科学

Big Data Research Pub Date : 2024-08-02 DOI: 10.1016/j.bdr.2024.100485

Daniela N. Rim , DongNyeong Heo , Chungjun Lee , Sukhyun Nam , Jae-Hyoung Yoo , James Won-Ki Hong , Heeyoul Choi

引用次数: 0