Big Data最新文献_第4页

Special Issue: Big Scientific Data and Machine Learning in Science and Engineering. 特刊：科学与工程中的大科学数据和机器学习。

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-08-01 Epub Date: 2024-07-31 DOI: 10.1089/big.2024.59218.kpa

Farhad Pourkamali-Anaraki

引用次数: 0

A Unified Training Process for Fake News Detection Based on Finetuned Bidirectional Encoder Representation from Transformers Model. 基于变压器模型微调双向编码器表示的假新闻检测统一训练流程

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-08-01 Epub Date: 2023-03-22 DOI: 10.1089/big.2022.0050

Vijay Srinivas Tida, Sonya Hsu, Xiali Hei

{"title":"A Unified Training Process for Fake News Detection Based on Finetuned Bidirectional Encoder Representation from Transformers Model.","authors":"Vijay Srinivas Tida, Sonya Hsu, Xiali Hei","doi":"10.1089/big.2022.0050","DOIUrl":"10.1089/big.2022.0050","url":null,"abstract":"An efficient fake news detector becomes essential as the accessibility of social media platforms increases rapidly. Previous studies mainly focused on designing the models solely based on individual data sets and might suffer from degradable performance. Therefore, developing a robust model for a combined data set with diverse knowledge becomes crucial. However, designing the model with a combined data set requires extensive training time and sequential workload to obtain optimal performance without having some prior knowledge about the model's parameters. The presented study here will help solve these issues by introducing the unified training strategy to have a base structure for the classifier and all hyperparameters from individual models using a pretrained transformer model. The performance of the proposed model is noted using three publicly available data sets, namely ISOT and others from the Kaggle website. The results indicate that the proposed unified training strategy surpassed the existing models such as Random Forests, convolutional neural networks, and long short-term memory, with 97% accuracy and achieved the F1 score of 0.97. Furthermore, there was a significant reduction in training time by almost 1.5 to 1.8 × by removing words lower than three letters from the input samples. We also did extensive performance analysis by varying the number of encoder blocks to build compact models and trained on the combined data set. We justify that reducing encoder blocks resulted in lower performance from the obtained results.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"331-342"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9150389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A New Filter Approach Based on Effective Ranges for Classification of Gene Expression Data. 基于有效范围的基因表达数据分类过滤新方法

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-08-01 Epub Date: 2023-09-04 DOI: 10.1089/big.2022.0086

Derya Turfan, Bulent Altunkaynak, Özgür Yeniay

{"title":"A New Filter Approach Based on Effective Ranges for Classification of Gene Expression Data.","authors":"Derya Turfan, Bulent Altunkaynak, Özgür Yeniay","doi":"10.1089/big.2022.0086","DOIUrl":"10.1089/big.2022.0086","url":null,"abstract":"Over the years, many studies have been carried out to reduce and eliminate the effects of diseases on human health. Gene expression data sets play a critical role in diagnosing and treating diseases. These data sets consist of thousands of genes and a small number of sample sizes. This situation creates the curse of dimensionality and it becomes problematic to analyze such data sets. One of the most effective strategies to solve this problem is feature selection methods. Feature selection is a preprocessing step to improve classification performance by selecting the most relevant and informative features while increasing the accuracy of classification. In this article, we propose a new statistically based filter method for the feature selection approach named Effective Range-based Feature Selection Algorithm (FSAER). As an extension of the previous Effective Range based Gene Selection (ERGS) and Improved Feature Selection based on Effective Range (IFSER) algorithms, our novel method includes the advantages of both methods while taking into account the disjoint area. To illustrate the efficacy of the proposed algorithm, the experiments have been conducted on six benchmark gene expression data sets. The results of the FSAER and the other filter methods have been compared in terms of classification accuracies to demonstrate the effectiveness of the proposed method. For classification methods, support vector machines, naive Bayes classifier, and k-nearest neighbor algorithms have been used.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"312-330"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10211345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Generalized Regularized Extreme Learning Machine Through Gradient-Based Optimizer Model for Self-Cleansing Nondeposition with Clean Bed Mode of Sediment Transport. 基于梯度优化器的混合广义正则化极限学习机模型，用于自清洁非沉积与清洁床模式的沉积物输送。

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-08-01 Epub Date: 2023-03-07 DOI: 10.1089/big.2022.0120

Enes Gul, Mir Jafar Sadegh Safari

{"title":"Hybrid Generalized Regularized Extreme Learning Machine Through Gradient-Based Optimizer Model for Self-Cleansing Nondeposition with Clean Bed Mode of Sediment Transport.","authors":"Enes Gul, Mir Jafar Sadegh Safari","doi":"10.1089/big.2022.0120","DOIUrl":"10.1089/big.2022.0120","url":null,"abstract":"Sediment transport modeling is an important problem to minimize sedimentation in open channels that could lead to unexpected operation expenses. From an engineering perspective, the development of accurate models based on effective variables involved for flow velocity computation could provide a reliable solution in channel design. Furthermore, validity of sediment transport models is linked to the range of data used for the model development. Existing design models were established on the limited data ranges. Thus, the present study aimed to utilize all experimental data available in the literature, including recently published datasets that covered an extensive range of hydraulic properties. Extreme learning machine (ELM) algorithm and generalized regularized extreme learning machine (GRELM) were implemented for the modeling, and then, particle swarm optimization (PSO) and gradient-based optimizer (GBO) were utilized for the hybridization of ELM and GRELM. GRELM-PSO and GRELM-GBO findings were compared to the standalone ELM, GRELM, and existing regression models to determine their accurate computations. The analysis of the models demonstrated the robustness of the models that incorporate channel parameter. The poor results of some existing regression models seem to be linked to the disregarding of the channel parameter. Statistical analysis of the model outcomes illustrated the outperformance of GRELM-GBO in contrast to the ELM, GRELM, GRELM-PSO, and regression models, although GRELM-GBO performed slightly better when compared to the GRELM-PSO counterpart. It was found that the mean accuracy of GRELM-GBO was 18.5% better when compared to the best regression model. The promising findings of the current study not only may encourage the use of recommended algorithms for channel design in practice but also may further the application of novel ELM-based methods in alternative environmental problems.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"282-298"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10861174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Vertical and Horizontal Water Penetration Velocity Modeling in Nonhomogenous Soil Using Fast Multi-Output Relevance Vector Regression. 利用快速多输出相关性矢量回归建立非同质土壤的垂直和水平透水速度模型

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-08-01 Epub Date: 2023-03-14 DOI: 10.1089/big.2022.0125

Babak Vaheddoost, Shervin Rahimzadeh Arashloo, Mir Jafar Sadegh Safari

{"title":"Vertical and Horizontal Water Penetration Velocity Modeling in Nonhomogenous Soil Using Fast Multi-Output Relevance Vector Regression.","authors":"Babak Vaheddoost, Shervin Rahimzadeh Arashloo, Mir Jafar Sadegh Safari","doi":"10.1089/big.2022.0125","DOIUrl":"10.1089/big.2022.0125","url":null,"abstract":"A joint determination of horizontal and vertical movement of water through porous medium is addressed in this study through fast multi-output relevance vector regression (FMRVR). To do this, an experimental data set conducted in a sand box with 300 × 300 × 150 mm dimensions made of Plexiglas is used. A random mixture of sand having size of 0.5-1 mm is used to simulate the porous medium. Within the experiments, 2, 3, 7, and 12 cm walls are used together with different injection locations as 130.7, 91.3, and 51.8 mm measured from the cutoff wall at the upstream. Then, the Cartesian coordinated of the tracer, time interval, length of the wall in each setup, and two dummy variables for determination of the initial point are considered as independent variables for joint estimation of horizontal and vertical velocity of water movement in the porous medium. Alternatively, the multi-linear regression, random forest, and the support vector regression approaches are used to alternate the results obtained by the FMRVR method. It was concluded that the FMRVR outperforms the other models, while the uncertainty in estimation of horizontal penetration is larger than the vertical one.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"299-311"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9105192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Kriging, Polynomial Chaos Expansion, and Low-Rank Approximations in Material Science and Big Data Analytics. 材料科学和大数据分析中的克里金法、多项式混沌展开和低域近似。

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-08-01 Epub Date: 2023-04-24 DOI: 10.1089/big.2022.0124

Golsa Mahdavi, Mohammad Amin Hariri-Ardebili

引用次数: 0

Research on the Influence of Information Iterative Propagation on Complex Network Structure. 信息迭代传播对复杂网络结构的影响研究。

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-07-27 DOI: 10.1089/big.2023.0016

Yinuo Qian, Fuzhong Nian, Zheming Wang, Yabing Yao

{"title":"Research on the Influence of Information Iterative Propagation on Complex Network Structure.","authors":"Yinuo Qian, Fuzhong Nian, Zheming Wang, Yabing Yao","doi":"10.1089/big.2023.0016","DOIUrl":"https://doi.org/10.1089/big.2023.0016","url":null,"abstract":"Dynamic propagation will affect the change of network structure. Different networks are affected by the iterative propagation of information to different degrees. The iterative propagation of information in the network changes the connection strength of the chain edge between nodes. Most studies on temporal networks build networks based on time characteristics, and the iterative propagation of information in the network can also reflect the time characteristics of network evolution. The change of network structure is a macromanifestation of time characteristics, whereas the dynamics in the network is a micromanifestation of time characteristics. How to concretely visualize the change of network structure influenced by the characteristics of propagation dynamics has become the focus of this article. The appearance of chain edge is the micro change of network structure, and the division of community is the macro change of network structure. Based on this, the node participation is proposed to quantify the influence of different users on the information propagation in the network, and it is simulated in different types of networks. By analyzing the iterative propagation of information, the weighted network of different networks based on the iterative propagation of information is constructed. Finally, the chain edge and community division in the network are analyzed to achieve the purpose of quantifying the influence of network propagation on complex network structure.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141789804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Fast Survival Support Vector Regression Approach to Large Scale Credit Scoring via Safe Screening. 通过安全筛选进行大规模信用评分的快速生存支持向量回归方法。

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-07-23 DOI: 10.1089/big.2023.0033

Hong Wang, Ling Hong

引用次数: 0

Content-Aware Human Mobility Pattern Extraction. 内容感知的人类移动模式提取。

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-07-10 DOI: 10.1089/big.2022.0281

Shengwen Li, Chaofan Fan, Tianci Li, Renyao Chen, Qingyuan Liu, Junfang Gong

{"title":"Content-Aware Human Mobility Pattern Extraction.","authors":"Shengwen Li, Chaofan Fan, Tianci Li, Renyao Chen, Qingyuan Liu, Junfang Gong","doi":"10.1089/big.2022.0281","DOIUrl":"https://doi.org/10.1089/big.2022.0281","url":null,"abstract":"Extracting meaningful patterns of human mobility from accumulating trajectories is essential for understanding human behavior. However, previous works identify human mobility patterns based on the spatial co-occurrence of trajectories, which ignores the effect of activity content, leaving challenges in effectively extracting and understanding patterns. To bridge this gap, this study incorporates the activity content of trajectories to extract human mobility patterns, and proposes acontent-aware mobility pattern model. The model first embeds the activity content in distributed continuous vector space by taking point-of-interest as an agent and then extracts representative and interpretable mobility patterns from human trajectory sets using a derived topic model. To investigate the performance of the proposed model, several evaluation metrics are developed, including pattern coherence, pattern similarity, and manual scoring. A real-world case study is conducted, and its experimental results show that the proposed model improves interpretability and helps to understand mobility patterns. This study provides not only a novel solution and several evaluation metrics for human mobility patterns but also a method reference for fusing content semantics of human activities for trajectory analysis and mining.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Preemptive Epidemic Information Transmission Model Using Nonreplication Edge Node Connectivity in Health Care Networks. 在医疗网络中使用无复制边缘节点连接的抢先式流行病信息传输模型

IF 2.6 4区计算机科学

Big Data Pub Date : 2024-04-01 Epub Date: 2023-04-19 DOI: 10.1089/big.2022.0278

Chandu Thota, Constandinos X Mavromoustakis, George Mastorakis

{"title":"Preemptive Epidemic Information Transmission Model Using Nonreplication Edge Node Connectivity in Health Care Networks.","authors":"Chandu Thota, Constandinos X Mavromoustakis, George Mastorakis","doi":"10.1089/big.2022.0278","DOIUrl":"10.1089/big.2022.0278","url":null,"abstract":"The reliability in medical data organization and transmission is eased with the inheritance of information and communication technologies in recent years. The growth of digital communication and sharing medium imposes the necessity for optimizing the accessibility and transmission of sensitive medical data to the end-users. In this article, the Preemptive Information Transmission Model (PITM) is introduced for improving the promptness in medical data delivery. This transmission model is designed to acquire the least communication in an epidemic region for seamless information availability. The proposed model makes use of a noncyclic connection procedure and preemptive forwarding inside and outside the epidemic region. The first is responsible for replication-less connection maximization ensuring better availability of the edge nodes. The connection replications are reduced using the pruning tree classifiers based on the communication time and delivery balancing factor. The later process is responsible for the reliable forwarding of the acquired data using a conditional selection of the infrastructure units. Both the processes of PITM are accountable for improving the delivery of observed medical data, over better transmissions, communication time, and achieving fewer delays.","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"141-154"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9440721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0