Journal of Big Data最新文献_第5页

New custom rating for improving recommendation system performance 用于提高推荐系统性能的新自定义评级

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-07-02 DOI: 10.1186/s40537-024-00952-3

Tora Fahrudin, Dedy Rahman Wijaya

{"title":"New custom rating for improving recommendation system performance","authors":"Tora Fahrudin, Dedy Rahman Wijaya","doi":"10.1186/s40537-024-00952-3","DOIUrl":"https://doi.org/10.1186/s40537-024-00952-3","url":null,"abstract":"Recommendation system is currently attracting the interest of many explorers. Various new businesses have surfaced with the rise of online marketing (E-Commerce) in response to Covid-19 pandemic. This phenomenon allows recommendation items through a system called Collaborative Filtering (CF), aiming to improve shopping experience of users. Typically, the effectiveness of CF relies on the precise identification of similar profile users by similarity algorithms. Traditional similarity measures are based on the user-item rating matrix. Approximately, four custom ratings (CR) were used along with a new rating formula, termed New Custom Rating (NCR), derived from the popularity of users and items in addition to the original rating. Specifically, NCR optimized recommendation system performance by using the popularity of users and items to determine new ratings value, rather than solely relying on the original rating. Additionally, the formulas improved the representativeness of the new rating values and the accuracy of similarity algorithm calculations. Consequently, the increased accuracy of recommendation system was achieved. The implementation of NCR across four CR algorithms and recommendation system using five public datasets was examined. Consequently, the experimental results showed that NCR significantly increased recommendation system accuracy, as evidenced by reductions in RMSE, MSE, and MAE as well as increasing FCP and Hit Rate. Moreover, by combining the popularity of users and items into rating calculations, NCR improved the accuracy of various recommendation system algorithms reducing RMSE, MSE, and MAE up to 62.10%, 53.62%, 65.97%, respectively, while also increasing FCP and Hit Rate up to 11.89% and 31.42%, respectively.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141520245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimization-based convolutional neural model for the classification of white blood cells 基于优化的卷积神经模型用于白细胞分类

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-06-26 DOI: 10.1186/s40537-024-00949-y

Tulasi Gayatri Devi, Nagamma Patil

{"title":"Optimization-based convolutional neural model for the classification of white blood cells","authors":"Tulasi Gayatri Devi, Nagamma Patil","doi":"10.1186/s40537-024-00949-y","DOIUrl":"https://doi.org/10.1186/s40537-024-00949-y","url":null,"abstract":"White blood cells (WBCs) are one of the most significant parts of the human immune system, and they play a crucial role in diagnosing the characteristics of pathologists and blood-related diseases. The characteristics of WBCs are well-defined based on the morphological behavior of their nuclei, and the number and types of WBCs can often determine the presence of diseases or illnesses. Generally, there are different types of WBCs, and the accurate classification of WBCs helps in proper diagnosis and treatment. Although various classification models were developed in the past, they face issues like less classification accuracy, high error rate, and large execution. Hence, a novel classification strategy named the African Buffalo-based Convolutional Neural Model (ABCNM) is proposed to classify the types of WBCs accurately. The proposed strategy commences with collecting WBC sample databases, which are preprocessed and trained into the system for classification. The preprocessing phase removes the noises and training flaws, which helps improve the dataset's quality and consistency. Further, feature extraction is performed to segment the WBCs, and African Buffalo fitness is updated in the classification layer for the correct classification of WBCs. The proposed framework is modeled in Python, and the experimental analysis depicts that it achieved 99.12% accuracy, 98.16% precision, 99% sensitivity, 99.04% specificity, and 99.02% f-measure. Furthermore, a comparative assessment with the existing techniques validated that the proposed strategy obtained better performances than the conventional models.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"38 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141503493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms 利用机器学习算法减少预测肝细胞癌的特征

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-06-18 DOI: 10.1186/s40537-024-00944-3

Ghada Mostafa, Hamdi Mahmoud, Tarek Abd El-Hafeez, Mohamed E. ElAraby

{"title":"Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms","authors":"Ghada Mostafa, Hamdi Mahmoud, Tarek Abd El-Hafeez, Mohamed E. ElAraby","doi":"10.1186/s40537-024-00944-3","DOIUrl":"https://doi.org/10.1186/s40537-024-00944-3","url":null,"abstract":"Hepatocellular carcinoma (HCC) is a highly prevalent form of liver cancer that necessitates accurate prediction models for early diagnosis and effective treatment. Machine learning algorithms have demonstrated promising results in various medical domains, including cancer prediction. In this study, we propose a comprehensive approach for HCC prediction by comparing the performance of different machine learning algorithms before and after applying feature reduction methods. We employ popular feature reduction techniques, such as weighting features, hidden features correlation, feature selection, and optimized selection, to extract a reduced feature subset that captures the most relevant information related to HCC. Subsequently, we apply multiple algorithms, including Naive Bayes, support vector machines (SVM), Neural Networks, Decision Tree, and K nearest neighbors (KNN), to both the original high-dimensional dataset and the reduced feature set. By comparing the predictive accuracy, precision, F Score, recall, and execution time of each algorithm, we assess the effectiveness of feature reduction in enhancing the performance of HCC prediction models. Our experimental results, obtained using a comprehensive dataset comprising clinical features of HCC patients, demonstrate that feature reduction significantly improves the performance of all examined algorithms. Notably, the reduced feature set consistently outperforms the original high-dimensional dataset in terms of prediction accuracy and execution time. After applying feature reduction techniques, the employed algorithms, namely decision trees, Naive Bayes, KNN, neural networks, and SVM achieved accuracies of 96%, 97.33%, 94.67%, 96%, and 96.00%, respectively.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"22 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141503495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advanced RIME architecture for global optimization and feature selection 用于全局优化和特征选择的先进 RIME 架构

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-06-18 DOI: 10.1186/s40537-024-00931-8

Ruba Abu Khurma, Malik Braik, Abdullah Alzaqebah, Krishna Gopal Dhal, Robertas Damaševičius, Bilal Abu-Salih

引用次数: 0

PoLYTC: a novel BERT-based classifier to detect political leaning of YouTube videos based on their titles PoLYTC：基于 BERT 的新型分类器，根据标题检测 YouTube 视频的政治倾向性

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-06-05 DOI: 10.1186/s40537-024-00946-1

Nouar AlDahoul, Talal Rahwan, Yasir Zaki

{"title":"PoLYTC: a novel BERT-based classifier to detect political leaning of YouTube videos based on their titles","authors":"Nouar AlDahoul, Talal Rahwan, Yasir Zaki","doi":"10.1186/s40537-024-00946-1","DOIUrl":"https://doi.org/10.1186/s40537-024-00946-1","url":null,"abstract":"Over two-thirds of the U.S. population uses YouTube, and a quarter of U.S. adults regularly receive their news from it. Despite the massive political content available on the platform, to date, no classifier has been proposed to classify the political leaning of YouTube videos. The only exception is a classifier that requires extensive information about each video (rather than just the title) and classifies the videos into just three classes (rather than the widely-used categorization into six classes). To fill this gap, “PoLYTC” (Political Leaning YouTube Classifier) is proposed to classify YouTube videos based on their titles into six political classes. PoLYTC utilizes a large language model, namely BERT, and is fine-tuned on a public dataset of 11.5 million YouTube videos. Experiments reveal that the proposed solution achieves high accuracy (75%) and high F1-score (77%), thereby outperforming the state of the art. To further validate the solution’s classification performance, several videos were collected from numerous prominent news agencies’ YouTube channels, such as Fox News and The New York Times, which have widely known political leanings. These videos were classified based on their titles, and the results have shown that, in the vast majority of cases, the predicted political leaning matches that of the news agency. PoLYTC can help YouTube users make informed decisions about which videos to watch and can help researchers analyze the political content on YouTube.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"74 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141520246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GB-AFS: graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette GB-AFS：基于图的自动特征选择，通过平均简化剪影实现多类分类

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-05-31 DOI: 10.1186/s40537-024-00934-5

David Levin, Gonen Singer

{"title":"GB-AFS: graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette","authors":"David Levin, Gonen Singer","doi":"10.1186/s40537-024-00934-5","DOIUrl":"https://doi.org/10.1186/s40537-024-00934-5","url":null,"abstract":"This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS) for multi-class classification tasks. The method determines the minimum combination of features required to sustain prediction performance while maintaining complementary discriminating abilities between different classes. It does not require any user-defined parameters such as the number of features to select. The minimum number of features is selected using our newly developed Mean Simplified Silhouette (abbreviated as MSS) index, designed to evaluate the clustering results for the feature selection task. To illustrate the effectiveness and generality of the method, we applied the GB-AFS method using various combinations of statistical measures and dimensionality reduction techniques. The experimental results demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches, and demonstrate that the GB-AFS method is independent of the statistical measure or the dimensionality reduction technique chosen by the user. Moreover, the proposed method maintained the accuracy achieved when utilizing all features while using only 7–(30%) of the original features. This resulted in an average time saving ranging from (15%) for the smallest dataset to (70%) for the largest. Our code is available at https://github.com/davidlevinwork/gbfs/.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"117 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141192655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integration of feature enhancement technique in Google inception network for breast cancer detection and classification 在谷歌感知网络中整合特征增强技术，用于乳腺癌检测和分类

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-05-28 DOI: 10.1186/s40537-024-00936-3

Wasyihun Sema Admass, Yirga Yayeh Munaye, Ayodeji Olalekan Salau

{"title":"Integration of feature enhancement technique in Google inception network for breast cancer detection and classification","authors":"Wasyihun Sema Admass, Yirga Yayeh Munaye, Ayodeji Olalekan Salau","doi":"10.1186/s40537-024-00936-3","DOIUrl":"https://doi.org/10.1186/s40537-024-00936-3","url":null,"abstract":"Breast cancer is a major public health concern, and early detection and classification are essential for improving patient outcomes. However, breast tumors can be difficult to distinguish from benign tumors, leading to high false positive rates in screening. The reason is that both benign and malignant tumors have no consistent shape, are found at the same position, have variable sizes, and have high correlations. The ambiguity of the correlation challenges the computer-aided system, and the inconsistency of morphology challenges an expert in identifying and classifying what is positive and what is negative. Due to this, most of the time, breast cancer screen is prone to false positive rates. This research paper presents the introduction of a feature enhancement method into the Google inception network for breast cancer detection and classification. The proposed model preserves both local and global information, which is important for addressing the variability of breast tumor morphology and their complex correlations. A locally preserving projection transformation function is introduced to retain local information that might be lost in the intermediate output of the inception model. Additionally, transfer learning is used to improve the performance of the proposed model on limited datasets. The proposed model is evaluated on a dataset of ultrasound images and achieves an accuracy of 99.81%, recall of 96.48%, and sensitivity of 93.0%. These results demonstrate the effectiveness of the proposed method for breast cancer detection and classification.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"29 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141170319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques 结合数据缩减和条件计算技术，高效实现垂直联合学习

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-05-28 DOI: 10.1186/s40537-024-00933-6

Francesco Folino, Gianluigi Folino, Francesco Sergio Pisani, Luigi Pontieri, Pietro Sabatino

{"title":"Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques","authors":"Francesco Folino, Gianluigi Folino, Francesco Sergio Pisani, Luigi Pontieri, Pietro Sabatino","doi":"10.1186/s40537-024-00933-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00933-6","url":null,"abstract":"In this paper, a framework based on a sparse Mixture of Experts (MoE) architecture is proposed for the federated learning and application of a distributed classification model in domains (like cybersecurity and healthcare) where different parties of the federation store different subsets of features for a number of data instances. The framework is designed to limit the risk of information leakage and computation/communication costs in both model training (through data sampling) and application (leveraging the conditional-computation abilities of sparse MoEs). Experiments on real data have shown the proposed approach to ensure a better balance between efficiency and model accuracy, compared to other VFL-based solutions. Notably, in a real-life cybersecurity case study focused on malware classification (the KronoDroid dataset), the proposed method surpasses competitors even though it utilizes only 50% and 75% of the training set, which is fully utilized by the other approaches in the competition. This method achieves reductions in the rate of false positives by 16.9% and 18.2%, respectively, and also delivers satisfactory results on the other evaluation metrics. These results showcase our framework’s potential to significantly enhance cybersecurity threat detection and prevention in a collaborative yet secure manner.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141170198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

15 years of Big Data: a systematic literature review 大数据 15 年：系统文献综述

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-05-14 DOI: 10.1186/s40537-024-00914-9

Davide Tosi, Redon Kokaj, Marco Roccetti

{"title":"15 years of Big Data: a systematic literature review","authors":"Davide Tosi, Redon Kokaj, Marco Roccetti","doi":"10.1186/s40537-024-00914-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00914-9","url":null,"abstract":"Big Data is still gaining attention as a fundamental building block of the Artificial Intelligence and Machine Learning world. Therefore, a lot of effort has been pushed into Big Data research in the last 15 years. The objective of this Systematic Literature Review is to summarize the current state of the art of the previous 15 years of research about Big Data by providing answers to a set of research questions related to the main application domains for Big Data analytics; the significant challenges and limitations researchers have encountered in Big Data analysis, and emerging research trends and future directions in Big Data. The review follows a predefined procedure that automatically searches five well-known digital libraries. After applying the selection criteria to the results, 189 primary studies were identified as relevant, of which 32 were Systematic Literature Reviews. Required information was extracted from the 32 studies and summarized. Our Systematic Literature Review sketched the picture of 15 years of research in Big Data, identifying application domains, challenges, and future directions in this research field. We believe that a substantial amount of work remains to be done to align and seamlessly integrate Big Data into data-driven advanced software solutions of the future.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"100 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Skyline query under multidimensional incomplete data based on classification tree 基于分类树的多维不完整数据下的天际线查询

IF 8.1 2区计算机科学

Journal of Big Data Pub Date : 2024-05-12 DOI: 10.1186/s40537-024-00923-8

Dengke Yuan, Liping Zhang, Song Li, Guanglu Sun

{"title":"Skyline query under multidimensional incomplete data based on classification tree","authors":"Dengke Yuan, Liping Zhang, Song Li, Guanglu Sun","doi":"10.1186/s40537-024-00923-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00923-8","url":null,"abstract":"A method for skyline query of multidimensional incomplete data based on a classification tree has been proposed to address the problem of a large amount of useless data in existing skyline queries with multidimensional incomplete data, which leads to low query efficiency and algorithm performance. This method consists of two main parts. The first part is the proposed incomplete data weighted classification tree algorithm. In the first part, an incomplete data weighted classification tree is proposed, and the incomplete data set is classified using this tree. The data classified in the first part serves as the basis for the second step of the query. The second part proposes a skyline query algorithm for multidimensional incomplete data. The concept of optimal virtual points has been recently introduced, effectively reducing the number of comparisons of a large amount of data, thereby improving the query efficiency for incomplete data. Theoretical research and experimental analysis have shown that the proposed method can perform skyline queries for multidimensional incomplete data well, with high query efficiency and accuracy of the algorithm.","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"147 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0