Journal of Big Data最新文献

筛选
英文 中文
Skyline query under multidimensional incomplete data based on classification tree 基于分类树的多维不完整数据下的天际线查询
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-12 DOI: 10.1186/s40537-024-00923-8
Dengke Yuan, Liping Zhang, Song Li, Guanglu Sun
{"title":"Skyline query under multidimensional incomplete data based on classification tree","authors":"Dengke Yuan, Liping Zhang, Song Li, Guanglu Sun","doi":"10.1186/s40537-024-00923-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00923-8","url":null,"abstract":"<p>A method for skyline query of multidimensional incomplete data based on a classification tree has been proposed to address the problem of a large amount of useless data in existing skyline queries with multidimensional incomplete data, which leads to low query efficiency and algorithm performance. This method consists of two main parts. The first part is the proposed incomplete data weighted classification tree algorithm. In the first part, an incomplete data weighted classification tree is proposed, and the incomplete data set is classified using this tree. The data classified in the first part serves as the basis for the second step of the query. The second part proposes a skyline query algorithm for multidimensional incomplete data. The concept of optimal virtual points has been recently introduced, effectively reducing the number of comparisons of a large amount of data, thereby improving the query efficiency for incomplete data. Theoretical research and experimental analysis have shown that the proposed method can perform skyline queries for multidimensional incomplete data well, with high query efficiency and accuracy of the algorithm.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"147 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A proposed hybrid framework to improve the accuracy of customer churn prediction in telecom industry 提高电信业客户流失预测准确性的混合框架建议
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-09 DOI: 10.1186/s40537-024-00922-9
Shimaa Ouf, Kholoud T. Mahmoud, Manal A. Abdel-Fattah
{"title":"A proposed hybrid framework to improve the accuracy of customer churn prediction in telecom industry","authors":"Shimaa Ouf, Kholoud T. Mahmoud, Manal A. Abdel-Fattah","doi":"10.1186/s40537-024-00922-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00922-9","url":null,"abstract":"<p>In the telecom sector, predicting customer churn has increased in importance in recent years. Developing a robust and accurate churn prediction model takes time, but it is crucial. Early churn prediction avoids revenue loss and improves customer retention. Telecom companies must identify these customers before they leave to solve this issue. Researchers have used a variety of applied machine-learning approaches to reveal the hidden relationships between different features. A key aspect of churn prediction is the accuracy level that affects the learning model's performance. This study aims to clarify several aspects of customer churn prediction accuracy and investigate state-of-the-art techniques' performance. However, no previous research has investigated performance using a hybrid framework combining the advantages of selecting suitable data preprocessing, ensemble learning, and resampling techniques. The study introduces a proposed hybrid framework that improves the accuracy of customer churn prediction in the telecom industry. The framework is built by integrating the XGBOOST classifier with the hybrid resampling method SMOTE-ENN, which concerns applying effective techniques for data preprocessing. The proposed framework is used for two experiments with three datasets in the telecom industry. This study determines which features are most crucial and influence customer churn, introduces the impact of data balancing, compares the classifiers' pre- and post-data balancing performances, and examines a speed-accuracy trade-off in hybrid classifiers. Many metrics, including accuracy, precision, recall, F1-score, and ROC curve, are used to analyze the results. All evaluation criteria are used to identify the most effective experiment. The results of the accuracy of the hybrid framework that respects balanced data outperformed applying the classifier only to imbalanced data. In addition, the results of the proposed hybrid framework are compared to previous studies on the same datasets, and the result of this comparison is offered. Compared with the review of the latest works, our proposed hybrid framework with the three datasets outperformed these works.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"58 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEMFFA: a multi-strategy modified Fennec Fox algorithm with mixed improved differential evolutionary variation strategies DEMFFA:采用混合改进型差分进化变异策略的多策略改进型芬内克-福克斯算法
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-08 DOI: 10.1186/s40537-024-00917-6
Gang Hu, Keke Song, Xiuxiu Li, Yi Wang
{"title":"DEMFFA: a multi-strategy modified Fennec Fox algorithm with mixed improved differential evolutionary variation strategies","authors":"Gang Hu, Keke Song, Xiuxiu Li, Yi Wang","doi":"10.1186/s40537-024-00917-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00917-6","url":null,"abstract":"<p>The Fennec Fox algorithm (FFA) is a new meta-heuristic algorithm that is primarily inspired by the Fennec fox's ability to dig and escape from wild predators. Compared with other classical algorithms, FFA shows strong competitiveness. The “No free lunch” theorem shows that an algorithm has different effects in the face of different problems, such as: when solving high-dimensional or more complex applications, there are challenges such as easily falling into local optimal and slow convergence speed. To solve this problem with FFA, in this paper, an improved Fenna fox algorithm DEMFFA is proposed by adding sin chaotic mapping, formula factor adjustment, Cauchy operator mutation, and differential evolution mutation strategies. Firstly, a sin chaotic mapping strategy is added in the initialization stage to make the population distribution more uniform, thus speeding up the algorithm convergence speed. Secondly, in order to expedite the convergence speed of the algorithm, adjustments are made to the factors of the formula whose position is updated in the first stage, resulting in faster convergence. Finally, in order to prevent the algorithm from getting into the local optimal too early and expand the search space of the population, the Cauchy operator mutation strategy and differential evolution mutation strategy are added after the first and second stages of the original algorithm update. In order to verify the performance of the proposed DEMFFA, qualitative analysis is carried out on different test sets, and the proposed algorithm is tested with the original FFA, other classical algorithms, improved algorithms, and newly proposed algorithms on three different test sets. And we also carried out a qualitative analysis of the CEC2020. In addition, DEMFFA is applied to 10 practical engineering design problems and a complex 24-bar truss topology optimization problem, and the results show that the DEMFFA algorithm has the potential to solve complex problems.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"38 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation is key: a survey on evaluation measures for synthetic time series 评估是关键:关于合成时间序列评估措施的调查
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-07 DOI: 10.1186/s40537-024-00924-7
Michael Stenger, Robert Leppich, Ian Foster, Samuel Kounev, André Bauer
{"title":"Evaluation is key: a survey on evaluation measures for synthetic time series","authors":"Michael Stenger, Robert Leppich, Ian Foster, Samuel Kounev, André Bauer","doi":"10.1186/s40537-024-00924-7","DOIUrl":"https://doi.org/10.1186/s40537-024-00924-7","url":null,"abstract":"<p>Synthetic data generation describes the process of learning the underlying distribution of a given real dataset in a model, which is, in turn, sampled to produce new data objects still adhering to the original distribution. This approach often finds application where circumstances limit the availability or usability of real-world datasets, for instance, in health care due to privacy concerns. While image synthesis has received much attention in the past, time series are key for many practical (e.g., industrial) applications. To date, numerous different generative models and measures to evaluate time series syntheses have been proposed. However, regarding the defining features of high-quality synthetic time series and how to quantify quality, no consensus has yet been reached among researchers. Hence, we propose a comprehensive survey on evaluation measures for time series generation to assist users in evaluating synthetic time series. For one, we provide brief descriptions or - where applicable - precise definitions. Further, we order the measures in a taxonomy and examine applicability and usage. To assist in the selection of the most appropriate measures, we provide a concise guide for fast lookup. Notably, our findings reveal a lack of a universally accepted approach for an evaluation procedure, including the selection of appropriate measures. We believe this situation hinders progress and may even erode evaluation standards to a “do as you like”-approach to synthetic data evaluation. Therefore, this survey is a preliminary step to advance the field of synthetic data evaluation.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"28 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Amharic spoken digits recognition using convolutional neural network 利用卷积神经网络识别阿姆哈拉语口语数字
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-04 DOI: 10.1186/s40537-024-00910-z
Tewodros Alemu Ayall, Changjun Zhou, Huawen Liu, Getnet Mezgebu Brhanemeskel, Solomon Teferra Abate, Michael Adjeisah
{"title":"Amharic spoken digits recognition using convolutional neural network","authors":"Tewodros Alemu Ayall, Changjun Zhou, Huawen Liu, Getnet Mezgebu Brhanemeskel, Solomon Teferra Abate, Michael Adjeisah","doi":"10.1186/s40537-024-00910-z","DOIUrl":"https://doi.org/10.1186/s40537-024-00910-z","url":null,"abstract":"<p>Spoken digits recognition (SDR) is a type of supervised automatic speech recognition, which is required in various human–machine interaction applications. It is utilized in phone-based services like dialing systems, certain bank operations, airline reservation systems, and price extraction. However, the design of SDR is a challenging task that requires the development of labeled audio data, the proper choice of feature extraction method, and the development of the best performing model. Even if several works have been done for various languages, such as English, Arabic, Urdu, etc., there is no developed Amharic spoken digits dataset (AmSDD) to build Amharic spoken digits recognition (AmSDR) model for the Amharic language, which is the official working language of the government of Ethiopia. Therefore, in this study, we developed a new AmSDD that contains 12,000 utterances of 0 (Zaero) to 9 (zet’enyi) digits which were recorded from 120 volunteer speakers of different age groups, genders, and dialects who repeated each digit ten times. Mel frequency cepstral coefficients (MFCCs) and Mel-Spectrogram feature extraction methods were used to extract trainable features from the speech signal. We conducted different experiments on the development of the AmSDR model using the AmSDD and classical supervised learning algorithms such as Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest (RF) as the baseline. To further improve the performance recognition of AmSDR, we propose a three layers Convolutional Neural Network (CNN) architecture with Batch normalization. The results of our experiments show that the proposed CNN model outperforms the baseline algorithms and scores an accuracy of 99% and 98% using MFCCs and Mel-Spectrogram features, respectively.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"21 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices 对大型语言模型进行 Xai 驱动的知识提炼,以便在低资源设备上高效部署
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-04 DOI: 10.1186/s40537-024-00928-3
Riccardo Cantini, Alessio Orsino, Domenico Talia
{"title":"Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices","authors":"Riccardo Cantini, Alessio Orsino, Domenico Talia","doi":"10.1186/s40537-024-00928-3","DOIUrl":"https://doi.org/10.1186/s40537-024-00928-3","url":null,"abstract":"<p>Large Language Models (LLMs) are characterized by their inherent memory inefficiency and compute-intensive nature, making them impractical to run on low-resource devices and hindering their applicability in edge AI contexts. To address this issue, Knowledge Distillation approaches have been adopted to transfer knowledge from a complex model, referred to as the teacher, to a more compact, computationally efficient one, known as the student. The aim is to retain the performance of the original model while substantially reducing computational requirements. However, traditional knowledge distillation methods may struggle to effectively transfer crucial explainable knowledge from an LLM teacher to the student, potentially leading to explanation inconsistencies and decreased performance. This paper presents <i>DiXtill</i>, a method based on a novel approach to distilling knowledge from LLMs into lightweight neural architectures. The main idea is to leverage local explanations provided by an eXplainable Artificial Intelligence (XAI) method to guide the cross-architecture distillation of a teacher LLM into a self-explainable student, specifically a bi-directional LSTM network.Experimental results show that our XAI-driven distillation method allows the teacher explanations to be effectively transferred to the student, resulting in better agreement compared to classical distillation methods,thus enhancing the student interpretability. Furthermore, it enables the student to achieve comparable performance to the teacher LLM while also delivering a significantly higher compression ratio and speedup compared to other techniques such as post-training quantization and pruning, which paves the way for more efficient and sustainable edge AI applications</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"18 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved deep hashing model for image retrieval with binary code similarities 利用二进制代码相似性进行图像检索的改进型深度散列模型
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-04-18 DOI: 10.1186/s40537-024-00919-4
Huawen Liu, Zongda Wu, Minghao Yin, Donghua Yu, Xinzhong Zhu, Jungang Lou
{"title":"An improved deep hashing model for image retrieval with binary code similarities","authors":"Huawen Liu, Zongda Wu, Minghao Yin, Donghua Yu, Xinzhong Zhu, Jungang Lou","doi":"10.1186/s40537-024-00919-4","DOIUrl":"https://doi.org/10.1186/s40537-024-00919-4","url":null,"abstract":"<p>The exponential growth of data raises an unprecedented challenge in data analysis: how to retrieve interesting information from such large-scale data. Hash learning is a promising solution to address this challenge, because it may bring many potential advantages, such as extremely high efficiency and low storage cost, after projecting high-dimensional data to compact binary codes. However, traditional hash learning algorithms often suffer from the problem of semantic inconsistency, where images with similar semantic features may have different binary codes. In this paper, we propose a novel end-to-end deep hashing method based on the similarities of binary codes, dubbed CSDH (Code Similarity-based Deep Hashing), for image retrieval. Specifically, it extracts deep features from images to capture semantic information using a pre-trained deep convolutional neural network. Additionally, a hidden and fully connected layer is attached at the end of the deep network to derive hash bits by virtue of an activation function. To preserve the semantic consistency of images, a loss function has been introduced. It takes the label similarities, as well as the Hamming embedding distances, into consideration. By doing so, CSDH can learn more compact and powerful hash codes, which not only can preserve semantic similarity but also have small Hamming distances between similar images. To verify the effectiveness of CSDH, we evaluate CSDH on two public benchmark image collections, i.e., CIFAR-10 and NUS-WIDE, with five classic shallow hashing models and six popular deep hashing ones. The experimental results show that CSDH can achieve competitive performance to the popular deep hashing algorithms.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"25 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting the potential value of vital signs in the real-time prediction of mortality risk in intensive care unit patients 重新审视生命体征在实时预测重症监护室患者死亡风险方面的潜在价值
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-04-18 DOI: 10.1186/s40537-024-00896-8
Pan Pan, Yue Wang, Chang Liu, Yanhui Tu, Haibo Cheng, Qingyun Yang, Fei Xie, Yuan Li, Lixin Xie, Yuhong Liu
{"title":"Revisiting the potential value of vital signs in the real-time prediction of mortality risk in intensive care unit patients","authors":"Pan Pan, Yue Wang, Chang Liu, Yanhui Tu, Haibo Cheng, Qingyun Yang, Fei Xie, Yuan Li, Lixin Xie, Yuhong Liu","doi":"10.1186/s40537-024-00896-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00896-8","url":null,"abstract":"&lt;h3 data-test=\"abstract-sub-heading\"&gt;Background&lt;/h3&gt;&lt;p&gt;Predicting patient mortality risk facilitates early intervention in intensive care unit (ICU) patients at greater risk of disease progression. This study applies machine learning methods to multidimensional clinical data to dynamically predict mortality risk in ICU patients.&lt;/p&gt;&lt;h3 data-test=\"abstract-sub-heading\"&gt;Methods&lt;/h3&gt;&lt;p&gt;A total of 33,798 patients in the MIMIC-III database were collected. An integrated model NIMRF (Network Integrating Memory Module and Random Forest) based on multidimensional variables such as vital sign variables and laboratory variables was developed to predict the risk of death for ICU patients in four non overlapping time windows of 0–1 h, 1–3 h, 3–6 h, and 6–12 h. Mortality risk in four nonoverlapping time windows of 12 h was externally validated on data from 889 patients in the respiratory critical care unit of the Chinese PLA General Hospital and compared with LSTM, random forest and time-dependent cox regression model (survival analysis) methods. We also interpret the developed model to obtain important factors for predicting mortality risk across time windows. The code can be found in https://github.com/wyuexiao/NIMRF.&lt;/p&gt;&lt;h3 data-test=\"abstract-sub-heading\"&gt;Results&lt;/h3&gt;&lt;p&gt;The NIMRF model developed in this study could predict the risk of death in four nonoverlapping time windows (0–1 h, 1–3 h, 3–6 h, 6–12 h) after any time point in ICU patients, and in internal data validation, it is suggested that the model is more accurate than LSTM, random forest prediction and time-dependent cox regression model (area under receiver operating characteristic curve, or AUC, 0–1 h: 0.8015 [95% CI 0.7725–0.8304] vs. 0.7144 [95%] CI 0.6824–0.7464] vs. 0.7606 [95% CI 0.7300–0.7913] vs 0.3867 [95% CI 0.3573–0.4161]; 1–3 h: 0.7100 [95% CI 0.6777–0.7423] vs. 0.6389 [95% CI 0.6055–0.6723] vs. 0.6992 [95% CI 0.6667–0.7318] vs 0.3854 [95% CI 0.3559–0.4150]; 3–6 h: 0.6760 [95% CI 0.6425–0.7097] vs. 0.5964 [95% CI 0.5622–0.6306] vs. 0.6760 [95% CI 0.6427–0.7099] vs 0.3967 [95% CI 0.3662–0.4271]; 6–12 h: 0.6380 [0.6031–0.6729] vs. 0.6032 [0.5705–0.6406] vs. 0.6055 [0.5682–0.6383] vs 0.4023 [95% CI 0.3709–0.4337]). External validation was performed on the data of patients in the respiratory critical care unit of the Chinese PLA General Hospital. Compared with LSTM, random forest and time-dependent cox regression model, the NIMRF model was still the best, with an AUC of 0.9366 [95% CI 0.9157–0.9575 for predicting death risk in 0–1 h]. The corresponding AUCs of LSTM, random forest and time-dependent cox regression model were 0.9263 [95% CI 0.9039–0.9486], 0.7437 [95% CI 0.7083–0.7791] and 0.2447 [95% CI 0.2202–0.2692], respectively. Interpretation of the model revealed that vital signs (systolic blood pressure, heart rate, diastolic blood pressure, respiratory rate, and body temperature) were highly correlated with events of death.&lt;/p&gt;&lt;h3 data-test=\"abstract-sub-heading\"&gt;Conclusion&lt;/h3&gt;&lt;p&gt;","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"11 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140626658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing academic performance prediction with temporal graph networks for massive open online courses 利用时序图网络加强大规模开放式在线课程的学习成绩预测
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-04-13 DOI: 10.1186/s40537-024-00918-5
Qionghao Huang, Jili Chen
{"title":"Enhancing academic performance prediction with temporal graph networks for massive open online courses","authors":"Qionghao Huang, Jili Chen","doi":"10.1186/s40537-024-00918-5","DOIUrl":"https://doi.org/10.1186/s40537-024-00918-5","url":null,"abstract":"<p>Educational big data significantly impacts education, and Massive Open Online Courses (MOOCs), a crucial learning approach, have evolved to be more intelligent with these technologies. Deep neural networks have significantly advanced the crucial task within MOOCs, predicting student academic performance. However, most deep learning-based methods usually ignore the temporal information and interaction behaviors during the learning activities, which can effectively enhance the model’s predictive accuracy. To tackle this, we formulate the learning processes of e-learning students as dynamic temporal graphs to encode the temporal information and interaction behaviors during their studying. We propose a novel academic performance prediction model (APP-TGN) based on temporal graph neural networks. Specifically, in APP-TGN, a dynamic graph is constructed from online learning activity logs. A temporal graph network with low-high filters learns potential academic performance variations encoded in dynamic graphs. Furthermore, a global sampling module is developed to mitigate the problem of false correlations in deep learning-based models. Finally, multi-head attention is utilized for predicting academic outcomes. Extensive experiments are conducted on a well-known public dataset. The experimental results indicate that APP-TGN significantly surpasses existing methods and demonstrates excellent potential in automated feedback and personalized learning.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"8 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The differences in gastric cancer epidemiological data between SEER and GBD: a joinpoint and age-period-cohort analysis SEER 和 GBD 胃癌流行病学数据的差异:连接点和年龄段队列分析
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-04-13 DOI: 10.1186/s40537-024-00907-8
Zenghong Wu, Kun Zhang, Weijun Wang, Mengke Fan, Rong Lin
{"title":"The differences in gastric cancer epidemiological data between SEER and GBD: a joinpoint and age-period-cohort analysis","authors":"Zenghong Wu, Kun Zhang, Weijun Wang, Mengke Fan, Rong Lin","doi":"10.1186/s40537-024-00907-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00907-8","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>The burden of gastric cancer (GC) should be further clarified worldwide, and helped us to understand the current situation of GC.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>In the present study, we estimated disability-adjusted life-years (DALYs) and mortality rates attributable to several major GC risk factors, including smoking, dietary risk, and behavioral risk. In addition, we evaluated the incidence rate and trends of incidence-based mortality (IBM) due to GC in the United States (US) during 1992–2018.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Globally, GC incidences increased from 883,395 in 1990 to 1,269,805 in 2019 while GC-associated mortality increased from 788,316 in 1990 to 957,185 in 2019. In 2019, the age-standardized rate (ASR) of GC exhibited variations around the world, with Mongolia having the highest observed ASR (43.7 per 100,000), followed by Bolivia (34 per 100,000) and China (30.6 per 100,000). A negative association was found among estimated annual percentage change (EAPC) and ASR (age-standardized incidence rate (ASIR): r = − 0.28, <i>p</i> &lt; 0.001; age-standardized death rate (ASDR): r = − 0.19, <i>p</i> = 0.005). There were 74,966 incidences of GC and 69,374 GC-related deaths recorded between 1992 and 2018. The significant decrease in GC incidences as well as decreasing trends in IBM of GC were first detected in 1994. The GC IBM significantly increased at a rate of 35%/y from 1992 to 1994 (95% CI 21.2% to 50.4%/y), and then begun to decrease at a rate of − 1.4%/y from 1994 to 2018 (95% CI − 1.6% to − 1.2%/y).</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>These findings mirror the global disease burden of GC and are important for development of targeted prevention strategies.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"26 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信