PeerJ Computer Science最新文献

筛选
英文 中文
Siamese meta-learning network for social disputes based on multi-head attention. 基于多头注意的社会纠纷Siamese元学习网络。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-06-04 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2910
Jing Wang, Rui Zhang, Huijian Han, Yuxiang Liu, Zhaoxing Peng
{"title":"Siamese meta-learning network for social disputes based on multi-head attention.","authors":"Jing Wang, Rui Zhang, Huijian Han, Yuxiang Liu, Zhaoxing Peng","doi":"10.7717/peerj-cs.2910","DOIUrl":"10.7717/peerj-cs.2910","url":null,"abstract":"<p><p>Few-shot learning has been widely used in scenarios where labeled data is scarce, where meta-learning based few-shot classification is widely used, such as the Siamese network. Although the Siamese network has achieved good results in some applications, there are still some problems: (1) When computing prototype vectors with external knowledge of class labels, it depends on the quality and correctness of class labels. (2) When processing data, the Siamese network is not sufficient to capture dependencies between long distance. (3) When the data is complex or the samples are unbalanced, the Siamese network does not achieve the best performance. Therefore, this article proposes a multi-head attention siamese meta-learning network (MASM). Specifically, this article uses synonym substitution to solve the problem that the computation of prototype vectors will be transitionally dependent on class label. In addition, we use the multi-head attention mechanism to capture long-distance dependence by exploiting its global perception capability, which further improves the model performance. We conducted experiments on four benchmark datasets, all of which achieved good performance, and also applied the model for the first time in the field of social disputes, and experimented on a homemade private dispute dataset, which also achieved good results.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2910"},"PeriodicalIF":3.5,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cloud-based real-time enhancement for disease prediction using Confluent Cloud, Apache Kafka, feature optimization, and explainable artificial intelligence. 基于云的疾病预测实时增强,使用Confluent Cloud、Apache Kafka、功能优化和可解释的人工智能。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-06-04 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2899
Abdulaziz AlMohimeed
{"title":"Cloud-based real-time enhancement for disease prediction using Confluent Cloud, Apache Kafka, feature optimization, and explainable artificial intelligence.","authors":"Abdulaziz AlMohimeed","doi":"10.7717/peerj-cs.2899","DOIUrl":"10.7717/peerj-cs.2899","url":null,"abstract":"<p><p>In recent years, Internet of Things (IoT)-based technologies have advanced healthcare by facilitating the development of monitoring systems, subsequently generating an exponential amount of streaming data. This streaming data can be preprocessed and analyzed using technologies that integrate ensemble models, Explainable Artificial Intelligence (XAI), feature selection (FS) method and big data streaming processing platforms to develop predictive real-time systems. This integration adds new value to healthcare that helps organizations enhance clinical decision-making, improve patient care, and elevate the overall quality of healthcare. This article presents a real-time system for the early detection and treatment of chronic kidney disease (CKD) using a real-world simulation application. The real-time system is developed in two phases. The first phase aims to propose a stacking model, apply a genetic algorithm (GA) and Particle swarm optimization (PSO) as feature selection, and explore a stacking model with the best features with explainable artificial intelligence (XAI). The best model with the best-optimized features is used to develop the second phase. The results showed that stacking model with GA is achieved the hightest performance with 100 accuracy, 100 precision, 100 recall, and 100 F1-score. The second phase is designed based on Confluent Cloud, which offers several benefits for creating a real-time streaming system based on Apache Kafka, providing multiple APIs-the Producer API and Consumer API-for data producers and consumers, respectively. Python scripts are developed to pipeline streaming data. The first Python script to generate streaming health attributes that are pushed into a Kafka topic. A second Python script to consume health attributes from a Kafka topic and apply a stacking model to predict CKD in real-time. The results showed that the stacking model with features selected by GA recorded the best performance with 100 accuracy. The pipeline's streaming steps have validated our approach's effectiveness in real-time, leveraging Confluent Cloud and Apache Kafka.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2899"},"PeriodicalIF":3.5,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192947/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid decision support system disaster management: application of lattice ordered q-rung linear Diophantine fuzzy hypersoft sets. 混合决策支持系统灾害管理:格序q阶线性丢番图模糊超软集的应用。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-06-03 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2927
J Vimala, A N Surya, Nasreen Kausar, Dragan Pamucar, Seifedine Kadry, Jungeun Kim
{"title":"Hybrid decision support system disaster management: application of lattice ordered q-rung linear Diophantine fuzzy hypersoft sets.","authors":"J Vimala, A N Surya, Nasreen Kausar, Dragan Pamucar, Seifedine Kadry, Jungeun Kim","doi":"10.7717/peerj-cs.2927","DOIUrl":"10.7717/peerj-cs.2927","url":null,"abstract":"<p><p>The discovery of the lattice-ordered q-rung linear Diophantine fuzzy hypersoft set is a significant extension of fuzzy set theory. This study describes many of its fundamental algebraic operations, such as restricted union, extended union, restricted intersection, OR operation, and AND operation, along with examples. Further, an algorithm based on the proposed operations is presented in this study to handle multi-attributed decision-making problems extremely well, along with an illustrative multi-attribute decision-making example in the area of disaster management, which helps in choosing the most appropriate plan to tackle the known natural disaster by considering a greater number of attributes together. Further, the contribution of the method in the disaster management field is presented in the comparative analysis along with computational efficiency and scalability and an analysis of the comparison between the existing decision-making methods and the proposed one to express the superiority and advantages of the suggested approach over the existing methods.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2927"},"PeriodicalIF":3.5,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12193505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-objective federated learning traffic prediction in vehicular network for intelligent transportation system. 面向智能交通系统的车联网多目标联合学习交通预测。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-06-03 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2922
Arulmurgan Aalavanthar, Famila S, Shanmugam Sundaramurthy, Stefano Cirillo, Giandomenico Solimando, Giuseppe Polese
{"title":"Multi-objective federated learning traffic prediction in vehicular network for intelligent transportation system.","authors":"Arulmurgan Aalavanthar, Famila S, Shanmugam Sundaramurthy, Stefano Cirillo, Giandomenico Solimando, Giuseppe Polese","doi":"10.7717/peerj-cs.2922","DOIUrl":"10.7717/peerj-cs.2922","url":null,"abstract":"<p><p>The spatial-temporal data of future freight traffic speed in the metropolitan region must be properly understood to develop freight-related traffic management strategies. This work introduces a new approach to traffic prediction using multi-objective federated learning. Instead of relying on a centralized cloud server for data processing, collaborative training is implemented among several participants. The proposed method utilizes the advantages of reinforcement learning in dynamic decision-making scenarios and the expressive capabilities of graphical models to identify traffic intensity. Furthermore, a new methodology integrates federated learning concepts with multi-objective optimization to forecast traffic patterns accurately. The proposed approach exhibits a higher level of performance than existing methods for estimating traffic speed. It achieves a communication delay of 23.4%, packet delivery ratio (PDR) of 92.45%, packet loss rate of 12.34%, prediction accuracy of 97.45%, and resource utilization of 89.56%. The visualisation findings demonstrate that this new approach is able to successfully capture interconnections of metropolitan areas in different neighboring cities.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2922"},"PeriodicalIF":3.5,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RA-QoS: a robust autoencoder-based QoS predictor for highly accurate web service QoS prediction. RA-QoS:一个鲁棒的基于自编码器的QoS预测器,用于高度精确的web服务QoS预测。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-06-02 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2928
Shun Fu, Junnan Li, Lufeng Wang
{"title":"RA-QoS: a robust autoencoder-based QoS predictor for highly accurate web service QoS prediction.","authors":"Shun Fu, Junnan Li, Lufeng Wang","doi":"10.7717/peerj-cs.2928","DOIUrl":"10.7717/peerj-cs.2928","url":null,"abstract":"<p><p>Web services are fundamental for online service-oriented applications, where accurately predicting quality of service (QoS) is critical for recommending optimal services among multiple candidates. Since QoS data often contains noise-stemming from factors like remote user or service locations-current deep neural network (DNN)-based QoS predictors, which generally rely on L2-norm loss functions, face limitations in robustness due to sensitivity to outliers. To address this issue, we propose a novel robust autoencoder-based QoS predictor (RA-QoS) that leverages a hybrid loss function combining bias, training bias, L1-norm and L2-norm to build a robust Autoencoder. This hybrid approach allows RA-QoS to better handle noisy data, minimizing the impact of outliers and biases on prediction accuracy. The RA-QoS model further incorporates preprocessing and training biases, improving its adaptability to real-world QoS data. To evaluate the proposed RA-QoS predictor, extensive experiments are conducted on two real-world QoS datasets. The results demonstrate that our RA-QoS predictor exhibits superior robustness to outliers and higher accuracy in QoS prediction compared to the related state-of-the-art models.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2928"},"PeriodicalIF":3.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12193467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced futures price-spread forecasting based on an attention-driven optimized LSTM network: integrating an improved grey wolf optimizer algorithm for enhanced accuracy. 基于注意力驱动优化LSTM网络的期货价差预测:集成改进的灰狼优化算法以提高准确性。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-06-02 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2865
Yongli Tang, Zhenlun Gao, Zhongqi Cai, Jinxia Yu, Panke Qin
{"title":"Enhanced futures price-spread forecasting based on an attention-driven optimized LSTM network: integrating an improved grey wolf optimizer algorithm for enhanced accuracy.","authors":"Yongli Tang, Zhenlun Gao, Zhongqi Cai, Jinxia Yu, Panke Qin","doi":"10.7717/peerj-cs.2865","DOIUrl":"10.7717/peerj-cs.2865","url":null,"abstract":"<p><p>Financial market prediction faces significant challenges due to the complex temporal dependencies and heterogeneous data relationships inherent in futures price-spread data. Traditional machine learning methods struggle to effectively mine these patterns, while conventional long short-term memory (LSTM) models lack focused feature prioritization and suffer from suboptimal hyperparameter selection. This article proposes the Improved Grey Wolf Optimizer with Multi-headed Self-attention and LSTM (IGML) model, which integrates a multi-head self-attention mechanism to enhance feature interaction and introduces an improved grey wolf optimizer (IGWO) with four strategic enhancements for automated hyperparameter tuning. Benchmark tests on optimization problems validate IGWO's superior convergence efficiency. Evaluated on real futures price-spread datasets, the IGML reduces mean square error (RMSE) and mean absolute error (MAE) by up to 88% and 85%, respectively, compared to baseline models, demonstrating its practical efficacy in capturing intricate financial market dynamics.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2865"},"PeriodicalIF":3.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VBM-YOLO: an enhanced YOLO model with reduced information loss for vehicle body markers detection. VBM-YOLO:一种增强的YOLO模型,减少了车身标记检测的信息损失。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-06-02 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2932
Bin Wang, Chao Li, Chao Zhou, Jun Sun
{"title":"VBM-YOLO: an enhanced YOLO model with reduced information loss for vehicle body markers detection.","authors":"Bin Wang, Chao Li, Chao Zhou, Jun Sun","doi":"10.7717/peerj-cs.2932","DOIUrl":"10.7717/peerj-cs.2932","url":null,"abstract":"<p><p>In vehicle safety detection, the accurate identification of body markers on medium and large vehicles plays a critical role in ensuring safe road travel. To address the issues of the feature and gradient information loss in previous You Only Look Once (YOLO) series models, a novel Vehicle Body Markers YOLO (VBM-YOLO) model has been designed. Firstly, the model integrates the cross-spatial-channel attention (CSCA) mechanism proposed in this study. The CSCA uses cross-dimensional information to address interaction issues during the fusion of spatial and channel dimensions, significantly enhancing the model's representational capacity. Secondly, we propose a multi-scale selective feature pyramid network (MSSFPN). By a progressive fusion approach and multi-scale feature selection learning, MSSFPN alleviates the issues of feature loss and target layer information confusion caused by traditional top-down and bottom-up feature pyramids. Finally, an auxiliary gradient branch (AGB) is proposed. During training, AGB incorporates feature information from different target layers to help the current layer retain complete gradient information. Additionally, the AGB branch does not participate in model inference, thereby reducing additional overhead. Experimental results demonstrate that VBM-YOLO improves mean average precision (mAP) by 2.3% and 4.3% at intersection over union (IoU) thresholds of 0.5 and 0.5:0.95, respectively, compared to YOLOv8s on the vehicle body markers dataset. VBM-YOLO also achieves a better balance between accuracy and computational resources than other mainstream models, exhibiting good generalization performance on public datasets like PASCAL VOC and D-Fire.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2932"},"PeriodicalIF":3.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12193416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anime popularity prediction before huge investments: a multimodal approach using deep learning. 巨额投资前的动漫人气预测:使用深度学习的多模式方法。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-06-02 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2715
Jesús Armenta-Segura, Grigori Sidorov
{"title":"Anime popularity prediction before huge investments: a multimodal approach using deep learning.","authors":"Jesús Armenta-Segura, Grigori Sidorov","doi":"10.7717/peerj-cs.2715","DOIUrl":"https://doi.org/10.7717/peerj-cs.2715","url":null,"abstract":"<p><p>In the Japanese anime industry, predicting whether an upcoming product will be popular is crucial. This article introduces one of the most comprehensive free datasets for predicting anime popularity using only features accessible before huge investments, relying solely on freely available internet data and adhering to rigorous standards based on real-life experiences. To explore this dataset and its potential, a deep neural network architecture incorporating GPT-2 and ResNet-50 is proposed. The model achieved a best mean squared error (MSE) of 0.012, significantly surpassing a benchmark with traditional methods of 0.415, and a best R-square (R2) score of 0.142, outperforming the benchmark of -37.591. The aim of this study is to explore the scope and impact of features available before huge investments in relation to anime popularity. For that reason, and complementing the MSE and R2 metrics, Pearson and Spearman correlation coefficients are used. The best results, with Pearson at 0.382 and Spearman at 0.362, along with a well-fitted learning curves, suggests that while these features are relevant, they are not decisive for determining anime popularity and they likely interacts with additional features accessible after further investments. This is one of the first multimodal approaches to address this kind of tasks, aiming to support an entertainment industry by helping to avoid financial failures and guide successful production strategies.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2715"},"PeriodicalIF":3.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12190294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TARGE: large language model-powered explainable hate speech detection. TARGE:大型语言模型驱动的可解释仇恨言论检测。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-05-30 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2911
Muhammad Haseeb Hashir, Memoona, Sung Won Kim
{"title":"TARGE: large language model-powered explainable hate speech detection.","authors":"Muhammad Haseeb Hashir, Memoona, Sung Won Kim","doi":"10.7717/peerj-cs.2911","DOIUrl":"10.7717/peerj-cs.2911","url":null,"abstract":"<p><p>The proliferation of user-generated content on social networking sites has intensified the challenge of accurately and efficiently detecting inflammatory and discriminatory speech at scale. Traditional manual moderation methods are impractical due to the sheer volume and complexity of online discourse, necessitating automated solutions. However, existing deep learning models for hate speech detection typically function as black-box systems, providing binary classifications without interpretable insights into their decision-making processes. This opacity significantly limits their practical utility, particularly in nuanced content moderation tasks. To address this challenge, our research explores leveraging the advanced reasoning and knowledge integration capabilities of state-of-the-art language models, specifically Mistral-7B, to develop transparent hate speech detection systems. We introduce a novel framework wherein large language models (LLMs) generate explicit rationales by identifying and analyzing critical textual features indicative of hate speech. These rationales are subsequently integrated into specialized classifiers designed to perform explainable content moderation. We rigorously evaluate our methodology on multiple benchmark English-language social media datasets. Results demonstrate that incorporating LLM-generated explanations significantly enhances both the interpretability and accuracy of hate speech detection. This approach not only identifies problematic content effectively but also clearly articulates the analytical rationale behind each decision, fulfilling the critical demand for transparency in automated content moderation.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2911"},"PeriodicalIF":3.5,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of unsupervised static topic models' emergence detection ability. 无监督静态主题模型突现检测能力评价。
IF 3.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-05-30 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.2875
Xue Li, Ciro D Esposito, Paul Groth, Jonathan Sitruk, Balazs Szatmari, Nachoem Wijnberg
{"title":"Evaluation of unsupervised static topic models' emergence detection ability.","authors":"Xue Li, Ciro D Esposito, Paul Groth, Jonathan Sitruk, Balazs Szatmari, Nachoem Wijnberg","doi":"10.7717/peerj-cs.2875","DOIUrl":"10.7717/peerj-cs.2875","url":null,"abstract":"<p><p>Detecting emerging topics is crucial for understanding research trends, technological advancements, and shifts in public discourse. While unsupervised topic modeling techniques such as Latent Dirichlet allocation (LDA), BERTopic, and CoWords clustering are widely used for topic extraction, their ability to retrospectively detect emerging topics without relying on ground truth labels has not been systematically compared. This gap largely stems from the lack of a dedicated evaluation metric for measuring emergence detection. In this study, we introduce a quantitative evaluation metric to assess the effectiveness of topic models in detecting emerging topics. We evaluate three topic modeling approaches using both qualitative analysis and our proposed emergence detection metric. Our results indicate that, qualitatively, CoWords identifies emerging topics earlier than LDA and BERTopics. Quantitatively, our evaluation metric demonstrates that LDA achieves an average F1 score of 80.6% in emergence detection, outperforming BERTopic by 24.0%. These findings highlight the strengths and limitations of different topic models for emergence detection, while our proposed metric provides a robust framework for future benchmarking in this area.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2875"},"PeriodicalIF":3.5,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192802/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信