{"title":"Enhanced analysis of large-scale news text data using the bidirectional-Kmeans-LSTM-CNN model","authors":"Qingxiang Zeng","doi":"10.7717/peerj-cs.2213","DOIUrl":"https://doi.org/10.7717/peerj-cs.2213","url":null,"abstract":"Traditional methods may be inefficient when processing large-scale data in the field of text mining, often struggling to identify and cluster relevant information accurately and efficiently. Additionally, capturing nuanced sentiment and emotional context within news text is challenging with conventional techniques. To address these issues, this article introduces an improved bidirectional-Kmeans-long short-term memory network-convolutional neural network (BiK-LSTM-CNN) model that incorporates emotional semantic analysis for high-dimensional news text visual extraction and media hotspot mining. The BiK-LSTM-CNN model comprises four modules: news text preprocessing, news text clustering, sentiment semantic analysis, and the BiK-LSTM-CNN model itself. By combining these components, the model effectively identifies common features within the input data, clusters similar news articles, and accurately analyzes the emotional semantics of the text. This comprehensive approach enhances both the accuracy and efficiency of visual extraction and hotspot mining. Experimental results demonstrate that compared to models such as Transformer, AdvLSTM, and NewRNN, BiK-LSTM-CNN achieves improvements in macro accuracy by 0.50%, 0.91%, and 1.34%, respectively. Similarly, macro recall rates increase by 0.51%, 1.24%, and 1.26%, while macro F1 scores improve by 0.52%, 1.23%, and 1.92%. Additionally, the BiK-LSTM-CNN model shows significant improvements in time efficiency, further establishing its potential as a more effective approach for processing and analyzing large-scale text data","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141880736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating international Chinese visualization teaching and vocational skills training: leveraging attention-connectionist temporal classification models","authors":"Yuan Yao, Zhujun Dai, Muhammad Shahbaz","doi":"10.7717/peerj-cs.2223","DOIUrl":"https://doi.org/10.7717/peerj-cs.2223","url":null,"abstract":"The teaching of Chinese as a second language has become increasingly crucial for promoting cross-cultural exchange and mutual learning worldwide. However, traditional approaches to international Chinese language teaching have limitations that hinder their effectiveness, such as outdated teaching materials, lack of qualified instructors, and limited access to learning facilities. To overcome these challenges, it is imperative to develop intelligent and visually engaging methods for teaching international Chinese language learners. In this article, we propose leveraging speech recognition technology within artificial intelligence to create an oral assistance platform that provides visualized pinyin-formatted feedback to learners. Additionally, this system can identify accent errors and provide vocational skills training to improve learners’ communication abilities. To achieve this, we propose the Attention-Connectionist Temporal Classification (CTC) model, which utilizes a specific temporal convolutional neural network to capture the location information necessary for accurate speech recognition. Our experimental results demonstrate that this model outperforms similar approaches, with significant reductions in error rates for both validation and test sets, compared with the original Attention model, Claim, Evidence, Reasoning (CER) is reduced by 0.67%. Overall, our proposed approach has significant potential for enhancing the efficiency and effectiveness of vocational skills training for international Chinese language learners.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wireless sensor localization based on distance optimization and assistance by mobile anchor nodes: a novel algorithm","authors":"Hui Yang","doi":"10.7717/peerj-cs.2179","DOIUrl":"https://doi.org/10.7717/peerj-cs.2179","url":null,"abstract":"Wireless sensor networks (WSNs) have wide applications in healthcare, environmental monitoring, and target tracking, relying on sensor nodes that are joined cooperatively. The research investigates localization algorithms for both target and node in WSNs to enhance accuracy. An innovative localization algorithm characterized as an asynchronous time-of-arrival (TOA) target is proposed by implementing a differential evolution algorithm. Unlike available approaches, the proposed algorithm employs the least squares criterion to represent signal-sending time as a function of the target position. The target node’s coordinates are estimated by utilizing a differential evolution algorithm with reverse learning and adaptive redirection. A hybrid received signal strength (RSS)-TOA target localization algorithm is introduced, addressing the challenge of unknown transmission parameters. This algorithm simultaneously estimates transmitted power, path loss index, and target position by employing the RSS and TOA measurements. These proposed algorithms improve the accuracy and efficiency of wireless sensor localization, boosting performance in various WSN applications.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kangning Du, Zhen Wang, Lin Cao, Yanan Guo, Shu Tian, Fan Zhang
{"title":"HCGAN: hierarchical contrast generative adversarial network for unpaired sketch face synthesis","authors":"Kangning Du, Zhen Wang, Lin Cao, Yanan Guo, Shu Tian, Fan Zhang","doi":"10.7717/peerj-cs.2184","DOIUrl":"https://doi.org/10.7717/peerj-cs.2184","url":null,"abstract":"Transforming optical facial images into sketches while preserving realism and facial features poses a significant challenge. The current methods that rely on paired training data are costly and resource-intensive. Furthermore, they often fail to capture the intricate features of faces, resulting in substandard sketch generation. To address these challenges, we propose the novel hierarchical contrast generative adversarial network (HCGAN). Firstly, HCGAN consists of a global sketch synthesis module that generates sketches with well-defined global features and a local sketch refinement module that enhances the ability to extract features in critical areas. Secondly, we introduce local refinement loss based on the local sketch refinement module, refining sketches at a granular level. Finally, we propose an association strategy called “warmup-epoch” and local consistency loss between the two modules to ensure HCGAN is effectively optimized. Evaluations of the CUFS and SKSF-A datasets demonstrate that our method produces high-quality sketches and outperforms existing state-of-the-art methods in terms of fidelity and realism. Compared to the current state-of-the-art methods, HCGAN reduces FID by 12.6941, 4.9124, and 9.0316 on three datasets of CUFS, respectively, and by 7.4679 on the SKSF-A dataset. Additionally, it obtained optimal scores for content fidelity (CF), global effects (GE), and local patterns (LP). The proposed HCGAN model provides a promising solution for realistic sketch synthesis under unpaired data training.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong Wei, Wu Sun, Xiaofeng Zou, Dan Ma, Huarong Xu, Panfeng Chen, Chaoshu Yang, Mei Chen, Hui Li
{"title":"An anomaly detection model for multivariate time series with anomaly perception","authors":"Dong Wei, Wu Sun, Xiaofeng Zou, Dan Ma, Huarong Xu, Panfeng Chen, Chaoshu Yang, Mei Chen, Hui Li","doi":"10.7717/peerj-cs.2172","DOIUrl":"https://doi.org/10.7717/peerj-cs.2172","url":null,"abstract":"Multivariate time series anomaly detection is a crucial data mining technique with a wide range of applications in areas such as IT applications. Currently, the majority of anomaly detection methods for time series data rely on unsupervised approaches due to the rarity of anomaly labels. However, in real-world scenarios, obtaining a limited number of anomaly labels is feasible and affordable. Effective usage of these labels can offer valuable insights into the temporal characteristics of anomalies and play a pivotal role in guiding anomaly detection efforts. To improve the performance of multivariate time series anomaly detection, we proposed a novel deep learning model named EDD (Encoder-Decoder-Discriminator) that leverages limited anomaly samples. The EDD model innovatively integrates a graph attention network with long short term memory (LSTM) to extract spatial and temporal features from multivariate time series data. This integrated approach enables the model to capture complex patterns and dependencies within the data. Additionally, the model skillfully maps series data into a latent space, utilizing a carefully crafted loss function to cluster normal data tightly in the latent space while dispersing abnormal data randomly. This innovative design results in distinct probability distributions for normal and abnormal data in the latent space, enabling precise identification of anomalous data. To evaluate the performance of our EDD model, we conducted extensive experimental validation across three diverse datasets. The results demonstrate the significant superiority of our model in multivariate time series anomaly detection. Specifically, the average F1-Score of our model outperformed the second-best method by 2.7% and 73.4% in both evaluation approaches, respectively, highlighting its superior detection capabilities. These findings validate the effectiveness of our proposed EDD model in leveraging limited anomaly samples for accurate and robust anomaly detection in multivariate time series data.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
María Novo-Lourés, Reyes Pavón, Rosalía Laza, José R. Méndez, David Ruano-Ordás
{"title":"An enhanced algorithm for semantic-based feature reduction in spam filtering","authors":"María Novo-Lourés, Reyes Pavón, Rosalía Laza, José R. Méndez, David Ruano-Ordás","doi":"10.7717/peerj-cs.2206","DOIUrl":"https://doi.org/10.7717/peerj-cs.2206","url":null,"abstract":"With the advent and improvement of ontological dictionaries (WordNet, Babelnet), the use of synsets-based text representations is gaining popularity in classification tasks. More recently, ontological dictionaries were used for reducing dimensionality in this kind of representation (e.g., Semantic Dimensionality Reduction System (SDRS) (Vélez de Mendizabal et al., 2020)). These approaches are based on the combination of semantically related columns by taking advantage of semantic information extracted from ontological dictionaries. Their main advantage is that they not only eliminate features but can also combine them, minimizing (low-loss) or avoiding (lossless) the loss of information. The most recent (and accurate) techniques included in this group are based on using evolutionary algorithms to find how many features can be grouped to reduce false positive (FP) and false negative (FN) errors obtained. The main limitation of these evolutionary-based schemes is the computational requirements derived from the use of optimization algorithms. The contribution of this study is a new lossless feature reduction scheme exploiting information from ontological dictionaries, which achieves slightly better accuracy (specially in FP errors) than optimization-based approaches but using far fewer computational resources. Instead of using computationally expensive evolutionary algorithms, our proposal determines whether two columns (synsets) can be combined by observing whether the instances included in a dataset (e.g., training dataset) containing these synsets are mostly of the same class. The study includes experiments using three datasets and a detailed comparison with two previous optimization-based approaches.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A knowledge graph algorithm enabled deep recommendation system","authors":"Yan Wang, Xiao Feng Ma, Miao Zhu","doi":"10.7717/peerj-cs.2010","DOIUrl":"https://doi.org/10.7717/peerj-cs.2010","url":null,"abstract":"Personalized learning resource recommendations may help resolve the difficulties of online education that include learning mazes and information overload. However, existing personalized learning resource recommendation algorithms have shortcomings such as low accuracy and low efficiency. This study proposes a deep recommendation system algorithm based on a knowledge graph (D-KGR) that includes four data processing units. These units are the recommendation unit (RS unit), the knowledge graph feature representation unit (KGE unit), the cross compression unit (CC unit), and the feature extraction unit (FE unit). This model integrates technologies including the knowledge graph, deep learning, neural network, and data mining. It introduces cross compression in the feature learning process of the knowledge graph and predicts user attributes. Multimodal technology is used to optimize the process of project attribute processing; text type attributes, multivalued type attributes, and other type attributes are processed separately to reconstruct the knowledge graph. A convolutional neural network algorithm is introduced in the reconstruction process to optimize the data feature qualities. Experimental analysis was conducted from two aspects of algorithm efficiency and accuracy, and the particle swarm optimization, neural network, and knowledge graph algorithms were compared. Several tests showed that the deep recommendation system algorithm had obvious advantages when the number of learning resources and users exceeded 1,000. It has the ability to integrate systems such as the particle swarm optimization iterative classification, neural network intelligent simulation, and low resource consumption. It can quickly process massive amounts of information data, reduce algorithm complexity and requires less time and had lower costs. Our algorithm also has better efficiency and accuracy.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141873064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time infectious disease endurance indicator system for scientific decisions using machine learning and rapid data processing","authors":"Shivendra Dubey, Dinesh Kumar Verma, Mahesh Kumar","doi":"10.7717/peerj-cs.2062","DOIUrl":"https://doi.org/10.7717/peerj-cs.2062","url":null,"abstract":"The SARS-CoV-2 virus, which induces an acute respiratory illness commonly referred to as COVID-19, had been designated as a pandemic by the World Health Organization due to its highly infectious nature and the associated public health risks it poses globally. Identifying the critical factors for predicting mortality is essential for improving patient therapy. Unlike other data types, such as computed tomography scans, x-radiation, and ultrasounds, basic blood test results are widely accessible and can aid in predicting mortality. The present research advocates the utilization of machine learning (ML) methodologies for predicting the likelihood of infectious disease like COVID-19 mortality by leveraging blood test data. Age, LDH (lactate dehydrogenase), lymphocytes, neutrophils, and hs-CRP (high-sensitivity C-reactive protein) are five extremely potent characteristics that, when combined, can accurately predict mortality in 96% of cases. By combining XGBoost feature importance with neural network classification, the optimal approach can predict mortality with exceptional accuracy from infectious disease, along with achieving a precision rate of 90% up to 16 days before the event. The studies suggested model’s excellent predictive performance and practicality were confirmed through testing with three instances that depended on the days to the outcome. By carefully analyzing and identifying patterns in these significant biomarkers insightful information has been obtained for simple application. This study offers potential remedies that could accelerate decision-making for targeted medical treatments within healthcare systems, utilizing a timely, accurate, and reliable method.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Dong, Yundong Liu, Yuhua Cheng, Guangshuai Gao, Kai Chen, Chunlei Li
{"title":"Adaptive adjacent context negotiation network for object detection in remote sensing imagery","authors":"Yan Dong, Yundong Liu, Yuhua Cheng, Guangshuai Gao, Kai Chen, Chunlei Li","doi":"10.7717/peerj-cs.2199","DOIUrl":"https://doi.org/10.7717/peerj-cs.2199","url":null,"abstract":"Accurate localization of objects of interest in remote sensing images (RSIs) is of great significance for object identification, resource management, decision-making and disaster relief response. However, many difficulties, like complex backgrounds, dense target quantities, large-scale variations, and small-scale objects, which make the detection accuracy unsatisfactory. To improve the detection accuracy, we propose an Adaptive Adjacent Context Negotiation Network (A2CN-Net). Firstly, the composite fast Fourier convolution (CFFC) module is given to reduce the information loss of small objects, which is inserted into the backbone network to obtain spectral global context information. Then, the Global Context Information Enhancement (GCIE) module is given to capture and aggregate global spatial features, which is beneficial for locating objects of different scales. Furthermore, to alleviate the aliasing effect caused by the fusion of adjacent feature layers, a novel Adaptive Adjacent Context Negotiation network (A2CN) is given to adaptive integration of multi-level features, which consists of local and adjacent branches, with the local branch adaptively highlighting feature information and the adjacent branch introducing global information at the adjacent level to enhance feature representation. In the meantime, considering the variability in the focus of feature layers in different dimensions, learnable weights are applied to the local and adjacent branches for adaptive feature fusion. Finally, extensive experiments are performed in several available public datasets, including DIOR and DOTA-v1.0. Experimental studies show that A2CN-Net can significantly boost detection performance, with mAP increasing to 74.2% and 79.2%, respectively.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient hybrid differential evolution-golden jackal optimization algorithm for multilevel thresholding image segmentation","authors":"Xianmeng Meng, Linglong Tan, Yueqin Wang","doi":"10.7717/peerj-cs.2121","DOIUrl":"https://doi.org/10.7717/peerj-cs.2121","url":null,"abstract":"Image segmentation is a crucial process in the field of image processing. Multilevel threshold segmentation is an effective image segmentation method, where an image is segmented into different regions based on multilevel thresholds for information analysis. However, the complexity of multilevel thresholding increases dramatically as the number of thresholds increases. To address this challenge, this article proposes a novel hybrid algorithm, termed differential evolution-golden jackal optimizer (DEGJO), for multilevel thresholding image segmentation using the minimum cross-entropy (MCE) as a fitness function. The DE algorithm is combined with the GJO algorithm for iterative updating of position, which enhances the search capacity of the GJO algorithm. The performance of the DEGJO algorithm is assessed on the CEC2021 benchmark function and compared with state-of-the-art optimization algorithms. Additionally, the efficacy of the proposed algorithm is evaluated by performing multilevel segmentation experiments on benchmark images. The experimental results demonstrate that the DEGJO algorithm achieves superior performance in terms of fitness values compared to other metaheuristic algorithms. Moreover, it also yields good results in quantitative performance metrics such as peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and feature similarity index (FSIM) measurements.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":null,"pages":null},"PeriodicalIF":3.8,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}