PeerJ Computer SciencePub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2465
Dávid Držík, Frantisek Forgac
{"title":"Slovak morphological tokenizer using the Byte-Pair Encoding algorithm.","authors":"Dávid Držík, Frantisek Forgac","doi":"10.7717/peerj-cs.2465","DOIUrl":"https://doi.org/10.7717/peerj-cs.2465","url":null,"abstract":"<p><p>This study introduces a new approach to text tokenization, SlovaK Morphological Tokenizer (SKMT), which integrates the morphology of the Slovak language into the training process using the Byte-Pair Encoding (BPE) algorithm. Unlike conventional tokenizers, SKMT focuses on preserving the integrity of word roots in individual tokens, crucial for maintaining lexical meaning. The methodology involves segmenting and extracting word roots from morphological dictionaries and databases, followed by <i>corpus</i> preprocessing and training SKMT alongside a traditional BPE tokenizer. Comparative evaluation against existing tokenizers demonstrates SKMT's outstanding ability to maintain root integrity, achieving 99.7% root integrity compared to SlovakBERT (90.5%) and a pureBPE tokenizer (93.1%). Further validation involved fine-tuning models on a sentiment classification NLP task, where models trained with SKMT achieved an F1-score improvement of 3.5% over those trained with conventional BPE tokenization, followed by a focus on the Semantic Textual Similarity (STS) task. These findings suggest that training language models on the SKMT tokenizer significantly enhances model performance and quality.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2465"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622830/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2493
Yuanzhi Huo, Mengjie Jin, Sicong You
{"title":"A study of hybrid deep learning model for stock asset management.","authors":"Yuanzhi Huo, Mengjie Jin, Sicong You","doi":"10.7717/peerj-cs.2493","DOIUrl":"10.7717/peerj-cs.2493","url":null,"abstract":"<p><p>Crafting a lucrative stock trading strategy is pivotal in the realm of investments. However, the task of devising such a strategy becomes challenging task the intricate and ever-changing situation of the stock market. In recent years, with the development of artificial intelligence (AI), some AI technologies have been proven to be successfully applied in stock price and asset management. For example, long short-term memory networks (LSTM) can be used for predicting stock price variation, reinforcement learning (RL) can be used for control stock trading, however, they are generally used separately and cannot achieve simultaneous prediction and trading. In this study, we propose a hybrid deep learning model to predict stock prices and control stock trading to manage assets. LSTM is responsible for predicting stock prices, while RL is responsible for stock trading based on the predicted price trends. Meanwhile, to reduce uncertainty in the stock market and maximize stock assets, the proposed LSTM model can predict the average directional index (ADX) to comprehend the stock trends in advance and we also propose several constraints to assist assets management, thereby reducing the risk and maximizing the stock assets. In our results, the hybrid model yields an average <i>R</i> <sup>2</sup> value of 0.94 when predicting price variations. Moreover, employing the proposed approach, which integrates ADX and constraints, the hybrid model augments stock assets to 1.05 times than initial assets.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2493"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639306/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2453
Ying Hu, Xiongyan Liu, Hao Chen
{"title":"Optimal tuning of multi-PID controller using improved CMOCSO algorithm.","authors":"Ying Hu, Xiongyan Liu, Hao Chen","doi":"10.7717/peerj-cs.2453","DOIUrl":"10.7717/peerj-cs.2453","url":null,"abstract":"<p><p>To mitigate synchronization errors within a multi-PID controller system and enhance its resistance to interference, an improved competitive and cooperative swarm optimizer for constrained multi-objective optimization (CMOCSO) algorithm is employed to optimize the parameters of the multi-PID controller. Initially, a mathematical model representing the constrained multi-objective problem associated with the multi-PID controller is formulated. In this model, the parameters are designated as decision variables, the performance index serves as the objective function, and the stability constraints of the system are incorporated. Subsequently, an improved CMOCSO algorithm is introduced, which bifurcates the evolutionary process into two distinct stages using a central point-moving strategy; each stage employs different evolutionary techniques to accelerate convergence rates, and a novel grouping strategy is implemented to increase the learning efficiency of the population. The efficacy of the algorithm is evaluated through testing on 16 standard functions, demonstrating its effectiveness in addressing constrained multi-objective problems. Ultimately, the algorithm is applied to optimize the parameters of the multi-PID controller. The simulation results indicate that the proposed method yields superior control performance, reduced synchronization errors, and notable interference resistance capacity.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2453"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2484
Aasma Akram, Fatima Anjum, Sajid Latif, Muhammad Imran Zulfiqar, Mohsin Nazir
{"title":"Honey bee inspired resource allocation scheme for IoT-driven smart healthcare applications in fog-cloud paradigm.","authors":"Aasma Akram, Fatima Anjum, Sajid Latif, Muhammad Imran Zulfiqar, Mohsin Nazir","doi":"10.7717/peerj-cs.2484","DOIUrl":"10.7717/peerj-cs.2484","url":null,"abstract":"<p><p>The Internet of Things (IoT) paradigm is a foundational and integral factor for the development of smart applications in different sectors. These applications are comprised over set of interconnected modules that exchange data and realize the distributed data flow (DDF) model. The execution of these modules on distant cloud data-center is prone to quality of service (QoS) degradation. This is where fog computing philosophy comes in to bridge this gap and bring the computation closer to the IoT devices. However, resource management in fog and optimal allocation of fog devices to application modules is critical for better resource utilization and achieve QoS. Significant challenge in this regard is to manage the fog network dynamically to determine cost effective placement of application modules on resources. In this study, we propose the optimal placement strategy for smart health-care application modules on fog resources. The objective of this strategy is to ensure optimal execution in terms of latency, bandwidth and earliest completion time as compared to few baseline techniques. A honey bee inspired strategy has been proposed for allocation and utilization of the resource for application module processing. In order to model the application and measure the effectiveness of our strategy, iFogSim Java-based simulation classes have been extended and conduct the experiments that demonstrate the satisfactory results.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2484"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-19eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2514
Oğuz Mısır
{"title":"Drivable path detection for a mobile robot with differential drive using a deep Learning based segmentation method for indoor navigation.","authors":"Oğuz Mısır","doi":"10.7717/peerj-cs.2514","DOIUrl":"10.7717/peerj-cs.2514","url":null,"abstract":"<p><p>The integration of artificial intelligence into the field of robotics enables robots to perform their tasks more meaningfully. In particular, deep-learning methods contribute significantly to robots becoming intelligent cybernetic systems. The effective use of deep-learning mobile cyber-physical systems has enabled mobile robots to become more intelligent. This effective use of deep learning can also help mobile robots determine a safe path. The drivable pathfinding problem involves a mobile robot finding the path to a target in a challenging environment with obstacles. In this paper, a semantic-segmentation-based drivable path detection method is presented for use in the indoor navigation of mobile robots. The proposed method uses a perspective transformation strategy based on transforming high-accuracy segmented images into real-world space. This transformation enables the motion space to be divided into grids, based on the image perceived in a real-world space. A grid-based RRT* navigation strategy was developed that uses images divided into grids to enable the mobile robot to avoid obstacles and meet the optimal path requirements. Smoothing was performed to improve the path planning of the grid-based RRT* and avoid unnecessary turning angles of the mobile robot. Thus, the mobile robot could reach the target in an optimum manner in the drivable area determined by segmentation. Deeplabv3+ and ResNet50 backbone architecture with superior segmentation ability are proposed for accurate determination of drivable path. Gaussian filter was used to reduce the noise caused by segmentation. In addition, multi-otsu thresholding was used to improve the masked images in multiple classes. The segmentation model and backbone architecture were compared in terms of their performance using different methods. DeepLabv3+ and ResNet50 backbone architectures outperformed the other compared methods by 0.21%-4.18% on many metrics. In addition, a mobile robot design is presented to test the proposed drivable path determination method. This design validates the proposed method by using different scenarios in an indoor environment.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2514"},"PeriodicalIF":3.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639217/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-18eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2445
Helen L Smith, Patrick J Biggs, Nigel P French, Adam N H Smith, Jonathan C Marshall
{"title":"Out of (the) bag-encoding categorical predictors impacts out-of-bag samples.","authors":"Helen L Smith, Patrick J Biggs, Nigel P French, Adam N H Smith, Jonathan C Marshall","doi":"10.7717/peerj-cs.2445","DOIUrl":"10.7717/peerj-cs.2445","url":null,"abstract":"<p><p>Performance of random forest classification models is often assessed and interpreted using out-of-bag (OOB) samples. Observations which are OOB when a tree is trained may serve as a test set for that tree and predictions from the OOB observations used to calculate OOB error and variable importance measures (VIM). OOB errors are popular because they are fast to compute and, for large samples, are a good estimate of the true prediction error. In this study, we investigate how target-based <i>vs</i>. target-agnostic encoding of categorical predictor variables for random forest can bias performance measures based on OOB samples. We show that, when categorical variables are encoded using a target-based encoding method, and when the encoding takes place prior to bagging, the OOB sample can underestimate the true misclassification rate, and overestimate variable importance. We recommend using a separate test data set when evaluating variable importance and/or predictive performance of tree based methods that utilise a target-based encoding method.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2445"},"PeriodicalIF":3.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142802804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-18eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2473
Dan Yang, Xiaoling Miao
{"title":"Predicting hotel booking cancellations using tree-based neural network.","authors":"Dan Yang, Xiaoling Miao","doi":"10.7717/peerj-cs.2473","DOIUrl":"10.7717/peerj-cs.2473","url":null,"abstract":"<p><p>In the hospitality business, cancellations negatively affect the precise estimation of revenue management. With today's powerful computational advances, it is feasible to develop a model to predict cancellations to reduce the risks for business owners. Although these models have not yet been tested in real-world conditions, several prototypes were developed and deployed in two hotels. The their main goal was to study how these models could be incorporated into a decision support system and to assess their influence on demand-management decisions. In our study, we introduce a tree-based neural network (TNN) that combines a tree-based learning algorithm with a feed-forward neural network as a computational method for predicting hotel booking cancellation. Experimental results indicated that the TNN model significantly improved the predictive power on two benchmark datasets compared to tree-based models and baseline artificial neural networks alone. Also, the preliminary success of our study confirmed that tree-based neural networks are promising in dealing with tabular data.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2473"},"PeriodicalIF":3.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-18eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2501
Büşra Çalmaz, Belgin Ergenç Bostanoğlu
{"title":"k-Clique counting on large scale-graphs: a survey.","authors":"Büşra Çalmaz, Belgin Ergenç Bostanoğlu","doi":"10.7717/peerj-cs.2501","DOIUrl":"10.7717/peerj-cs.2501","url":null,"abstract":"<p><p>Clique counting is a crucial task in graph mining, as the count of cliques provides different insights across various domains, social and biological network analysis, community detection, recommendation systems, and fraud detection. Counting cliques is algorithmically challenging due to combinatorial explosion, especially for large datasets and larger clique sizes. There are comprehensive surveys and reviews on algorithms for counting subgraphs and triangles (three-clique), but there is a notable lack of reviews addressing k-clique counting algorithms for k > 3. This paper addresses this gap by reviewing clique counting algorithms designed to overcome this challenge. Also, a systematic analysis and comparison of exact and approximation techniques are provided by highlighting their advantages, disadvantages, and suitability for different contexts. It also presents a taxonomy of clique counting methodologies, covering approximate and exact methods and parallelization strategies. The paper aims to enhance understanding of this specific domain and guide future research of k-clique counting in large-scale graphs.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2501"},"PeriodicalIF":3.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-13eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2385
Heng Guo
{"title":"Design of judicial public opinion supervision and intelligent decision-making model based on Bi-LSTM.","authors":"Heng Guo","doi":"10.7717/peerj-cs.2385","DOIUrl":"10.7717/peerj-cs.2385","url":null,"abstract":"<p><p>Fuzzy preference modeling in intelligent decision support systems aims to improve the efficiency and accuracy of decision-making processes by incorporating fuzzy logic and preference modeling techniques. While network public opinion (NPO) has the potential to drive judicial reform and progress, it also poses challenges to the independence of the judiciary due to the negative impact of malicious public opinion. To tackle this issue within the context of intelligent decision support systems, this study provides an insightful overview of current NPO monitoring technologies. Recognizing the complexities associated with handling large-scale NPO data and mitigating significant interference, a novel judicial domain NPO monitoring model is proposed, which centers around semantic feature analysis. This model takes into account time series characteristics, binary semantic fitting, and public sentiment intensity. Notably, it leverages a bidirectional long short-term memory (Bi-LSTM) network (S-Bi-LSTM) to construct a judicial domain semantic similarity calculation model. The semantic similarity values between sentences are obtained through the utilization of a fully connected layer. Empirical evaluations demonstrate the remarkable performance of the proposed model, achieving an accuracy rate of 85.9% and an F1 value of 87.1 on the test set, surpassing existing sentence semantic similarity models. Ultimately, the proposed model significantly enhances the monitoring capabilities of judicial authorities over NPO, thereby alleviating the burden on public relations faced by judicial institutions and fostering a more equitable execution of judicial power.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2385"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PeerJ Computer SciencePub Date : 2024-11-13eCollection Date: 2024-01-01DOI: 10.7717/peerj-cs.2417
Ming Xu, Jinwei Cui, Xiaoyu Ma, Zhiyi Zou, Zhisheng Xin, Muhammad Bilal
{"title":"Image enhancement with art design: a visual feature approach with a CNN-transformer fusion model.","authors":"Ming Xu, Jinwei Cui, Xiaoyu Ma, Zhiyi Zou, Zhisheng Xin, Muhammad Bilal","doi":"10.7717/peerj-cs.2417","DOIUrl":"10.7717/peerj-cs.2417","url":null,"abstract":"<p><p>Graphic design, as a product of the burgeoning new media era, has seen its users' requirements for images continuously evolve. However, external factors such as light and noise often cause graphic design images to become distorted during acquisition. To enhance the definition of these images, this paper introduces a novel image enhancement model based on visual features. Initially, a histogram equalization (HE) algorithm is applied to enhance the graphic design images. Subsequently, image feature extraction is performed using a dual-flow network comprising convolutional neural network (CNN) and Transformer architectures. The CNN employs a residual dense block (RDB) to embed spatial local structure information with varying receptive fields. An improved attention mechanism module, attention feature fusion (AFF), is then introduced to integrate the image features extracted from the dual-flow network. Finally, through image perception quality guided adversarial learning, the model adjusts the initial enhanced image's color and recovers more details. Experimental results demonstrate that the proposed algorithm model achieves enhancement effects exceeding 90% on two large image datasets, which represents a 5%-10% improvement over other models. Furthermore, the algorithm exhibits superior performance in terms of peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) image quality evaluation metrics. Our findings indicate that the fusion model significantly enhances image quality, thereby advancing the field of graphic design and showcasing its potential in cultural and creative product design.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2417"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}