PeerJ Computer Science最新文献

筛选
英文 中文
Improving machine learning detection of Alzheimer disease using enhanced manta ray gene selection of Alzheimer gene expression datasets. 利用增强的蝠鲼基因选择阿尔茨海默病基因表达数据集改进机器学习检测阿尔茨海默病。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-14 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3064
Zahraa Ahmed, Mesut Çevik
{"title":"Improving machine learning detection of Alzheimer disease using enhanced manta ray gene selection of Alzheimer gene expression datasets.","authors":"Zahraa Ahmed, Mesut Çevik","doi":"10.7717/peerj-cs.3064","DOIUrl":"10.7717/peerj-cs.3064","url":null,"abstract":"<p><p>One of the most prominent neurodegenerative diseases globally is Alzheimer's disease (AD). The early diagnosis of AD is a challenging task due to complex pathophysiology caused by the presence and accumulation of neurofibrillary tangles and amyloid plaques. However, the late enriched understanding of the genetic underpinnings of AD has been made possible due to recent advancements in data mining analysis methods, machine learning, and microarray technologies. However, the \"curse of dimensionality\" caused by the high-dimensional microarray datasets impacts the accurate prediction of the disease due to issues of overfitting, bias, and high computational demands. To alleviate such an effect, this study proposes a gene selection approach based on the parameter-free and large-scale manta ray foraging optimization algorithm. Given the dimensional disparities and statistical relationship distributions of the six investigated datasets, in addition to four evaluated machine learning classifiers; the proposed Sign Random Mutation and Best Rank enhancements that substantially improved MRFO's exploration and exploitation contributed to efficient identification of relevant genes and to machine learning improved prediction accuracy.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3064"},"PeriodicalIF":2.5,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-step partitioning combined with SOM neural network-based clustering technique effectively improves SAT solver performance. 多步划分与基于SOM神经网络的聚类技术相结合,有效地提高了求解器的性能。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-14 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3076
Siyu Yun, Xinsheng Wang
{"title":"Multi-step partitioning combined with SOM neural network-based clustering technique effectively improves SAT solver performance.","authors":"Siyu Yun, Xinsheng Wang","doi":"10.7717/peerj-cs.3076","DOIUrl":"https://doi.org/10.7717/peerj-cs.3076","url":null,"abstract":"<p><p>As the core engine of electronic design automation (EDA) tools, the efficiency of Boolean Satisfiability Problem (SAT) solver largely determines the cycle of integrated circuit research and development. The effectiveness of SAT solvers has steadily turned into the key bottleneck of circuit design cycle due to the dramatically increased integrated circuit scale. The primary issue of SAT solver now is the divergence between SAT used in industry and research on pure solution algorithms. We propose a strategy for partitioning the SAT problem based on the structural information then solving it. By effectively extracting the structure information from the original SAT problem, the self-organizing map (SOM) neural network deployed in the division section can speed up the sub-thread solver's processing while avoiding cumbersome parameter adjustments. The experimental results demonstrate the stability and scalability of our technique, which can drastically shorten the time required to solve industrial benchmarks from various sources.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3076"},"PeriodicalIF":2.5,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A literature survey of shapelet quality measures for time series classification. 用于时间序列分类的小块质量测度的文献综述。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-14 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3115
Teng Li, Xiaodong Guo, Cun Ji
{"title":"A literature survey of shapelet quality measures for time series classification.","authors":"Teng Li, Xiaodong Guo, Cun Ji","doi":"10.7717/peerj-cs.3115","DOIUrl":"10.7717/peerj-cs.3115","url":null,"abstract":"<p><p>With the rapid development of the Internet of Things, time series classification (TSC) has gained significant attention from researchers due to its applications in various real-world fields, including electroencephalogram/electrocardiogram classification, emotion recognition, and error message detection. To improve classification performance, numerous TSC methods have been proposed in recent years. Among these, shapelet-based TSC methods are particularly notable for their intuitive interpretability. A critical task within these methods is evaluating the quality of candidate shapelets. This paper provides a comprehensive survey of the state-of-the-art measures for assessing shapelet quality. To present a structured overview, we begin by proposing a taxonomy of these measures, followed by a detailed description of each one. We then discuss these measures, highlighting the challenges faced by current research and offering suggestions for future directions. Finally, we summarize the findings of this survey. We hope that this work will serve as a valuable resource for researchers in the field.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3115"},"PeriodicalIF":2.5,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing variable neighbourhood search algorithms for the direct aperture optimisation in radiotherapy. 比较放射治疗中直接孔径优化的可变邻域搜索算法。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-14 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3094
Mauricio Moyano, Keiny Meza-Vasquez, Gonzalo Tello-Valenzuela, Nicolle Ojeda-Ortega, Carolina Lagos, Guillermo Cabrera-Guerrero
{"title":"Comparing variable neighbourhood search algorithms for the direct aperture optimisation in radiotherapy.","authors":"Mauricio Moyano, Keiny Meza-Vasquez, Gonzalo Tello-Valenzuela, Nicolle Ojeda-Ortega, Carolina Lagos, Guillermo Cabrera-Guerrero","doi":"10.7717/peerj-cs.3094","DOIUrl":"10.7717/peerj-cs.3094","url":null,"abstract":"<p><p>Intensity modulated radiation therapy (IMRT) is a prevalent approach for administering radiation therapy in cancer treatment. The primary objective of IMRT is to devise a treatment strategy that eradicates cancer cells from the tumour while minimising damage to the surrounding organs at risk. Conventional IMRT planning entails a sequential procedure: optimising beam intensity for a certain set of angles, followed by sequencing. Unfortunately, treatment plans obtained in the optimisation stage are severely impaired after the sequencing stage due to physical and delivery constraints that are not considered during the optimisation stage. One method that tackles the issues above is the direct aperture optimisation (DAO) technique. The DAO problem seeks to generate a set of deliverable aperture configurations and a corresponding set of radiation intensities. This method accounts for physical and delivery time limitations, facilitating the creation of clinically appropriate treatment programs. In this article, we propose and compare two variable neighbourhood search (VNS) based algorithms, called variable neighbourhood descent (VND) and reduced variable neighbourhood search (rVNS). The VND algorithm is a deterministic variant of VNS that systematically explores different neighbourhood structures. This approach allows for a more thorough solution for space exploration while maintaining computational efficiency. The rVNS, unlike traditional VNS algorithms, does not require any transition rule, as it integrates a set of predefined neighbourhood moves at each iteration. We apply our proposed algorithms to prostate cancer cases, achieving highly competitive results for both algorithms. In particular, the proposed rVNS requires 62.75% fewer apertures and achieved a 63.93% reduction in beam-on time compared to the sequential approach's best case, which means treatment plans that can be delivered in less time. Additionally, we evaluate the clinical quality of the treatment plans using established dosimetric indicators, comparing our results against those produced by matRad's tool for DAO to assess target coverage and organ-at-risk sparing.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3094"},"PeriodicalIF":2.5,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453873/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data trace as the scientific foundation for trusted metrological data: a review for future metrology direction. 数据溯源是可信计量数据的科学基础——对未来计量方向的展望。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-14 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3106
Zhanshuo Cao, Boyong Gao, Zilong Liu, Xingchuang Xiong, Bin Wang, Chenbo Pei
{"title":"Data trace as the scientific foundation for trusted metrological data: a review for future metrology direction.","authors":"Zhanshuo Cao, Boyong Gao, Zilong Liu, Xingchuang Xiong, Bin Wang, Chenbo Pei","doi":"10.7717/peerj-cs.3106","DOIUrl":"10.7717/peerj-cs.3106","url":null,"abstract":"<p><p>In the context of the digital transformation of metrology, ensuring the trustworthiness and integrity of measurement data during its generation, transmission, and storage-<i>i.e</i>., trustworthy detection of measurement data-has become a critical challenge. Data traces are residual marks left during the data processing, which help identify malicious activities targeting measurement data. These traces are especially important when the trust and integrity of potential data evidence are under threat. To this end, this article systematically reviews relevant core techniques and analyzes various detection methods across the different stages of the data lifecycle, evaluating their applicability and limitations in identifying data tampering, unauthorized access, and anomalous operations. The findings suggest that trace detection technologies can enhance the traceability and transparency of metrological data, thereby providing technical support for building a trustworthy digital metrology system. This review lays the theoretical foundation for future research on developing automated anomaly detection models, improving forensic techniques for data tampering in measurement devices, and constructing multi-modal, full-lifecycle traceability frameworks for measurement data. Subsequent studies should focus on aligning these technologies with metrological standards and verifying their deployment in real-world measurement instruments.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3106"},"PeriodicalIF":2.5,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive review of ball detection techniques in sports. 体育运动中球检测技术的综合综述。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-12 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3079
Cristiano Moreira, Lino Ferreira, Paulo Jorge Coelho
{"title":"A comprehensive review of ball detection techniques in sports.","authors":"Cristiano Moreira, Lino Ferreira, Paulo Jorge Coelho","doi":"10.7717/peerj-cs.3079","DOIUrl":"10.7717/peerj-cs.3079","url":null,"abstract":"<p><p>Detecting balls in sports plays a pivotal role in enhancing game analysis, providing real-time data for spectators, and improving decision-making and strategic thinking for referees and coaches. This is a highly debated and researched topic, but most works focus on one sport. Effective generalization of a single method or algorithm to different sports is much harder to achieve. This article reviews methodologies and advancements in object detection tailored to ball detection across various sports. Traditional computer vision techniques and modern deep learning methods are visited, emphasizing their strengths, limitations, and adaptability to diverse game scenarios. The challenges of occlusion, dynamic backgrounds, varying ball sizes, and high-speed movements are identified and discussed. This review aims to consolidate existing knowledge, compare state-of-the-art detection models, highlight pivotal challenges and possible solutions, and propose future research directions. The article underscores the importance of optimizations for accurate and efficient ball detection, setting the foundation for next-generation sports analytics systems.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3079"},"PeriodicalIF":2.5,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss. 基于改进局部特征提取和远程依赖捕获的增强BERT模型在听力损失启动子预测中的应用。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-12 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3104
Jing Sun, Yangfan Huang, Jiale Fu, Li Teng, Xiao Liu, Xiaohua Luo
{"title":"An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss.","authors":"Jing Sun, Yangfan Huang, Jiale Fu, Li Teng, Xiao Liu, Xiaohua Luo","doi":"10.7717/peerj-cs.3104","DOIUrl":"10.7717/peerj-cs.3104","url":null,"abstract":"<p><p>Promoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often have limitations in simultaneously extracting local sequence features and long-range dependencies inherent in genomic data. To address this challenge, we propose DNABERT-CBL (DNABERT-2_CNN_BiLSTM), an enhanced BERT-based architecture that fuses a convolutional neural network (CNN) and a bidirectional long and short-term memory (BiLSTM) layer. The CNN module is able to capture local regulatory features, while the BiLSTM module can effectively model long-distance dependencies, enabling efficient integration of global and local features of promoter sequences. The models are optimized using three strategies: individual learning, cross-disease training and global training, and the performance of each module is verified by constructing comparison models with different combinations. The experimental results show that DNABERT-CBL outperforms the baseline DNABERT-2_BASE model in hearing loss promoter prediction, with a 20% reduction in loss, a 3.3% improvement in the area under the working characteristic curve (AUC) of the subjects, and a 5.8% improvement in accuracy at a sequence length of 600 base pairs. In addition, DNABERT-CBL consistently outperforms other state-of-the-art BERT-based genome models on several evaluation metrics, highlighting its superior generalization ability. Overall, DNABERT-CBL provides an effective framework for accurate promoter prediction, offers valuable insights into gene regulatory mechanisms, and supports the development of gene therapies for hearing loss and related diseases.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3104"},"PeriodicalIF":2.5,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453759/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of artwork resource management system based on block classification coding and bit plane rearrangement. 基于块分类编码和位平面重排的艺术品资源管理系统设计。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-12 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3092
Xiaomeng Xia
{"title":"Design of artwork resource management system based on block classification coding and bit plane rearrangement.","authors":"Xiaomeng Xia","doi":"10.7717/peerj-cs.3092","DOIUrl":"10.7717/peerj-cs.3092","url":null,"abstract":"<p><p>With the vigorous development of the art market, the management of art resources is confronted with increasingly difficult challenges, such as copyright protection, authenticity verification, and efficient storage. Currently, the digital watermarking and compression schemes applied to artworks struggle to achieve an effective balance among robustness, image quality preservation, and watermark capacity. Moreover, they lack sufficient scalability when dealing with large-scale datasets. To address these issues, this article proposes an innovative algorithm that integrates watermarking and compression for artwork images, namely the Block Classification Coding-Bit Plane Rearrangement-Integrated Compression and Watermark Embedding (BCC-BPR-ICWE) algorithm. By employing refined block classification coding (RS-BCC) and optimized bit plane rearrangement (BPR) techniques, this algorithm significantly enhances the watermark embedding capacity and robustness while ensuring image quality. Experimental results demonstrate that, compared to existing classical algorithms, the proposed method excels in terms of watermarked image quality (PSNR > 57 dB, SSIM = 0.9993), watermark capacity (0.5 bpp), and tampering recovery performance (PSNR = 41.17 dB, SSIM = 0.9993). The research in this article provides strong support for its practical application in large-scale art resource management systems. The proposed technique not only promotes the application of digital watermarking and compression technologies in the field of art management but also offers new ideas and directions for the future development of related technologies.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3092"},"PeriodicalIF":2.5,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel deep learning approach for predicting stone-free rates post-ESWL on uncontrasted CT. 一种新的深度学习方法预测非对比CT eswl后无结石率。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-11 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3111
Ozgur Efiloglu, Muhammed Yildirim, Kadir Yildirim, Harun Bingol, Mustafa Kaan Akalin, Meftun Culpan, Bilal Alatas, Asif Yildirim
{"title":"A novel deep learning approach for predicting stone-free rates post-ESWL on uncontrasted CT.","authors":"Ozgur Efiloglu, Muhammed Yildirim, Kadir Yildirim, Harun Bingol, Mustafa Kaan Akalin, Meftun Culpan, Bilal Alatas, Asif Yildirim","doi":"10.7717/peerj-cs.3111","DOIUrl":"10.7717/peerj-cs.3111","url":null,"abstract":"<p><p>Extracorporeal shock wave lithotripsy (ESWL) is one of the most often employed therapy methods for managing kidney stones. In our work, we sought to assess the efficacy of the artificial intelligence model developed using non-contrast computed tomography (CT) images in predicting stone-free rates for ESWL. The main difference between this study and other studies is that it proposes an artificial intelligence-based model that predicts the success of ESWL treatment using artificial intelligence methods. Data from 910 patients who underwent ESWL between January 2016 and June 2021 were analyzed retrospectively. Since the local binary pattern (LBP) and histogram of oriented gradients (HOG) feature extraction methods gave more successful results than other methods, a new feature map was obtained using the neighborhood component analysis (NCA) dimension reduction method after combining the features obtained using these methods. Then, the reduced feature map was classified into classifiers. In conclusion, we analyzed the effect of ESWL treatment using different artificial intelligence methods and found that the prediction accuracy was 94% on average. Results were obtained from seven different convolutional neural networks (CNNs) and two textural-based models in the study. Since textural-based models achieved the highest success among these models, these models were used as the base in the proposed model. The proposed model achieved better results than nine different models used in the study. When the results obtained from the proposed hybrid model for ESWL prediction are examined, this model will guide experts in the treatment of the disease.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3111"},"PeriodicalIF":2.5,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453815/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models. 基于神经机器翻译模型的低资源英-土耳其语对形态和结构复杂性分析。
IF 2.5 4区 计算机科学
PeerJ Computer Science Pub Date : 2025-08-11 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3072
Mehmet Acı, Nisa Vuran Sarı, Çiğdem İnan Acı
{"title":"Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models.","authors":"Mehmet Acı, Nisa Vuran Sarı, Çiğdem İnan Acı","doi":"10.7717/peerj-cs.3072","DOIUrl":"10.7717/peerj-cs.3072","url":null,"abstract":"<p><p>Neural machine translation (NMT) has achieved remarkable success in high-resource language pairs; however, its effectiveness for morphologically rich and low-resource languages like Turkish remains underexplored. As a highly agglutinative and morphologically complex language with limited high-quality parallel data, Turkish serves as a representative case for evaluating NMT systems on low-resource and linguistically challenging settings. Its structural divergence from English makes it a critical testbed for assessing tokenization strategies, attention mechanisms, and model generalizability in neural translation. This study investigates the comparative performance of two prominent NMT paradigms-the Transformer architecture, and recurrent-based sequence-to-sequence (Seq2Seq) models with attention for both English-to-Turkish and Turkish-to-English translation. The models are evaluated under various configurations, including different tokenization strategies (Byte Pair Encoding (BPE) <i>vs</i>. Word Tokenization), attention mechanisms (Bahdanau and an exploratory hybrid mechanism combining Bahdanau and Scaled Dot-Product attention), and architectural depths (layer count and attention head number). Extensive experiments using automatic metrics such as BiLingual Evaluation Understudy (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR), and Translation Error Rate (TER) reveal that the Transformer model with three layers, eight attention heads, and BPE tokenization achieved the best performance, obtaining a BLEU score of 47.85 and METEOR score of 44.62 in the English-to-Turkish direction. Similar performance trends were observed in the reverse direction, indicating the model's generalizability. These findings highlight the potential of carefully optimized Transformer-based NMT systems in handling the complexities of morphologically rich, low-resource languages like Turkish in both translation directions.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3072"},"PeriodicalIF":2.5,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信