Engineering Applications of Artificial Intelligence最新文献_第8页

A multimodal multi-scale fusion network for leak detection in marine piping systems 船舶管道系统泄漏检测的多模态多尺度融合网络

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-29 DOI: 10.1016/j.engappai.2025.112545

Peng Zhang , Chaozhe Li , Shitao Peng , Bomu Tian , Si Luo , Yuewen Zhang , Taili Du

{"title":"A multimodal multi-scale fusion network for leak detection in marine piping systems","authors":"Peng Zhang , Chaozhe Li , Shitao Peng , Bomu Tian , Si Luo , Yuewen Zhang , Taili Du","doi":"10.1016/j.engappai.2025.112545","DOIUrl":"10.1016/j.engappai.2025.112545","url":null,"abstract":"<div><div>Marine system monitoring data inherently exhibit multimodal characteristics, making artificial intelligence-driven correlation and fusion essential for improving fault feature recognition. However, existing intelligent diagnosis methods mostly focus on feature fusion within homogeneous data types, such as fusing multiple time-series signals or multiple image sets, while systematic exploration of joint representation learning across heterogeneous dimensions remains under-explored. This limitation constrains the recognition capability for complex failure modes. Meanwhile, the inherent differences in physical meanings and representations of multimodal data pose significant challenges in constructing effective correlations, often limiting the performance of mainstream machine learning based fault diagnosis approaches. The proposed method enhances the fault diagnosis capability of mainstream approaches through the fusion of multi-sensor data and visual data, with its core innovation residing in a multimodal fusion framework leveraging attention mechanisms to effectively integrate cross-dimensional representations of multivariate time-series data and imaging data. Compared to existing multimodal transformer techniques, this dual-strategy architecture enables the model to simultaneously capture shared systemic behaviors and modality-unique signatures, substantially elevating diagnosis precision. Experimental validation on real-world leak detection datasets demonstrates that the proposed model achieves F1-scores consistently surpassing 90 % across diverse marine monitoring scenarios, with quantitative evaluations further confirming its superior performance over conventional multivariate time-series diagnosis methods in establishing multimodal correlations, conclusively validating both technical excellence and engineering practicability.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112545"},"PeriodicalIF":8.0,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An intrusion detection system for critical infrastructures: Modbus approach 关键基础设施入侵检测系统：Modbus方法

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-29 DOI: 10.1016/j.engappai.2025.112410

Murat Varol , Murat İskefiyeli

{"title":"An intrusion detection system for critical infrastructures: Modbus approach","authors":"Murat Varol , Murat İskefiyeli","doi":"10.1016/j.engappai.2025.112410","DOIUrl":"10.1016/j.engappai.2025.112410","url":null,"abstract":"<div><div>This study aims to develop an Intrusion Detection System (IDS) using deep learning and machine learning algorithms to detect cyber attacks in the network traffic of critical infrastructures using an artificial intelligence-based approach. The research investigates various machine learning algorithms, datasets, and performance evaluations to detect the security vulnerabilities commonly found in industrial networks. Implemented in Python, the system has been tested on hybrid dataset, demonstrating the performance of different algorithms in terms of accuracy, precision, and other metrics. From artificial intelligence perspective, this study contributes machine learning and deep learning in cybersecurity, showing how normal and ensemble models can effectively detect complex threats, with fewer features but more relevant. The research employs supervised learning techniques, leveraging labeled datasets to train models that can accurately classify network traffic as either normal or attack, ensuring high detection accuracy. From an engineering standpoint, the system’s Python implementation addresses the practical challenges of real-world deployment in industrial control systems (ICS) and facilitates integration with existing infrastructures. Additionally, the custom dataset and post-dissector code contribute to the field of industrial cybersecurity, providing engineers with tools for testing, validating, and optimizing IDS solutions. As cyber–physical systems are increasingly integrated into ICS, the proposed IDS provides a crucial layer of defense against cyber threats, safeguarding both the digital and physical components of critical infrastructure. The findings reveal that the proposed system exhibits high performance in terms of detection accuracy. The results show that the system provides an effective and reliable detection mechanism using artificial intelligence techniques.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112410"},"PeriodicalIF":8.0,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145223164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fusing spatial–temporal information into deep learning via wind propagation theory to enhance wind power prediction 利用风传播理论将时空信息融合到深度学习中，增强风电预测能力

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-28 DOI: 10.1016/j.engappai.2025.112494

Maolin He, Jujie Wang

{"title":"Fusing spatial–temporal information into deep learning via wind propagation theory to enhance wind power prediction","authors":"Maolin He, Jujie Wang","doi":"10.1016/j.engappai.2025.112494","DOIUrl":"10.1016/j.engappai.2025.112494","url":null,"abstract":"<div><div>Accurately predicting wind power poses significant challenges because of the inherent randomness and intermittency of wind speed, thereby impeding effective wind power scheduling. This study proposes an improved deep learning model which leverages wind propagation theory to uncover spatial–temporal relationships among wind turbines to enhance the performance of wind power prediction. In addition, comprehensive theoretical and empirical analyses are conducted to justify the effectiveness of leveraging wind propagation theory for capturing spatio-temporal relationships among wind turbines. Moreover, spatio-temporal dependencies are modeled through a dual mechanism: multi-channel independent modeling for per-turbine temporal dynamics and wind propagation-based matrix computations for inter-turbine spatial relationships, which together significantly reduce computational complexity while preserving predictive performance. Data from 134 wind turbines and six comparison models were employed to validate the robustness and effectiveness of the proposed model. Empirical results indicate that the proposed model outperforms the baseline models, achieving an average improvement of 6.19% in Root Mean Square Error and 7.05% in Mean Absolute Error.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112494"},"PeriodicalIF":8.0,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inter-graph and Intra-graph: Utilizing global financial markets and constituent stocks for stock index prediction Inter-graph and Intra-graph：利用全球金融市场和成分股进行股指预测

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-28 DOI: 10.1016/j.engappai.2025.112437

Yong Shi , Yunong Wang , Jie Wu

{"title":"Inter-graph and Intra-graph: Utilizing global financial markets and constituent stocks for stock index prediction","authors":"Yong Shi , Yunong Wang , Jie Wu","doi":"10.1016/j.engappai.2025.112437","DOIUrl":"10.1016/j.engappai.2025.112437","url":null,"abstract":"<div><div>Stock index prediction is a significant yet difficult undertaking due to its incorporation of complex and diverse information. Following the implementation of Graph Neural Networks in financial data analysis, numerous researchers have focused on the node-level task of forecasting individual stock movements by analyzing the relationships between stocks. However, two key challenges remain: first, realizing different speeds of feature propagation among nodes in graph representation learning; second, predicting stock indices by extracting and aggregating fluctuations from constituent stocks through graph-level tasks remains unaddressed. To tackle these challenges, this paper proposes a novel spatio-temporal prediction framework combining both node-level and graph-level tasks. The framework includes two types of graphs: inter-graph and intra-graph, which combine information from the micro, meso, and macro dimensions. For the inter-graph at the node level, we introduce the Granger causality test as an innovative node filtering method, which realizes the propagation of features between nodes with different strengths and speeds in the process of graph representation learning. For the intra-graph at the graph level, we examine various graph pooling methods and pooling proportions of stock index constituents to enhance the interpretability of the results and to provide new theoretical insights for stock index prediction. In conclusion, we develop the Graph Representation Learning-based Long Short-Term Memory (GRL-LSTM) model for forecasting stock index movements, and demonstrate the superiority of our approach on four major Chinese stock markets.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112437"},"PeriodicalIF":8.0,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A hyperparameter-fusion neural networks for deposition prediction 沉积预测的超参数融合神经网络

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-28 DOI: 10.1016/j.engappai.2025.112434

Li Ding , Kun Pang , Junjie Li , Hua Shao , Nan Liu , Rui Chen , Zhiqiang Li , Zhenjie Yao , Ling Li

{"title":"A hyperparameter-fusion neural networks for deposition prediction","authors":"Li Ding , Kun Pang , Junjie Li , Hua Shao , Nan Liu , Rui Chen , Zhiqiang Li , Zhenjie Yao , Ling Li","doi":"10.1016/j.engappai.2025.112434","DOIUrl":"10.1016/j.engappai.2025.112434","url":null,"abstract":"<div><div>As integrated circuit manufacturing processes develop into the nanometer scale, precise control and prediction of the deposition process have become crucial. Nanoscale manufacturing imposes unprecedentedly high demands on film quality, uniformity, and consistency, presenting significant challenges to traditional control and prediction methodologies. This study proposes a novel approach that, for the first time, formulates the thin-film deposition process as a video prediction task, enabling the use of deep learning for morphological forecasting under varying process conditions, and introduces a novel hyperparameter-fusion neural network, referred to as DepositionNet (DepoNet). Unlike conventional video prediction models, DepoNet specifically accounts for the influence of deposition parameters on the entire simulation process. We have incorporated a novel Hyper Projector that allows the model to flexibly adapt to varying deposition conditions and material characteristics. Through comprehensive comparative experimental analyses, we demonstrate that DepoNet significantly outperforms existing deep-learning models and achieves a mean squared error of 17.34, representing a 3.67% improvement over the second best model and a 1,435<span><math><mo>×</mo></math></span> speedup over physics-based methods, thereby validating its exceptional generalization capability. Extensive experiments reveal that the model maintains high performance even under conditions of limited training data, for instance, achieving a peak signal-to-noise ratio (PSNR) of 41.516 decibels (dB) when trained with only 20% of the available data.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112434"},"PeriodicalIF":8.0,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multilayer perceptron-based offspring prediction model for constrained multi-objective optimization 基于多层感知器的约束多目标优化子代预测模型

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-27 DOI: 10.1016/j.engappai.2025.112428

Qianlong Dang, Ruihuan Luo, Linlin Xie, Xiaochuan Gao, Weiting Bai

{"title":"Multilayer perceptron-based offspring prediction model for constrained multi-objective optimization","authors":"Qianlong Dang, Ruihuan Luo, Linlin Xie, Xiaochuan Gao, Weiting Bai","doi":"10.1016/j.engappai.2025.112428","DOIUrl":"10.1016/j.engappai.2025.112428","url":null,"abstract":"<div><div>Constrained multi-objective optimization problems generally have both multiple constraint violations and conflicting objective functions. Some of them not only have sparse feasible regions, but also are difficult to converge. For these problems, the evolutionary operators used in traditional constrained multi-objective evolutionary algorithms (CMOEAs) are difficult to generate solutions with ideal quality. Therefore, this paper proposes a multilayer perceptron-based offspring prediction model for constrained multi-objective optimization (MOPCMO). Specifically, an evolutionary direction guidance strategy is designed that utilizes historical populations as training data to train a multilayer perceptron, which guides the evolution of the population by predicting and generating offspring, thereby improving the overall evolutionary efficiency of the algorithm. In addition, as the population iterates, evolutionary direction guidance strategy adaptively transforms the training data of multilayer perceptron. Finally, the multilayer perceptron is intermittently updated and uses an evolutionary direction guidance strategy to generate promising offspring, guiding the algorithm to achieve efficient search. Compared with seven state-of-the-art CMOEAs on 33 benchmark test problems and 8 engineering application problems, MOPCMO achieves excellent performance.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112428"},"PeriodicalIF":8.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development of data-driven predictive model and enhanced multiobjective optimization to improve the excavation performance of large-diameter slurry shields 开发数据驱动的预测模型和增强多目标优化，提高大直径泥浆盾构开挖性能

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-27 DOI: 10.1016/j.engappai.2025.112402

Feiming Su , Xianguo Wu , Tiejun Li , Yang Liu

{"title":"Development of data-driven predictive model and enhanced multiobjective optimization to improve the excavation performance of large-diameter slurry shields","authors":"Feiming Su , Xianguo Wu , Tiejun Li , Yang Liu","doi":"10.1016/j.engappai.2025.112402","DOIUrl":"10.1016/j.engappai.2025.112402","url":null,"abstract":"<div><div>Safety, efficiency and energy consumption are important aspects for evaluating the performance of large-diameter slurry shield, and improving the performance of shield is crucial for safe and efficient excavation. To this end, a data-driven hybrid method is developed to improve the excavation performance of large-diameter slurry shields by intelligence regulating shield parameters. This method combines Bayesian Optimization with categorical boosting (BO-CatBoost) and enhanced multiobjective evolutionary algorithm based on decomposition (EMOEA/D). The method uses surface settlement, penetration and specific energy as output targets and employs the expert knowledge to select the input parameters. Subsequently, the trained BO-CatBoost model is employed to fit the input-output relationship. On this basis, the multiobjective optimization process was performed using EMOEA/D, with the important parameters determined by Shapley Additive exPlanations as decision variables and the nonlinear relationship fitted by BO-CatBoost as the objective function. Finally, the technique for order preference similarity to ideal solution is applied to obtain optimal operational parameters, thereby enhancing the excavation performance of large-diameter slurry shield. The proposed method is applied to a Wuhan rail transit line to verify the effectiveness, and the result shows that: (1) Our method can accurately predict the three targets with goodness of fit ranging from 0.938 to 0.988, respectively. (2) The proposed method can effectively improve the excavation performance of the large-diameter slurry shield, and reaches 13.88 %, 5.21 %, and 10.88 %, respectively. (3) An adaptive decision-making system for setting operational parameters is constructed, which is valuable for formulating of operational control strategies for large-diameter slurry shields.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112402"},"PeriodicalIF":8.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145160232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Beyond trial-and-error: Predicting user abandonment after a moderation intervention 超越试错：在适度干预后预测用户放弃

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-27 DOI: 10.1016/j.engappai.2025.112375

Benedetta Tessa , Lorenzo Cima , Amaury Trujillo , Marco Avvenuti , Stefano Cresci

{"title":"Beyond trial-and-error: Predicting user abandonment after a moderation intervention","authors":"Benedetta Tessa , Lorenzo Cima , Amaury Trujillo , Marco Avvenuti , Stefano Cresci","doi":"10.1016/j.engappai.2025.112375","DOIUrl":"10.1016/j.engappai.2025.112375","url":null,"abstract":"<div><div>Current content moderation follows a reactive, trial-and-error approach, where interventions are applied and their effects are only measured post-hoc. In contrast, we introduce a proactive, predictive approach that enables moderators to anticipate the impact of their actions before implementation. We propose and tackle the new task of predicting user abandonment following a moderation intervention. We study the reactions of 16,540 users to a massive ban of online communities on Reddit, training a set of binary classifiers to identify those users who would abandon the platform after the intervention—a problem of great practical relevance. We leverage a dataset of 13.8 million posts to compute a large and diverse set of 142 features, which convey information about the activity, toxicity, relations, and writing style of the users. We obtain promising results, with the best-performing model achieving <em>micro F1-score</em> <span><math><mrow><mo>=</mo><mn>0</mn><mo>.</mo><mn>914</mn></mrow></math></span>. Our model shows robust generalizability when applied to users from previously unseen communities. Furthermore, we identify activity features as the most informative predictors, followed by relational and toxicity features, while writing style features exhibit limited utility. Theoretically, our results demonstrate the feasibility of adopting a predictive machine learning approach to estimate the effects of moderation interventions. Practically, this work marks a fundamental shift from reactive to predictive moderation, equipping platform administrators with intelligent tools to strategically plan interventions, minimize unintended consequences, and optimize user engagement.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112375"},"PeriodicalIF":8.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145160241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A three-stage segmentation framework for lung cancer lesion isolation in three-dimensional positron emission tomography images 三维正电子发射断层成像中肺癌病灶分离的三阶段分割框架

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-27 DOI: 10.1016/j.engappai.2025.112507

Yusheng Wu , Qiang Lin , Jingjun Wei , Yongchun Cao , Zhengxing Man , Xiaodi Huang

{"title":"A three-stage segmentation framework for lung cancer lesion isolation in three-dimensional positron emission tomography images","authors":"Yusheng Wu , Qiang Lin , Jingjun Wei , Yongchun Cao , Zhengxing Man , Xiaodi Huang","doi":"10.1016/j.engappai.2025.112507","DOIUrl":"10.1016/j.engappai.2025.112507","url":null,"abstract":"<div><h3>Background</h3><div>Positron emission tomography (PET) is a critical functional medical imaging modality for the early detection and diagnosis of cancers. PET imaging faces several challenges that hinder accurate interpretation including its inherently low spatial resolution, substantial variability in cancer lesions’ appearance, and difficulties distinguishing between the image background and benign lesions.</div></div><div><h3>Methods</h3><div>We propose a novel three-stage image segmentation framework to enhance the accuracy of lung cancer lesion identification and extraction from three-dimensional (3D) PET images. The first stage conducts a coarse segmentation using an encoder-decoder structure network to roughly position lesions. The second stage employs a multi-layer feature extraction network to learn the detailed characteristics of coarse segmentation results, mitigating false positives caused by localization inaccuracy. The last stage further refines the extracted features via dividing a sub-region of the lesion into foreground and background branches, reducing false positives caused by over-segmentation of edges. A novel lesion count loss function is introduced to guide the model to generate predictions during the training, ensuring that the predicted lesion counts align with the ground truth labels.</div></div><div><h3>Results</h3><div>The proposed method was evaluated on clinical 3D PET image datasets. Experimental results demonstrated a <em>Dice Similarity Coefficient</em> (DSC) of 85.35 %, <em>Accuracy</em> of 83.97 %, and <em>Recall</em> of 86.83 %. Compared to existing models applied to the same datasets, our method consistently achieved superior performance.</div></div><div><h3>Conclusion</h3><div>The proposed method significantly improves the segmentation performance of lung cancer lesions, implying that our method holds substantial potential for broader clinical application, even in low-resolution images.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112507"},"PeriodicalIF":8.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A streaming variable neural speech codec 一个流式可变神经语音编解码器

IF 8 2区计算机科学

Engineering Applications of Artificial Intelligence Pub Date : 2025-09-27 DOI: 10.1016/j.engappai.2025.112418

Huaifeng Zhang, Pengfei Wu, Guigeng Li, Yuan An, Hao Zhang

{"title":"A streaming variable neural speech codec","authors":"Huaifeng Zhang, Pengfei Wu, Guigeng Li, Yuan An, Hao Zhang","doi":"10.1016/j.engappai.2025.112418","DOIUrl":"10.1016/j.engappai.2025.112418","url":null,"abstract":"<div><div>This paper presents a variable bit rate streaming neural speech codec designed for ultra-low bit rate scenarios, based on the SoundStream network framework. The codec employs the vector quantized variational auto-encoder (VQ-VAE) algorithm to capture the temporal structure and spectral characteristics of the speech signal, and constructs a latent space codebook to facilitate the effective mapping of feature vectors to discrete vectors. Based on the harmonic characteristics of speech signals and the inherent defects of single-scale discriminators, we introduce multi-period discriminators and multi-scale discriminators. The training process uses a balanced training strategy to ensure the balance between codebook utilization and training weights, and utilizes the Short-Time Fourier Transform (STFT) spectrum that can provide more accurate time–frequency resolution to compute the reconstruction loss. We introduce codebook loss to improve the utilization rate of the codebook and accelerate the convergence of the model. In the inference process, we use a quantizer selection strategy to achieve adaptive adjustment of variable bitrate. Objective and subjective experiments demonstrate that our proposed new neural speech codec outperforms traditional classical speech codecs and existing neural speech codecs in terms of reconstructed speech naturalness and quality while maintaining the low latency characteristic of neural speech codecs. With a multi-stimulus test with hidden reference and anchor (MUSHRA) score of 87, it is highly suitable for ultra-low bit rate speech compression applications such as satellite speech communication and narrowband instant messaging. The demo has been publicly released at <span><span>https://svcodec.github.io/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"162 ","pages":"Article 112418"},"PeriodicalIF":8.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0