Big DataPub Date : 2026-04-01Epub Date: 2026-02-09DOI: 10.1177/2167647X251411174
Qurat Ul Ain, Hammad Afzal, Fazli Subhan, Mazliham Mohd Suud, Younhyun Jung
{"title":"Advancing Dysarthric Speech-to-Text Recognition with LATTE: A Low-Latency Acoustic Modeling Approach for Real-Time Communication.","authors":"Qurat Ul Ain, Hammad Afzal, Fazli Subhan, Mazliham Mohd Suud, Younhyun Jung","doi":"10.1177/2167647X251411174","DOIUrl":"10.1177/2167647X251411174","url":null,"abstract":"<p><p>Dysarthria, a motor speech disorder characterized by slurred and often unintelligible speech, presents substantial challenges for effective communication. Conventional automatic speech recognition systems frequently underperform on dysarthric speech, particularly in severe cases. To address this gap, we introduce low-latency acoustic transcription and textual encoding (LATTE), an advanced framework designed for real-time dysarthric speech recognition. LATTE integrates preprocessing, acoustic processing, and transcription mapping into a unified pipeline, with its core powered by a hybrid architecture that combines convolutional layers for acoustic feature extraction with bidirectional temporal layers for modeling temporal dependencies. Evaluated on the UA-Speech dataset, LATTE achieves a word error rate of 12.5%, phoneme error rate of 8.3%, and a character error rate of 1%. By enabling accurate, low-latency transcription of impaired speech, LATTE provides a robust foundation for enhancing communication and accessibility in both digital applications and real-time interactive environments.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"122-136"},"PeriodicalIF":2.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146143844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-04-01Epub Date: 2026-02-06DOI: 10.1177/2167647X251409135
Pir Noman Ahmad, Muhammad Shahid Anwar, Saleha Masood, Atta Ur Rehman, Muhammad Zubair
{"title":"Real-Time Named Entity Recognition from Textual Electronic Clinical Records in Cancer Therapy Using Low-Latency Neural Networks.","authors":"Pir Noman Ahmad, Muhammad Shahid Anwar, Saleha Masood, Atta Ur Rehman, Muhammad Zubair","doi":"10.1177/2167647X251409135","DOIUrl":"10.1177/2167647X251409135","url":null,"abstract":"<p><p>Named entity recognition (NER) is a core task in natural language processing that identifies and classifies entities, such as people, organizations, and locations within text. It has traditionally been applied in areas like text summarization, machine translation, and question answering. In recent years, NER has gained growing importance in health care, where electronic clinical records and online platforms generate large amounts of unstructured medical data. However, applying NER in clinical contexts introduces unique challenges due to the complexity of medical terminology and the need for high accuracy. In this study, we focused on the development of a real-time, low-latency NER system designed for cross-lingual speech-to-text applications, with a particular emphasis on cancer therapy-related clinical records and traditional Chinese medicine (TCM). We explored the integration of deep learning (DL) architectures optimized for low-latency neural processing to extract structured information from multilingual spoken content in medical settings, particularly in multimodal environments. We evaluate DL-based methods and propose a semi-supervised approach that combines TCM-specific corpora with biomedical resources to improve recognition accuracy. The findings provide both a systematic review of current methods and practical insights for building real-time clinical applications that support decision-making and information management in health care.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"137-154"},"PeriodicalIF":2.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-03-28DOI: 10.1177/2167647X261423120
Minquan Zhao, Can Du, Qing Xu, Hong Luo, Nengpan Wang
{"title":"Digital and Intelligent Optimization Mechanism of the Entire Supply Chain Link Based on Generative Artificial Intelligence and Neural Semantic Analysis.","authors":"Minquan Zhao, Can Du, Qing Xu, Hong Luo, Nengpan Wang","doi":"10.1177/2167647X261423120","DOIUrl":"https://doi.org/10.1177/2167647X261423120","url":null,"abstract":"<p><p>In the process of supply chain optimization, data has characteristics such as multi-source heterogeneity and unstructuredness. Traditional supply chain optimization methods that rely on structured data and statistical analysis cannot achieve ideal goals. Therefore, this article studies the digital and intelligent optimization mechanism of the entire supply chain link based on generative artificial intelligence. The entire supply chain link is divided into product design and process links, product raw material procurement links, product production and manufacturing management links, product delivery links, and product retirement and recycling links. For each link, the ChatGPT large language model in generative artificial intelligence adopts a neural semantic analysis method based on an encoding-decoding architecture. For the multi-source heterogeneous general knowledge within the entire supply chain link, general corpus training, expert annotation, and special corpus training are carried out, and the semantic analysis of the general knowledge of the entire supply chain link is realized through in-depth mining and understanding. Based on the semantic analysis results, the generative adversarial network in generative artificial intelligence is used to predict complex patterns or solutions such as product design, transportation routes, and sales methods in each link of the entire supply chain, making the prediction results more accurate and more in line with the actual supply chain business. The experimental results show that this mechanism can accurately analyze the semantics of the general knowledge of the entire supply chain link, improve the accuracy of the prediction of each function of the entire supply chain link, and significantly improve the economic benefits of enterprises.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261423120"},"PeriodicalIF":2.6,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147576185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-03-28DOI: 10.1177/2167647X261431683
Yueyan Liu, Hyung Jong Na, Jiang Xue
{"title":"AI-Based Forecasting of National Tourism Revenues: Integrating Economic, Fiscal, Political, and Environmental Determinants Through Regression-Oriented Hybrid Models.","authors":"Yueyan Liu, Hyung Jong Na, Jiang Xue","doi":"10.1177/2167647X261431683","DOIUrl":"https://doi.org/10.1177/2167647X261431683","url":null,"abstract":"<p><p>This study proposes an advanced framework for forecasting national tourism revenues by systematically comparing machine learning (ML), deep learning (DL), and hybrid architectures on a country-year panel. Baseline models using only trade and economic indicators have limited explanatory power, whereas adding fiscal, political, and environmental variables substantially improves accuracy. Among ML methods, LightGBM performs best; among DL models, the Transformer excels by capturing nonlinear interactions and temporal dependencies. Building on these results, we introduce a hybrid residual boosting model that integrates the Transformer's predictive strength with LightGBM's structural interpretability. The hybrid model outperforms single models across mean absolute error, root mean square error, mean absolute percentage error, and <i>R</i><sup>2</sup>, simultaneously minimizing errors and maximizing explanatory power. Methodologically and theoretically, the framework advances tourism economics while offering policymakers actionable guidance on fiscal planning, political stability, and environmental sustainability. Importantly, the empirical results are correlational and reflect predictive associations; they should not be interpreted as causal effects of policy interventions. Methodological novelty lies in a regression-oriented, two-stage residual-boosting design that (1) learns a Transformer as the primary forecaster on the country-year panel, (2) fits LightGBM to the Transformer residuals to correct systematic errors under distributional heterogeneity, and (3) yields a decomposed forecast (base + residual correction) that facilitates transparent error attribution beyond prior DL-DL stacking hybrids. Importantly, the reported relationships are associational and derived from predictive modeling; they should not be interpreted as causal effects of policy levers.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261431683"},"PeriodicalIF":2.6,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147576253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-03-28DOI: 10.1177/2167647X261432584
Wen Shao
{"title":"Explainable Agentic AI for Big Data-Driven Evaluation and Visual Analytics of Digital Literacy in Higher Vocational Teacher Education.","authors":"Wen Shao","doi":"10.1177/2167647X261432584","DOIUrl":"https://doi.org/10.1177/2167647X261432584","url":null,"abstract":"<p><p>Large-scale, diverse data produced by higher vocational teacher colleges' digital transformation challenges traditional methods for evaluating digital literacy. The reliability of current analytics and black-box artificial intelligence (AI) models for educational decision-making is limited by their frequent lack of autonomy and transparency. In order to assess digital literacy at higher vocational teacher colleges using big data and visual analytics, this study suggests an Explainable Agentic AI framework. In order to facilitate adaptive data exploration, competency evaluation, and insight generation across multimodal educational data, such as learning behavior logs, assessment records, and digital engagement indicators, the framework combines autonomous agentic intelligence with explainable AI (XAI). While XAI methods offer clear explanations of literacy aspects, decision rationale, and uncertainty, agentic components dynamically handle data processing, feature reasoning, and model selection. Effective human-AI collaboration is made possible by an interactive visual analytics layer that allows for layered investigation of learner patterns, temporal dynamics, and cohort heterogeneity. When compared with traditional machine learning techniques, experimental results on large-scale datasets from higher vocational teacher colleges show better assessment accuracy, robustness, and interpretability. This work demonstrates the promise of agentic AI for explainable big data exploration and promotes reliable instructional intelligence by combining agentic autonomy, explainability, and visual analytics within a scalable big data paradigm.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261432584"},"PeriodicalIF":2.6,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147576227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-03-18DOI: 10.1177/2167647X261428016
Tao Zhang, Yu Zhu
{"title":"MuTemAPR: Enhance Multilocation Patches with Template-Based Neural Program Repair.","authors":"Tao Zhang, Yu Zhu","doi":"10.1177/2167647X261428016","DOIUrl":"https://doi.org/10.1177/2167647X261428016","url":null,"abstract":"<p><p>Automated program repair (APR) has been studied extensively in recent years. Existing approaches mainly generate single-position patches that fail to address multilocation faults effectively. While existing multistep repair approaches can iteratively generate patches for each fault position sequentially, their data augmentation methodologies lack rationality and deviate from real-world scenarios. Furthermore, they overlook the interdependencies between faulty statements, leading to patches learned from erroneous contextual patterns. In this article, we propose MuTemAPR, an APR approach that iteratively generates multilocation patches. MuTemAPR incorporates templates with neural machine translation. Specifically, our method introduces three key innovations. First, we design a template-based data augmentation framework that transforms single-line faulty code into multilocation faulty code through 35 mutation templates. It simulates a real-world environment by establishing variable-type mapping tables for more accurate repair augmentation. Second, we propose a reinforced faulty context training method that employs progressive annotation to incrementally learn repair processes from top to bottom in multifault code. Third, we implement a semantic constraint mechanism during training that enforces syntactic and semantic rules through differential analysis between templates, input code, and generated patches. We evaluate MuTemAPR on the widely used Defects4j benchmark. Experimental results demonstrate that our approach can effectively repair multilocation faults, successfully fixing five additional bugs compared with state-of-the-art methods on Defects4j v1.2 and v2.0.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261428016"},"PeriodicalIF":2.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147476419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-03-07DOI: 10.1177/2167647X251406211
Victor Chang, Péter Kacsuk, Gary Wills, Reinhold Behringer
{"title":"Editorial Summary of Selected Articles.","authors":"Victor Chang, Péter Kacsuk, Gary Wills, Reinhold Behringer","doi":"10.1177/2167647X251406211","DOIUrl":"10.1177/2167647X251406211","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X251406211"},"PeriodicalIF":2.6,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145835397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-02-28DOI: 10.1177/2167647X261423127
Pedro Herrero-Vidal, You-Lin Chen, Cris Liu, Bin Xu, Prithviraj Sen, Lichao Wang
{"title":"Unified AI Approach Using Encoding and Generative Large Language Models for Variant Product Matching in e-Commerce.","authors":"Pedro Herrero-Vidal, You-Lin Chen, Cris Liu, Bin Xu, Prithviraj Sen, Lichao Wang","doi":"10.1177/2167647X261423127","DOIUrl":"https://doi.org/10.1177/2167647X261423127","url":null,"abstract":"<p><p>We introduce VARM, <i>va</i>riant <i>r</i>elationship <i>m</i>atcher strategy, to identify pairs of variant products in e-commerce catalogs. Traditional definitions of entity resolution are concerned with whether product mentions refer to the same underlying product. However, this fails to capture product relationships that are critical for e-commerce applications, such as having similar, but not identical, products listed on the same webpage or share reviews. Here, we formulate a new type of entity resolution in <i>variant product</i> relationships to capture these similar e-commerce product links. In contrast with the traditional definition, the new definition requires both identifying if two products are variant matches of each other <i>and</i> what the attributes are that vary between them. To satisfy these two requirements, we developed a strategy that leverages the strengths of both encoding and generative AI models. First, we construct a dataset that captures webpage product links, and therefore variant product relationships, to train an encoding large language model (LLM) to predict variant matches for any given pair of products. Second, we use retrieval-augmented generation-prompted generative LLMs to extract variation and common attributes amongst groups of variant products. To validate our strategy, we evaluated model performance using real data from one of the world's leading e-commerce retailers. The results showed that our strategy outperforms alternative solutions and paves the way to exploiting these new types of product relationships.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261423127"},"PeriodicalIF":2.6,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147318904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-02-01Epub Date: 2026-02-09DOI: 10.1177/2167647X261423109
Xianfeng Gong, Mingyang Mao
{"title":"Perceived Usefulness, Trust, and Behavioral Intention: A Study on College Student User Adoption Behaviors of Artificial Intelligence Generated News Based on Technology Acceptance Model.","authors":"Xianfeng Gong, Mingyang Mao","doi":"10.1177/2167647X261423109","DOIUrl":"10.1177/2167647X261423109","url":null,"abstract":"<p><p>This study intends to identify the critical factors that shape college students' adoption of AI-generated news, with a specific focus on integrating Big Data methodologies into the Technology Acceptance Model (TAM) framework. Building on TAM, the research incorporates \"trust\" as a core variable to develop a dual-path theoretical model that combines technological cognition (e.g., perceived usefulness, perceived ease of use) and psychological emotions. Unlike traditional TAM-based studies relying solely on questionnaire data, this research enriches its data sources by leveraging Big Data techniques-including the collection and analysis of college students' real-time behavioral data (e.g., AI news reading duration, sharing frequency, source verification clicks) and unstructured text data (e.g., sentiment orientation in comment sections)-to complement the survey data from 300 college students. Through a questionnaire survey of 300 college students and data analysis using the structural equation model, the study found that trust has the strongest direct positive impact on the willingness to use (β = 0.49, <i>p</i> < 0.001), and its influence is significantly greater than perceived usefulness (β = 0.35, <i>p</i> < 0.001). Meanwhile, although perceived ease of use does not directly affect the willingness to use, it has significant indirect effects by enhancing trust and perceived usefulness. The results show that in the AI news context with high-risk perception, trust is a more crucial psychological mechanism than traditional technological cognitive factors. These findings have expanded the explanatory boundaries of the TAM model in new technology fields and provided empirical evidence and practical inspiration for AI developers to optimize system credibility and for educators to conduct algorithmic literacy training.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"56-61"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-02-01Epub Date: 2026-03-23DOI: 10.1177/2167647X261429851
Zhijun Gao, Yishuai Yang, Jinhuan Wang, Xin Yue
{"title":"TL-TransUNet: An Improved Lightweight Semantic Segmentation Model of Macular Edema Lesions in Retinal OCT Images.","authors":"Zhijun Gao, Yishuai Yang, Jinhuan Wang, Xin Yue","doi":"10.1177/2167647X261429851","DOIUrl":"https://doi.org/10.1177/2167647X261429851","url":null,"abstract":"<p><p>Optical coherence tomography (OCT) offers significant advantages of noncontact operation, high resolution, and real-time imaging, making it particularly suitable for acquiring human retinal images and playing a crucial role in diagnosing and monitoring retinal diseases such as diabetic macular edema (DME). OCT is a key noninvasive imaging modality for retinal diseases such as DME, offering high-resolution visualization of retinal layers and fluid accumulations. However, retinal fluid segmentation faces several challenges including variations in fluid size, location, and shape, as well as complex irregular boundaries. To address these issues, we propose TL-TransUNet, a novel lightweight segmentation model based on TransUNet. The model incorporates a hybrid self-attention mechanism that effectively combines linear self-attention with residual filtered multilayer perceptron modules, reducing both parameter size and computational complexity while capturing global relationships and local details to improve segmentation performance for small lesions. Furthermore, the decoder employs wavelet convolution that utilizes wavelet transform to extract multi-scale features from low- to high-frequency components, enhancing the model's multi-scale learning capability. Experimental results on a public DME dataset demonstrate that our proposed method outperforms several mainstream segmentation approaches, demonstrating superior performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":"14 1","pages":"29-41"},"PeriodicalIF":2.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}