Big DataPub Date : 2026-05-08DOI: 10.1177/2167647X261447812
Boting Geng, Hongxia Wang, Pengliang Zhang, Jin Xue
{"title":"DS<sup>2</sup>PT: A Deep Two-Stage Patent Text Segmentation Framework Informed by Low-Latency Neural Network Characteristics.","authors":"Boting Geng, Hongxia Wang, Pengliang Zhang, Jin Xue","doi":"10.1177/2167647X261447812","DOIUrl":"https://doi.org/10.1177/2167647X261447812","url":null,"abstract":"<p><p>Patent text segmentation is a fundamental task in patent data mining, enabling applications such as patent analysis and search. The objective is to decompose structurally complex, lengthy sentences into grammatically complete, semantically equivalent short sentences to facilitate downstream processing. Traditional approaches rely on manually defined rules or feature-based machine learning methods, which are labor-intensive, domain-specific, and exhibit limited generalizability. To overcome these limitations, this study proposes a Deep Segmentation Model for Patent Text (DS<sup>2</sup>PT), a two-stage fine-grained segmentation framework based on ALBERT. The first stage employs a conditional random field model to perform coarse segmentation of patent paragraphs into shorter clauses based on structural cues. The second stage utilizes the ALBERT model to perform deep, context-aware segmentation of complex clauses into syntactically independent and semantically complete sentences. Compared to conventional methods, DS<sup>2</sup>PT effectively captures hierarchical contextual information across two stages, significantly improving segmentation accuracy without semantic loss. Furthermore, this research draws inspiration from advancements in cross-lingual speech-to-text systems with low-latency neural networks for real-time applications. While the domains differ, the core technical challenges are analogous: both require models to process sequential, information-dense input (audio streams or long sentences) into structured, meaningful units (transcribed text or segmented clauses) with high accuracy and efficiency. The principles of low-latency neural networks-such as efficient context modeling, parallelizable architectures, and real-time incremental processing-inform the design of our segmentation pipeline to enhance its scalability and potential for integration into real-time patent analysis systems. Similarly, the cross-lingual capability highlights the importance of model generalization, which aligns with our goal of developing a domain-adaptive segmentation tool for diverse patent corpora.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261447812"},"PeriodicalIF":2.6,"publicationDate":"2026-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147845876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-05-06DOI: 10.1177/2167647X261438100
Kashish Ara Shakil, Mudasir Ahmad Wani, Faiz Ullah, Younhyun Jung, Shakir Khan, Gufran Ahmad Ansari
{"title":"ThinkAI: A Natural Language Processing-based Intelligent framework for Mental Health.","authors":"Kashish Ara Shakil, Mudasir Ahmad Wani, Faiz Ullah, Younhyun Jung, Shakir Khan, Gufran Ahmad Ansari","doi":"10.1177/2167647X261438100","DOIUrl":"https://doi.org/10.1177/2167647X261438100","url":null,"abstract":"<p><p>In the current digital era, emotional and mental health challenges are becoming very common. Therefore, it is essential to find new and effective ways to support emotional well-being. In this work, we propose ThinkAI helps individuals to better understand and manage their mental health. This platform offers a secure, explainable, and private space where users can express their thoughts and feelings through journaling. ThinkAI then leverages the help of natural language processing (NLP)-based algorithms to analyze these writings to identify emotional patterns. It tracks the changes in the mental well-being of individuals over time. It also offers visual insights that help users see how their emotional states have evolved during that time period, encouraging reflection, and self-awareness. The proposed system also generates alerts in case of adverse situations. ThinkAI comprises two main components: one focused on detecting early signs of depression, and the other on analyzing emotions. The system uses a combination of classical machine learning methods and modern transformer-based models to achieve accurate and reliable results. Both kinds of models were tested. For traditional models, the Support Vector Machine model reported the highest accuracy (0.920) and Receiver Operating Characteristic - Area Under the Curve (ROC-AUC) (0.97) for depression detection, while Naive Bayes had the best recall (0.947). For emotion analysis, Bidirectional Encoder Representations from Transformers (BERT) performed best with an accuracy of 0.945 and an F1-score of 0.9446, closely followed by robustly optimized BERT approach (RoBERTa) and Distilled Robustly Optimized BERT Approach (DistilRoBERTa). Furthermore, to provide a more comprehensive evaluation of the proposed models, we analyzed the training and validation loss across all models up to five epochs, in addition to reporting accuracy. The results highlight how combining classic algorithms with modern transformer-based deep learning models can create powerful tools for understanding emotional and mental health. Thus, ThinkAI offers a promising step toward real-time monitoring of mental health. This work contributes a technically validated and ethically grounded framework. It can be used for real-time monitoring of an individual's mental well-being, digital therapeutics, and large-scale psychological data analysis.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261438100"},"PeriodicalIF":2.6,"publicationDate":"2026-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147845868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-04-28DOI: 10.1177/2167647X261439005
Liyang Zheng, Baisuo Jin
{"title":"TS-PET: A Novel Framework for Fine-Tuning Pretrained Time-Series Models.","authors":"Liyang Zheng, Baisuo Jin","doi":"10.1177/2167647X261439005","DOIUrl":"https://doi.org/10.1177/2167647X261439005","url":null,"abstract":"<p><p>Time series foundation models offer powerful zero-shot capabilities but face significant adaptation challenges: (1) inefficient fine-tuning due to computational constraints and unreliable sensitivity scores in existing variants to the low-rank adaptation (LoRA) method, and (2) parameter-heavy prediction heads causing overfitting. We propose time series parameter-efficient transformer (TS-PET), a novel fine-tuning framework featuring: (1) A lightweight prediction module that reduces parameters by >80%, mitigating overfitting while maintaining performance; (2) Specialized pruned LoRA, which introduces robust rank allocation via identifying synergistic parameter interactions, enabling stochastic approximation for efficiency. Extensive experiments on eight benchmarks (Electricity Transforming Temperature, Exchange, Weather, etc.) show TS-PET achieves state-of-the-art accuracy-outperforming MOMENT (linear probing), PatchTST, and adaptive LoRA variants-while demonstrating superior parameter efficiency. Our solution enables scalable adaptation of time series foundation models without compromising performance.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261439005"},"PeriodicalIF":2.6,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147788653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-04-17DOI: 10.1177/2167647X261430300
Jie Li, Yucheng Zhao, Xiaoyu Yang, Yi Ding, Yidan Zou, Jun Jin
{"title":"KSG: A Symbolic Semantics Graph Generation Method of Smart Contract Based on the K Framework.","authors":"Jie Li, Yucheng Zhao, Xiaoyu Yang, Yi Ding, Yidan Zou, Jun Jin","doi":"10.1177/2167647X261430300","DOIUrl":"https://doi.org/10.1177/2167647X261430300","url":null,"abstract":"<p><p>The formal semantics of blockchain smart contracts are the foundation of formal verification. They can be used to establish formal models to verify the security of contracts and help developers understand the specific execution rules of contracts. However, the mathematical logic involved in such modeling poses a high barrier to entry and cannot be directly integrated with other program analysis methods. This article proposes a semantic graph generation approach, KSG, for blockchain smart contracts. First, the semantic rules of the contract language are formally defined, and a semantic interpreter and prover are constructed to automatically transform smart contract code into a scalable semantic graph. This graph incorporates semantic control flow information, semantic data flow information, execution rules, and verification constraints. Next, the generated semantic graph can be utilized for vulnerability detection and symbolic execution and supports iterative optimization based on the analysis results. Finally, the detailed process of semantic graph generation and analysis is demonstrated through the verification of the reentrancy contract and the honeypot contract.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261430300"},"PeriodicalIF":2.6,"publicationDate":"2026-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147700699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-04-08DOI: 10.1177/2167647X261435867
Yu-Chung Hsiao
{"title":"E-Commerce Network Search System Based on Target Webpage Positioning and Sentiment Analysis Recommendation.","authors":"Yu-Chung Hsiao","doi":"10.1177/2167647X261435867","DOIUrl":"https://doi.org/10.1177/2167647X261435867","url":null,"abstract":"<p><p>With the rise of e-commerce network search systems, product search efficiency and user satisfaction have become increasingly important. To address the low accuracy of consumer sentiment analysis in existing product recommendation scenarios, a webpage localization and sentiment analysis recommendation model is proposed that combines an improved web search algorithm with a bidirectional long short-term memory network and an attention mechanism. An e-commerce network search system is then designed around this model. Experimental results show that the sentiment analysis recommendation model achieves an accuracy of 98.88% and an average mean squared error of 1.027, outperforming all comparison models. The average root-mean-square error is 0.476, recall is 98.92%, the F1 score is 97.78%, and the recognition accuracy for each of the four emotional tendencies exceeds 95%. In addition, the integrated system delivers an average search time of 87.6 ms, a central processing unit occupancy of 44.68%, a missed-search rate of 1.42%, and a user satisfaction of 99.34%, all superior to the comparison systems. The system offers a ready-to-deploy solution for sentiment-aware product search and provides a theoretical basis for future research in e-commerce search systems.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261435867"},"PeriodicalIF":2.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147635011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-04-01Epub Date: 2026-03-19DOI: 10.1177/2167647X261430664
Jawad Khan, Muhammad Hameed Siddiqi, Tariq Rahim, Shah Khalid
{"title":"Cross-Lingual Speech-to-Text Systems with Low-Latency Neural Networks for Real-Time Applications.","authors":"Jawad Khan, Muhammad Hameed Siddiqi, Tariq Rahim, Shah Khalid","doi":"10.1177/2167647X261430664","DOIUrl":"10.1177/2167647X261430664","url":null,"abstract":"","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"65-66"},"PeriodicalIF":2.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147482355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-04-01Epub Date: 2025-12-04DOI: 10.1177/2167647X251399606
Suhas Alalasandra Ramakrishnaiah, Yasir Abdullah Rabi, Ananth John Patrick, Mohammad Shabaz, Surbhi B Khan, Rijwan Khan, Ahlam Almusharraf
{"title":"Hybrid DeepSentX Framework for AI-Driven Requirements Insight and Risk Prediction in Multilingual Sports Using Natural Language Processing.","authors":"Suhas Alalasandra Ramakrishnaiah, Yasir Abdullah Rabi, Ananth John Patrick, Mohammad Shabaz, Surbhi B Khan, Rijwan Khan, Ahlam Almusharraf","doi":"10.1177/2167647X251399606","DOIUrl":"10.1177/2167647X251399606","url":null,"abstract":"<p><p>Engineering teams need timely signals about evolving requirements and release risk, yet multilingual fan discourse around live sports is noisy, code-switched, and saturated with sarcasm and event-driven drift. We present Hybrid DeepSentX, an AI-driven framework that converts crowd commentary into actionable requirements insight and sprint-level risk scores. The pipeline couples multilingual transformer encoders with an inductive GraphSAGE conversation graph to inject relational context across posts, and adds a reinforcement learner whose reward is shaped to prioritize correct decisions on sarcasm-heavy items and rapidly shifting events. We assembled a million-plus posts from X, Reddit, and sports forums and evaluated the framework against strong baselines, including BERT, long short-term memory, support-vector machines, and recent hybrid models, with significance tests, calibration analysis, ablations, and efficiency profiling. DeepSentX achieved higher macro-averaged accuracy and F1 on code-switched and sarcastic subsets, reduced missed risk flags, and produced developer-facing artefacts that directly support backlog grooming and defect triage. Relative to prior hybrids that combine transformers with either graph reasoning or reinforcement alone, our contributions are fourfold: (i) a unified multilingual design that integrates transformer, graph, and reinforcement components for sarcasm and drift robustness, (ii) an annotated multi-platform corpus with explicit code switching and sarcasm labels and per platform language balance, (iii) a rigorous comparative study reporting accuracy, calibration, latency, memory, and parameter count, and (iv) deployment artefacts that turn model outputs into requirement clusters and sprint risk scores suitable for continuous planning.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"67-86"},"PeriodicalIF":2.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-04-01Epub Date: 2025-12-12DOI: 10.1177/2167647X251403895
Zhaodi Yu, Zhenxiang Xu, Jiangang Qi
{"title":"The Two Worlds of Emergency Law: A Comparative Study of International and Chinese Scholarship Through Knowledge Domain Mapping.","authors":"Zhaodi Yu, Zhenxiang Xu, Jiangang Qi","doi":"10.1177/2167647X251403895","DOIUrl":"10.1177/2167647X251403895","url":null,"abstract":"<p><p>In the context of a global risk society, emergency law has become a critical field for balancing the expansion of state power with the protection of civil rights during crises. Despite its growing importance, a systematic, quantitative comparison of the knowledge landscapes of international and Chinese emergency law scholarship has been notably absent. This study employs bibliometric and knowledge mapping analysis, utilizing CiteSpace software. A total of 274 publications were retrieved from the Web of Science Core Collection and 391 from the China National Knowledge Infrastructure database. These data were used to systematically map and compare the research status, collaborative networks, and core themes of the two academic communities. The findings indicate that while both international and Chinese research are crisis-driven, with publication surges corresponding to major events such as the 9/11 attacks, SARS, and the COVID-19 pandemic, they function as two academically isolated communities with no author-level collaboration. A fundamental divergence in research paradigms was identified. International scholarship follows a \"limitation-oriented\" paradigm, rooted in liberal constitutionalism, focusing on the tension between emergency powers and human rights, and the risks of a state of exception. In contrast, Chinese research adopts a \"construction-oriented\" paradigm aimed at building an efficient, state-centric crisis response system, dominated by concepts such as emergency management and the \"one plan and three sub-systems\" framework. This study concludes that there are two worlds of emergency law. The international paradigm primarily treats emergency law as a mechanism to constrain state authority and protect individual rights from government overreach. In contrast, the Chinese paradigm views law as an instrument to enhance state capacity and ensure effective crisis management. This fundamental divergence in normative goals and theoretical foundations identified in this study presents significant theoretical and practical challenges for global emergency governance and offers a clear direction for future comparative legal studies.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"87-121"},"PeriodicalIF":2.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145835330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Transformer-GNN Integration for Multilingual News Speech-to-Text Similarity Modeling.","authors":"Jaishree Jain, Saroj Kushwah, Updesh Kumar Jaiswal, Navin Kumar Agrawal, Jabir Ali, Ankit Vidyarthi","doi":"10.1177/2167647X261427458","DOIUrl":"10.1177/2167647X261427458","url":null,"abstract":"<p><p>The increasing volume of multilingual news broadcasts highlights the need for advanced systems capable of transforming speech into semantically comparable text across languages. Traditional speech-to-text and textual similarity methods often fall short in handling linguistic diversity, contextual ambiguity, and cross-lingual semantic alignment. To overcome these limitations, we introduce a Transformer-Graph Neural Network (GNN) integrated framework for multilingual news speech-to-text similarity modeling. This article presents an approach that leverages a Transformer encoder to extract deep contextual embeddings from speech inputs, capturing sequential and contextual nuances. These embeddings are then structured into graphs that represent semantic relations among words, phrases, and sentences. A GNN refines these graph-based representations by modeling relational dependencies across languages. Finally, a cross-lingual semantic alignment module produces similarity scores, enabling accurate transformation of multilingual speech into comparable text. Experiments conducted on benchmark multilingual news video datasets in English, Hindi, Marathi, and Tamil show that our framework consistently outperforms baseline models, including standalone Transformers and GNNs. The model achieved significant gains, with improvements of 7.8% in semantic similarity accuracy, 6.1% in BLEU score, and 8.4% in cross-lingual alignment efficiency. Furthermore, it demonstrated robustness to noisy input, code-switching, and low-resource language scenarios, making it suitable for practical multilingual news applications. The proposed approach achieved a relative improvement of 4.8% in semantic similarity and a 3.1% reduction in word error rate compared with the baseline models. Future directions include extending the framework for real-time deployment, expanding support to underrepresented languages, and incorporating multimodal news data for enriched global media analysis.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"155-165"},"PeriodicalIF":2.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147610334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big DataPub Date : 2026-04-01Epub Date: 2026-04-02DOI: 10.1177/2167647X261435874
Jiajia Hu
{"title":"A Cross-Lingual Real-Time E-Commerce Recommendation Method Based on Siamese Graph Convolutional Network and Bilinear Attention.","authors":"Jiajia Hu","doi":"10.1177/2167647X261435874","DOIUrl":"10.1177/2167647X261435874","url":null,"abstract":"<p><p>To balance low latency and high accuracy in cross-lingual real-time recommendation, we propose a two-stage method combining offline high-precision entity representation with online low-latency feature interaction. First, the cross-language scenario is abstracted as a heterogeneous graph. Then, a Siamese Graph Convolutional Network is utilized for entity representation learning. Finally, an efficient bilinear attention mechanism is employed for deep feature interaction to output predictions. After conducting experiments on the cross-border e-commerce dataset, it was found that the model performed well in entity representation learning. When the recommendation list length was 30, the normalized discounted cumulative gain value of the Siamese graph convolutional network was stable at more than 7.8%, which was more than 20% higher than other models. Regarding feature interaction, the bilinear attention mechanism showed superior convergence. Its mean average value reached 12.7% in the 100th round, 1.9 percentage points higher than the bilinear mechanism. In the scenario of increasing the sales rate of \"long-tail products,\" the hit rate of the recommendation method proposed by the study reached 46.5%. In summary, the proposed method demonstrates excellent accuracy and efficiency, proving its potential for real-time cross-language applications.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"166-178"},"PeriodicalIF":2.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147596219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}