{"title":"Robust block tensor PCA with F-norm projection framework","authors":"Xiaomin Zhang, Xiaofeng Wang, Zhenzhong Liu, Jianen Chen","doi":"10.1016/j.knosys.2024.112712","DOIUrl":"10.1016/j.knosys.2024.112712","url":null,"abstract":"<div><div>Tensor principal component analysis (TPCA), also known as Tucker decomposition, ensures that the extracted “core tensor” maximizes the variance of the sample projections. Nevertheless, this method is particularly susceptible to noise and outliers. This is due to the utilization of the squared <em>F</em>-norm as the distance metric. In addition, it lacks constraints on the discrepancies between the original tensors and the projected tensors. To address these issues, a novel tensor-based trigonometric projection framework is proposed using <em>F</em>-norm to measure projection distances. Tensor data are first processed utilizing a blocking recombination technique prior to projection, thus enhancing the representation of the data at a local spatio-temporal level. Then, we present a block TPCA with the <em>F</em>-norm metric (BTPCA-F) and develop an iterative greedy algorithm for solving BTPCA-F. Subsequently, regarding the <em>F</em>-norm projection relation as the “Pythagorean Theorem”, we provide three different objective functions, namely, the tangent, cosine and sine models. These three functions directly or indirectly achieve the two objectives of maximizing projection distances and minimizing reconstruction errors. The corresponding tangent, cosine and sine solution algorithms based on BTPCA-F (called <em>tan</em>-BTPCA-F, <em>cos</em>-BTPCA-F and <em>sin</em>-BTPCA-F) are presented to optimize the objective functions, respectively. The convergence and rotation invariance of these algorithms are rigorously proved theoretically and discussed in detail. Lastly, extensive experimental results illustrate that the proposed methods significantly outperform the existing TPCA and the related 2DPCA algorithms.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112712"},"PeriodicalIF":7.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel automated labelling algorithm for deep learning-based built-up areas extraction using nighttime lighting data","authors":"Baoling Gui, Anshuman Bhardwaj, Lydia Sam","doi":"10.1016/j.knosys.2024.112702","DOIUrl":"10.1016/j.knosys.2024.112702","url":null,"abstract":"<div><div>The use of remote sensing imagery and cutting-edge deep learning techniques can produce impressive results when it comes to built-up areas extraction (BUAE). However, reducing the manual labelling set production process while ensuring high accuracy is currently the main research topic. This study pioneers the exploitation of nighttime lighting data (NLD) for automatically generating deep learning label sets, assessing the feasibility, and identifying limitations of using varied intensity ranges of lighting data directly for this purpose. We provide a novel method for generating fine-grained labels through an optimisation technique that eliminates the necessity for human involvement. This approach employs deep learning segmentation algorithms and has been tested in eight cities across seven countries. The results indicate that segmentation performs well in most cities, with the combination of iso clustering and NLD allowing for more precise extraction of urban building districts. The overall accuracy exceeds 90% in most cities. The results based on manual and historical data (∼0.7) as labels are significantly lower than those based on NLD. At the same time, the segmentation effect of deep learning has a more significant advantage over traditional machine learning classification algorithms (∼0.8). DeeplabV3 and U-Net exhibit different strengths in segmentation and extraction: DeeplabV3 has a stronger ability to eliminate errors, while U-Net retains the capability to handle less labelled information, making them mutually advantageous depending on the specific requirements of the task. It proposes a strategy to automatically extract built-up areas with minimal human involvement.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112702"},"PeriodicalIF":7.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AutoQuo: An Adaptive plan optimizer with reinforcement learning for query plan selection","authors":"Xiaoqiao Xiong , Jiong Yu , Zhenzhen He","doi":"10.1016/j.knosys.2024.112664","DOIUrl":"10.1016/j.knosys.2024.112664","url":null,"abstract":"<div><div>An efficient execution plan generation is crucial for optimizing database queries. In exploring large table spaces to identify the most optimal table join orders, traditional cost-based optimizers may encounter challenges when confronted with complicated queries. Thus, learning-based optimizers have recently been proposed to leverage past experience and generate high-quality execution plans. However, these optimizers demonstrate limited generalization capabilities for workloads with diverse distributions.</div><div>In this study, an adaptive plan selector based on reinforcement learning is proposed to address these issues. However, three challenges remain: (1) How to generate optimal multi-table join orders? We adopt an exploration–exploitation strategy to traverse the vast search space composed of candidate tables, thereby evaluating the significance of each table. Long short-term memory (LSTM) networks are subsequently used to predict the performance of join orders and generate high-quality candidate plans. (2) How to automatically learn new features in novel datasets? We employ the Actor–Critic strategy, which involves jointly cross-training the policy and value networks. By adjusting the parameters based on real feedback obtained from the database, the new datasets are automatically learnt. (3) How to automatically select the best plan? We introduce a constraint-aware optimal plan selection model that captures the relationship between constraints and plans. This model guides the selection of the best plan under constraints of execution time, cardinality, cost, and mean-squared error (MSE). The experimental results on real datasets demonstrated the superiority of the proposed approach over state-of-the-art baselines. Compared with PostgreSQL, we observed a reduction of 29.73% in total latency and 28.36% in tail latency.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112664"},"PeriodicalIF":7.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MSSTGNN: Multi-scaled Spatio-temporal graph neural networks for short- and long-term traffic prediction","authors":"Yuanhai Qu, Xingli Jia, Junheng Guo, Haoran Zhu, Wenbin Wu","doi":"10.1016/j.knosys.2024.112716","DOIUrl":"10.1016/j.knosys.2024.112716","url":null,"abstract":"<div><div>Accurate traffic prediction plays a crucial role in ensuring traffic safety and minimizing property damage. The utilization of STGNNs in traffic prediction has gained significant attention from researchers aiming to capture the intricate time-varying relationships within traffic data. While existing STGNNs commonly rely on Euclidean distance to assess the similarity between nodes, which may fall short in reflecting POI or regional functions. The traffic network exhibits static from a macro perspective, whereas undergoes dynamic changes in the micro perspective. Previous researchers incorporating self-attention to capture time-varying features for constructing dynamic graphs have faced challenges in overlooking the connections between nodes due to the <em>Softmax</em> polarization effect, which tends to amplify extreme value differences, and fails to accurately represent the true relationships between nodes. To solve this problem, we introduce the <strong>M</strong>ulti-<strong>S</strong>caled <strong>S</strong>patio-<strong>T</strong>emporal <strong>G</strong>raph <strong>N</strong>eural <strong>N</strong>etworks (MSSTGNN), which aims to comprehensively capture characteristics within traffic from multiscale viewpoints to construct multi-perspective graphs. We employ a trainable matrix to enhance the predefined adjacency matrices and to construct an optimal dynamic graph based on both trend and period. Additionally, a graph aggregate technique is proposed to effectively merge trend and periodic dynamic graphs. TCN is developed to model nonstationary traffic data, and we leverage the skip and residual connections to increase the model depth. A two-stage learning approach and a novel MSELoss function is designed to enhance the model's performance. The experimental results demonstrate that the MSSTGNN model outperforms the existing methods, achieving state-of-the-art performances across multiple real-world datasets.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112716"},"PeriodicalIF":7.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingting Dai , Qiao Liu , Yue Zeng , Yang Xie , Xujiang Liu , Haoran Hu , Xu Luo
{"title":"Collaborative association networks with cross-level attention for session-based recommendation","authors":"Tingting Dai , Qiao Liu , Yue Zeng , Yang Xie , Xujiang Liu , Haoran Hu , Xu Luo","doi":"10.1016/j.knosys.2024.112693","DOIUrl":"10.1016/j.knosys.2024.112693","url":null,"abstract":"<div><div>Session-based recommendation aims to predict the next interacted item based on the anonymous user’s behavior sequence. The main challenge lies in how to perceive user preference within limited interactions. Recent advances demonstrate the advantage of utilizing intent represented by combining consecutive items in understanding complex user behavior. However, these methods concentrate on the diverse expression of intents enriched by considering consecutive items with different lengths, ignoring the exploration of complex transitions between intents. This limitation makes intent transfer unclear in the user behavior with dynamic change, resulting in sub-optimal performance. To solve this problem, we propose novel collaborative association networks with cross-level attention for session-based recommendation (denoted as CAN4Rec), which simultaneously models intra- and inter-level transitions within hierarchical user intents. Specifically, we first construct two levels of intent, including individual-level and aggregated-level intent, and each level of intent is obtained based on sequential transitions. Then, the cross-level attention mechanism is designed to extract inter-transitions between different levels of intent. The captured inter-transitions are bi-directional, containing from individual-level to aggregated-level intents and from aggregated-level to individual-level intents. Finally, we generate directional session representations and combine them to realize the prediction of the next item. Experimental results on three public benchmark datasets demonstrate that the proposed model achieves state-of-the-art performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112693"},"PeriodicalIF":7.2,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shunpan Liang , Xiang Li , Shi Mu , Chen Li , Yu Lei , Yulei Hou , Tengfei Ma
{"title":"CIDGMed: Causal Inference-Driven Medication Recommendation with Enhanced Dual-Granularity Learning","authors":"Shunpan Liang , Xiang Li , Shi Mu , Chen Li , Yu Lei , Yulei Hou , Tengfei Ma","doi":"10.1016/j.knosys.2024.112685","DOIUrl":"10.1016/j.knosys.2024.112685","url":null,"abstract":"<div><div>Medication recommendation aims to integrate patients’ long-term health records to provide accurate and safe medication combinations for specific health states. Existing methods often fail to deeply explore the true causal relationships between diseases/procedures and medications, resulting in biased recommendations. Additionally, in medication representation learning, the relationships between information at different granularities of medications—coarse-grained (medication itself) and fine-grained (molecular level)—are not effectively integrated, leading to biases in representation learning. To address these limitations, we propose the Causal Inference-driven Dual-Granularity Medication Recommendation method (CIDGMed). Our approach leverages causal inference to uncover the relationships between diseases/procedures and medications, thereby enhancing the rationality and interpretability of recommendations. By integrating coarse-grained medication effects with fine-grained molecular structure information, CIDGMed provides a comprehensive representation of medications. Additionally, we employ a bias correction model during the prediction phase to further refine recommendations, ensuring both accuracy and safety. Through extensive experiments, CIDGMed significantly outperforms current state-of-the-art models across multiple metrics, achieving a 2.54% increase in accuracy, a 3.65% reduction in side effects, and a 39.42% improvement in time efficiency. Additionally, we demonstrate the rationale of CIDGMed through a case study.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112685"},"PeriodicalIF":7.2,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vincenzo Moscato, Marco Postiglione, Giancarlo Sperlì, Andrea Vignali
{"title":"ALDANER: Active Learning based Data Augmentation for Named Entity Recognition","authors":"Vincenzo Moscato, Marco Postiglione, Giancarlo Sperlì, Andrea Vignali","doi":"10.1016/j.knosys.2024.112682","DOIUrl":"10.1016/j.knosys.2024.112682","url":null,"abstract":"<div><div>Training Named Entity Recognition (NER) models typically necessitates the use of extensively annotated datasets. This requirement presents a significant challenge due to the labor-intensive and costly nature of manual annotation, especially in specialized domains such as medicine and finance. To address data scarcity, two strategies have emerged as effective: (1) Active Learning (AL), which autonomously identifies samples that would most enhance model performance if annotated, and (2) data augmentation, which automatically generates new samples. However, while AL reduces human effort, it does not eliminate it entirely, and data augmentation often leads to incomplete and noisy annotations, presenting new hurdles in NER model training. In this study, we integrate AL principles into a data augmentation framework, named Active Learning-based Data Augmentation for NER (ALDANER), to prioritize the selection of informative samples from an augmented pool and mitigate the impact of noisy annotations. Our experiments across various benchmark datasets and few-shot scenarios demonstrate that our approach surpasses several data augmentation baselines, offering insights into promising avenues for future research.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"305 ","pages":"Article 112682"},"PeriodicalIF":7.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local Metric NER: A new paradigm for named entity recognition from a multi-label perspective","authors":"Zaifeng Hua, Yifei Chen","doi":"10.1016/j.knosys.2024.112686","DOIUrl":"10.1016/j.knosys.2024.112686","url":null,"abstract":"<div><div>As the field of Nested Named Entity Recognition (NNER) advances, it is marked by a growing complexity due to the increasing number of multi-label entity instances. How to more effectively identify multi-label entities and explore the correlation between labels is the focus of our work. Unlike previous models that are modeled in single-label multi-classification problems, we propose a novel multi-label local metric NER model to rethink Nested Entity Recognition from a multi-label perspective. Simultaneously, to address the significant sample imbalance problem commonly encountered in multi-label scenarios, we introduce a parts-of-speech-based strategy that significantly improves the model’s performance on imbalanced datasets. Experiments on nested, multi-label, and flat datasets verify the generalization and superiority of our model, with results surpassing the existing state-of-the-art (SOTA) on several multi-label and flat benchmarks. After a series of experimental analyses, we highlight the persistent challenges in the multi-label NER. We are hopeful that the insights derived from our work will not only provide new perspectives on the nested NER landscape but also contribute to the ongoing momentum necessary for advancing research in the field of multi-label NER.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"305 ","pages":"Article 112686"},"PeriodicalIF":7.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shichao Wu , Yongru Wang , Yushan Jiang , Qianyi Zhang , Jingtai Liu
{"title":"CRATI: Contrastive representation-based multimodal sound event localization and detection","authors":"Shichao Wu , Yongru Wang , Yushan Jiang , Qianyi Zhang , Jingtai Liu","doi":"10.1016/j.knosys.2024.112692","DOIUrl":"10.1016/j.knosys.2024.112692","url":null,"abstract":"<div><div>Sound event localization and detection (SELD) refers to classifying sound categories and locating their locations with acoustic models on the same multichannel audio. Recently, SELD has been rapidly evolving by leveraging advanced approaches from other research areas, and the benchmark SELD datasets have become increasingly realistic with simultaneously captured videos provided. Vibration produces sound, we usually associate visual objects with their sound, i.e., we hear footsteps from a walking person, and hear a jangle from one running bell. It comes naturally to think about using multimodal information (image–audio–text vs audio merely), to strengthen sound event detection (SED) accuracies and decrease sound source localization (SSL) errors. In this paper, we propose one contrastive representation-based multimodal acoustic model (CRATI) for SELD, which is designed to learn contrastive audio representations from audio, text, and image in an end-to-end manner. Experiments on the real dataset of STARSS23 and the synthesized dataset of TAU-NIGENS Spatial Sound Events 2021 both show that our CRATI model can learn more effective audio features with additional constraints to minimize the difference among audio and text (SED and SSL annotations in this work). Image input is not conducive to improving SELD performance, as only minor visual changes can be observed from consecutive frames. Compared to the baseline system, our model increases the SED F-score by 11% and decreases the SSL error by 31.02<span><math><mo>°</mo></math></span> on the STARSS23 dataset, respectively.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"305 ","pages":"Article 112692"},"PeriodicalIF":7.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting representation learning of color information: Color medical image segmentation incorporating quaternion","authors":"Bicheng Xia , Bangcheng Zhan , Mingkui Shen , Hejun Yang","doi":"10.1016/j.knosys.2024.112707","DOIUrl":"10.1016/j.knosys.2024.112707","url":null,"abstract":"<div><div>Currently, color medical image segmentation methods commonly extract color and texture features mixed together by default, however, the distribution of color information and texture information is different: color information is represented differently in different color channels of a color image, while the distribution of texture information remains the same. Such a simple and brute-force feature extraction pattern will inevitably result in a partial bias in the model's semantics understanding. In this paper, we decouple the representation learning for color-texture information, and propose a novel network for color medical image segmentation, named CTNet. Specifically, CTNet introduces the Quaternion CNN (QCNN) module to capture the correlation among different color channels of color medical images to generate color features, and uses a designed local-global texture feature integrator (LoG) to mine the textural features from local to global. Moreover, a multi-stage features interaction strategy is proposed to minimize the semantic understanding gap of color and texture features in CTNet, so that they can be subsequently fused to generate a unified and robust feature representation. Comparative experiments on four different color medical image segmentation benchmark datasets show that CTNet strikes an optimal trade-off between segmentation accuracy and computational overhead when compared to current state-of-the-art methods. We also conduct extensive ablation experiments to verify the effectiveness of the proposed components. Our code will be available at <span><span>https://github.com/Notmezhan/CTNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112707"},"PeriodicalIF":7.2,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}