Hui Sun, Ziyan Zhang, Lili Huang, Bo Jiang, Bin Luo
{"title":"Category-Aware Siamese Learning Network for Few-Shot Segmentation","authors":"Hui Sun, Ziyan Zhang, Lili Huang, Bo Jiang, Bin Luo","doi":"10.1007/s12559-024-10273-5","DOIUrl":"https://doi.org/10.1007/s12559-024-10273-5","url":null,"abstract":"<p>Few-shot segmentation (FS) which aims to segment unseen query image based on a few annotated support samples is an active problem in computer vision and multimedia field. It is known that the core issue of FS is how to leverage the annotated information from the support images to guide query image segmentation. Existing methods mainly adopt Siamese Convolutional Neural Network (SCNN) which first encodes both support and query images and then utilizes the masked Global Average Pooling (GAP) to facilitate query image pixel-level representation and segmentation. However, this pipeline generally fails to fully exploit the category/class coherent information between support and query images. <i>For FS task, one can observe that both support and query images share the same category information</i>. This inherent property provides an important cue for FS task. However, previous methods generally fail to fully exploit it for FS task. To overcome this limitation, in this paper, we propose a novel Category-aware Siamese Learning Network (CaSLNet) to encode both support and query images. The proposed CaSLNet conducts <i>Category Consistent Learning (CCL)</i> for both support images and query images and thus can achieve the information communication between support and query images more sufficiently. Comprehensive experimental results on several public datasets demonstrate the advantage of our proposed CaSLNet. Our code is publicly available at https://github.com/HuiSun123/CaSLN.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140935614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunder Ali Khowaja, Parus Khuwaja, Kapal Dev, Weizheng Wang, Lewis Nkenyereye
{"title":"ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review","authors":"Sunder Ali Khowaja, Parus Khuwaja, Kapal Dev, Weizheng Wang, Lewis Nkenyereye","doi":"10.1007/s12559-024-10285-1","DOIUrl":"https://doi.org/10.1007/s12559-024-10285-1","url":null,"abstract":"<p>ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e. sustainability, privacy, digital divide, and ethics and suggests that not only chatGPT but every subsequent entry in the category of conversational bots should undergo Sustainability, PrivAcy, Digital divide, and Ethics (SPADE) evaluation. This paper discusses in detail the issues and concerns raised over chatGPT in line with aforementioned characteristics. We also discuss the recent EU AI Act briefly in accordance with the SPADE evaluation. We support our hypothesis by some preliminary data collection and visualizations along with hypothesized facts. We also suggest mitigations and recommendations for each of the concerns. Furthermore, we also suggest some policies and recommendations for EU AI policy act concerning ethics, digital divide, and sustainability.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CIL-Net: Densely Connected Context Information Learning Network for Boosting Thyroid Nodule Segmentation Using Ultrasound Images","authors":"Haider Ali, Mingzhao Wang, Juanying Xie","doi":"10.1007/s12559-024-10289-x","DOIUrl":"https://doi.org/10.1007/s12559-024-10289-x","url":null,"abstract":"<p>Thyroid nodule (TYN) is a life-threatening disease that is commonly observed among adults globally. The applications of deep learning in computer-aided diagnosis systems (CADs) for diagnosing thyroid nodules have attracted attention among clinical professionals due to their significantly potential role in reducing the occurrence of missed diagnoses. However, most techniques for segmenting thyroid nodules rely on U-Net structures or deep convolutional neural networks, which have limitations in obtaining different context information due to the diversities in the shapes and sizes, ambiguous boundaries, and heterostructure of thyroid nodules. To resolve these challenges, we present an encoder-decoder-based architecture (referred to as CIL-Net) for boosting TYN segmentation. There are three contributions in the proposed CIL-Net. First, the encoder is established using dense connectivity for efficient feature extraction and the triplet attention block (TAB) for highlighting essential feature maps. Second, we design a feature improvement block (FIB) using dilated convolutions and attention mechanisms to capture the global context information and also build up robust feature maps between the encoder-decoder branches. Third, we introduce the residual context block (RCB), which leverages residual units (ResUnits) to accumulate the context information from the multiple blocks of decoders in the decoder branch. We assess the segmentation quality of our proposed method using six different evaluation metrics on two standard datasets (DDTI and TN3K) of TYN and demonstrate competitive performance against advanced state-of-the-art methods. We consider that the proposed method advances the performance of TYN region localization and segmentation, which heavily rely on an accurate assessment of different context information. This advancement is primarily attributed to the comprehensive incorporation of dense connectivity, TAB, FIB, and RCB, which effectively capture both extensive and intricate contextual details. We anticipate that this approach reliability and visual explainability make it a valuable tool that holds the potential to significantly enhance clinical practices by offering reliable predictions to facilitate cognitive and healthcare decision-making.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ConceptGlassbox: Guided Concept-Based Explanation for Deep Neural Networks","authors":"Radwa El Shawi","doi":"10.1007/s12559-024-10262-8","DOIUrl":"https://doi.org/10.1007/s12559-024-10262-8","url":null,"abstract":"<p>Various industries and fields have utilized machine learning models, particularly those that demand a significant degree of accountability and transparency. With the introduction of the General Data Protection Regulation (GDPR), it has become imperative for machine learning model predictions to be both plausible and verifiable. One approach to explaining these predictions involves assigning an importance score to each input element. Another category aims to quantify the importance of human-understandable concepts to explain global and local model behaviours. The way concepts are constructed in such concept-based explanation techniques lacks inherent interpretability. Additionally, the magnitude and diversity of the discovered concepts make it difficult for machine learning practitioners to comprehend and make sense of the concept space. To this end, we introduce ConceptGlassbox, a novel local explanation framework that seeks to learn high-level transparent concept definitions. Our approach leverages human knowledge and feedback to facilitate the acquisition of concepts with minimal human labelling effort. The ConceptGlassbox learns concepts consistent with the user’s understanding of a concept’s meaning. It then dissects the evidence for the prediction by identifying the key concepts the black-box model uses to arrive at its decision regarding the instance being explained. Additionally, ConceptGlassbox produces counterfactual explanations, proposing the smallest changes to the instance’s concept-based explanation that would result in a counterfactual decision as specified by the user. Our systematic experiments confirm that ConceptGlassbox successfully discovers relevant and comprehensible concepts that are important for neural network predictions.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Document-Level Relation Extraction with Attention-Convolutional Hybrid Networks and Evidence Extraction","authors":"Feiyu Zhang, Ruiming Hu, Guiduo Duan, Tianxi Huang","doi":"10.1007/s12559-024-10269-1","DOIUrl":"https://doi.org/10.1007/s12559-024-10269-1","url":null,"abstract":"<p>Document-level relation extraction aims at extracting relations between entities in a document. In contrast to sentence-level correspondences, document-level relation extraction requires reasoning over multiple sentences to extract complex relational triples. Recent work has found that adding additional evidence extraction tasks and using the extracted evidence sentences to help predict can improve the performance of document-level relation extraction tasks, however, these approaches still face the problem of inadequate modeling of the interactions between entity pairs. In this paper, based on the review of human cognitive processes, we propose a hybrid network HIMAC applied to the entity pair feature matrix, in which the multi-head attention sub-module can fuse global entity-pair information on a specific inference path, while the convolution sub-module is able to obtain local information of adjacent entity pairs. Then we incorporate the contextual interaction information learned by the entity pairs into the relation prediction and evidence extraction tasks. Finally, the extracted evidence sentences are used to further correct the relation extraction results. We conduct extensive experiments on two document-level relation extraction benchmark datasets (DocRED/Re-DocRED), and the experimental results demonstrate that our method achieves state-of-the-art performance (62.84/75.89 F1). Experiments demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Easin Arafat, Md. Wakil Ahmad, S. M. Shovan, Towhid Ul Haq, Nazrul Islam, Mufti Mahmud, M. Shamim Kaiser
{"title":"Accurate Prediction of Lysine Methylation Sites Using Evolutionary and Structural-Based Information","authors":"Md. Easin Arafat, Md. Wakil Ahmad, S. M. Shovan, Towhid Ul Haq, Nazrul Islam, Mufti Mahmud, M. Shamim Kaiser","doi":"10.1007/s12559-024-10268-2","DOIUrl":"https://doi.org/10.1007/s12559-024-10268-2","url":null,"abstract":"<p>Methylation is considered one of the proteins’ most important post-translational modifications (PTM). Plasticity and cellular dynamics are among the many traits that are regulated by methylation. Currently, methylation sites are identified using experimental approaches. However, these methods are time-consuming and expensive. With the use of computer modelling, methylation sites can be identified quickly and accurately, providing valuable information for further trial and investigation. In this study, we propose a new machine-learning model called MeSEP to predict methylation sites that incorporates both evolutionary and structural-based information. To build this model, we first extract evolutionary and structural features from the PSSM and SPD2 profiles, respectively. We then employ Extreme Gradient Boosting (XGBoost) as the classification model to predict methylation sites. To address the issue of imbalanced data and bias towards negative samples, we use the SMOTETomek-based hybrid sampling method. The MeSEP was validated on an independent test set (ITS) and 10-fold cross-validation (TCV) using lysine methylation sites. The method achieved: an accuracy of 82.9% in ITS and 84.6% in TCV; precision of 0.92 in ITS and 0.94 in TCV; area under the curve values of 0.90 in ITS and 0.92 in TCV; F1 score of 0.81 in ITS and 0.83 in TCV; and MCC of 0.67 in ITS and 0.70 in TCV. MeSEP significantly outperformed previous studies found in the literature. MeSEP as a standalone toolkit and all its source codes are publicly available at https://github.com/arafatro/MeSEP.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Duo of Visual Servoing and Deep Learning-Based Methods for Situation-Aware Disaster Management: A Comprehensive Review","authors":"Senthil Kumar Jagatheesaperumal, Mohammad Mehedi Hassan, Md. Rafiul Hassan, Giancarlo Fortino","doi":"10.1007/s12559-024-10290-4","DOIUrl":"https://doi.org/10.1007/s12559-024-10290-4","url":null,"abstract":"<p>Unmanned aerial vehicles (UAVs) have become essential in disaster management due to their ability to provide real-time situational awareness and support decision-making processes. Visual servoing, a technique that uses visual feedback to control the motion of a robotic system, has been used to improve the precision and accuracy of UAVs in disaster scenarios. The study integrates visual servoing to enhance UAV precision while exploring recent advancements in deep learning. This integration enhances the precision and efficiency of disaster response by enabling UAVs to navigate complex environments, identify critical areas for intervention, and provide actionable insights to decision-makers in real time. It discusses disaster management aspects like search and rescue, damage assessment, and situational awareness, while also analyzing the challenges associated with integrating visual servoing and deep learning into UAVs. This review article provides a comprehensive analysis to offer real-time situational awareness and decision support in disaster management. It highlights that deep learning along with visual servoing enhances precision and accuracy in disaster scenarios. The analysis also summarizes the challenges and the need for high computational power, data processing, and communication capabilities. UAVs, especially when combined with visual servoing and deep learning, play a crucial role in disaster management. The review underscores the potential benefits and challenges of integrating these technologies, emphasizing their significance in improving disaster response and recovery, with possible means of enhanced situational awareness and decision-making.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Efficient Recurrent Architectures: A Deep LSTM Neural Network Applied to Speech Enhancement and Recognition","authors":"Jing Wang, Nasir Saleem, Teddy Surya Gunawan","doi":"10.1007/s12559-024-10288-y","DOIUrl":"https://doi.org/10.1007/s12559-024-10288-y","url":null,"abstract":"<p>Long short-term memory (LSTM) has proven effective in modeling sequential data. However, it may encounter challenges in accurately capturing long-term temporal dependencies. LSTM plays a central role in speech enhancement by effectively modeling and capturing temporal dependencies in speech signals. This paper introduces a variable-neurons-based LSTM designed for capturing long-term temporal dependencies by reducing neuron representation in layers with no loss of data. A skip connection between nonadjacent layers is added to prevent gradient vanishing. An attention mechanism in these connections highlights important features and spectral components. Our LSTM is inherently causal, making it well-suited for real-time processing without relying on future information. Training involves utilizing combined acoustic feature sets for improved performance, and the models estimate two time–frequency masks—the ideal ratio mask (IRM) and the ideal binary mask (IBM). Comprehensive evaluation using perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) showed that the proposed LSTM architecture demonstrates enhanced speech intelligibility and perceptual quality. Composite measures further substantiated performance, considering residual noise distortion (Cbak) and speech distortion (Csig). The proposed model showed a 16.21% improvement in STOI and a 0.69 improvement in PESQ on the TIMIT database. Similarly, with the LibriSpeech database, the STOI and PESQ showed improvements of 16.41% and 0.71 over noisy mixtures. The proposed LSTM architecture outperforms deep neural networks (DNNs) in different stationary and nonstationary background noisy conditions. To train an automatic speech recognition (ASR) system on enhanced speech, the Kaldi toolkit is used for evaluating word error rate (WER). The proposed LSTM at the front-end notably reduced WERs, achieving a notable 15.13% WER across different noisy backgrounds.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DFootNet: A Domain Adaptive Classification Framework for Diabetic Foot Ulcers Using Dense Neural Network Architecture","authors":"Nishu Bansal, Ankit Vidyarthi","doi":"10.1007/s12559-024-10282-4","DOIUrl":"https://doi.org/10.1007/s12559-024-10282-4","url":null,"abstract":"<p>Diabetic foot ulcers (DFUs) are a prevalent and serious complication of diabetes, often leading to severe morbidity and even amputations if not timely diagnosed and managed. The increasing prevalence of DFUs poses a significant challenge to healthcare systems worldwide. Accurate and timely classification of DFUs is crucial for effective treatment and prevention of complications. In this paper, we present “DFootNet”, an innovative and comprehensive classification framework for the accurate assessment of diabetic foot ulcers using a dense neural network architecture. Our proposed approach leverages the power of deep learning to automatically extract relevant features from diverse clinical DFU images. The proposed model comprises a multi-layered dense neural network designed to handle the intricate patterns and variations present in different stages and types of DFUs. The network architecture integrates convolutional and fully connected layers, allowing for hierarchical feature extraction and robust feature representation. To evaluate the efficacy of DFootNet, we conducted experiments on a large and diverse dataset of diabetic foot ulcers. Our results demonstrate that DFootNet achieves a remarkable accuracy of 98.87%, precision—99.01%, recall—98.73%, F1-score as 98.86%, and AUC-ROC as 98.13%, outperforming existing methods in distinguishing between ulcer and non-ulcer images. Moreover, our framework provides insights into the decision-making process, offering transparency and interpretability through attention mechanisms that highlight important regions within ulcer images. We also present a comparative analysis of DFootNet’s performance against other popular deep learning models, showcasing its robustness and adaptability across various scenarios.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140828536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Method for Human-Vehicle Recognition Based on Wireless Sensing and Deep Learning Technologies","authors":"Liangliang Lou, Ruyin Cai, Mingan Lu, Mingmin Wang, Guang Chen","doi":"10.1007/s12559-024-10276-2","DOIUrl":"https://doi.org/10.1007/s12559-024-10276-2","url":null,"abstract":"<p>Currently, human-vehicle recognition (HVR) method has been applied in road monitoring, congestion control, and safety protection situations. However, traditional vision-based HVR methods suffer from problems such as high construction cost and low robustness in scenarios with insufficient lighting. For this reason, it is necessary to develop a low-cost and high-robust HVR method for intelligent street light systems (ISLS). A well-designed HVR method can aid the brightness adjustment in ISLSs that operate exclusively at night, facilitating lower power consumption and carbon emission. The paper proposes a novel wireless sensing-based human-vehicle recognition (WsHVR) method based on deep learning technologies, which can be applied in ISLSs that assembled with wireless sensor network (WSN). To solve the problem of limited recognition ability of wireless sensing technology, a deep feature extraction model that combines multi-scale convolution and attention mechanism is proposed, in which the received signal strength (RSS) features of road users are extracted by multi-scale convolution. WsHVR integrates an adaptive registration convolutional attention mechanism (ARCAM) to further feature extraction and classification. The final normalized classification result is obtained by SoftMax function. Experiments show that the proposed WsHVR outperforms existing methods with an accuracy of 99.07%. The dataset and source code related to the paper have been published at https://github.com/TZ-mx/WiParam and https://github.com/TZ-mx/WsHVR, respectively. The proposed WsHVR method has high performance in the field of human-vehicle recognition, potentially providing valuable guidance for the design of intelligent streetlight systems in intelligent transportation systems.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":5.4,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140626025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}