Luyang Qian, Canlong Zhang, Zhixin Li, Zhiwen Wang
{"title":"Graph Structure Guided Transformer for Semantic Segmentation","authors":"Luyang Qian, Canlong Zhang, Zhixin Li, Zhiwen Wang","doi":"10.1109/ICTAI56018.2022.00140","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00140","url":null,"abstract":"Segmentation is an essential operation of image processing, and utilizing long-range context information is the key for pixel-wise prediction tasks such as semantic segmentation. Convolutional Neural Networks (CNNs) are good at modeling local relationships through convolutional operations, but they are often inefficient in capturing global relationships between distant regions and require stacking multiple convolutional lay-ers. Utilizing the advantages of transformer in modeling long-range dependency, this paper proposes a novel Graph Structure Guided Transformer (GSGT) to realize semantic segmentation. Different from the previous methods that hard-divide the image in a regular grid manner, our graph projection method maps the two-dimensional feature map into a graph structure according to certain semantic relevance, so as to meet the data structure form required by the transformer. Meanwhile, to fully utilize the graph structure information, we also propose a graph embedding attention module, which utilizes the local topology of the graph structure to complement the global context of transformer. Moreover, GSGT is easy to be incorporated with various CNN backbones and transformer model variants to significantly improve the segmentation accuracy and convergence speed. Experiments on Cityscapes, VOC and ADE20K datasets demonstrate that the proposed method performs well in semantic seamentation task.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117054204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Language Driven Image Editing via Transformers","authors":"Rodrigo Santos, A. Branco, J. Silva","doi":"10.1109/ICTAI56018.2022.00139","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00139","url":null,"abstract":"With the emergence of specifically tailored neural architectures that cope with both modalities, cross-modal language and image processing has attracted increasing attention. A major motivation has been the search for a quantum leap in language understanding supported by visual grounding, which has been oriented mostly to solve tasks where language descriptions of images are to be provided, and vice-versa, where images are to be generated on the basis of keywords. Adopting a distinct angle of inquiry, this paper addresses rather the cross-modal challenge of language driven image design, focusing on the task of editing an image on the basis of language instructions to modify it. And adopting as well a distinct research path, which dispenses with specifically tailored architectures, the approach proposed here resorts rather to a general purpose, suitably instantiated neural architecture of the Transformer class. Experimentation with this approach delivered very encouraging results, empirically demonstrating that this is an effective methodology for language driven image design and the basis for further advances in cross-modal processing and its applications with affordable compute and data.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121120303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Zeng, Mohammad Al-Rifai, Michael Nolting, W. Nejdl
{"title":"Towards building reliable deep learning based driver identification systems","authors":"Li Zeng, Mohammad Al-Rifai, Michael Nolting, W. Nejdl","doi":"10.1109/ICTAI56018.2022.00118","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00118","url":null,"abstract":"Recent studies have shown the potential of leveraging neural networks to achieve high levels of accuracy in re-identifying drivers by learning latent features from vehicular sensor data. However, deploying such networks in real-world applications (like theft detection or fleet management) requires re-training the networks with new data to transfer the learnings from the initial dataset to the target drivers. In this paper, we highlight the importance of the evaluation of such networks in both phases, initial training and transfer learning. Our evaluation shows that the performance of existing solutions drops significantly, when applied to new drivers that have not been seen by the networks in the initial training phase. Moreover, we propose a deep neural network that outperforms state-of-the-art solutions in both phases. For the evaluation of the transfer learning phase, we use a dataset from a real-world ride-sharing service that has not been used in the initial training.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121206206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Device Behavior Identification in Encrypted Home Security Camera Traffic","authors":"Shu Liu, Xiaolin Xu, Zhefeng Nan","doi":"10.1109/ICTAI56018.2022.00135","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00135","url":null,"abstract":"Home security cameras have become one of the most popular IoT devices due to rigid demand and low cost. However, these devices have become a disaster area where security issues such as cyberattacks and privacy breaches often occur. Researchers and intruders often employ traffic behavior analyzing methods to mine vulnerabilities. Nevertheless, the content transmitted by the HSC device contains a lot of dynamic interference video traffic, so it is hard to mine the behavior information of the HSC device from it. In contrast, the HSC device's non-TLS one-way response packets carry more efficient behavior information. Therefore, we propose an approach to identify device behavior based on the features of one-way response packets in non-TLS traffic. Based on the functional characteristics of the HSC device, we have a more fine-grained type division of behaviors, including eight behaviors and five states. In addition, we propose an automatic labeling approach based on countercurrent and operation logs for the problem of tedious and inaccurate manual labeling. Based on the features of three attributes, we compared the recognition effects of nine classifiers on two datasets, the real-world dataset and the IMC 2019 payload public dataset. Finally, the CNN-based classifier can achieve the most desirable identification effect with an accuracy rate of 97.47%, a recall rate of 97.42%, and an F1 score of 97.4%. The results show that the proposed approach can accurately identify the behavior and state of HSC at a fine-grained level. Moreover, this work has a significant reference value for device anomalous behavior detection and threat awareness.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126115227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Drug Side Effects Prediction via Heterogeneous Multi-Relational Graph Convolutional Networks","authors":"Yike Wang, Huifang Ma, Ruoyi Zhang, Zihao Gao","doi":"10.1109/ICTAI56018.2022.00167","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00167","url":null,"abstract":"Numerous clinical trials have revealed that a serious consequence of polypharmacy is that patients are at high risk of adverse side effects. However, designing clinical trials to determine the frequency of side effects from polypharmacy is both time-consuming and costly. Therefore, the computer-aided prediction of drug side effects is becoming an attractive proposition. Existing methods of drug side effects prediction introduce the target protein of a drug without screening. Although this alleviates the sparsity of the original data to some extent, the blind introduction of proteins as auxiliary information allows a large amount of noisy information to be added, which degrades the model efficiency and acheive sub-opitmal predicition results. To this end, we propose a novel method called DEP-GCN (Drug Side Effects Prediction via Heterogeneous Multi-Relational Graph Convolutional Networks). Specifically, we design two protein auxiliary pathways directly related to drugs and combine these two auxiliary pathways with a multi-relational graph of drug side effects, which both alleviate the sparsity of data and filter out noisy data. Then, to produce accurate drug representations, we distinguish the impact from different drug neighbors and introduce a query-aware attention mechanism to fine-grained determine how much messaging is delivered. Finally, in contrast to approaches limited to predicting the existence or associations of drug side effects, we output the exact frequency of drug side effects occurring via a tensor factorization decoder. Extensive experimental results demonstrate that DEP-GCN significantly outperforms all baseline methods. The further examination provides literature evidence for highly ranked predictions.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125681680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Assisted Mouth-Esophagus Passage Time Estimation During Gastroscopy","authors":"Zinan Xiong, Qilei Chen, Chenxi Zhang, Yu Cao, Benyuan Liu, Yuehua Wu, Yu Peng, Xiaowei Liu","doi":"10.1109/ICTAI56018.2022.00169","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00169","url":null,"abstract":"A gastroscopy involves examining the upper digestive system using a flexible tube equipped with a small camera. Generally, it is performed to determine the cause of digestive symptoms, such as vomiting blood, stomach pains, and difficulty swallowing. Though this procedure has been performed since the mid-19th century, and various measures have been implemented to make it easier and less invasive, it is still not risk-free. One of the major complications is esophagus perforation, and most of them happen during the insertion of the gastroscopy. Therefore, it is necessary to develop an effective method for evaluating the performance of the operator. One appropriate metric is the time interval between the mouth and esophagus during the intubation. In this paper, we propose a gastroscopy video processing system based on deep learning to automatically evaluate the mouth-esophagus passage time. In this system, a Convolutional Neural Network (CNN) based model is adopted to detect the mouth and esophagus, track the timestamps of the last appearance of the mouth and the first appearance of the esophagus, and calculate the interval between those appearances. Our system is capable of dealing with abnormal circumstances that can occur during a procedure, as well as reporting accurate results. Experiment results show that our best model achieves an accuracy of 88.92% on image dataset, and an accuracy of 99.86% on videos for the mouth-esophagus passage time.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115110771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DRL-TA: A Type-aware Task Scheduling and Load Balancing Method based on Deep Reinforcement Learning in Heterogeneous Computing Environment","authors":"Changyong Sun, Tan Yang, Youxun Lei","doi":"10.1109/ICTAI56018.2022.00181","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00181","url":null,"abstract":"Task scheduling and load balancing in heterogeneous computing environments has been a challenge for long, especially when dealing with multiple types of task input batches. In this scenario, existing methods cannot take into account both the high efficiency of task processing and the full utilization of cluster resources. However, the rise of artificial intelligence methods provides a new way to solve this problem. In this paper, we design a type-aware task scheduling method based on deep reinforcement learning to tackle multiple types of tasks in heterogeneous computing environment. First, we adopt prioritized dueling double deep q-learning network to make action decisions for each batch of input tasks. Then we build a task type prediction neural network to predict the task type of the input task, and then use the Monte Carlo algorithm based on reward value to realize the load balancing of the scheduled cluster. To verify the effectiveness of our proposed method, we use a widely used dataset Alibaba cluster trace dataset for our experiments. Experimental results show that our proposed algorithm can significantly shorten the average makespan of task batches and achieve better load balancing effect compared with other existing solutions.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116538357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Private and Shared Feature Extractors Based on Hierarchical Neighbor Encoder for Adaptive Few-Shot Knowledge Graph Completion","authors":"Canqun Yang, Weiwen Zhang","doi":"10.1109/ICTAI56018.2022.00067","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00067","url":null,"abstract":"While Knowledge Graphs (KGs) have been applied in many AI tasks, KGs are known for being incomplete with many missing facts. Previous works rely on a large number of training data for KG completion. However, there are often few entity pairs available for most relations in KGs. In this paper, we propose a Few-shot Knowledge Graph Completion (FKGC) model, named Private and Shared feature extractors based on Hierarchical neighbor encoder for Adaptive few-shot knowledge graph completion (PSHA). In the PSHA model, we first exploit the hierarchical attention mechanism to extract the inherent and valuable hidden information of the neighborhood surrounding the entity. Following that, we adopt a private feature extractor to extract the private features of relation information of the entity pairs, and then a shared feature extractor is used to extract the shared features of the entity pairs of the support set. In addition, an adaptive aggregator aggregates entity pairs of the support set about the query. We conduct experiments on the 2-shot and 5-shot of the NELL-One and CoDEx-S-One dataset. The experimental results show that the PSHA outperforms the existing FKGC models in both scenarios.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122066913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiscale Approach in Deep Convolutional Networks for Minutia Extraction from Contactless Fingerprint Images","authors":"Anderson Nogueira Cotrim, H. Pedrini","doi":"10.1109/ICTAI56018.2022.00142","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00142","url":null,"abstract":"Biometric identification by contactless fingerprinting has been a trend in recent years, reinforced by the pandemic of the new coronavirus (COVID-19). Contactless acquisition tends to be a more hygienic acquisition category with greater user acceptance because it is less invasive and does not require the use of a surface touched by other people as traditional acquisition does. However, this area presents some challenging tasks. Contact-based sensors still generally provide greater biometric effectiveness since the minutiae are more pronounced due to the high contrast between ridges and valleys. On the other hand, contactless images typically have low contrast, so the methods fail with spurious or undetectable details, demonstrating the need for further studies in this area. In this work, we propose and analyze a robust scaled deep learning model for extracting minutiae in contactless fingerprint images. The results, evaluated on three datasets, show that the proposed method is competitive against other minutia extraction algorithms and commercial software.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122073933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Implicit Discourse Relation Classification by Perceiving External Semantics and Convolving Internal Semantics","authors":"Zujun Dou, Yu Hong, Yu Sun, Xiao Li, Guodong Zhou","doi":"10.1109/ICTAI56018.2022.00080","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00080","url":null,"abstract":"Implicit discourse relation classification refers to a task of automatically determining relationships between arguments. It has been widely proven that, in a neural classification architecture, decoding discourse relations heavily relies on the reliable semantic representations of arguments. In addition, our previous survey shows that, for a target argument, the external semantic information hidden in the accompanying argument benefits the encoding of the target, either wholly or partially. Moreover, dependency structure appears as the crucial feature for synthesizing word senses of the entire words in arguments. Accordingly, we propose a novel method to enhance the current representation learning of pairwise arguments, which takes into consideration both external semantic information and internal dependency structure. In particular, we inject external semantic information into the Long-Short Term Memory (LSTM) unit of Recurrent Neural Network (RNN) through the input and forget gates. Different from the existing one-off interactive learning models, our method allows the neuronal memory of internal argument semantics to be affected by external information at each encoding step. On the basis, we apply the parser-based Graph Convolutional Networks (GCN) over the semantic presentations of words, so as to accumulate the closely-related semantic information in terms of dependency structures. We conduct experiments on Penn Discourse TreeBank Corpus of version 2.0 (PDTB 2.0). The test results illustrate that the proposed method enhances the baseline significantly, and it obtains comparable performance compared to the state of the art.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122122174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}