Kai Huang, Chaolin Pan, Jun Chu, L. Leng, Jun Miao, Junjiang Wu, Lingfeng Wang
{"title":"SiamORPN: Enabling Orthogonality between Object and Background in Siamese Object Tracking","authors":"Kai Huang, Chaolin Pan, Jun Chu, L. Leng, Jun Miao, Junjiang Wu, Lingfeng Wang","doi":"10.1109/ICTAI56018.2022.00100","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00100","url":null,"abstract":"Siamese-based trackers currently are the dominant tracking paradigm due to the balance between speed and performance. However, it is prone to drift and tracking failure when the environment is complex and similar objects interfere. While the Siamese-based trackers perform the correlation operation, the responses of the target object and background appear in different channels, i.e., the feature spaces of the target object and background have some orthogonality. However, when meeting background clutters and similar objects interfere, this orthogonality becomes weaker and the wrong classification contribution of the object and the background reduces the stability of the learned similarity function, leading to many misclassified pixels in the heatmaps. In this work, we proposed a SiamORPN to solve the above issues. It is incorporated at two levels: an Orthogonal Region Proposal Network (ORPN) and an Adaptive Pixel-wise Aggregation (APA) module. Specifically, for ORPN, the orthogonality between the object and the background maximizes the inter-class inertia. Moreover, the ORPN introduces the orthogonal module to enhance this orthogonality. For APA, it introduces two lightweight networks to predict the weights of all pixels in different heatmaps and the weights of all pixels in different regression offsets. Experiments on challenging benchmarks, including OTB2015, VOT2016, VOT2018, GOT-10k test set, UAV123, LaSOT, and TrackingNet, demonstrate the proposed SiamORPN outperforms many SOTA trackers and achieves leading performance. The inference speed at GTX1080Ti can reach about 32 FPS, meeting the real-time requirements.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130229044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extending a Refinement Acting Engine for Fleet Management: Concurrency and Resources","authors":"Jérémy Turi, Arthur Bit-Monnot","doi":"10.1109/ICTAI56018.2022.00216","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00216","url":null,"abstract":"Recent years have seen an important increase in the complexity of deployed robotic systems, both in terms of the number of robots involved, and scale of the tackled problems. The key challenge in this context is to allow the design of fleet control systems that, on the one hand, allow flexible and reactive operation of individual robots and, on the other hand, enable the system to optimize the global behavior of the fleet in order to increase its effectiveness and efficiency. To approach this problem, we propose to extend the Refinement Acting Engine (RAE) that has been used to program the behavior of autonomous agents through a hierarchical decomposition of high-level tasks into primitive commands, and is the subject of active research in order to guide its decisions with planning and scheduling techniques. The core of our proposal is to provide first-hand support for concurrency in the RAE procedure, allowing a natural representation for concurrent systems by reasoning on resource allocation. The resulting acting engine exploits a custom language that is designed to ease its integration with planning engines, both through its simple and orthogonal core constructs as well as in the explicit identification of decision points in the system operation. We provide an initial validation of the system in simulation on a logistic problem involving a fleet of robots.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130449024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junan Huang, Zhiqiu Huang, Guohua Shen, Heng Xu, Gaoyang Hua
{"title":"SIA-Net: Scalable Interaction-Aware Network for Vehicle Trajectory Prediction Based on Self-Attention","authors":"Junan Huang, Zhiqiu Huang, Guohua Shen, Heng Xu, Gaoyang Hua","doi":"10.1109/ICTAI56018.2022.00120","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00120","url":null,"abstract":"In order to navigate through different scenarios safely and efficiently, self-driving vehicles must predict future trajectories of other vehicles, which is a challenging task due to the implicit vehicle interactions in the driving scenario. Because there is no predefined number of surrounding vehicles, the model must be scalable to cope with scenarios of different vehicle numbers with high accuracy and low computation cost for jointly predicting future trajectories of all vehicles. However, previous methods mainly focus on predicting a single trajectory of the target vehicle, which makes them subject to accuracy and computation speed. In this paper, we propose SIA-Net that predicts future trajectories of all vehicles in the scenario independent of the vehicle number. SIA-Net learns the implicit interactions of all vehicles by self-attention social pooling and generates each trajectory through one forward propagation by attentional decoder. Experiments demonstrate the improvement of our model in prediction accuracy on the publicly available NGSIM and INTERACTION datasets while keeping the computation cost low. We also present qualitative analysis to study the mechanism of our model.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123203685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Network New Word Discovery Framework Based on Sentence Semantic Vector Similarity","authors":"GanFeng Yu, Yue Feng Ma, Yang Song","doi":"10.1109/ICTAI56018.2022.00052","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00052","url":null,"abstract":"New word discovery is a key problem in text information retrieval technology. Methods in new word discovery are often closely related to words. Because their target is words, the findings are obtained by designing methods to analyze words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network new words that are far from standard Chinese expression. How detect network new words is one of the important goals in the field of new word discovery today. In this paper, we integrate the word embedding model and clustering methods to propose a network new word discovery framework based on sentence semantic similarity (S3-N2WD) to detect network new words effectively from the network texts. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes new network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network new words but also realizes the standard word meaning of the discovery of it, which reflects the effectiveness of our work.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125825663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Face Recognition Using Neural Aggregation Networks with Mutual Relational Learning","authors":"Kangli Zeng, Zhongyuan Wang, Tao Lu, Jianyu Chen","doi":"10.1109/ICTAI56018.2022.00104","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00104","url":null,"abstract":"Video face recognition benefits profoundly from deep convolutional neural networks (CNNs), which learn robust feature embeddings. However, due to their fixed geometric structures, CNNs are inherently limited in modeling the significant variations from the angle, pose, occlusion and other factors of face images. In this paper, a neural aggregation network based on mutual relation learning is proposed for video face recognition. First, Intra-frame Relational Learning network (Intra-Net) is introduced, which models the interdependencies between the re-gional components of individual features and develops relevance between fine-grained features. Such processing can determine the region of interest adaptively according to the quality of the input face image to achieve the extraction of valuable information. Secondly, we introduce Inter-frame Relational Learning Network (Inter-Net), which considers the most significant appearance representation in the overall structure of the face image to cor-relate the complementarity of features between frames. Finally, information aggregation is performed by combining Inter-Net and Intra-Net. Joint optimization of the two branches allows our model to effectively exploit the complementary information between them to improve the aggregation capability. We validate the effectiveness of our model for video face recognition, proving its superiority over state-of-the-art methods on two benchmark datasets.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126663073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time Series Augmentation with Time-Scale Modifications and Piecewise Aggregate Approximation for Human Action Recognition","authors":"Mariusz Oszust, Dawid Warchoł","doi":"10.1109/ICTAI56018.2022.00108","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00108","url":null,"abstract":"In this paper, a method for time series augmentation, aiming at the improvement of human action recognition accuracy of a deep learning classifier, is proposed. The approach performs time-scale modifications of the input time series and transforms them into compact sequences of time segments using Piecewise Aggregate Approximation (PAA) to facilitate the training of a neural network. The approach is compared against related methods on six representative datasets using Bidirectional Long Short-Term Memory (BiLSTM) classifier. It is shown that the resulting artificial time series lead to a better performance of the deep learning model than augmented data samples generated by popular approaches. The source code of the method is available at https://marosz.kia.prz.edu.pl/Adder.html.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"94 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126870508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic Perspectives","authors":"Wenjin Xie, Siyuan Liu, Xiaomeng Wang, Tao Jia","doi":"10.1109/ICTAI56018.2022.00043","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00043","url":null,"abstract":"Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116232761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Heterogeneous Feature Representation for Document Layout Understanding","authors":"Guosheng Feng, Danqing Huang, Chin-Yew Lin, Damjan Dakic, Milos Milunovic, Tamara Stankovic, Igor Ilic","doi":"10.1109/ICTAI56018.2022.00046","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00046","url":null,"abstract":"There are increasing interests in document layout representation learning and understanding. Transformer, with its great power, has become the mainstream model architecture and achieved promising results in this area. As elements in a document layout consist of multi-modal and multi-dimensional features such as position, size, and its text content, prior works represent each element by summing all feature embeddings into one unified vector in the input layer, which is then fed into the self-attention for element-wise interaction. However, this simple summation would potentially raise mixed correlations among heterogeneous features and bring noise to the representation learning. In this paper, we propose a novel two-step disentangled attention mechanism to allow more flexible feature interactions in the self-attention. Furthermore, inspired by the principles of document design (e.g., contrast, proximity), we propose an unsupervised learning objective to constrain the layout representations. We verify our approach on two layout understanding tasks, namely element role labeling and image captioning. Experiment results show that our approach achieves state-of-the-art performances. Moreover, we conduct extensive studies and observe better interpretability using our approach.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115648408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-scale Intermediate Flow Estimation for Video Frame Interpolation","authors":"Zehua Fan, Feng Zhu, Lei Li, Xiaoyang Tan","doi":"10.1109/ICTAI56018.2022.00137","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00137","url":null,"abstract":"Video frame interpolation is one of the most chal-lenging tasks in video processing, which aims to synthesize intermediate frames between consecutive frames. In this work, we propose a flow-based approach called Multi-scale Intermediate Flow Estimation (MIFE) to balance the fineness and estimation range of the flows. MIFE consists of two main modules. Specifically, (1) Refined Flow Estimation uses a shifted window to estimate low-resolution intermediate flows at three levels. The refined full-resolution flow of each level is a weighted combination of nearby low-resolution flows, where the weights are determined by the similarity scores of the input frames and the reliability scores of the flows. (2) Multi-scale Flow Fusion generates fusion masks based on the estimable flow range and the estimated flow size. It fuses three levels of flows and refines the results. Experimental results show that the proposed method achieves good performance on various datasets. The source code is available at https://github.com/fzh169/MIFE.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121639211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reshaping the Semantic Logits for Proposal-free Panoptic Segmentation","authors":"Tianqi Lu, Chenyue Zhu","doi":"10.1109/ICTAI56018.2022.00136","DOIUrl":"https://doi.org/10.1109/ICTAI56018.2022.00136","url":null,"abstract":"We propose to enable the general semantic segmentation frameworks to separate instances so that such frameworks can be used for the panoptic segmentation task. In the semantic segmentation frameworks, the logits which are output from the neural network and normalized by the following softmax function can only distinguish classes but not instances. In this work, we find simple regularization on the logits can help to single out the instances, which is modeled by an energy-based representation, energy surface. Several regularization approaches are discussed and a novel persistent homology-based instance extraction method is proposed to obtain the instances. Finally, we demonstrate the generality of the logit regularization on different base semantic segmentation frameworks and evaluating them on Cityscapes, Mapillary Vistas, and COCO. High-quality semantic segmentation frameworks such as DeepLabV3+ and HRNet-OCR can achieve competitive performance to the state-of-the-art proposal-free panoptic segmentation solver. Codes and trained models will be made public.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125287179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}