Nguyen Anh Minh Mai, Pierre Duthon, L. Khoudour, Alain Crouzil, S. Velastín
{"title":"Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth Estimation and 3D Object Detection","authors":"Nguyen Anh Minh Mai, Pierre Duthon, L. Khoudour, Alain Crouzil, S. Velastín","doi":"10.1049/icp.2021.1442","DOIUrl":"https://doi.org/10.1049/icp.2021.1442","url":null,"abstract":"The ability to accurately detect and localize objects is recognized as being the most important for the perception of selfdriving cars. From 2D to 3D object detection, the most difficult is to determine the distance from the ego-vehicle to objects. Expensive technology like LiDAR can provide a precise and accurate depth information, so most studies have tended to focus on this sensor showing a performance gap between LiDAR-based methods and camera-based methods. Although many authors have investigated how to fuse LiDAR with RGB cameras, as far as we know there are no studies to fuse LiDAR and stereo in a deep neural network for the 3D object detection task. This paper presents SLS-Fusion, a new approach to fuse data from 4-beam LiDAR and a stereo camera via a neural network for depth estimation to achieve better dense depth maps and thereby improves 3D object detection performance. Since 4-beam LiDAR is cheaper than the well-known 64-beam LiDAR, this approach is also classified as a low-cost sensorsbased method. Through evaluation on the KITTI benchmark, it is shown that the proposed method significantly improves depth estimation performance compared to a baseline method. Also when applying it to 3D object detection, a new state of the art on low-cost sensor based method is achieved.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132362652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison of Different Embedding Methods on Session Based Recommendation with Graph Neural Networks","authors":"M. Aker, C. E. Yıldız, Y. Yaslan","doi":"10.1049/icp.2021.1436","DOIUrl":"https://doi.org/10.1049/icp.2021.1436","url":null,"abstract":"Predicting users' next behavior based on their previous actions is one of the most valuable but also difficult task in the e-commerce and e-marketing fields. Recommendation systems that build upon session-based data try to bring a solution to that desire. The ultimate goal of this type of recommendation system is trying to make the best predictions about the succeeding item. Sequential order of the items within a session is also kept in mind in such systems. Recently proposed SR-GNN (Session Based Recommendation with Graph Neural Networks) has benefited from graph theory and proven its adequacy about being the state-of-art session-based recommendation model. Furthermore, there are some parts exist that can improve the overall performance. The current model uses primitive embedding type which is the simplest way of representing the items, attributes, and their relationships between each other. This study brings the SR-GNN recommendation model with di erent types of graph embedding techniques which are widely used in a variety of research areas. Aim of this research is investigating the the e ect of the embedding types to the SR-GNN. The proposed variety of embedding techniques that applied to SR-GNN show similar but slightly worse results compared to the original SR-GNN embedding. The experimental results obtained on two real datasets show that the performance of the SR-GNN model is not a ected by the embedding models and the power of the model comes from the gated graph neural network model.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116393084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regional Delineation Based on A Modularity Maximization Approach","authors":"Qinghe Liu, Zhicheng Liu, Yinfei Xu, Weiting Xiong, Junyan Yang, Qiao Wang","doi":"10.1049/icp.2021.1461","DOIUrl":"https://doi.org/10.1049/icp.2021.1461","url":null,"abstract":"Regional delineation is critical to urban policy formulation and infrastructure construction. For the convenience of regional management, the population flow in the same region should be as dense as possible, and that between different regions should be as little as possible. We consider the population flow as a kind of correlation between urban plots, and construct graph by using unit plots as nodes and population flow as the edges. By combining strategies of hierarchical aggregation and node movement, a novel community detection algorithm based on Modularity maximization is proposed. The efficacy of the proposed algorithm on Modularity optimization is verified through experiments using real world data set. Our method outperforms baselines on objective optimization with an acceptable execution time. Moreover, a case study in Nanjing China is presented, and the result demonstrates the rationality of the regional delineation from our proposal.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128243737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leodécio Braz, Vinicius Teixeira, H. Pedrini, Z. Dias
{"title":"Image-Text Integration Using a Multimodal Fusion Network Module for Movie Genre Classification","authors":"Leodécio Braz, Vinicius Teixeira, H. Pedrini, Z. Dias","doi":"10.1049/icp.2021.1456","DOIUrl":"https://doi.org/10.1049/icp.2021.1456","url":null,"abstract":"Multimodal models have received increasing attention from researchers for using the complementarity of data to obtain a better inference on the dataset. These multimodal models have been applied to several deep learning tasks, such as emotion recognition, video classification and audio-visual speech enhancement. In this paper, we propose a multimodal method that has two branches, one for text classification and another for image classification. In the image classification branch, we use the Class Activation Mapping (CAM) method as an attention module for the identification of relevant regions of the images. To validate our method, we used the MM-IMDB dataset, which consists of 25959 movies with their respective plot outlines, poster and genres. Our results showed that our method averaged 0:6749 in F1-Weight, 0:6734 in F1-Samples, 0:6750 in F1-Micro and 0:6159 in F1-Macro, achieving better results than the state of the art in the F1-Weight and F1-Macro metrics, and being the second best result in the F1-Samples and F1-Micro metrics.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115842033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enforced Isolation Deep Network For Anomaly Detection In Images","authors":"Demetris Lappas, Vasileios Argyriou, Dimitrios Makris","doi":"10.1049/icp.2021.1441","DOIUrl":"https://doi.org/10.1049/icp.2021.1441","url":null,"abstract":"Challenges in anomaly detection include the implicit definition of anomaly, benchmarking against human intuition and scarcity of anomalous examples. We introduce a novel approach designed to enforce separation of normal and abnormal samples in an embedded space using a refined Triple Loss Function, within the paradigm of Deep Networks. Training is based on randomly sampled triplets to manage datasets with small proportion of anomalous data. Results for a range of proportions between normal and anomalous data are presented on the MNIST, CIFAR10 and Concrete Cracks datasets and compared against the current state of the art.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134028744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José Sebastián Gómez Meza, J. Delpiano, S. Velastín, R. Fernández, Sebastián Seriani Awad
{"title":"Multiple Object Tracking for Robust Quantitative Analysis of Passenger Motion While Boarding and Alighting a Metropolitan Train","authors":"José Sebastián Gómez Meza, J. Delpiano, S. Velastín, R. Fernández, Sebastián Seriani Awad","doi":"10.1049/icp.2021.1468","DOIUrl":"https://doi.org/10.1049/icp.2021.1468","url":null,"abstract":"To achieve significant improvements in public transport it is necessary to develop an autonomous system that locates and counts passengers in real time in scenarios with a high level of occlusion, providing tools to efficiently solve problems such as reduction and stabilization in travel times, greater fluency, better control of fleets and less congestion. A deep learning method based in transfer learning is used to accomplish this: You Only Look Once (YOLO) version 3 and Faster RCNN Inception version 2 architectures are fine tuned using PAMELA-UANDES dataset, which contains annotated images of the boarding and alighting of passengers on a subway platform from a superior perspective. The locations given by the detector are passed through a multiple object tracking system implemented based on a Markov decision process that associates subjects in consecutive frames and assigns identities considering overlaps between past detections and predicted positions using a Kalman filter.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115589577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Cloud-NARX Estimation Algorithm for Uncertainty Analysis of Air Pollution Prediction","authors":"Y. Gu, B. Li, Q. Meng, P. Shang","doi":"10.1049/icp.2021.1440","DOIUrl":"https://doi.org/10.1049/icp.2021.1440","url":null,"abstract":"Air pollution causes significant negative impacts on climate, environment and human health. Monitoring and forecasting the PM2.5, one of the most dangerous pollutant, are crucial. However, the strong prediction uncertainty in peak periods can potentially increase the prediction error and decrease the model reliability. To overcome this problem, prediction intervals are needed to quantify the uncertainty and provide information of the confidence in the prediction. In this article, an improved cloud-NARX estimation algorithm is developed to quantify the uncertainty and produce prediction intervals. The proposed method integrates a new recursive estimation procedure and two new criteria, which significantly improve the training speed and prediction interval accuracy. The proposed method is applied to predict PM2.5 for one hour ahead. From our results, the proposed method achieves higher accuracies of both average predictions and prediction intervals than other methods. This study provides a novel framework for quantifying the uncertainty of time series prediction, and to improve the model robustness.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114410471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Emotion recognition using multimodal matchmap fusion and multi-task learning","authors":"Ricardo Pizarro, Juan Bekios-Calfa","doi":"10.1049/icp.2021.1454","DOIUrl":"https://doi.org/10.1049/icp.2021.1454","url":null,"abstract":"Emotion recognition is a complex task due to the great intraclass and inter-class variability that exists implicitly in the problem. From the point of view of the intra-class, an emotion can be expressed by different people, which generates different representations of it. For the inter-class case, there are some kinds of emotions that are alike. Traditionally, the problem has been approached in different ways, highlighting the analysis of images to determine the facial expression of a person to extrapolate it to a type of emotion, also, the use of audio sequences to estimate the emotion of the speaker. The present work seeks to solve this problem using multimodal techniques, multitask and Deep Learning. To help with these problems, the use of a fusion method based on the similarity between audio and video modalities will be investigated and applied to the emotion classification problem. The use of this method allows the use of auxiliary tasks that enhance the learned relationships between the emotions shown in video frames and audio frames belonging to the same emotion label and punish those that are different. The results show that when using the fusion method based on the similarity of modalities together with the use of multiple tasks, the classification is improved by 7% with respect to the classification obtained in the baseline model that uses concatenation of the characteristics of each modality, the experiments are performed on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116993082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Labeling Consecutive Search Query Pairs Using Siamese Networks","authors":"N. Ateş, Y. Yaslan","doi":"10.1049/icp.2021.1464","DOIUrl":"https://doi.org/10.1049/icp.2021.1464","url":null,"abstract":"As internet users interact with search engines to meet their information needs, a huge amount of search queries are stored. Proper analysis of such query data enhances prediction and understanding of user tasks. User tasks can be used to increase the performance of search engines and recommendations. Query segmentation is an initial step that is commonly performed while analyzing user queries. It determines whether two consecutive query expressions belong to the same sub-task. Any deficits in query segmentation process is likely to affect all other advanced query based steps and activities like task identification and query suggestion. Recently, some researchers focused on application of algorithms including Recurrent Neural Networks (RNN) to seek for the semantics of queries, and attention based neural networks, but such methodologies are not task-specific. In this paper, we propose a Siamese Convolutional Neural Network (CNN) that models input queries into a more task-specific embedding and a decider network that does the labelling. The proposed method is compared with Context Attention Based Long Short Term Memory (CA-LSTM) and Bi-RNN Gated Retified Unit (GRU) models on Webis Search Mission Corpus 2012 (WSMC12) and Cross-Session Task Extraction (CSTE) datasets. The proposed model performs 95%, implying a 1% improvement over the already existing models and an accuracy of 81% on CSTE dataset implying an improvement classification accuracy of 6% over the previous best results.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130564059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the use of multiple languages for crisp and fuzzy speaker identification","authors":"T. Aguiar de Lima, M. Da Costa-Abreu","doi":"10.1049/icp.2021.1431","DOIUrl":"https://doi.org/10.1049/icp.2021.1431","url":null,"abstract":"The use of speech for system identification is an important and relevant topic. There are several ways of doing it, but most are dependent on the language the user speaks. However, if the idea is to create an all-inclusive and reliable system that uses speech as its input, we must take into account that people can and will speak different languages and have different accents. Thus, this research evaluates speaker identification systems on a multilingual setup. Our experiments are performed using three widely spoken languages which are Portuguese, English, and Chinese. Initial tests indicated the systems have certain robustness on multiple languages. Results with more languages decreases our accuracy, but our investigation suggests these impacts are related to the number of classes.","PeriodicalId":431144,"journal":{"name":"11th International Conference of Pattern Recognition Systems (ICPRS 2021)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131728451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}