Xiangdong Zhang, Futai Liang, Xin Chen, Min Cheng, Qiao-lin Hu, Song He
{"title":"Track Initiation Method Based on Deep Learning and Logic Method","authors":"Xiangdong Zhang, Futai Liang, Xin Chen, Min Cheng, Qiao-lin Hu, Song He","doi":"10.1109/CMVIT57620.2023.00020","DOIUrl":"https://doi.org/10.1109/CMVIT57620.2023.00020","url":null,"abstract":"This paper introduces a method of vehicle millimeter wave radar track initiation based on deep learning. In the complex and transient road environment of automobile radar, fast and correct track initiation is the key to multi-target tracking. In this paper, two improvements have been made to the classical logic method. One is to use YOLOv5 instead of CFAR to detect the target in the range Doppler image to improve the target detection effect. The other is to improve the track start condition of the logic method and use the idea of probability event to improve the track start speed. The m/n logic in the improved method means that if the probability of no less than n dots in the sliding window is greater than 90% or the probability of at least n dots being true is not less than 80%, the track will start successfully. This paper proves the superiority of the track initiation method based on deep learning and logic method in several special cases.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127254935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ran Shen, Qingshun She, Gang Sun, Hao Shen, Yiling Li, Weihao Jiang
{"title":"Max-Pooling Based Self-Attention with Transformer for Speaker Verification","authors":"Ran Shen, Qingshun She, Gang Sun, Hao Shen, Yiling Li, Weihao Jiang","doi":"10.1109/CMVIT57620.2023.00012","DOIUrl":"https://doi.org/10.1109/CMVIT57620.2023.00012","url":null,"abstract":"Transformer has become predominant in many natural language processing (NLP) tasks for its powerful long-term sequence processing abilities. As the central idea of Transformer, self-attention mechanism is originally proposed to model global information for textual sequences. However, discriminating acoustic feature sequences from different speakers mostly rely on local information, which makes Transformer less competitive in the speaker verification task. We alleviate this limitation with a max-pooling based self-attention mechanism to enlarge the receptive field of the attention heads thus to better capture local information. Besides, we also introduce and compare position-based and content-based self-attention mechanism to self-attention and explore different frame-level pooling methods for speaker embeddings. Experiments conducted on AISHEL-1 and LibriSpeech datasets demonstrate that the method we proposed accomplishes the most excellent performance with statistic attentive pooling (SAP) compared with the original Transformer baseline systems.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130335863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CRA: Text to Image Retrieval for Architecture Images by Chinese CLIP","authors":"Siyuan Wang, Yuyao Yan, Xi Yang, Kaizhu Huang","doi":"10.1109/cmvit57620.2023.00015","DOIUrl":"https://doi.org/10.1109/cmvit57620.2023.00015","url":null,"abstract":"Text-to-image retrieval is revolutionized since the Contrastive Language-Image Pre-training model was proposed. Most existing methods learn a latent representation of text and then align its embedding with the corresponding image’s embedding from an image encoder. Recently, several Chinese CLIP models have supported a good representation of paired image-text sets. However, adapting the pre-trained retrieval model to a professional domain still remains a challenge, mainly due to the large domain gap between the professional and general text-image sets. In this paper, we introduce a novel contrastive tuning model, named CRA, using Chinese texts to retrieve architecture-related images by fine-tuning the pre-trained Chinese CLIP. Instead of fine-tuning the whole CLIP model, we engage the Locked-image Text tuning (LiT) strategy to adapt the architecture-terminology sets by tuning the text encoder and freezing the pre-trained large-scale image encoder. We further propose a text-image dataset of architectural design. On the text-to-image retrieval task, we improve the metric of R@20 from 44.92% by the original Chinese CLIP model to 74.61% by our CRA model in the test set.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131968418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ran Shen, Gang Sun, Hao Shen, Yiling Li, Liangfeng Jin, Han Jiang
{"title":"SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation","authors":"Ran Shen, Gang Sun, Hao Shen, Yiling Li, Liangfeng Jin, Han Jiang","doi":"10.1109/CMVIT57620.2023.00030","DOIUrl":"https://doi.org/10.1109/CMVIT57620.2023.00030","url":null,"abstract":"Converting text into the structured query language (Text2SQL) is a research hotspot in the field of natural language processing (NLP), which has broad application prospects. In the era of big data, the use of databases has penetrated all walks of life, in which the collected data is large in scale, diverse in variety, and wide in scope, making the data query cumbersome and inefficient, and putting forward higher requirements for the Text2SQL model. In practical applications, the current mainstream end-to-end Text2SQL model is not only difficult to build due to its complex structure and high requirements for training data, but also difficult to adjust due to massive parameters. In addition, the accuracy of the model is hard to achieve the desired result. Based on this, this paper proposes a pipelined Text2SQL method: SPSQL. This method disassembles the Text2SQL task into four subtasks——table selection, column selection, SQL generation, and value filling, which can be converted into a text classification problem, a sequence labeling problem, and two text generation problems, respectively. Then, we construct data formats of different subtasks based on existing data and improve the accuracy of the overall model by improving the accuracy of each submodel. We also use the named entity recognition module and data augmentation to optimize the overall model. We construct the dataset based on the marketing business data of the State Grid Corporation of China. Experiments demonstrate our proposed method achieves the best performance compared with the end-to-end method and other pipeline methods.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128564791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Basic Research on Machine Vision Underpinned by Image Frame Algebra (VFA) and Visual Semantic Algebra (VSA)","authors":"Guoyin Wang","doi":"10.1109/CMVIT57620.2023.00009","DOIUrl":"https://doi.org/10.1109/CMVIT57620.2023.00009","url":null,"abstract":"Computer vison [1], [2], [3], [4], [5] studies properties of machine vision, its semantic understanding, and general manipulations by Intelligent Mathematics (IM) [6], [7], [8], [9], [10] [11], [12], [13], [14], [15] [16], [17]. Computer vison has been studies from various aspects such as algorithmic methods, analysis methods, pattern recognitions, and neural-network-regression (AI) technologies [2], [3]. However, there is a lack of fundamental theories for enabling autonomous image recognition and processing by machines. Basic research on contemporary IM has revealed that formal manipulations of visual objects by intelligent machines may be rigorously implemented by Image Frame Algebra (IFA) [8], [18] in the front-end and Visual Semantic Algebra (VSA) [19] in the backend. IFA formally manipulates visual images as general 2D matrixes by a set of algebraic operators such as modeling, analyses, syntheses, feature elicitation, and pattern recognition [4], [5], [18]. Then, its counterpart, VSA, transforms the geographic relations of visual objects to their semantic interpretations by algebraic analyses and compositions. The coherent theory of IFA and VSA provides a formal methodology for machine-enabled image processing and comprehension. This keynote presents a theoretical framework of machine vision underpinned by IFA and VSA for the structural denotations of visual objects and functional manipulations of visual mechanisms [3], [8], [9]. It demonstrates how the persistent challenges to machine vision may be rigorously and efficiently solved by the IFA/VSA methodology. Case studies on applying IFA/VSA for rigorous visual pattern detection, recognition, analysis, and composition in real world will be demonstrated [5], [18], [20]. As two coherent paradigms of IM, among others [21], [22], [23], [24], [25] [26], [27], [28], [29], [30], IFA and VSA have been applied not only in robot visual and spatial reasoning, but also in computational intelligence and AI for rigorously representing and manipulating of visual objects and patterns by machine recognition and cognition [31], [32], [33], [34], [35] [36], [37], [38], [39], [40], [41], [42], [43], [44], [45] [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65] [66], [67], [68], [69], [70], [71], [72], [73], [74], [75] [76].","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126997702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regional Transformer for Image Super-Resolution","authors":"Sen Yang, Jiahong Yang, Dahong Xu, Xi Li","doi":"10.1109/cmvit57620.2023.00011","DOIUrl":"https://doi.org/10.1109/cmvit57620.2023.00011","url":null,"abstract":"In the image super-resolution algorithm model, a large receptive field can provide more valuable features, so the Transformer with strong information interaction ability has achieved excellent results in image super-resolution processing applications. However, when the range of the receptive field reaches a certain critical value, the restoration performance of the super-resolution algorithm also reaches a certain critical value, which indicates that unconditionally increasing the receptive field will not continue to promote the improvement of the restoration performance. At the same time, the larger the receptive field range, the more data the model needs to process, which also seriously increases the computational complexity of the algorithm. In order to exchange information in a wider range more effectively, in this paper, a new type of super-resolution network based on Transformer, namely Regional Transformer, is designed. The key element in the newly designed network structure is the Region Block (RB) with the Boundary Restriction (BR) mechanism. In addition, the paper designs a Boundary Restriction based on coarse-to-fine pipes. This paper conducts a large number of experiments on multiple datasets, and the experiments show that the network structure designed in this paper has a significant improvement in performance.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129420130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Automatic Solution in Security Inspection","authors":"Hui Zhang, Xiaoli Zhang","doi":"10.1109/CMVIT57620.2023.00032","DOIUrl":"https://doi.org/10.1109/CMVIT57620.2023.00032","url":null,"abstract":"In this paper, we present a brand new dataset named cellphone buttery defects in X-ray(CBDx). CBDx consists of 300 X-ray images and 250 of them are anomaly free. We name them ‘good’. Others have some defects in the area of buttery. We name them ‘anomaly’, as Fig. 1. It raises a new task of detecting anomaly defects of cellphone butteries. But the challenge is how to distinguish the anomaly in the case of only training the ‘good’ cellphone. We define this task as an anomaly detection task. We propose an approach to deal with the task from the perspective of unbalanced classification. Specifically, we propose a data augmentation strategy that creates an anomaly sample mimic to the real defects. It helps the classifier to learn self-supervised deep representations and then make it an one-class classifier based on the representations. The classifier is well designed to discriminate the defect samples from the good ones. We evaluate different data augmentation strategies on CBDx. Our approach is more significant in this scenario with no defect training samples, which can be applied in real-world security inspection someday. Also, it can be used on the industrial texture anomaly detection, such as MVTec_AD.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131510609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI or IA (Intelligence Augmentation)- Future Trends","authors":"S. Latifi","doi":"10.1109/cmvit57620.2023.00010","DOIUrl":"https://doi.org/10.1109/cmvit57620.2023.00010","url":null,"abstract":"Artificial Intelligence (AI) and at its heart Machine Learning (ML) is arguably the biggest export form computer science to other disciplines. AI/ML has had a profound impact on our lives and will continue to do so at an accelerated rate. At the same time, the worries about social issues associated with the AI such as privacy, security and controllability continue to grow. Some scientists fear the AI will take over the mankind once it achieves general intelligence, whereas others believe the AI will augment human’s intelligence and can be kept in control. What does the future hold and what are the pitfalls as the AI expands its influence in the world? In this talk, I briefly review the history of the AI and discuss the present status of AI/ML. I also address the barriers along the way to achieve general intelligence. Some problems that need to be addressed by the research community as we advance the AI/ML technology will also be discussed.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125359061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Skin lesion segmentation combining feature refinement and context guide","authors":"Heng Jie, Yuling Chen","doi":"10.1109/cmvit57620.2023.00038","DOIUrl":"https://doi.org/10.1109/cmvit57620.2023.00038","url":null,"abstract":"Aiming at the problem of high-precision segmentation of skin lesions, a skin lesion segmentation network combining feature refinement and context guide is proposed. Firstly, a dual-layer feature thinning module is designed to mine the difference information and common information between adjacent feature layers, and generate weight vectors to guide the encoder feature map to gradually refine, so as to enhance the ability of feature expression. Secondly, a dense residual pyramid context guide module is designed at the highest level of the network to expand the network’s receptive field through cascading expansion convolution, and integrate features of different scales in a hierarchical residual connection method to achieve dense aggregation of spatial information, and then combine global and local attention establish a multi-scale and multi-dimensional context prior to guiding the network to pay more attention to the target area and reduce noise interference. Finally, the cross-entropy loss and weighted boundary loss are combined to supervise the shape of the lesion in the model training process to improve the accuracy of boundary prediction.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121711632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from Program Chair","authors":"","doi":"10.1109/cmvit57620.2023.00006","DOIUrl":"https://doi.org/10.1109/cmvit57620.2023.00006","url":null,"abstract":"","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127282154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}