2023 7th International Conference on Machine Vision and Information Technology (CMVIT)最新文献_第2页

Track Initiation Method Based on Deep Learning and Logic Method 基于深度学习和逻辑方法的航迹起始方法

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/CMVIT57620.2023.00020

Xiangdong Zhang, Futai Liang, Xin Chen, Min Cheng, Qiao-lin Hu, Song He

引用次数: 0

Max-Pooling Based Self-Attention with Transformer for Speaker Verification 基于最大池化的变压器自关注说话人验证

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/CMVIT57620.2023.00012

Ran Shen, Qingshun She, Gang Sun, Hao Shen, Yiling Li, Weihao Jiang

引用次数: 0

CRA: Text to Image Retrieval for Architecture Images by Chinese CLIP CRA:基于中文CLIP的建筑图像文本到图像检索

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/cmvit57620.2023.00015

Siyuan Wang, Yuyao Yan, Xi Yang, Kaizhu Huang

{"title":"CRA: Text to Image Retrieval for Architecture Images by Chinese CLIP","authors":"Siyuan Wang, Yuyao Yan, Xi Yang, Kaizhu Huang","doi":"10.1109/cmvit57620.2023.00015","DOIUrl":"https://doi.org/10.1109/cmvit57620.2023.00015","url":null,"abstract":"Text-to-image retrieval is revolutionized since the Contrastive Language-Image Pre-training model was proposed. Most existing methods learn a latent representation of text and then align its embedding with the corresponding image’s embedding from an image encoder. Recently, several Chinese CLIP models have supported a good representation of paired image-text sets. However, adapting the pre-trained retrieval model to a professional domain still remains a challenge, mainly due to the large domain gap between the professional and general text-image sets. In this paper, we introduce a novel contrastive tuning model, named CRA, using Chinese texts to retrieve architecture-related images by fine-tuning the pre-trained Chinese CLIP. Instead of fine-tuning the whole CLIP model, we engage the Locked-image Text tuning (LiT) strategy to adapt the architecture-terminology sets by tuning the text encoder and freezing the pre-trained large-scale image encoder. We further propose a text-image dataset of architectural design. On the text-to-image retrieval task, we improve the metric of R@20 from 44.92% by the original Chinese CLIP model to 74.61% by our CRA model in the test set.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131968418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation 基于一步一步解析的文本到sql生成框架

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/CMVIT57620.2023.00030

Ran Shen, Gang Sun, Hao Shen, Yiling Li, Liangfeng Jin, Han Jiang

{"title":"SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation","authors":"Ran Shen, Gang Sun, Hao Shen, Yiling Li, Liangfeng Jin, Han Jiang","doi":"10.1109/CMVIT57620.2023.00030","DOIUrl":"https://doi.org/10.1109/CMVIT57620.2023.00030","url":null,"abstract":"Converting text into the structured query language (Text2SQL) is a research hotspot in the field of natural language processing (NLP), which has broad application prospects. In the era of big data, the use of databases has penetrated all walks of life, in which the collected data is large in scale, diverse in variety, and wide in scope, making the data query cumbersome and inefficient, and putting forward higher requirements for the Text2SQL model. In practical applications, the current mainstream end-to-end Text2SQL model is not only difficult to build due to its complex structure and high requirements for training data, but also difficult to adjust due to massive parameters. In addition, the accuracy of the model is hard to achieve the desired result. Based on this, this paper proposes a pipelined Text2SQL method: SPSQL. This method disassembles the Text2SQL task into four subtasks——table selection, column selection, SQL generation, and value filling, which can be converted into a text classification problem, a sequence labeling problem, and two text generation problems, respectively. Then, we construct data formats of different subtasks based on existing data and improve the accuracy of the overall model by improving the accuracy of each submodel. We also use the named entity recognition module and data augmentation to optimize the overall model. We construct the dataset based on the marketing business data of the State Grid Corporation of China. Experiments demonstrate our proposed method achieves the best performance compared with the end-to-end method and other pipeline methods.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128564791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Basic Research on Machine Vision Underpinned by Image Frame Algebra (VFA) and Visual Semantic Algebra (VSA) 基于图像帧代数和视觉语义代数的机器视觉基础研究

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/CMVIT57620.2023.00009

Guoyin Wang

{"title":"Basic Research on Machine Vision Underpinned by Image Frame Algebra (VFA) and Visual Semantic Algebra (VSA)","authors":"Guoyin Wang","doi":"10.1109/CMVIT57620.2023.00009","DOIUrl":"https://doi.org/10.1109/CMVIT57620.2023.00009","url":null,"abstract":"Computer vison [1], [2], [3], [4], [5] studies properties of machine vision, its semantic understanding, and general manipulations by Intelligent Mathematics (IM) [6], [7], [8], [9], [10] [11], [12], [13], [14], [15] [16], [17]. Computer vison has been studies from various aspects such as algorithmic methods, analysis methods, pattern recognitions, and neural-network-regression (AI) technologies [2], [3]. However, there is a lack of fundamental theories for enabling autonomous image recognition and processing by machines. Basic research on contemporary IM has revealed that formal manipulations of visual objects by intelligent machines may be rigorously implemented by Image Frame Algebra (IFA) [8], [18] in the front-end and Visual Semantic Algebra (VSA) [19] in the backend. IFA formally manipulates visual images as general 2D matrixes by a set of algebraic operators such as modeling, analyses, syntheses, feature elicitation, and pattern recognition [4], [5], [18]. Then, its counterpart, VSA, transforms the geographic relations of visual objects to their semantic interpretations by algebraic analyses and compositions. The coherent theory of IFA and VSA provides a formal methodology for machine-enabled image processing and comprehension. This keynote presents a theoretical framework of machine vision underpinned by IFA and VSA for the structural denotations of visual objects and functional manipulations of visual mechanisms [3], [8], [9]. It demonstrates how the persistent challenges to machine vision may be rigorously and efficiently solved by the IFA/VSA methodology. Case studies on applying IFA/VSA for rigorous visual pattern detection, recognition, analysis, and composition in real world will be demonstrated [5], [18], [20]. As two coherent paradigms of IM, among others [21], [22], [23], [24], [25] [26], [27], [28], [29], [30], IFA and VSA have been applied not only in robot visual and spatial reasoning, but also in computational intelligence and AI for rigorously representing and manipulating of visual objects and patterns by machine recognition and cognition [31], [32], [33], [34], [35] [36], [37], [38], [39], [40], [41], [42], [43], [44], [45] [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65] [66], [67], [68], [69], [70], [71], [72], [73], [74], [75] [76].","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126997702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Regional Transformer for Image Super-Resolution 图像超分辨率区域转换器

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/cmvit57620.2023.00011

Sen Yang, Jiahong Yang, Dahong Xu, Xi Li

{"title":"Regional Transformer for Image Super-Resolution","authors":"Sen Yang, Jiahong Yang, Dahong Xu, Xi Li","doi":"10.1109/cmvit57620.2023.00011","DOIUrl":"https://doi.org/10.1109/cmvit57620.2023.00011","url":null,"abstract":"In the image super-resolution algorithm model, a large receptive field can provide more valuable features, so the Transformer with strong information interaction ability has achieved excellent results in image super-resolution processing applications. However, when the range of the receptive field reaches a certain critical value, the restoration performance of the super-resolution algorithm also reaches a certain critical value, which indicates that unconditionally increasing the receptive field will not continue to promote the improvement of the restoration performance. At the same time, the larger the receptive field range, the more data the model needs to process, which also seriously increases the computational complexity of the algorithm. In order to exchange information in a wider range more effectively, in this paper, a new type of super-resolution network based on Transformer, namely Regional Transformer, is designed. The key element in the newly designed network structure is the Region Block (RB) with the Boundary Restriction (BR) mechanism. In addition, the paper designs a Boundary Restriction based on coarse-to-fine pipes. This paper conducts a large number of experiments on multiple datasets, and the experiments show that the network structure designed in this paper has a significant improvement in performance.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129420130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Automatic Solution in Security Inspection 安全检查的自动化解决方案

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/CMVIT57620.2023.00032

Hui Zhang, Xiaoli Zhang

{"title":"An Automatic Solution in Security Inspection","authors":"Hui Zhang, Xiaoli Zhang","doi":"10.1109/CMVIT57620.2023.00032","DOIUrl":"https://doi.org/10.1109/CMVIT57620.2023.00032","url":null,"abstract":"In this paper, we present a brand new dataset named cellphone buttery defects in X-ray(CBDx). CBDx consists of 300 X-ray images and 250 of them are anomaly free. We name them ‘good’. Others have some defects in the area of buttery. We name them ‘anomaly’, as Fig. 1. It raises a new task of detecting anomaly defects of cellphone butteries. But the challenge is how to distinguish the anomaly in the case of only training the ‘good’ cellphone. We define this task as an anomaly detection task. We propose an approach to deal with the task from the perspective of unbalanced classification. Specifically, we propose a data augmentation strategy that creates an anomaly sample mimic to the real defects. It helps the classifier to learn self-supervised deep representations and then make it an one-class classifier based on the representations. The classifier is well designed to discriminate the defect samples from the good ones. We evaluate different data augmentation strategies on CBDx. Our approach is more significant in this scenario with no defect training samples, which can be applied in real-world security inspection someday. Also, it can be used on the industrial texture anomaly detection, such as MVTec_AD.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131510609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI or IA (Intelligence Augmentation)- Future Trends AI或IA(智能增强)——未来趋势

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/cmvit57620.2023.00010

S. Latifi

引用次数: 0

Skin lesion segmentation combining feature refinement and context guide 结合特征细化和上下文引导的皮肤病变分割

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/cmvit57620.2023.00038

Heng Jie, Yuling Chen

{"title":"Skin lesion segmentation combining feature refinement and context guide","authors":"Heng Jie, Yuling Chen","doi":"10.1109/cmvit57620.2023.00038","DOIUrl":"https://doi.org/10.1109/cmvit57620.2023.00038","url":null,"abstract":"Aiming at the problem of high-precision segmentation of skin lesions, a skin lesion segmentation network combining feature refinement and context guide is proposed. Firstly, a dual-layer feature thinning module is designed to mine the difference information and common information between adjacent feature layers, and generate weight vectors to guide the encoder feature map to gradually refine, so as to enhance the ability of feature expression. Secondly, a dense residual pyramid context guide module is designed at the highest level of the network to expand the network’s receptive field through cascading expansion convolution, and integrate features of different scales in a hierarchical residual connection method to achieve dense aggregation of spatial information, and then combine global and local attention establish a multi-scale and multi-dimensional context prior to guiding the network to pay more attention to the target area and reduce noise interference. Finally, the cross-entropy loss and weighted boundary loss are combined to supervise the shape of the lesion in the model training process to improve the accuracy of boundary prediction.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121711632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Message from Program Chair 项目主席致辞

2023 7th International Conference on Machine Vision and Information Technology (CMVIT) Pub Date : 2023-03-01 DOI: 10.1109/cmvit57620.2023.00006

引用次数: 0