跨境产品检测中的标签信息识别方法与算法

Proceedings of the 4th International Conference on Crowd Science and Engineering Pub Date : 2019-10-18 DOI:10.1145/3371238.3371248

Dunsheng Chen, Yinsheng Li, X. Liang

{"title":"跨境产品检测中的标签信息识别方法与算法","authors":"Dunsheng Chen, Yinsheng Li, X. Liang","doi":"10.1145/3371238.3371248","DOIUrl":null,"url":null,"abstract":"The images with fixed layouts, such as images from ID cards, driving licenses, and invoices can be recognized from prior knowledge[1]-[7]. However, The non-immobilized images, such as product labels at ports, is very difficult to be extracted structured data information from tag images because the formats and contents of tags in different countries and different product vary widely[8]. The process is complex and the error rate is high. This paper combines the characteristics of the Cross-Border Products label, overall format complex and simple local structure (top-to-down and left-to-right), and proposes a method for identifying and structuring port commodity label information. The method mainly establishes a template library of keyword and data unit information of commodity labels according to the port commodity classification and then separates the keyword and the data information from the multi-line text with accurate location information recognized by the OCR engine. Finally, the keyword and data are structured according to the local layout pattern between the keyword and the data, and the structured Cross-Border product information is obtained.","PeriodicalId":241191,"journal":{"name":"Proceedings of the 4th International Conference on Crowd Science and Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tag Information Recognition Approaches and Algorithms for Cross-Border Products Checking\",\"authors\":\"Dunsheng Chen, Yinsheng Li, X. Liang\",\"doi\":\"10.1145/3371238.3371248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The images with fixed layouts, such as images from ID cards, driving licenses, and invoices can be recognized from prior knowledge[1]-[7]. However, The non-immobilized images, such as product labels at ports, is very difficult to be extracted structured data information from tag images because the formats and contents of tags in different countries and different product vary widely[8]. The process is complex and the error rate is high. This paper combines the characteristics of the Cross-Border Products label, overall format complex and simple local structure (top-to-down and left-to-right), and proposes a method for identifying and structuring port commodity label information. The method mainly establishes a template library of keyword and data unit information of commodity labels according to the port commodity classification and then separates the keyword and the data information from the multi-line text with accurate location information recognized by the OCR engine. Finally, the keyword and data are structured according to the local layout pattern between the keyword and the data, and the structured Cross-Border product information is obtained.\",\"PeriodicalId\":241191,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Crowd Science and Engineering\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Crowd Science and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3371238.3371248\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Crowd Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3371238.3371248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

对于固定布局的图像，如身份证、驾照、发票等图像，可以通过先验知识进行识别[1]-[7]。然而，对于非固定化图像，如港口的产品标签，由于不同国家和不同产品的标签格式和内容差异很大，很难从标签图像中提取结构化数据信息[8]。过程复杂，错误率高。本文结合跨境商品标签整体格式复杂、局部结构简单(从上到下、从左到右)的特点，提出了一种口岸商品标签信息的识别和结构化方法。该方法主要是根据港口商品分类建立商品标签关键字和数据单元信息模板库，然后将关键字和数据信息从OCR引擎识别的具有准确位置信息的多行文本中分离出来。最后，根据关键词与数据之间的局部布局模式对关键词与数据进行结构化，得到结构化的跨境产品信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Tag Information Recognition Approaches and Algorithms for Cross-Border Products Checking

The images with fixed layouts, such as images from ID cards, driving licenses, and invoices can be recognized from prior knowledge[1]-[7]. However, The non-immobilized images, such as product labels at ports, is very difficult to be extracted structured data information from tag images because the formats and contents of tags in different countries and different product vary widely[8]. The process is complex and the error rate is high. This paper combines the characteristics of the Cross-Border Products label, overall format complex and simple local structure (top-to-down and left-to-right), and proposes a method for identifying and structuring port commodity label information. The method mainly establishes a template library of keyword and data unit information of commodity labels according to the port commodity classification and then separates the keyword and the data information from the multi-line text with accurate location information recognized by the OCR engine. Finally, the keyword and data are structured according to the local layout pattern between the keyword and the data, and the structured Cross-Border product information is obtained.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 4th International Conference on Crowd Science and Engineering

自引率

0.00%

发文量