{"title":"一个高效的场景文本检测神经网络","authors":"Yifan Su","doi":"10.1145/3522749.3523074","DOIUrl":null,"url":null,"abstract":"Abstract: We introduce a new type of text detection neural network, which can accurately locate the position of the text in a variety of complex environments and give the best rectangle containing them. It is composed of three parts, the first part is the backbone composed of residual network, which is responsible for refining the feature map. the second part is the sequence module composed of transformer, which processes the feature map as a linear behavioral unit, so as to deeply mine the context of characters in the image, and the last part is the multi-scale detection module, which is based on different sizes of feature maps The best target box is detected as the result. The residual backbone ensures that there will be no gradient explosion in the process of back propagation.as information between grid cells in the same line is consistent, the transformer module pay more attention to the text line. The detection module uses multiple anchors in the vertical direction at the same time, which achieves good results in speed and accuracy. Based on the data set icdar2015, which is commonly used in the field of text detection, we do experiments and achieve a f score of 0.7.","PeriodicalId":361473,"journal":{"name":"Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An efficient scene text detection neural network\",\"authors\":\"Yifan Su\",\"doi\":\"10.1145/3522749.3523074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract: We introduce a new type of text detection neural network, which can accurately locate the position of the text in a variety of complex environments and give the best rectangle containing them. It is composed of three parts, the first part is the backbone composed of residual network, which is responsible for refining the feature map. the second part is the sequence module composed of transformer, which processes the feature map as a linear behavioral unit, so as to deeply mine the context of characters in the image, and the last part is the multi-scale detection module, which is based on different sizes of feature maps The best target box is detected as the result. The residual backbone ensures that there will be no gradient explosion in the process of back propagation.as information between grid cells in the same line is consistent, the transformer module pay more attention to the text line. The detection module uses multiple anchors in the vertical direction at the same time, which achieves good results in speed and accuracy. Based on the data set icdar2015, which is commonly used in the field of text detection, we do experiments and achieve a f score of 0.7.\",\"PeriodicalId\":361473,\"journal\":{\"name\":\"Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3522749.3523074\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3522749.3523074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Abstract: We introduce a new type of text detection neural network, which can accurately locate the position of the text in a variety of complex environments and give the best rectangle containing them. It is composed of three parts, the first part is the backbone composed of residual network, which is responsible for refining the feature map. the second part is the sequence module composed of transformer, which processes the feature map as a linear behavioral unit, so as to deeply mine the context of characters in the image, and the last part is the multi-scale detection module, which is based on different sizes of feature maps The best target box is detected as the result. The residual backbone ensures that there will be no gradient explosion in the process of back propagation.as information between grid cells in the same line is consistent, the transformer module pay more attention to the text line. The detection module uses multiple anchors in the vertical direction at the same time, which achieves good results in speed and accuracy. Based on the data set icdar2015, which is commonly used in the field of text detection, we do experiments and achieve a f score of 0.7.