Volkmar Frinken, Yutaro Iwakiri, R. Ishida, Kensho Fujisaki, S. Uchida
{"title":"基于文本数据的视角场景识别改进","authors":"Volkmar Frinken, Yutaro Iwakiri, R. Ishida, Kensho Fujisaki, S. Uchida","doi":"10.1109/ICPR.2014.512","DOIUrl":null,"url":null,"abstract":"At the current rate of technological advancement and social acceptance thereof, it will not be long before wearable devices will be common that constantly record the field of view of the user. We introduce a new database of image sequences, taken with a first person view camera, of realistic, everyday scenes. As a distinguishing feature, we manually transcribed the scene text of each image. This way, sophisticated OCR algorithms can be simulated that can help in the recognition of the location and the activity. To test this hypothesis, we performed a set of experiments using visual features, textual features, and a combination of both. We demonstrate that, although not very powerful when considered alone, the textual information improves the overall recognition rates.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Improving Point of View Scene Recognition by Considering Textual Data\",\"authors\":\"Volkmar Frinken, Yutaro Iwakiri, R. Ishida, Kensho Fujisaki, S. Uchida\",\"doi\":\"10.1109/ICPR.2014.512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At the current rate of technological advancement and social acceptance thereof, it will not be long before wearable devices will be common that constantly record the field of view of the user. We introduce a new database of image sequences, taken with a first person view camera, of realistic, everyday scenes. As a distinguishing feature, we manually transcribed the scene text of each image. This way, sophisticated OCR algorithms can be simulated that can help in the recognition of the location and the activity. To test this hypothesis, we performed a set of experiments using visual features, textual features, and a combination of both. We demonstrate that, although not very powerful when considered alone, the textual information improves the overall recognition rates.\",\"PeriodicalId\":142159,\"journal\":{\"name\":\"2014 22nd International Conference on Pattern Recognition\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 22nd International Conference on Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPR.2014.512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 22nd International Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR.2014.512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Point of View Scene Recognition by Considering Textual Data
At the current rate of technological advancement and social acceptance thereof, it will not be long before wearable devices will be common that constantly record the field of view of the user. We introduce a new database of image sequences, taken with a first person view camera, of realistic, everyday scenes. As a distinguishing feature, we manually transcribed the scene text of each image. This way, sophisticated OCR algorithms can be simulated that can help in the recognition of the location and the activity. To test this hypothesis, we performed a set of experiments using visual features, textual features, and a combination of both. We demonstrate that, although not very powerful when considered alone, the textual information improves the overall recognition rates.