{"title":"图书馆的未来:整合胡椒和计算机视觉的智能辅助","authors":"Claire Trinquet, Deepti Mishra , Akshara Pande","doi":"10.1016/j.array.2025.100469","DOIUrl":null,"url":null,"abstract":"<div><div>In recent decades, the utilization of social robots in our daily lives has increased, but they are different from robots designed for libraries. On the one hand, library robots cannot establish social interactions, while social robots lack the necessary sensors to identify books. The present study aims to integrate the social robot Pepper's camera with computer vision techniques to enable Pepper to read the titles of books in front of it. This involves two main steps. The first step is to detect objects, i.e. books, from the scene. Thereafter, the titles of the books need to be read from the previous step. To achieve the first objective, two object detection models, YOLOv4 and YOLOv9, were employed. To accomplish the second goal, three OCR models —EasyOCR, Pytesseract, and Keras-OCR — were used. The results indicate that with the YOLOv9 model, all books were detected, whereas with the YOLOv4 model, 94 % books were identified. The findings of the present study suggest that when the YOLOv4 model and YOLOv9 were applied, EasyOCR performed well at a distance of 50 cm with a resolution of 3. Although the results of the OCR do not match perfectly with the written text on the books, the error rate is quite low for recognition by humans and computers. Therefore, there is a need to employ more advanced object detection and OCR technologies in future work.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100469"},"PeriodicalIF":4.5000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The future of libraries: Integrating pepper and computer vision for smart assistance\",\"authors\":\"Claire Trinquet, Deepti Mishra , Akshara Pande\",\"doi\":\"10.1016/j.array.2025.100469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent decades, the utilization of social robots in our daily lives has increased, but they are different from robots designed for libraries. On the one hand, library robots cannot establish social interactions, while social robots lack the necessary sensors to identify books. The present study aims to integrate the social robot Pepper's camera with computer vision techniques to enable Pepper to read the titles of books in front of it. This involves two main steps. The first step is to detect objects, i.e. books, from the scene. Thereafter, the titles of the books need to be read from the previous step. To achieve the first objective, two object detection models, YOLOv4 and YOLOv9, were employed. To accomplish the second goal, three OCR models —EasyOCR, Pytesseract, and Keras-OCR — were used. The results indicate that with the YOLOv9 model, all books were detected, whereas with the YOLOv4 model, 94 % books were identified. The findings of the present study suggest that when the YOLOv4 model and YOLOv9 were applied, EasyOCR performed well at a distance of 50 cm with a resolution of 3. Although the results of the OCR do not match perfectly with the written text on the books, the error rate is quite low for recognition by humans and computers. Therefore, there is a need to employ more advanced object detection and OCR technologies in future work.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"27 \",\"pages\":\"Article 100469\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005625000967\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625000967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
The future of libraries: Integrating pepper and computer vision for smart assistance
In recent decades, the utilization of social robots in our daily lives has increased, but they are different from robots designed for libraries. On the one hand, library robots cannot establish social interactions, while social robots lack the necessary sensors to identify books. The present study aims to integrate the social robot Pepper's camera with computer vision techniques to enable Pepper to read the titles of books in front of it. This involves two main steps. The first step is to detect objects, i.e. books, from the scene. Thereafter, the titles of the books need to be read from the previous step. To achieve the first objective, two object detection models, YOLOv4 and YOLOv9, were employed. To accomplish the second goal, three OCR models —EasyOCR, Pytesseract, and Keras-OCR — were used. The results indicate that with the YOLOv9 model, all books were detected, whereas with the YOLOv4 model, 94 % books were identified. The findings of the present study suggest that when the YOLOv4 model and YOLOv9 were applied, EasyOCR performed well at a distance of 50 cm with a resolution of 3. Although the results of the OCR do not match perfectly with the written text on the books, the error rate is quite low for recognition by humans and computers. Therefore, there is a need to employ more advanced object detection and OCR technologies in future work.