图书馆的未来：整合胡椒和计算机视觉的智能辅助

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2025-07-17 DOI:10.1016/j.array.2025.100469

Claire Trinquet, Deepti Mishra , Akshara Pande

{"title":"图书馆的未来：整合胡椒和计算机视觉的智能辅助","authors":"Claire Trinquet, Deepti Mishra , Akshara Pande","doi":"10.1016/j.array.2025.100469","DOIUrl":null,"url":null,"abstract":"<div><div>In recent decades, the utilization of social robots in our daily lives has increased, but they are different from robots designed for libraries. On the one hand, library robots cannot establish social interactions, while social robots lack the necessary sensors to identify books. The present study aims to integrate the social robot Pepper's camera with computer vision techniques to enable Pepper to read the titles of books in front of it. This involves two main steps. The first step is to detect objects, i.e. books, from the scene. Thereafter, the titles of the books need to be read from the previous step. To achieve the first objective, two object detection models, YOLOv4 and YOLOv9, were employed. To accomplish the second goal, three OCR models —EasyOCR, Pytesseract, and Keras-OCR — were used. The results indicate that with the YOLOv9 model, all books were detected, whereas with the YOLOv4 model, 94 % books were identified. The findings of the present study suggest that when the YOLOv4 model and YOLOv9 were applied, EasyOCR performed well at a distance of 50 cm with a resolution of 3. Although the results of the OCR do not match perfectly with the written text on the books, the error rate is quite low for recognition by humans and computers. Therefore, there is a need to employ more advanced object detection and OCR technologies in future work.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100469"},"PeriodicalIF":4.5000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The future of libraries: Integrating pepper and computer vision for smart assistance\",\"authors\":\"Claire Trinquet, Deepti Mishra , Akshara Pande\",\"doi\":\"10.1016/j.array.2025.100469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent decades, the utilization of social robots in our daily lives has increased, but they are different from robots designed for libraries. On the one hand, library robots cannot establish social interactions, while social robots lack the necessary sensors to identify books. The present study aims to integrate the social robot Pepper's camera with computer vision techniques to enable Pepper to read the titles of books in front of it. This involves two main steps. The first step is to detect objects, i.e. books, from the scene. Thereafter, the titles of the books need to be read from the previous step. To achieve the first objective, two object detection models, YOLOv4 and YOLOv9, were employed. To accomplish the second goal, three OCR models —EasyOCR, Pytesseract, and Keras-OCR — were used. The results indicate that with the YOLOv9 model, all books were detected, whereas with the YOLOv4 model, 94 % books were identified. The findings of the present study suggest that when the YOLOv4 model and YOLOv9 were applied, EasyOCR performed well at a distance of 50 cm with a resolution of 3. Although the results of the OCR do not match perfectly with the written text on the books, the error rate is quite low for recognition by humans and computers. Therefore, there is a need to employ more advanced object detection and OCR technologies in future work.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"27 \",\"pages\":\"Article 100469\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005625000967\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625000967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

近几十年来，社交机器人在我们日常生活中的应用越来越多，但它们不同于为图书馆设计的机器人。一方面，图书馆机器人无法建立社交互动，而社交机器人缺乏必要的传感器来识别书籍。目前的研究旨在将社交机器人Pepper的摄像头与计算机视觉技术相结合，使Pepper能够阅读摆在它面前的书名。这包括两个主要步骤。第一步是从场景中检测物体，即书籍。之后，需要从前一步读取书籍的标题。为了实现第一个目标，我们使用了YOLOv4和YOLOv9两个目标检测模型。为了实现第二个目标，我们使用了三个OCR模型——easyocr、Pytesseract和Keras-OCR。结果表明，使用YOLOv9模型可以检测到所有的图书，而使用YOLOv4模型可以识别出94%的图书。本研究结果表明，当使用YOLOv4模型和YOLOv9模型时，EasyOCR在50 cm的距离上表现良好，分辨率为3。虽然OCR的结果与书本上的文字并不完全匹配，但对于人类和计算机的识别来说，错误率是相当低的。因此，在未来的工作中需要采用更先进的目标检测和OCR技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

The future of libraries: Integrating pepper and computer vision for smart assistance

查看原文本刊更多论文

The future of libraries: Integrating pepper and computer vision for smart assistance

In recent decades, the utilization of social robots in our daily lives has increased, but they are different from robots designed for libraries. On the one hand, library robots cannot establish social interactions, while social robots lack the necessary sensors to identify books. The present study aims to integrate the social robot Pepper's camera with computer vision techniques to enable Pepper to read the titles of books in front of it. This involves two main steps. The first step is to detect objects, i.e. books, from the scene. Thereafter, the titles of the books need to be read from the previous step. To achieve the first objective, two object detection models, YOLOv4 and YOLOv9, were employed. To accomplish the second goal, three OCR models —EasyOCR, Pytesseract, and Keras-OCR — were used. The results indicate that with the YOLOv9 model, all books were detected, whereas with the YOLOv4 model, 94 % books were identified. The findings of the present study suggest that when the YOLOv4 model and YOLOv9 were applied, EasyOCR performed well at a distance of 50 cm with a resolution of 3. Although the results of the OCR do not match perfectly with the written text on the books, the error rate is quite low for recognition by humans and computers. Therefore, there is a need to employ more advanced object detection and OCR technologies in future work.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Array Computer Science-General Computer Science

CiteScore

4.40

自引率

0.00%

发文量

审稿时长

45 days