Juan Miguel A. Mendoza, China Marie G. Lao, Antolin J. Alipio, Dan Michael A. Cortez, Anne Camille M. Maupay, Charito M. Molina, C. Centeno, Jonathan C. Morano
{"title":"SemanTV:基于内容的视频检索框架","authors":"Juan Miguel A. Mendoza, China Marie G. Lao, Antolin J. Alipio, Dan Michael A. Cortez, Anne Camille M. Maupay, Charito M. Molina, C. Centeno, Jonathan C. Morano","doi":"10.1145/3533050.3533067","DOIUrl":null,"url":null,"abstract":"With the increased adaption of CCTV for surveillance, challenges in terms of retrieval have recently gained attention. Most Surveillance Video Systems can only retrieve footage based on its metadata, (date, time, camera location, etc.) which limits the diversity of meaningful footage intended to be retrieved by the user. To solve this, a content-based video retrieval framework was proposed to retrieve relevant videos based on their content and match it to the user's query. This framework composes of two (2) methods: A method for Video Content Extraction that utilizes Google's Video Intelligence API for Optical Character Recognition and Label Detection, and a method for Video Retrieval. Various setups for the Video Retrieval method are explored; this includes the usage of SBERT and Okapi BM25. Each setup was tested against various text queries with equivalent test video results based on the MSVD dataset. To measure each setup's performance in terms of relevance, Recall and Precision at K and Median and Mean Rank were used. It was concluded that the framework composed of the Video Intelligence API along with SBERT alone performed better than the other proposed setup for returning videos relevant to the user's text query more accurately than the other setups of the method.","PeriodicalId":109214,"journal":{"name":"Proceedings of the 2022 6th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SemanTV: A Content-Based Video Retrieval Framework\",\"authors\":\"Juan Miguel A. Mendoza, China Marie G. Lao, Antolin J. Alipio, Dan Michael A. Cortez, Anne Camille M. Maupay, Charito M. Molina, C. Centeno, Jonathan C. Morano\",\"doi\":\"10.1145/3533050.3533067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increased adaption of CCTV for surveillance, challenges in terms of retrieval have recently gained attention. Most Surveillance Video Systems can only retrieve footage based on its metadata, (date, time, camera location, etc.) which limits the diversity of meaningful footage intended to be retrieved by the user. To solve this, a content-based video retrieval framework was proposed to retrieve relevant videos based on their content and match it to the user's query. This framework composes of two (2) methods: A method for Video Content Extraction that utilizes Google's Video Intelligence API for Optical Character Recognition and Label Detection, and a method for Video Retrieval. Various setups for the Video Retrieval method are explored; this includes the usage of SBERT and Okapi BM25. Each setup was tested against various text queries with equivalent test video results based on the MSVD dataset. To measure each setup's performance in terms of relevance, Recall and Precision at K and Median and Mean Rank were used. It was concluded that the framework composed of the Video Intelligence API along with SBERT alone performed better than the other proposed setup for returning videos relevant to the user's text query more accurately than the other setups of the method.\",\"PeriodicalId\":109214,\"journal\":{\"name\":\"Proceedings of the 2022 6th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 6th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3533050.3533067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533050.3533067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SemanTV: A Content-Based Video Retrieval Framework
With the increased adaption of CCTV for surveillance, challenges in terms of retrieval have recently gained attention. Most Surveillance Video Systems can only retrieve footage based on its metadata, (date, time, camera location, etc.) which limits the diversity of meaningful footage intended to be retrieved by the user. To solve this, a content-based video retrieval framework was proposed to retrieve relevant videos based on their content and match it to the user's query. This framework composes of two (2) methods: A method for Video Content Extraction that utilizes Google's Video Intelligence API for Optical Character Recognition and Label Detection, and a method for Video Retrieval. Various setups for the Video Retrieval method are explored; this includes the usage of SBERT and Okapi BM25. Each setup was tested against various text queries with equivalent test video results based on the MSVD dataset. To measure each setup's performance in terms of relevance, Recall and Precision at K and Median and Mean Rank were used. It was concluded that the framework composed of the Video Intelligence API along with SBERT alone performed better than the other proposed setup for returning videos relevant to the user's text query more accurately than the other setups of the method.