{"title":"OVALYTICS:增强攻击性视频检测与YouTube转录和先进的语言模型","authors":"Sneha Chinivar , Roopa M.S. , Arunalatha J.S. , Venugopal K.R.","doi":"10.1016/j.nlp.2025.100147","DOIUrl":null,"url":null,"abstract":"<div><div>The exponential growth of offensive content online underscores the need for robust content moderation. In response, this work presents OVALYTICS (Offensive Video Analysis Leveraging YouTube Transcriptions with Intelligent Classification System), a comprehensive framework that introduces novel integrations of advanced technologies for offensive video detection. Unlike existing approaches, OVALYTICS uniquely combines Whisper AI for accurate audio-to-text transcription with state-of-the-art large language models (LLMs) such as BERT, ALBERT, XLM-R, MPNet, and T5 for semantic analysis. The framework also features a newly curated dataset tailored for fine-grained evaluation, achieving significant improvements in accuracy and F1-scores over traditional methods and advancing the state of automated content moderation.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100147"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OVALYTICS: Enhancing Offensive Video Detection with YouTube Transcriptions and Advanced Language Models\",\"authors\":\"Sneha Chinivar , Roopa M.S. , Arunalatha J.S. , Venugopal K.R.\",\"doi\":\"10.1016/j.nlp.2025.100147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The exponential growth of offensive content online underscores the need for robust content moderation. In response, this work presents OVALYTICS (Offensive Video Analysis Leveraging YouTube Transcriptions with Intelligent Classification System), a comprehensive framework that introduces novel integrations of advanced technologies for offensive video detection. Unlike existing approaches, OVALYTICS uniquely combines Whisper AI for accurate audio-to-text transcription with state-of-the-art large language models (LLMs) such as BERT, ALBERT, XLM-R, MPNet, and T5 for semantic analysis. The framework also features a newly curated dataset tailored for fine-grained evaluation, achieving significant improvements in accuracy and F1-scores over traditional methods and advancing the state of automated content moderation.</div></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"11 \",\"pages\":\"Article 100147\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719125000238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719125000238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OVALYTICS: Enhancing Offensive Video Detection with YouTube Transcriptions and Advanced Language Models
The exponential growth of offensive content online underscores the need for robust content moderation. In response, this work presents OVALYTICS (Offensive Video Analysis Leveraging YouTube Transcriptions with Intelligent Classification System), a comprehensive framework that introduces novel integrations of advanced technologies for offensive video detection. Unlike existing approaches, OVALYTICS uniquely combines Whisper AI for accurate audio-to-text transcription with state-of-the-art large language models (LLMs) such as BERT, ALBERT, XLM-R, MPNet, and T5 for semantic analysis. The framework also features a newly curated dataset tailored for fine-grained evaluation, achieving significant improvements in accuracy and F1-scores over traditional methods and advancing the state of automated content moderation.