{"title":"YouTube评论宇宙:一种机器学习方法,用于在自定义准备数据集上对YouTube评论进行系统分类","authors":"Sankalp Naik, Ashay Katre","doi":"10.1109/WCONF58270.2023.10235049","DOIUrl":null,"url":null,"abstract":"At present, YouTube can be regarded as a cloud service owing to the amount of data it adds every second and the enormous data it stores in its data farms. It doesn’t delete old content, it uses redundant storage. The platform can be more sustainable and cost efficient, if they were to discard redundancies of which major portion is constituted by the spam comments or comments that are offensive/abusive. In this paper several machine learning models are used in order to reduce those comments and eventually towards a more efficient storage model. We first address the task of dataset preparation by designing a comprehensive annotation scheme, considering various dimensions such as sentiment, topic, toxicity, and engagement. Leveraging this annotated dataset, we develop a robust machine learning framework that combines state-of-the-art natural language processing techniques with advanced classification algorithms. Our methodology involves several stages, including preprocessing, feature extraction, and model training. We also employ techniques like sentiment analysis and toxicity detection to capture the sentiment and abusive nature of comments, respectively. We also introduced gravity to the comments which would act as a reward mechanism to the comments. To evaluate the performance of our approach, we conduct extensive experiments on a large-scale YouTube comments dataset. We compare the effectiveness of various classification algorithms, including support vector machines, random forests, and deep learning models, in accurately categorizing comments based on our predefined annotation scheme. Additionally, we assess the generalizability of our model by conducting cross-domain experiments on different genres of YouTube videos. Overall, our work contributes to the understanding and management of the YouTube comment ecosystem, showcasing the power of machine learning techniques in systematically classifying and analyzing comments on this popular platform.","PeriodicalId":202864,"journal":{"name":"2023 World Conference on Communication & Computing (WCONF)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"YouTube Universe of Comments: A Machine Learning approach for systematic classification of YouTube Comments on custom prepared dataset\",\"authors\":\"Sankalp Naik, Ashay Katre\",\"doi\":\"10.1109/WCONF58270.2023.10235049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, YouTube can be regarded as a cloud service owing to the amount of data it adds every second and the enormous data it stores in its data farms. It doesn’t delete old content, it uses redundant storage. The platform can be more sustainable and cost efficient, if they were to discard redundancies of which major portion is constituted by the spam comments or comments that are offensive/abusive. In this paper several machine learning models are used in order to reduce those comments and eventually towards a more efficient storage model. We first address the task of dataset preparation by designing a comprehensive annotation scheme, considering various dimensions such as sentiment, topic, toxicity, and engagement. Leveraging this annotated dataset, we develop a robust machine learning framework that combines state-of-the-art natural language processing techniques with advanced classification algorithms. Our methodology involves several stages, including preprocessing, feature extraction, and model training. We also employ techniques like sentiment analysis and toxicity detection to capture the sentiment and abusive nature of comments, respectively. We also introduced gravity to the comments which would act as a reward mechanism to the comments. To evaluate the performance of our approach, we conduct extensive experiments on a large-scale YouTube comments dataset. We compare the effectiveness of various classification algorithms, including support vector machines, random forests, and deep learning models, in accurately categorizing comments based on our predefined annotation scheme. Additionally, we assess the generalizability of our model by conducting cross-domain experiments on different genres of YouTube videos. Overall, our work contributes to the understanding and management of the YouTube comment ecosystem, showcasing the power of machine learning techniques in systematically classifying and analyzing comments on this popular platform.\",\"PeriodicalId\":202864,\"journal\":{\"name\":\"2023 World Conference on Communication & Computing (WCONF)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 World Conference on Communication & Computing (WCONF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WCONF58270.2023.10235049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 World Conference on Communication & Computing (WCONF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCONF58270.2023.10235049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
YouTube Universe of Comments: A Machine Learning approach for systematic classification of YouTube Comments on custom prepared dataset
At present, YouTube can be regarded as a cloud service owing to the amount of data it adds every second and the enormous data it stores in its data farms. It doesn’t delete old content, it uses redundant storage. The platform can be more sustainable and cost efficient, if they were to discard redundancies of which major portion is constituted by the spam comments or comments that are offensive/abusive. In this paper several machine learning models are used in order to reduce those comments and eventually towards a more efficient storage model. We first address the task of dataset preparation by designing a comprehensive annotation scheme, considering various dimensions such as sentiment, topic, toxicity, and engagement. Leveraging this annotated dataset, we develop a robust machine learning framework that combines state-of-the-art natural language processing techniques with advanced classification algorithms. Our methodology involves several stages, including preprocessing, feature extraction, and model training. We also employ techniques like sentiment analysis and toxicity detection to capture the sentiment and abusive nature of comments, respectively. We also introduced gravity to the comments which would act as a reward mechanism to the comments. To evaluate the performance of our approach, we conduct extensive experiments on a large-scale YouTube comments dataset. We compare the effectiveness of various classification algorithms, including support vector machines, random forests, and deep learning models, in accurately categorizing comments based on our predefined annotation scheme. Additionally, we assess the generalizability of our model by conducting cross-domain experiments on different genres of YouTube videos. Overall, our work contributes to the understanding and management of the YouTube comment ecosystem, showcasing the power of machine learning techniques in systematically classifying and analyzing comments on this popular platform.