Varad Patwardhan, Gauri Takawane, Nirmayi Kelkar, Omkar Gaikwad, Rutwik Saraf, S. Sonawane
{"title":"使用机器学习技术分析马拉语-英语代码混合的社交媒体数据的情绪","authors":"Varad Patwardhan, Gauri Takawane, Nirmayi Kelkar, Omkar Gaikwad, Rutwik Saraf, S. Sonawane","doi":"10.1109/ESCI56872.2023.10100304","DOIUrl":null,"url":null,"abstract":"A vast amount of data is generated every day through social media platforms. Various techniques and methodologies are used to bring different forms of data to use. One such form of data is textual data generated from social media platforms in the form of chats, comments, and tweets. The term “code-mixed data” describes data that combines components of different languages or linguistic subgroups such as text written in several different languages or speech that shifts between languages. Due to increased social media use and worldwide communication, many individuals are using multiple languages in their daily communication, making this type of data even more crucial. Machine translation, speech recognition, and text categorization are just a few examples of natural language processing activities that can be performed on code-mixed data. Research on code-mixed data can also aid in the understanding of multilingual communication. In this paper, we present an empirical study on the problem of word-level language identification and text normalisation for Marathi-English code-mixed text. We have created a new dataset of 1009 sentences that exhibit code-mixing of Marathi (Romanised) and English textual data. This data was collected from Whatsapp chats and Youtube comments.","PeriodicalId":441215,"journal":{"name":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysing The Sentiments Of Marathi-English Code-Mixed Social Media Data Using Machine Learning Techniques\",\"authors\":\"Varad Patwardhan, Gauri Takawane, Nirmayi Kelkar, Omkar Gaikwad, Rutwik Saraf, S. Sonawane\",\"doi\":\"10.1109/ESCI56872.2023.10100304\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A vast amount of data is generated every day through social media platforms. Various techniques and methodologies are used to bring different forms of data to use. One such form of data is textual data generated from social media platforms in the form of chats, comments, and tweets. The term “code-mixed data” describes data that combines components of different languages or linguistic subgroups such as text written in several different languages or speech that shifts between languages. Due to increased social media use and worldwide communication, many individuals are using multiple languages in their daily communication, making this type of data even more crucial. Machine translation, speech recognition, and text categorization are just a few examples of natural language processing activities that can be performed on code-mixed data. Research on code-mixed data can also aid in the understanding of multilingual communication. In this paper, we present an empirical study on the problem of word-level language identification and text normalisation for Marathi-English code-mixed text. We have created a new dataset of 1009 sentences that exhibit code-mixing of Marathi (Romanised) and English textual data. This data was collected from Whatsapp chats and Youtube comments.\",\"PeriodicalId\":441215,\"journal\":{\"name\":\"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESCI56872.2023.10100304\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESCI56872.2023.10100304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysing The Sentiments Of Marathi-English Code-Mixed Social Media Data Using Machine Learning Techniques
A vast amount of data is generated every day through social media platforms. Various techniques and methodologies are used to bring different forms of data to use. One such form of data is textual data generated from social media platforms in the form of chats, comments, and tweets. The term “code-mixed data” describes data that combines components of different languages or linguistic subgroups such as text written in several different languages or speech that shifts between languages. Due to increased social media use and worldwide communication, many individuals are using multiple languages in their daily communication, making this type of data even more crucial. Machine translation, speech recognition, and text categorization are just a few examples of natural language processing activities that can be performed on code-mixed data. Research on code-mixed data can also aid in the understanding of multilingual communication. In this paper, we present an empirical study on the problem of word-level language identification and text normalisation for Marathi-English code-mixed text. We have created a new dataset of 1009 sentences that exhibit code-mixing of Marathi (Romanised) and English textual data. This data was collected from Whatsapp chats and Youtube comments.