{"title":"MIKA:用于现代标准阿拉伯语和口语情感分析的标记语料库","authors":"Hossam S. Ibrahim, Sherif M. Abdou, M. Gheith","doi":"10.1109/ReTIS.2015.7232904","DOIUrl":null,"url":null,"abstract":"Sentiment analysis (SA) and opinion mining (OM) becomes a field of interest that fueled the attention of research during the last decade, due to the rise of the amount of internet documents (especially online reviews and comments) on the social media such as blogs and social networks. Many attempts have been conducted to build a corpus for SA, due to the consideration of importance of building such resource as a key factor in SA and OM systems. But the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present MIKA a multi-genre tagged corpus of modern standard Arabic (MSA) and colloquial. MIKA is manually collected and annotated at sentence level with semantic orientation (positive or negative or neutral). A number of rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases and others are used for the annotation process. Our data focus on MSA and Egyptian dialectal Arabic. We report the efforts of manually building and annotating our sentiment corpus using different types of data, such as tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).","PeriodicalId":161306,"journal":{"name":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":"{\"title\":\"MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis\",\"authors\":\"Hossam S. Ibrahim, Sherif M. Abdou, M. Gheith\",\"doi\":\"10.1109/ReTIS.2015.7232904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis (SA) and opinion mining (OM) becomes a field of interest that fueled the attention of research during the last decade, due to the rise of the amount of internet documents (especially online reviews and comments) on the social media such as blogs and social networks. Many attempts have been conducted to build a corpus for SA, due to the consideration of importance of building such resource as a key factor in SA and OM systems. But the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present MIKA a multi-genre tagged corpus of modern standard Arabic (MSA) and colloquial. MIKA is manually collected and annotated at sentence level with semantic orientation (positive or negative or neutral). A number of rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases and others are used for the annotation process. Our data focus on MSA and Egyptian dialectal Arabic. We report the efforts of manually building and annotating our sentiment corpus using different types of data, such as tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).\",\"PeriodicalId\":161306,\"journal\":{\"name\":\"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"38\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReTIS.2015.7232904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReTIS.2015.7232904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis
Sentiment analysis (SA) and opinion mining (OM) becomes a field of interest that fueled the attention of research during the last decade, due to the rise of the amount of internet documents (especially online reviews and comments) on the social media such as blogs and social networks. Many attempts have been conducted to build a corpus for SA, due to the consideration of importance of building such resource as a key factor in SA and OM systems. But the need of building these resources is still ongoing, especially for morphologically-Rich language (MRL) such as Arabic. In this paper, we present MIKA a multi-genre tagged corpus of modern standard Arabic (MSA) and colloquial. MIKA is manually collected and annotated at sentence level with semantic orientation (positive or negative or neutral). A number of rich set of linguistically motivated features (contextual Intensifiers, contextual Shifter and negation handling), syntactic features for conflicting phrases and others are used for the annotation process. Our data focus on MSA and Egyptian dialectal Arabic. We report the efforts of manually building and annotating our sentiment corpus using different types of data, such as tweets and Arabic microblogs (hotel reservation, product reviews, and TV program comments).