{"title":"使用词嵌入平均值和支持向量机的多语种作者分析","authors":"R. Bayot, Teresa Gonçalves","doi":"10.1109/SKIMA.2016.7916251","DOIUrl":null,"url":null,"abstract":"This paper describes an experiment done to investigate author profiling of tweets in English and Spanish, particularly for cross genre evaluation. Profiling consists of age and gender classification. The training sets were taken from tweets while genres for evaluation come from blogs, hotel reviews, other tweets collected in a different time, as well as other social media. Comparisons were done between tfidf as a baseline and average of word vectors, using a Support Vector Machine algorithm. Results show that using average of word vectors outperforms tfidf in most cross genre problems for age and gender.","PeriodicalId":417370,"journal":{"name":"2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Multilingual author profiling using word embedding averages and SVMs\",\"authors\":\"R. Bayot, Teresa Gonçalves\",\"doi\":\"10.1109/SKIMA.2016.7916251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes an experiment done to investigate author profiling of tweets in English and Spanish, particularly for cross genre evaluation. Profiling consists of age and gender classification. The training sets were taken from tweets while genres for evaluation come from blogs, hotel reviews, other tweets collected in a different time, as well as other social media. Comparisons were done between tfidf as a baseline and average of word vectors, using a Support Vector Machine algorithm. Results show that using average of word vectors outperforms tfidf in most cross genre problems for age and gender.\",\"PeriodicalId\":417370,\"journal\":{\"name\":\"2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SKIMA.2016.7916251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKIMA.2016.7916251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multilingual author profiling using word embedding averages and SVMs
This paper describes an experiment done to investigate author profiling of tweets in English and Spanish, particularly for cross genre evaluation. Profiling consists of age and gender classification. The training sets were taken from tweets while genres for evaluation come from blogs, hotel reviews, other tweets collected in a different time, as well as other social media. Comparisons were done between tfidf as a baseline and average of word vectors, using a Support Vector Machine algorithm. Results show that using average of word vectors outperforms tfidf in most cross genre problems for age and gender.