{"title":"罗马尼亚语不同语料库常用词研究","authors":"A. Mitrea, A. Vlad, Octavian Hodea, R. Dragomir","doi":"10.1109/ICCOMM.2014.6866729","DOIUrl":null,"url":null,"abstract":"The experimental analysis focused on a corpus of literary works - novels and short stories (107 books) - comprising about 12.7 million words, obtained by bringing together two different collections of writings of similar length. The main objective was to identify the common words occurring in each of the books constituting the corpus as a whole, in each of the sub-corpora organized per author and also in each sub-collection of texts.","PeriodicalId":366043,"journal":{"name":"2014 10th International Conference on Communications (COMM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A study on the common words found in different literary Romanian corpora\",\"authors\":\"A. Mitrea, A. Vlad, Octavian Hodea, R. Dragomir\",\"doi\":\"10.1109/ICCOMM.2014.6866729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The experimental analysis focused on a corpus of literary works - novels and short stories (107 books) - comprising about 12.7 million words, obtained by bringing together two different collections of writings of similar length. The main objective was to identify the common words occurring in each of the books constituting the corpus as a whole, in each of the sub-corpora organized per author and also in each sub-collection of texts.\",\"PeriodicalId\":366043,\"journal\":{\"name\":\"2014 10th International Conference on Communications (COMM)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 10th International Conference on Communications (COMM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCOMM.2014.6866729\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 10th International Conference on Communications (COMM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCOMM.2014.6866729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A study on the common words found in different literary Romanian corpora
The experimental analysis focused on a corpus of literary works - novels and short stories (107 books) - comprising about 12.7 million words, obtained by bringing together two different collections of writings of similar length. The main objective was to identify the common words occurring in each of the books constituting the corpus as a whole, in each of the sub-corpora organized per author and also in each sub-collection of texts.