{"title":"越南语的文本类型变异:叙事与非叙事体裁的语料库挖掘","authors":"Nhu Vo Diep, T. Bui, D. Dinh","doi":"10.1109/NICS51282.2020.9335835","DOIUrl":null,"url":null,"abstract":"In this study, we exploit two Vietnamese corpora: a narrative corpus and a non-narrative corpus. For each of these corpora, there are 24 million words collected from documents of the two genres with publication dates from 2000 to 2020. All of these words are annotated with word boundaries and parts of speech. To examine the use of linguistic features in different genres, we implement statistical analysis for word frequency, parts of speech, linguistic features, and the correlation among these features. The results show that the frequencies of the pronoun “I” and of exclamation words in narrative texts are significantly higher than those in non-narrative texts. Moreover, while adjectives are not correlated with any other features in the narrative genre, they are most likely to co-occur with third-person pronouns in the non-narrative genre.","PeriodicalId":308944,"journal":{"name":"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text-Type Variation in Vietnamese: Corpus Mining for Linguistic Features in Narrative and Non-Narrative Genres\",\"authors\":\"Nhu Vo Diep, T. Bui, D. Dinh\",\"doi\":\"10.1109/NICS51282.2020.9335835\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we exploit two Vietnamese corpora: a narrative corpus and a non-narrative corpus. For each of these corpora, there are 24 million words collected from documents of the two genres with publication dates from 2000 to 2020. All of these words are annotated with word boundaries and parts of speech. To examine the use of linguistic features in different genres, we implement statistical analysis for word frequency, parts of speech, linguistic features, and the correlation among these features. The results show that the frequencies of the pronoun “I” and of exclamation words in narrative texts are significantly higher than those in non-narrative texts. Moreover, while adjectives are not correlated with any other features in the narrative genre, they are most likely to co-occur with third-person pronouns in the non-narrative genre.\",\"PeriodicalId\":308944,\"journal\":{\"name\":\"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"152 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS51282.2020.9335835\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 7th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS51282.2020.9335835","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text-Type Variation in Vietnamese: Corpus Mining for Linguistic Features in Narrative and Non-Narrative Genres
In this study, we exploit two Vietnamese corpora: a narrative corpus and a non-narrative corpus. For each of these corpora, there are 24 million words collected from documents of the two genres with publication dates from 2000 to 2020. All of these words are annotated with word boundaries and parts of speech. To examine the use of linguistic features in different genres, we implement statistical analysis for word frequency, parts of speech, linguistic features, and the correlation among these features. The results show that the frequencies of the pronoun “I” and of exclamation words in narrative texts are significantly higher than those in non-narrative texts. Moreover, while adjectives are not correlated with any other features in the narrative genre, they are most likely to co-occur with third-person pronouns in the non-narrative genre.