{"title":"汉语普通话后缀生产力的测量与比较","authors":"Eiji Nishimoto","doi":"10.30019/IJCLCLP.200302.0003","DOIUrl":null,"url":null,"abstract":"The present study attempts to measure and compare the morphological productivity of five Mandarin Chinese suffixes: the verbal suffix -hua, the plural suffix -men, and the nominal suffixes -r, -zi, and -tou. These suffixes are predicted to differ in their degree of productivity : -hua and -men appear to be productive, being able to systematically form a word with a variety of base words, whereas -zi and -tou (and perhaps also -r) may be limited in productivity. Baayen [1989, 1992] proposes the use of corpus data in measuring productivity in word formation. Based on word-token frequencies in a large corpus of texts, his token-based measure of productivity expresses productivity as the probability that a new word form of an affix will be encountered in a corpus. We first use the token-based measure to examine the productivity of the Mandarin suffixes. The present study, then, proposes a type-based measure of productivity that employs the deleted estimation method [Jelinek & Mercer, 1985] in defining unseen words of a corpus and expresses productivity by the ratio of unseen word types to all word types. The proposed type-based measure yields the productivity ranking “-men, -hua, -r, -zi, -tou,” where -men is the most productive and -tou is the least productive. The effects of corpus-data variability on a productivity measure are also examined. The proposed measure is found to obtain a consistent productivity ranking despite variability in corpus data.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Measuring and Comparing the Productivity of Mandarin Chinese Suffixes\",\"authors\":\"Eiji Nishimoto\",\"doi\":\"10.30019/IJCLCLP.200302.0003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present study attempts to measure and compare the morphological productivity of five Mandarin Chinese suffixes: the verbal suffix -hua, the plural suffix -men, and the nominal suffixes -r, -zi, and -tou. These suffixes are predicted to differ in their degree of productivity : -hua and -men appear to be productive, being able to systematically form a word with a variety of base words, whereas -zi and -tou (and perhaps also -r) may be limited in productivity. Baayen [1989, 1992] proposes the use of corpus data in measuring productivity in word formation. Based on word-token frequencies in a large corpus of texts, his token-based measure of productivity expresses productivity as the probability that a new word form of an affix will be encountered in a corpus. We first use the token-based measure to examine the productivity of the Mandarin suffixes. The present study, then, proposes a type-based measure of productivity that employs the deleted estimation method [Jelinek & Mercer, 1985] in defining unseen words of a corpus and expresses productivity by the ratio of unseen word types to all word types. The proposed type-based measure yields the productivity ranking “-men, -hua, -r, -zi, -tou,” where -men is the most productive and -tou is the least productive. The effects of corpus-data variability on a productivity measure are also examined. The proposed measure is found to obtain a consistent productivity ranking despite variability in corpus data.\",\"PeriodicalId\":436300,\"journal\":{\"name\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30019/IJCLCLP.200302.0003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Linguistics Chin. Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30019/IJCLCLP.200302.0003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Measuring and Comparing the Productivity of Mandarin Chinese Suffixes
The present study attempts to measure and compare the morphological productivity of five Mandarin Chinese suffixes: the verbal suffix -hua, the plural suffix -men, and the nominal suffixes -r, -zi, and -tou. These suffixes are predicted to differ in their degree of productivity : -hua and -men appear to be productive, being able to systematically form a word with a variety of base words, whereas -zi and -tou (and perhaps also -r) may be limited in productivity. Baayen [1989, 1992] proposes the use of corpus data in measuring productivity in word formation. Based on word-token frequencies in a large corpus of texts, his token-based measure of productivity expresses productivity as the probability that a new word form of an affix will be encountered in a corpus. We first use the token-based measure to examine the productivity of the Mandarin suffixes. The present study, then, proposes a type-based measure of productivity that employs the deleted estimation method [Jelinek & Mercer, 1985] in defining unseen words of a corpus and expresses productivity by the ratio of unseen word types to all word types. The proposed type-based measure yields the productivity ranking “-men, -hua, -r, -zi, -tou,” where -men is the most productive and -tou is the least productive. The effects of corpus-data variability on a productivity measure are also examined. The proposed measure is found to obtain a consistent productivity ranking despite variability in corpus data.