{"title":"基于分布分析的话语边界和词频在巴西葡萄牙语词性学习中的作用","authors":"Pablo Picasso Feliciano de Faria","doi":"10.18653/v1/W19-2917","DOIUrl":null,"url":null,"abstract":"In this study, we address the problem of part-of-speech (or syntactic category) learning during language acquisition through distributional analysis of utterances. A model based on Redington et al.’s (1998) distributional learner is used to investigate the informativeness of distributional information in Brazilian Portuguese (BP). The data provided to the learner comes from two publicly available corpora of child directed speech. We present preliminary results from two experiments. The first one investigates the effects of different assumptions about utterance boundaries when presenting the input data to the learner. The second experiment compares the learner’s performance when counting contextual words’ frequencies versus just acknowledging their co-occurrence with a given target word. In general, our results indicate that explicit boundaries are more informative, frequencies are important, and that distributional information is useful to the child as a source of categorial information. These results are in accordance with Redington et al.’s findings for English.","PeriodicalId":428409,"journal":{"name":"Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The Role of Utterance Boundaries and Word Frequencies for Part-of-speech Learning in Brazilian Portuguese Through Distributional Analysis\",\"authors\":\"Pablo Picasso Feliciano de Faria\",\"doi\":\"10.18653/v1/W19-2917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we address the problem of part-of-speech (or syntactic category) learning during language acquisition through distributional analysis of utterances. A model based on Redington et al.’s (1998) distributional learner is used to investigate the informativeness of distributional information in Brazilian Portuguese (BP). The data provided to the learner comes from two publicly available corpora of child directed speech. We present preliminary results from two experiments. The first one investigates the effects of different assumptions about utterance boundaries when presenting the input data to the learner. The second experiment compares the learner’s performance when counting contextual words’ frequencies versus just acknowledging their co-occurrence with a given target word. In general, our results indicate that explicit boundaries are more informative, frequencies are important, and that distributional information is useful to the child as a source of categorial information. These results are in accordance with Redington et al.’s findings for English.\",\"PeriodicalId\":428409,\"journal\":{\"name\":\"Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W19-2917\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W19-2917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
在本研究中,我们通过对话语的分布分析来解决语言习得过程中词性(或句法范畴)学习的问题。基于Redington et al.(1998)的分布学习器模型用于研究巴西葡萄牙语(BP)中分布信息的信息量。提供给学习者的数据来自两个公开的儿童定向言语语料库。我们提出了两个实验的初步结果。第一项研究调查了在向学习者呈现输入数据时,对话语边界的不同假设的影响。第二个实验比较了学习者在计算上下文词的频率和仅仅承认它们与给定目标词共现时的表现。总的来说,我们的结果表明,明确的边界更有信息量,频率很重要,而且分布信息作为分类信息的来源对孩子很有用。这些结果与雷丁顿等人对英语的研究结果一致。
The Role of Utterance Boundaries and Word Frequencies for Part-of-speech Learning in Brazilian Portuguese Through Distributional Analysis
In this study, we address the problem of part-of-speech (or syntactic category) learning during language acquisition through distributional analysis of utterances. A model based on Redington et al.’s (1998) distributional learner is used to investigate the informativeness of distributional information in Brazilian Portuguese (BP). The data provided to the learner comes from two publicly available corpora of child directed speech. We present preliminary results from two experiments. The first one investigates the effects of different assumptions about utterance boundaries when presenting the input data to the learner. The second experiment compares the learner’s performance when counting contextual words’ frequencies versus just acknowledging their co-occurrence with a given target word. In general, our results indicate that explicit boundaries are more informative, frequencies are important, and that distributional information is useful to the child as a source of categorial information. These results are in accordance with Redington et al.’s findings for English.