{"title":"走向更自然的人工语言","authors":"Mark Hopkins","doi":"10.18653/v1/2022.conll-1.7","DOIUrl":null,"url":null,"abstract":"A number of papers have recently argued in favor of using artificially generated languages to investigate the inductive biases of linguistic models, or to develop models for low-resource languages with underrepresented typologies. But the promise of artificial languages comes with a caveat: if these artificial languages are not sufficiently reflective of natural language, then using them as a proxy may lead to inaccurate conclusions. In this paper, we take a step towards increasing the realism of artificial language by introducing a variant of indexed grammars that draw their weights from hierarchical Pitman-Yor processes. We show that this framework generates languages that emulate the statistics of natural language corpora better than the current approach of directly formulating weighted context-free grammars.","PeriodicalId":221345,"journal":{"name":"Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)","volume":"85 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards More Natural Artificial Languages\",\"authors\":\"Mark Hopkins\",\"doi\":\"10.18653/v1/2022.conll-1.7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A number of papers have recently argued in favor of using artificially generated languages to investigate the inductive biases of linguistic models, or to develop models for low-resource languages with underrepresented typologies. But the promise of artificial languages comes with a caveat: if these artificial languages are not sufficiently reflective of natural language, then using them as a proxy may lead to inaccurate conclusions. In this paper, we take a step towards increasing the realism of artificial language by introducing a variant of indexed grammars that draw their weights from hierarchical Pitman-Yor processes. We show that this framework generates languages that emulate the statistics of natural language corpora better than the current approach of directly formulating weighted context-free grammars.\",\"PeriodicalId\":221345,\"journal\":{\"name\":\"Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)\",\"volume\":\"85 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.conll-1.7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.conll-1.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A number of papers have recently argued in favor of using artificially generated languages to investigate the inductive biases of linguistic models, or to develop models for low-resource languages with underrepresented typologies. But the promise of artificial languages comes with a caveat: if these artificial languages are not sufficiently reflective of natural language, then using them as a proxy may lead to inaccurate conclusions. In this paper, we take a step towards increasing the realism of artificial language by introducing a variant of indexed grammars that draw their weights from hierarchical Pitman-Yor processes. We show that this framework generates languages that emulate the statistics of natural language corpora better than the current approach of directly formulating weighted context-free grammars.