{"title":"面向知识交流的濒危语言计算机口语语料库构建研究","authors":"Zihui Xiao, Junjun Fan, Wei-Nan Gao","doi":"10.1145/3568739.3568801","DOIUrl":null,"url":null,"abstract":"This Corpus/corpora is a computer database that stores language materials. Corpus in the world is dominated by lingua franca corpus. However, these corpora have limited samples and are difficult to meet the needs of machine artificial intelligence. The paper proposes to build a Knowledge-Communication-Oriented spoken corpus for Endangered Languages to enrich the content of corpus research. A Knowledge-Communication-Oriented Spoken Corpus was divided into three sub-corporas, the words sub-corpu, the sentences sub-corpus, and the narrative discourses sub-corpus. This paper mainly introduces the methods of constructing the spoken corpus of endangered languages from the aspects of corpus collection, corpus arrangement and corpus annotation.","PeriodicalId":200698,"journal":{"name":"Proceedings of the 6th International Conference on Digital Technology in Education","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Construction of Knowledge-Communication-Oriented Computer Spoken Corpus for Endangered Languages\",\"authors\":\"Zihui Xiao, Junjun Fan, Wei-Nan Gao\",\"doi\":\"10.1145/3568739.3568801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This Corpus/corpora is a computer database that stores language materials. Corpus in the world is dominated by lingua franca corpus. However, these corpora have limited samples and are difficult to meet the needs of machine artificial intelligence. The paper proposes to build a Knowledge-Communication-Oriented spoken corpus for Endangered Languages to enrich the content of corpus research. A Knowledge-Communication-Oriented Spoken Corpus was divided into three sub-corporas, the words sub-corpu, the sentences sub-corpus, and the narrative discourses sub-corpus. This paper mainly introduces the methods of constructing the spoken corpus of endangered languages from the aspects of corpus collection, corpus arrangement and corpus annotation.\",\"PeriodicalId\":200698,\"journal\":{\"name\":\"Proceedings of the 6th International Conference on Digital Technology in Education\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Conference on Digital Technology in Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3568739.3568801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Digital Technology in Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3568739.3568801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the Construction of Knowledge-Communication-Oriented Computer Spoken Corpus for Endangered Languages
This Corpus/corpora is a computer database that stores language materials. Corpus in the world is dominated by lingua franca corpus. However, these corpora have limited samples and are difficult to meet the needs of machine artificial intelligence. The paper proposes to build a Knowledge-Communication-Oriented spoken corpus for Endangered Languages to enrich the content of corpus research. A Knowledge-Communication-Oriented Spoken Corpus was divided into three sub-corporas, the words sub-corpu, the sentences sub-corpus, and the narrative discourses sub-corpus. This paper mainly introduces the methods of constructing the spoken corpus of endangered languages from the aspects of corpus collection, corpus arrangement and corpus annotation.