{"title":"中文口语语料库:国语、客家话、闽南话","authors":"Kawai Chui, Huei-ling Lai","doi":"10.6519/TJL.2008.6(2).5","DOIUrl":null,"url":null,"abstract":"In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialects undergoing linguistic changes, but the population of Southern Min and Hakka is also diminishing. The NCCU Corpus of Spoken Chinese is thus a project of language documentation whereby open online access to Mandarin, Hakka, and Southern Min data is provided for non-profit-making research. As a language documentation project, the NCCU spoken corpus focuses on collecting and archiving spoken forms of various types. It consists of three sub-corpora, namely the Corpus of Spoken Mandarin, the Corpus of Spoken Hakka, and the Corpus of Spoken Southern Min. The three corpora share a common scheme for the collection of spoken data, mostly in the form of spontaneous face-to-face conversations. The infrastructure of the corpus is designed in a simple yet user-friendly way, so that data can be processed efficiently in the database, and users can browse the spoken data directly from the web. We hope that our work can encourage more people to engage in building up spoken corpora from different perspectives and for different purposes.","PeriodicalId":41000,"journal":{"name":"Taiwan Journal of Linguistics","volume":"6 1","pages":"119-144"},"PeriodicalIF":0.3000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"The NCCU Corpus of Spoken Chinese: Mandarin, Hakka, and Southern Min\",\"authors\":\"Kawai Chui, Huei-ling Lai\",\"doi\":\"10.6519/TJL.2008.6(2).5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialects undergoing linguistic changes, but the population of Southern Min and Hakka is also diminishing. The NCCU Corpus of Spoken Chinese is thus a project of language documentation whereby open online access to Mandarin, Hakka, and Southern Min data is provided for non-profit-making research. As a language documentation project, the NCCU spoken corpus focuses on collecting and archiving spoken forms of various types. It consists of three sub-corpora, namely the Corpus of Spoken Mandarin, the Corpus of Spoken Hakka, and the Corpus of Spoken Southern Min. The three corpora share a common scheme for the collection of spoken data, mostly in the form of spontaneous face-to-face conversations. The infrastructure of the corpus is designed in a simple yet user-friendly way, so that data can be processed efficiently in the database, and users can browse the spoken data directly from the web. We hope that our work can encourage more people to engage in building up spoken corpora from different perspectives and for different purposes.\",\"PeriodicalId\":41000,\"journal\":{\"name\":\"Taiwan Journal of Linguistics\",\"volume\":\"6 1\",\"pages\":\"119-144\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2008-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Taiwan Journal of Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.6519/TJL.2008.6(2).5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Taiwan Journal of Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6519/TJL.2008.6(2).5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
The NCCU Corpus of Spoken Chinese: Mandarin, Hakka, and Southern Min
In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialects undergoing linguistic changes, but the population of Southern Min and Hakka is also diminishing. The NCCU Corpus of Spoken Chinese is thus a project of language documentation whereby open online access to Mandarin, Hakka, and Southern Min data is provided for non-profit-making research. As a language documentation project, the NCCU spoken corpus focuses on collecting and archiving spoken forms of various types. It consists of three sub-corpora, namely the Corpus of Spoken Mandarin, the Corpus of Spoken Hakka, and the Corpus of Spoken Southern Min. The three corpora share a common scheme for the collection of spoken data, mostly in the form of spontaneous face-to-face conversations. The infrastructure of the corpus is designed in a simple yet user-friendly way, so that data can be processed efficiently in the database, and users can browse the spoken data directly from the web. We hope that our work can encourage more people to engage in building up spoken corpora from different perspectives and for different purposes.
期刊介绍:
Taiwan Journal of Linguistics is an international journal dedicated to the publication of research papers in linguistics and welcomes contributions in all areas of the scientific study of language. Contributions may be submitted from all countries and are accepted all year round. The language of publication is English. There are no restrictions on regular submission; however, manuscripts simultaneously submitted to other publications cannot be accepted. TJL adheres to a strict standard of double-blind reviews to minimize biases that might be caused by knowledge of the author’s gender, culture, or standing within the professional community. Once a manuscript is determined as potentially suitable for the journal after an initial screening by the editor, all information that may identify the author is removed, and copies are sent to at least two qualified reviewers. The selection of reviewers is based purely on professional considerations and their identity will be kept strictly confidential by TJL. All feedback from the reviewers, except such comments as may be specifically referred to the attention of the editor, is faithfully relayed to the authors to assist them in improving their work, regardless of whether the paper is to be accepted, accepted upon minor revision, revised and resubmitted, or rejected.