Int. J. Comput. Linguistics Chin. Lang. Process.最新文献

筛选
英文 中文
Measuring Relationship among Dialects: DOC and Related Resources 方言关系的测量:DOC与相关资源
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-02-01 DOI: 10.30019/IJCLCLP.199702.0002
Chin-Chuan Cheng
{"title":"Measuring Relationship among Dialects: DOC and Related Resources","authors":"Chin-Chuan Cheng","doi":"10.30019/IJCLCLP.199702.0002","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199702.0002","url":null,"abstract":"This paper is a synthesis of the past studies in measurements of dialect relationships. The phonological data of 17 Chinese dialects that were computerized in the late 1960s have been utilized for measurements of dialect distance. In addition, a file of over 6,400 lexical variants in 18 dialects was also used to quantify dialect affinity. This writing first explains the nature, the organization, and the coding of these files. A series of steps illustrate how the phonological file was processed to derive the needed information for calculation of correlation coefficients. The coefficients are considered as indices of dialect affinity. The dialects are then grouped by the average linking method of cluster analysis of the coefficients. The appropriateness of the correlation method to the data is then discussed. Recent work on calculation of dialect mutual intelligibility is presented to indicate the future direction of research.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127861821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
MAT - A Project to Collect Mandarin Speech Data Through Telephone Net works in Taiwan 利用台湾电话网搜集普通话语音资料的计画
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-02-01 DOI: 10.30019/IJCLCLP.199702.0003
Hsiao-Chuan Wang
{"title":"MAT - A Project to Collect Mandarin Speech Data Through Telephone Net works in Taiwan","authors":"Hsiao-Chuan Wang","doi":"10.30019/IJCLCLP.199702.0003","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199702.0003","url":null,"abstract":"A cooperative project, called ”Polyphone”, was initiated by the Coordinating Committee on Speech Databases and Speech I/O Systems Assessment (COCOSDA) in 1992. Accordingly, a project to collect Mandarin speech data across Taiwan (MAT) was conducted by a group of researchers from several universities and research organizations in Taiwan. The purpose was to generate a speech corpus for the development of Mandarin-based speech technology and products. The speech data were collected at eight recording stations through telephone networks. The speakers were chosen so as to reflect the population of the gender, the dialect, the educational level, and the residence .in Taiwan. A preliminary Mandarin speech database of 800 speakers has been produced. The final goal is to generate a speech database of at. least 5000 speakers.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133673561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
A Model for Robust Chinese Parser 一种鲁棒中文解析器模型
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1996-08-01 DOI: 10.30019/IJCLCLP.199608.0006
Keh-Jiann Chen
{"title":"A Model for Robust Chinese Parser","authors":"Keh-Jiann Chen","doi":"10.30019/IJCLCLP.199608.0006","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199608.0006","url":null,"abstract":"The Chinese language has many special characteristics which are substantially different from western languages, causing conventional methods of language processing to fail on Chinese. For example, Chinese sentences are composed of strings of characters without word boundaries that are marked by spaces. Therefore, word segmentation and unknown word identification techniques must be used in order to identify words in Chinese. In addition, Chinese has very few inflectional or grammatical markers, making purely syntactic approaches to parsing almost impossible. Hence, a unified approach which involves both syntactic and semantic information must be used. Therefore, a lexical feature-based grammar formalism, called Information-based Case Grammar, is adopted for the parsing model proposed here. This grammar formalism stipulates that a lexical entry for a word contains both semantic and syntactic feature structures. By relaxing the constraints on lexical feature structures, even ill-formed input can be accepted, broadening the coverage of the grammar. A model of a priority controlled chart parser is proposed which, in conjunction with a mechanism of dynamic grammar extension, addresses the problems of: (1) syntactic ambiguities, (2) under-specification and limited coverage of grammars, and (3) ill-formed sentences. The model does this without causing inefficient parsing of sentences that do not require relaxation of constraints or dynamic extension of the grammar.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129765226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An Overview of Corpus-Based Statistics-Oriented(CBSO) Techniques for Natural Language Processing 基于语料库的面向统计(CBSO)自然语言处理技术综述
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1996-08-01 DOI: 10.30019/IJCLCLP.199608.0004
Keh-Yih Su, Tung-Hui Chiang, Jing-Shin Chang
{"title":"An Overview of Corpus-Based Statistics-Oriented(CBSO) Techniques for Natural Language Processing","authors":"Keh-Yih Su, Tung-Hui Chiang, Jing-Shin Chang","doi":"10.30019/IJCLCLP.199608.0004","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199608.0004","url":null,"abstract":"A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in knowledge acquisition in terms of cost and consistency. Therefore, it is very difficult for such systems to be scaled-up. Statistical methods, with the capability of automatically acquiring knowledge from corpora, are becoming more and more popular, in part, to amend the shortcomings of rule-based approaches. However, most simple statistical models, which adopt almost nothing from existing linguistic knowledge, often result in a large parameter space and, thus, require an unaffordably large training corpus for even well-justified linguistic phenomena. The corpus-based statistics-oriented (CBSO) approach is a compromise between the two extremes of the spectrum for knowledge acquisition. CBSO approach emphasizes use of well-justified linguistic knowledge in developing the underlying language model and application of statistical optimization techniques on top of high level constructs, such as annotated syntax trees, rather than on surface strings, so that only a training corpus of reasonable size is needed for training and long distance dependency between constituents could be handled. In this paper, corpus-based statistics-oriented techniques are reviewed. General techniques applicable to CBSO approaches are introduced. In particular, we shall address the following important issues: (1) general tasks in developing an NLP system; (2) why CBSO is the preferred choice among different strategies; (3) how to achieve good performance systematically using a CBSO approach, and (4) frequently used CBSO techniques. Several examples are also reviewed.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133243666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A Hybrid Approach to Machine Translation System Design 机器翻译系统设计的混合方法
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1996-08-01 DOI: 10.30019/IJCLCLP.199608.0005
Kuang-hua Chen, Hsin-Hsi Chen
{"title":"A Hybrid Approach to Machine Translation System Design","authors":"Kuang-hua Chen, Hsin-Hsi Chen","doi":"10.30019/IJCLCLP.199608.0005","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199608.0005","url":null,"abstract":"It is difficult for pure statistics-based machine translation systems to process long sentences. In addition, the domain dependent problem is a key issue under such a framework. Pure rule-based machine translation systems have many human costs in formulating rules and introduce inconsistencies when the number of rules increases. Integration of these two approaches reduces the difficulties associated with both. In this paper, an integrated model for machine translation system is proposed. A partial parsing method is adopted, and the translation process is performed chunk by chunk. In the synthesis module, the word order is locally rearranged within chunks via the Markov model. Since the length of a chunk is much shorter than that of a sentence, the disadvantage of the Markov model in dealing with long distance phenomena is greatly reduced. Structural transfer is fulfilled using a set of rules; in contrast, lexical transfer is resolved using bilingual constraints. Qualitative and quantitative knowledge is employed interleavingly and cooperatively, so that the advantages of these two approaches can be retained.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125127017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Survey on Automatic Speech Recognition with an Illustrative Example on Continuous Speech Recognition of Mandarin 语音自动识别技术综述——以普通话连续语音识别为例
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1996-08-01 DOI: 10.30019/IJCLCLP.199608.0001
Chin-Hui Lee, B. Juang
{"title":"A Survey on Automatic Speech Recognition with an Illustrative Example on Continuous Speech Recognition of Mandarin","authors":"Chin-Hui Lee, B. Juang","doi":"10.30019/IJCLCLP.199608.0001","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199608.0001","url":null,"abstract":"For the past two decades, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems on personal computers, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. In this paper we review some of the key advances in several areas of automatic speech recognition. We also illustrate, by examples, how these key advances can be used for continuous speech recognition of Mandarin. Finally we elaborate the requirements in designing successful real-world applications and address technical challenges that need to be harnessed in order to reach the ultimate goal of providing an easy-to-use, natural, and flexible voice interface between people and machines.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128011270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Important Issues on Chinese Information Retrieval 中文信息检索中的几个重要问题
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1996-08-01 DOI: 10.30019/IJCLCLP.199608.0007
Lee-Feng Chien, H. Pu
{"title":"Important Issues on Chinese Information Retrieval","authors":"Lee-Feng Chien, H. Pu","doi":"10.30019/IJCLCLP.199608.0007","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199608.0007","url":null,"abstract":"In this paper, we will emphasize the significance of Chinese information retrieval in this age of the Internet, and raise several important research issues which are fundamental and require further investigation. At the same time, we will point out some problems and requirements which have often been neglected in designing general Chinese IR systems. Furthermore, experiences obtained from the design of the Csmart system will be described also.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133162203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信