Int. J. Comput. Linguistics Chin. Lang. Process.最新文献_第9页

What Can Near Synonyms Tell Us 近义词能告诉我们什么

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2000-02-01 DOI: 10.30019/IJCLCLP.200002.0003

Lian-Cheng Chief, Chu-Ren Huang, Keh-Jiann Chen, Mei-Chih Tsai, Li-Li Chang

引用次数: 33

The Module-Attribute Representation of Verbal Semantics: From Semantic to Argument Structure 语言语义的模块-属性表示:从语义到参数结构

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2000-02-01 DOI: 10.30019/IJCLCLP.200002.0002

Chu-Ren Huang, K. Ahrens, Li-Li Chang, Keh-Jiann Chen, Mei-Chun Liu, Mei-Chih Tsai

引用次数: 45

A Model for Word Sense Disambiguation 一种词义消歧模型

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1999-08-01 DOI: 10.30019/IJCLCLP.199908.0001

Juan-Zi Li, C. Huang

引用次数: 6

Statistical Analysis of Mandarin Acoustic Units and Automatic Extraction of Phonetically Rich Sentences Based Upon a very Large Chinese Text Corpus 基于大型汉语语料库的汉语语音单元统计分析及语音丰富句子自动提取

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1998-08-01 DOI: 10.30019/IJCLCLP.199808.0005

H. Wang

{"title":"Statistical Analysis of Mandarin Acoustic Units and Automatic Extraction of Phonetically Rich Sentences Based Upon a very Large Chinese Text Corpus","authors":"H. Wang","doi":"10.30019/IJCLCLP.199808.0005","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199808.0005","url":null,"abstract":"Automatic speech recognition by computers can provide humans with the most convenient method to communicate with computers. Because the Chinese language is not alphabetic and input of Chinese characters into computers is very difficult, Mandarin speech recognition is very highly desired. Recently, high performance speech recognition systems have begun to emerge from research institutes. However, it is believed that an adequate speech database for training acoustic models and evaluating performance is certainly critical for successful deployment of such systems in realistic operating environments. Thus, designing a set of phonetically rich sentences to be used in efficiently training and evaluating a speech recognition system has become very important. This paper first presents statistical analysis of various Mandarin acoustic units based upon a very large Chinese text corpus collected from daily newspapers and then presents an algorithm to automatically extract phonetically rich sentences from the text corpus to be used in training and evaluating a Mandarin speech recognition system.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127876508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

White Page Construction from Web Pages for Finding People on the Internet 在互联网上寻找人的网页的白页建设

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1998-02-01 DOI: 10.30019/IJCLCLP.199802.0005

Hsin-Hsi Chen, Guo-Wei Bian

{"title":"White Page Construction from Web Pages for Finding People on the Internet","authors":"Hsin-Hsi Chen, Guo-Wei Bian","doi":"10.30019/IJCLCLP.199802.0005","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199802.0005","url":null,"abstract":"This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents includes proper nouns, E-mail addresses and home page URLs. Natural language processing techniques are employed to identify and classify proper nouns, which are usually unknown words. The information (i.e., home pages' URLs or e-mail addresses) for those proper nouns appearing in the anchor parts can be easily extracted using the associated anchor tags. For those proper nouns in the non-anchor pan of a web page, different kinds of clues, such as the spelling method, adjacency principle and HTML tags, are used to relate proper nouns to their corresponding E-mail addresses and/or URLs. Based on the semantics of content and HTML tags, the extracted information is more accurate than the results obtained using traditional search engines. The results can be used to construct white pages for Internet/Intranet users or to build databases for finding people and organizations on the Internet. Such searching services are very useful for human communication and dissemination of information.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114466844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Building a Bracketed Corpus Using Φ2 Statistics 使用Φ2 Statistics构建括号语料库

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-08-01 DOI: 10.30019/IJCLCLP.199708.0001

Yue-Shi Lee, Hsin-Hsi Chen

引用次数: 0

Towards a Representation of Verbal Semantics – An Approach Based on Near-Synonyms 一种基于近义词的语言语义表达方法

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-08-01 DOI: 10.30019/IJCLCLP.199802.0004

Mei-Chih Tsai, Chu-Ren Huang, Keh-Jiann Chen, K. Ahrens

引用次数: 36

An Unsupervised Iterative Method for Chinese New Lexicon Extraction 中文新词典抽取的无监督迭代方法

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-08-01 DOI: 10.30019/IJCLCLP.199708.0005

Jing-Shin Chang, Keh-Yih Su

{"title":"An Unsupervised Iterative Method for Chinese New Lexicon Extraction","authors":"Jing-Shin Chang, Keh-Yih Su","doi":"10.30019/IJCLCLP.199708.0005","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199708.0005","url":null,"abstract":"An unsupervised iterative approach for extracting a new lexicon (or unknown words) from a Chinese text corpus is proposed in this paper. Instead of using a non-iterative segmentation-merging-filtering-and-disambiguation approach, the proposed method iteratively integrates the contextual constraints (among word candidates) and a joint character association metric to progressively improve the segmentation results of the input corpus (and thus the new word list.) An augmented dictionary, which includes potential unknown words (in addition to known words), is used to segment the input corpus, unlike traditional approaches which use only known words for segmentation. In the segmentation process, the augmented dictionary is used to impose contextual constraints over known words and potential unknown words within input sentences; an unsupervised Viterbi Training process is then applied to ensure that the selected potential unknown words (and known words) maximize the likelihood of the input corpus. On the other hand, the joint character association metric (which reflects the global character association characteristics across the corpus) is derived by integrating several commonly used word association metrics, such as mutual information and entropy, with a joint Gaussian mixture density function; such integration allows the filter to use multiple features simultaneously to evaluate character association, unlike traditional filters which apply multiple features independently. The proposed method then allows the contextual constraints and the joint character association metric to enhance each other; this is achieved by iteratively applying the joint association metric to truncate unlikely unknown words in the augmented dictionary and using the segmentation result to improve the estimation of the joint association metric. The refined augmented dictionary and improved estimation are then used in the next iteration to acquire better segmentation and carry out more reliable filtering. Experiments show that both the precision and recall rates are improved almost monotonically, in contrast to non-iterative segmentation-merging-filtering-and-disambiguation approaches, which often sacrifice precision for recall or vice versa. With a corpus of 311,591 sentences, the performance is 76% (bigram), 54% (trigram), and 70% (quadragram) in F-measure, which is significantly better than using the non-iterative approach with F-measures of 74% (bigram), 46% (trigram), and 58% (quadragram).","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117249587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

Computational Tools and Resources for Linguistic Studies 语言学研究的计算工具和资源

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-02-01 DOI: 10.30019/IJCLCLP.199702.0001

Y. Hsu, Jing-Shin Chang, Keh-Yih Su

{"title":"Computational Tools and Resources for Linguistic Studies","authors":"Y. Hsu, Jing-Shin Chang, Keh-Yih Su","doi":"10.30019/IJCLCLP.199702.0001","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199702.0001","url":null,"abstract":"This paper presents several useful computational tools and available resources to facilitate linguistic studies. For each computational tool, we demonstrate why it is useful and how can it be used for research. In addition, linguistic examples are given for illustration. First, a very useful searching engine, Key Word in Context (KWIC), is introduced. This tool can automatically extract linguistically significant patterns from large corpora and help linguists discover syntagmatic generalizations. Second, Dynamic Clustering and Hierarchical Clustering are introduced for identifying natural clusters of words or phrases in distribution. Third, statistical measures which could be used to measure the degree of cohesion and correlation among linguistic units are presented. These tools can help linguists identify the boundaries of lexical units. Fourth, alignment tools for aligning parallel texts at the word, sentence and structure levels are presented for linguists who do comparative studies of different languages. Fifth, we introduce Sequential Forward Selection (SFS) and Classification and Regression Tree (CART) for automatic rule ordering. Finally, some available electronic Chinese resources are described to provide reference purposes for those who are interested.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114555418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Applications 不同语音群体同步汉语语料库的构建与应用

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-02-01 DOI: 10.30019/IJCLCLP.199702.0004

B. K. T'sou, Hing-lung Lin, Godfrey Liu, Terence Y. W. Chan, Jerome Hu, Ching-hai Chew, John K. P. Tse

{"title":"A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Applications","authors":"B. K. T'sou, Hing-lung Lin, Godfrey Liu, Terence Y. W. Chan, Jerome Hu, Ching-hai Chew, John K. P. Tse","doi":"10.30019/IJCLCLP.199702.0004","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199702.0004","url":null,"abstract":"Similar to other languages such as English, Spanish and Arabic, Chinese is used by a large number of speakers in distinct speech communities which, despite sharing the unity of language, vary in interesting ways, and a systematic study of such linguistic variation is invaluable to appreciate the diversity and richness of the underlying cultures. This paper describes Project LIVAC (Linguistic Variation in Chinese Communities), which focuses on the development of a Chinese corpus, based on data taken concurrently at regular intervals from multiple Chinese speech communities. The resulting database and computerized concordance from the approximately 20 million word corpus with uniform time reference points extending across two years enable linguists and social scientists to undertake meaningful qualitative and quantitative comparative analysis of the development of linguistic and cultural variation. To facilitate these studies, a framework for integrating the corpus with specific corpus analysis applications is proposed. Based on this framework, a prototype retrieval system, which supports longitudinal studies on word and concept distribution, as well as lexical and other linguistic variation, is designed and implemented.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117023005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13