Int. J. Comput. Linguistics Chin. Lang. Process.最新文献

筛选
英文 中文
What Can Near Synonyms Tell Us 近义词能告诉我们什么
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2000-02-01 DOI: 10.30019/IJCLCLP.200002.0003
Lian-Cheng Chief, Chu-Ren Huang, Keh-Jiann Chen, Mei-Chih Tsai, Li-Li Chang
{"title":"What Can Near Synonyms Tell Us","authors":"Lian-Cheng Chief, Chu-Ren Huang, Keh-Jiann Chen, Mei-Chih Tsai, Li-Li Chang","doi":"10.30019/IJCLCLP.200002.0003","DOIUrl":"https://doi.org/10.30019/IJCLCLP.200002.0003","url":null,"abstract":"This study examines a near synonym pair fangbian and bianli, 'to be convenient,' and extracts the contrasts that dictate their semantic and associated syntactic behaviors. Corpus data reveal important but opaque distributional differences between these synonyms that are not readily apparent based on native speaker intuition. In particular, we argue that this synonym pair can be accounted for with a lexical conceptual profile. This study demonstrates how corpus data can serve as a useful tool for probing the interaction between syntax and semantics.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121784399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
The Module-Attribute Representation of Verbal Semantics: From Semantic to Argument Structure 语言语义的模块-属性表示:从语义到参数结构
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2000-02-01 DOI: 10.30019/IJCLCLP.200002.0002
Chu-Ren Huang, K. Ahrens, Li-Li Chang, Keh-Jiann Chen, Mei-Chun Liu, Mei-Chih Tsai
{"title":"The Module-Attribute Representation of Verbal Semantics: From Semantic to Argument Structure","authors":"Chu-Ren Huang, K. Ahrens, Li-Li Chang, Keh-Jiann Chen, Mei-Chun Liu, Mei-Chih Tsai","doi":"10.30019/IJCLCLP.200002.0002","DOIUrl":"https://doi.org/10.30019/IJCLCLP.200002.0002","url":null,"abstract":"In this paper, we set forth a theory of lexical knowledge. We propose two types of modules: event structure modules and role modules, as well as two sets of attributes: event-internal attributes and role-internal attributes, which are linked to the event structure module and role module, respectively. These module-attribute semantic representations have associated grammatical consequences. Our data is drawn from a comprehensive corpus-based study of Mandarin Chinese verbal semantics, and four particular case studies are presented.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129884684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
A Model for Word Sense Disambiguation 一种词义消歧模型
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1999-08-01 DOI: 10.30019/IJCLCLP.199908.0001
Juan-Zi Li, C. Huang
{"title":"A Model for Word Sense Disambiguation","authors":"Juan-Zi Li, C. Huang","doi":"10.30019/IJCLCLP.199908.0001","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199908.0001","url":null,"abstract":"Word sense disambiguation is one of the most difficult problems in natural language processing. This paper puts forward a model for mapping a structural semantic space from a thesaurus into a multi-dimensional, real-valued vector space and gives a word sense disambiguation method based on this mapping. The model, which uses an unsupervised learning method to acquire the disambiguation knowledge, not only saves extensive manual work, but also realizes the sense tagging of a large number of content words. Firstly, a Chinese thesaurus Cilin and a very large-scale corpus are used to construct the structure of the semantic space. Then, a dynamic disambiguation model is developed to disambiguate an ambiguous word according to the vectors of monosemous words in each of its possible categories. In order to resolve the problem of data sparseness, a method is proposed to make the model more robust. Testing results show that the model has relatively good performance and can also be used for other languages.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129172151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Statistical Analysis of Mandarin Acoustic Units and Automatic Extraction of Phonetically Rich Sentences Based Upon a very Large Chinese Text Corpus 基于大型汉语语料库的汉语语音单元统计分析及语音丰富句子自动提取
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1998-08-01 DOI: 10.30019/IJCLCLP.199808.0005
H. Wang
{"title":"Statistical Analysis of Mandarin Acoustic Units and Automatic Extraction of Phonetically Rich Sentences Based Upon a very Large Chinese Text Corpus","authors":"H. Wang","doi":"10.30019/IJCLCLP.199808.0005","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199808.0005","url":null,"abstract":"Automatic speech recognition by computers can provide humans with the most convenient method to communicate with computers. Because the Chinese language is not alphabetic and input of Chinese characters into computers is very difficult, Mandarin speech recognition is very highly desired. Recently, high performance speech recognition systems have begun to emerge from research institutes. However, it is believed that an adequate speech database for training acoustic models and evaluating performance is certainly critical for successful deployment of such systems in realistic operating environments. Thus, designing a set of phonetically rich sentences to be used in efficiently training and evaluating a speech recognition system has become very important. This paper first presents statistical analysis of various Mandarin acoustic units based upon a very large Chinese text corpus collected from daily newspapers and then presents an algorithm to automatically extract phonetically rich sentences from the text corpus to be used in training and evaluating a Mandarin speech recognition system.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127876508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
White Page Construction from Web Pages for Finding People on the Internet 在互联网上寻找人的网页的白页建设
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1998-02-01 DOI: 10.30019/IJCLCLP.199802.0005
Hsin-Hsi Chen, Guo-Wei Bian
{"title":"White Page Construction from Web Pages for Finding People on the Internet","authors":"Hsin-Hsi Chen, Guo-Wei Bian","doi":"10.30019/IJCLCLP.199802.0005","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199802.0005","url":null,"abstract":"This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents includes proper nouns, E-mail addresses and home page URLs. Natural language processing techniques are employed to identify and classify proper nouns, which are usually unknown words. The information (i.e., home pages' URLs or e-mail addresses) for those proper nouns appearing in the anchor parts can be easily extracted using the associated anchor tags. For those proper nouns in the non-anchor pan of a web page, different kinds of clues, such as the spelling method, adjacency principle and HTML tags, are used to relate proper nouns to their corresponding E-mail addresses and/or URLs. Based on the semantics of content and HTML tags, the extracted information is more accurate than the results obtained using traditional search engines. The results can be used to construct white pages for Internet/Intranet users or to build databases for finding people and organizations on the Internet. Such searching services are very useful for human communication and dissemination of information.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114466844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Building a Bracketed Corpus Using Φ2 Statistics 使用Φ2 Statistics构建括号语料库
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-08-01 DOI: 10.30019/IJCLCLP.199708.0001
Yue-Shi Lee, Hsin-Hsi Chen
{"title":"Building a Bracketed Corpus Using Φ2 Statistics","authors":"Yue-Shi Lee, Hsin-Hsi Chen","doi":"10.30019/IJCLCLP.199708.0001","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199708.0001","url":null,"abstract":"Research based on treebanks is ongoing for many natural language applications. However, the work involved in building a large-scale treebank is laborious and time-consuming. Thus, speeding up the process of building a treebank has become an important task. This paper proposes two versions of probabilistic chunkers to aid the development of a bracketed corpus. The basic version partitions part-of-speech sequences into chunk sequences, which form a partially bracketed corpus. Applying the chunking action recursively, the recursive version generates a fully bracketed corpus. Rather than using a treebank as a training corpus, a corpus, which is tagged with part-of-speech information only, is used. The experimental results show that the probabilistic chunker has a correct rate of more than 94% in producing a partially bracketed corpus and also gives very encouraging results in generating a fully bracketed corpus. These two versions of chunkers are simple but effective and can also be applied to many natural language applications.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129055422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Representation of Verbal Semantics – An Approach Based on Near-Synonyms 一种基于近义词的语言语义表达方法
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-08-01 DOI: 10.30019/IJCLCLP.199802.0004
Mei-Chih Tsai, Chu-Ren Huang, Keh-Jiann Chen, K. Ahrens
{"title":"Towards a Representation of Verbal Semantics – An Approach Based on Near-Synonyms","authors":"Mei-Chih Tsai, Chu-Ren Huang, Keh-Jiann Chen, K. Ahrens","doi":"10.30019/IJCLCLP.199802.0004","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199802.0004","url":null,"abstract":"In this paper we propose using the distributional differences in the syntactic patterns of near-synonyms to deduce the relevant components of verb meaning. Our method involves determining the distributional differences in syntactic patterns, deducing the semantic features from the syntactic phenomena, and testing the semantic features in new syntactic frames. We determine the distributional differences in syntactic patterns through the following five steps: First, we search for all instances of the verb in the corpus. Second, we classify each of these instances into its type of syntactic function. Third, we classify each of these instances into its argument structure type. Fourth, we determine the aspectual type that is associated with each verb. Lastly, we determine each verb's sentential type. Once the distributional differences have been determined, then the relevant semantic features are postulated. Our goal is to tease out the lexical semantic features as the explanation, and as the motivation of the syntactic contrasts.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"49 17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116998070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
An Unsupervised Iterative Method for Chinese New Lexicon Extraction 中文新词典抽取的无监督迭代方法
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-08-01 DOI: 10.30019/IJCLCLP.199708.0005
Jing-Shin Chang, Keh-Yih Su
{"title":"An Unsupervised Iterative Method for Chinese New Lexicon Extraction","authors":"Jing-Shin Chang, Keh-Yih Su","doi":"10.30019/IJCLCLP.199708.0005","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199708.0005","url":null,"abstract":"An unsupervised iterative approach for extracting a new lexicon (or unknown words) from a Chinese text corpus is proposed in this paper. Instead of using a non-iterative segmentation-merging-filtering-and-disambiguation approach, the proposed method iteratively integrates the contextual constraints (among word candidates) and a joint character association metric to progressively improve the segmentation results of the input corpus (and thus the new word list.) An augmented dictionary, which includes potential unknown words (in addition to known words), is used to segment the input corpus, unlike traditional approaches which use only known words for segmentation. In the segmentation process, the augmented dictionary is used to impose contextual constraints over known words and potential unknown words within input sentences; an unsupervised Viterbi Training process is then applied to ensure that the selected potential unknown words (and known words) maximize the likelihood of the input corpus. On the other hand, the joint character association metric (which reflects the global character association characteristics across the corpus) is derived by integrating several commonly used word association metrics, such as mutual information and entropy, with a joint Gaussian mixture density function; such integration allows the filter to use multiple features simultaneously to evaluate character association, unlike traditional filters which apply multiple features independently. The proposed method then allows the contextual constraints and the joint character association metric to enhance each other; this is achieved by iteratively applying the joint association metric to truncate unlikely unknown words in the augmented dictionary and using the segmentation result to improve the estimation of the joint association metric. The refined augmented dictionary and improved estimation are then used in the next iteration to acquire better segmentation and carry out more reliable filtering. Experiments show that both the precision and recall rates are improved almost monotonically, in contrast to non-iterative segmentation-merging-filtering-and-disambiguation approaches, which often sacrifice precision for recall or vice versa. With a corpus of 311,591 sentences, the performance is 76% (bigram), 54% (trigram), and 70% (quadragram) in F-measure, which is significantly better than using the non-iterative approach with F-measures of 74% (bigram), 46% (trigram), and 58% (quadragram).","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117249587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Computational Tools and Resources for Linguistic Studies 语言学研究的计算工具和资源
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-02-01 DOI: 10.30019/IJCLCLP.199702.0001
Y. Hsu, Jing-Shin Chang, Keh-Yih Su
{"title":"Computational Tools and Resources for Linguistic Studies","authors":"Y. Hsu, Jing-Shin Chang, Keh-Yih Su","doi":"10.30019/IJCLCLP.199702.0001","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199702.0001","url":null,"abstract":"This paper presents several useful computational tools and available resources to facilitate linguistic studies. For each computational tool, we demonstrate why it is useful and how can it be used for research. In addition, linguistic examples are given for illustration. First, a very useful searching engine, Key Word in Context (KWIC), is introduced. This tool can automatically extract linguistically significant patterns from large corpora and help linguists discover syntagmatic generalizations. Second, Dynamic Clustering and Hierarchical Clustering are introduced for identifying natural clusters of words or phrases in distribution. Third, statistical measures which could be used to measure the degree of cohesion and correlation among linguistic units are presented. These tools can help linguists identify the boundaries of lexical units. Fourth, alignment tools for aligning parallel texts at the word, sentence and structure levels are presented for linguists who do comparative studies of different languages. Fifth, we introduce Sequential Forward Selection (SFS) and Classification and Regression Tree (CART) for automatic rule ordering. Finally, some available electronic Chinese resources are described to provide reference purposes for those who are interested.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114555418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Applications 不同语音群体同步汉语语料库的构建与应用
Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 1997-02-01 DOI: 10.30019/IJCLCLP.199702.0004
B. K. T'sou, Hing-lung Lin, Godfrey Liu, Terence Y. W. Chan, Jerome Hu, Ching-hai Chew, John K. P. Tse
{"title":"A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Applications","authors":"B. K. T'sou, Hing-lung Lin, Godfrey Liu, Terence Y. W. Chan, Jerome Hu, Ching-hai Chew, John K. P. Tse","doi":"10.30019/IJCLCLP.199702.0004","DOIUrl":"https://doi.org/10.30019/IJCLCLP.199702.0004","url":null,"abstract":"Similar to other languages such as English, Spanish and Arabic, Chinese is used by a large number of speakers in distinct speech communities which, despite sharing the unity of language, vary in interesting ways, and a systematic study of such linguistic variation is invaluable to appreciate the diversity and richness of the underlying cultures. This paper describes Project LIVAC (Linguistic Variation in Chinese Communities), which focuses on the development of a Chinese corpus, based on data taken concurrently at regular intervals from multiple Chinese speech communities. The resulting database and computerized concordance from the approximately 20 million word corpus with uniform time reference points extending across two years enable linguists and social scientists to undertake meaningful qualitative and quantitative comparative analysis of the development of linguistic and cultural variation. To facilitate these studies, a framework for integrating the corpus with specific corpus analysis applications is proposed. Based on this framework, a prototype retrieval system, which supports longitudinal studies on word and concept distribution, as well as lexical and other linguistic variation, is designed and implemented.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117023005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信