NEWS@IJCNLP最新文献

筛选
英文 中文
Czech Named Entity Corpus and SVM-based Recognizer 捷克语命名实体语料库和基于svm的识别器
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699748
Jana Kravalova, Z. Žabokrtský
{"title":"Czech Named Entity Corpus and SVM-based Recognizer","authors":"Jana Kravalova, Z. Žabokrtský","doi":"10.3115/1699705.1699748","DOIUrl":"https://doi.org/10.3115/1699705.1699748","url":null,"abstract":"This paper deals with recognition of named entities in Czech texts. We present a recently released corpus of Czech sentences with manually annotated named entities, in which a rich two-level classification scheme was used. There are around 6000 sentences in the corpus with roughly 33000 marked named entity instances. We use the data for training and evaluating a named entity recognizer based on Support Vector Machine classification technique. The presented recognizer outperforms the results previously reported for NE recognition in Czech.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115915212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Named Entity Transcription with Pair n-Gram Models 配对n-Gram模型的命名实体转录
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699713
Martin Jansche, R. Sproat
{"title":"Named Entity Transcription with Pair n-Gram Models","authors":"Martin Jansche, R. Sproat","doi":"10.3115/1699705.1699713","DOIUrl":"https://doi.org/10.3115/1699705.1699713","url":null,"abstract":"We submitted results for each of the eight shared tasks. Except for Japanese name kanji restoration, which uses a noisy channel model, our Standard Run submissions were produced by generative long-range pair n-gram models, which we mostly augmented with publicly available data (either from LDC datasets or mined from Wikipedia) for the Non-Standard Runs.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116958796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Hybrid Model for Urdu Hindi Transliteration 乌尔都语印地语音译的混合模式
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699746
M. G. A. Malik, L. Besacier, C. Boitet, P. Bhattacharyya
{"title":"A Hybrid Model for Urdu Hindi Transliteration","authors":"M. G. A. Malik, L. Besacier, C. Boitet, P. Bhattacharyya","doi":"10.3115/1699705.1699746","DOIUrl":"https://doi.org/10.3115/1699705.1699746","url":null,"abstract":"We report in this paper a novel hybrid approach for Urdu to Hindi transliteration that combines finite-state machine (FSM) based techniques with statistical word language model based approach. The output from the FSM is filtered with the word language model to produce the correct Hindi output. The main problem handled is the case of omission of diacritical marks from the input Urdu text. Our system produces the correct Hindi output even when the crucial information in the form of diacritic marks is absent. The approach improves the accuracy of the transducer-only approach from 50.7% to 79.1%. The results reported show that performance can be improved using a word language model to disambiguate the output produced by the transducer-only approach, especially when diacritic marks are not present in the Urdu input.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122244580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Language Independent Transliteration System Using Phrase-based SMT Approach on Substrings 基于短语的子字符串SMT方法的语言独立音译系统
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699734
Sara Noeman
{"title":"Language Independent Transliteration System Using Phrase-based SMT Approach on Substrings","authors":"Sara Noeman","doi":"10.3115/1699705.1699734","DOIUrl":"https://doi.org/10.3115/1699705.1699734","url":null,"abstract":"Everyday the newswire introduce events from all over the world, highlighting new names of persons, locations and organizations with different origins. These names appear as Out of Vocabulary (OOV) words for Machine translation, cross lingual information retrieval, and many other NLP applications. One way to deal with OOV words is to transliterate the unknown words, that is, to render them in the orthography of the second language. We introduce a statistical approach for transliteration only using the bilingual resources released in the shared task and without any previous knowledge of the target languages. Mapping the Transliteration problem to the Machine Translation problem, we make use of the phrase based SMT approach and apply it on substrings of names. In the English to Russian task, we report ACC (Accuracy in top-1) of 0.545, Mean F-score of 0.917, and MRR (Mean Reciprocal Rank) of 0.596. Due to time constraints, we made a single experiment in the English to Chinese task, reporting ACC, Mean F-score, and MRR of 0.411, 0.737, and 0.464 respectively. Finally, it is worth mentioning that the system is language independent since the author is not aware of either languages used in the experiments.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"10 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120883249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Analysis and Robust Extraction of Changing Named Entities 变化命名实体的分析与鲁棒提取
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699743
Masatoshi Tsuchiya, Shoko Endo, S. Nakagawa
{"title":"Analysis and Robust Extraction of Changing Named Entities","authors":"Masatoshi Tsuchiya, Shoko Endo, S. Nakagawa","doi":"10.3115/1699705.1699743","DOIUrl":"https://doi.org/10.3115/1699705.1699743","url":null,"abstract":"This paper focuses on the change of named entities over time and its influence on the performance of the named entity tagger. First, we analyze Japanese named entities which appear in Mainichi Newspaper articles published in 1995, 1996, 1997, 1998 and 2005. This analysis reveals that the number of named entity types and the number of named entity tokens are almost steady over time and that 70 ~ 80% of named entity types in a certain year occur in the articles published either in its succeeding year or in its preceding year. These facts lead that 20 ~ 30% of named entity types are replaced with new ones every year. The experiment against these texts shows that our proposing semi-supervised method which combines a small annotated corpus and a large unannotated corpus for training works robustly although the traditional supervised method is fragile against the change of name entity distribution.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114960807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Transliteration by Bidirectional Statistical Machine Translation 双向统计机器翻译的音译
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699719
A. Finch, E. Sumita
{"title":"Transliteration by Bidirectional Statistical Machine Translation","authors":"A. Finch, E. Sumita","doi":"10.3115/1699705.1699719","DOIUrl":"https://doi.org/10.3115/1699705.1699719","url":null,"abstract":"The system presented in this paper uses phrase-based statistical machine translation (SMT) techniques to directly transliterate between all language pairs in this shared task. The technique makes no language specific assumptions, uses no dictionaries or explicit phonetic information. The translation process transforms sequences of tokens in the source language directly into to sequences of tokens in the target. All language pairs were transliterated by applying this technique in a single unified manner. The machine translation system used was a system comprised of two phrase-based SMT decoders. The first generated from the first token of the target to the last. The second system generated the target from last to first. Our results show that if only one of these decoding strategies is to be chosen, the optimal choice depends on the languages involved, and that in general a combination of the two approaches is able to outperform either approach.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114836521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Learning Multi Character Alignment Rules and Classification of Training Data for Transliteration 多字符对齐规则学习与音译训练数据分类
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699721
Dipankar Bose, S. Sarkar
{"title":"Learning Multi Character Alignment Rules and Classification of Training Data for Transliteration","authors":"Dipankar Bose, S. Sarkar","doi":"10.3115/1699705.1699721","DOIUrl":"https://doi.org/10.3115/1699705.1699721","url":null,"abstract":"We address the issues of transliteration between Indian languages and English, especially for named entities. We use an EM algorithm to learn the alignment between the languages. We find that there are lot of ambiguities in the rules mapping the characters in the source language to the corresponding characters in the target language. Some of these ambiguities can be handled by capturing context by learning multi-character based alignments and use of character n-gram models. We observed that a word in the source script may have actually originated from different languages. Instead of learning one model for the language pair, we propose that one may use multiple models and a classifier to decide which model to use. A contribution of this work is that the models and classifiers are learned in a completely unsupervised manner. Using our system we were able to get quite accurate transliteration models.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132351831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Name Transliteration with Bidirectional Perceptron Edit Models 使用双向感知器编辑模型的名称音译
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699739
Dayne Freitag, Zhiqiang Wang
{"title":"Name Transliteration with Bidirectional Perceptron Edit Models","authors":"Dayne Freitag, Zhiqiang Wang","doi":"10.3115/1699705.1699739","DOIUrl":"https://doi.org/10.3115/1699705.1699739","url":null,"abstract":"We report on our efforts as part of the shared task on the NEWS 2009 Machine Transliteration Shared Task. We applied an orthographic perceptron character edit model that we have used previously for name transliteration, enhancing it in two ways: by ranking possible transliterations according to the sum of their scores according to two models, one trained to generate left-to-right, and one right-to-left; and by constraining generated strings to be consistent with character bigrams observed in the respective language's training data. Our poor showing in the official evaluation was due to a bug in the script used to produce competition-compliant output. Subsequent evaluation shows that our approach yielded comparatively strong performance on all alphabetic language pairs we attempted.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123269452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Maximum n-Gram HMM-based Name Transliteration: Experiment in NEWS 2009 on English-Chinese Corpus 基于最大n-Gram hmm的姓名音译:NEWS 2009中英文语料实验
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699738
Yilu Zhou
{"title":"Maximum n-Gram HMM-based Name Transliteration: Experiment in NEWS 2009 on English-Chinese Corpus","authors":"Yilu Zhou","doi":"10.3115/1699705.1699738","DOIUrl":"https://doi.org/10.3115/1699705.1699738","url":null,"abstract":"We propose an English-Chinese name transliteration system using a maximum N-gram Hidden Markov Model. To handle special challenges with alphabet-based and character-based language pair, we apply a two-phase transliteration model by building two HMM models, one between English and Chinese Pinyin and another between Chinese Pinyin and Chinese characters. Our model improves traditional HMM by assigning the longest prior translation sequence of syllables the largest weight. In our non-standard runs, we use a Web-mining module to boost the performance by adding online popularity information of candidate translations. The entire model does not rely on any dictionaries and the probability tables are derived merely from training corpus. In participation of NEWS 2009 experiment, our model achieved 0.462 Top-1 accuracy and 0.764 Mean F-score.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130172772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Graphemic Approximation of Phonological Context for English-Chinese Transliteration 英汉音译语音语境的字形近似
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699747
O. Kwong
{"title":"Graphemic Approximation of Phonological Context for English-Chinese Transliteration","authors":"O. Kwong","doi":"10.3115/1699705.1699747","DOIUrl":"https://doi.org/10.3115/1699705.1699747","url":null,"abstract":"Although direct orthographic mapping has been shown to outperform phoneme-based methods in English-to-Chinese (E2C) transliteration, it is observed that phonological context plays an important role in resolving graphemic ambiguity. In this paper, we investigate the use of surface graphemic features to approximate local phonological context for E2C. In the absence of an explicit phonemic representation of the English source names, experiments show that the previous and next character of a given English segment could effectively capture the local context affecting its expected pronunciation, and thus its rendition in Chinese.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130460985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信