NEWS@IJCNLPPub Date : 2011-11-01DOI: 10.18653/v1/W16-2709
Xiangyu Duan, Rafael E. Banchs, Min Zhang, Haizhou Li, A. Kumaran
{"title":"Report of NEWS 2016 Machine Transliteration Shared Task","authors":"Xiangyu Duan, Rafael E. Banchs, Min Zhang, Haizhou Li, A. Kumaran","doi":"10.18653/v1/W16-2709","DOIUrl":"https://doi.org/10.18653/v1/W16-2709","url":null,"abstract":"This report documents the Machine Transliteration Shared Task conducted as a part of the Named Entities Workshop (NEWS 2011), an IJCNLP 2011 workshop. The shared task features machine transliteration of proper names from English to 11 languages and from 3 languages to English. In total, 14 tasks are provided. 10 teams from 7 different countries participated in the evaluations. Finally, 73 standard and 4 non-standard runs are submitted, where diverse transliteration methodologies are explored and reported on the evaluation data. We report the results with 4 performance metrics. We believe that the shared task has successfully achieved its objective by providing a common benchmarking platform for the research community to evaluate the state-of-the-art technologies that benefit the future research and development.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122158499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699731
Peter Nabende
{"title":"Transliteration System Using Pair HMM with Weighted FSTs","authors":"Peter Nabende","doi":"10.3115/1699705.1699731","DOIUrl":"https://doi.org/10.3115/1699705.1699731","url":null,"abstract":"This paper presents a transliteration system based on pair Hidden Markov Model (pair HMM) training and Weighted Finite State Transducer (WFST) techniques. Parameters used by WFSTs for transliteration generation are learned from a pair HMM. Parameters from pair-HMM training on English-Russian data sets are found to give better transliteration quality than parameters trained for WFSTs for corresponding structures. Training a pair HMM on English vowel bigrams and standard bigrams for Cyrillic Romanization, and using a few transformation rules on generated Russian transliterations to test for context improves the system's transliteration quality.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124750956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699740
Davide Picca, A. Gliozzo, S. Campora
{"title":"Bridging Languages by SuperSense Entity Tagging","authors":"Davide Picca, A. Gliozzo, S. Campora","doi":"10.3115/1699705.1699740","DOIUrl":"https://doi.org/10.3115/1699705.1699740","url":null,"abstract":"This paper explores a very basic linguistic phenomenon in multilingualism: the lexicalizations of entities are very often identical within different languages while concepts are usually lexicalized differently. Since entities are commonly referred to by proper names in natural language, we measured their distribution in the lexical overlap of the terminologies extracted from comparable corpora. Results show that the lexical overlap is mostly composed by unambiguous words, which can be regarded as anchors to bridge languages: most of terms having the same spelling refer exactly to the same entities. Thanks to this important feature of Named Entities, we developed a multilingual super sense tagging system capable to distinguish between concepts and individuals. Individuals adopted for training have been extracted both by YAGO and by a heuristic procedure. The general F1 of the English tagger is over 76%, which is in line with the state of the art on super sense tagging while augmenting the number of classes. Performances for Italian are slightly lower, while ensuring a reasonable accuracy level which is capable to show effective results for knowledge acquisition.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130231223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699749
Asif Ekbal, Sivaji Bandyopadhyay
{"title":"Voted NER System using Appropriate Unlabeled Data","authors":"Asif Ekbal, Sivaji Bandyopadhyay","doi":"10.3115/1699705.1699749","DOIUrl":"https://doi.org/10.3115/1699705.1699749","url":null,"abstract":"This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes use of the language independent features in the form of different contextual and orthographic word level features along with the language dependent features extracted from the Part of Speech (POS) tagger and gazetteers. Context patterns generated from the unlabeled data using an active learning method have been used as the features in each of the classifiers. A semi-supervised method has been used to describe the measures to automatically select effective documents and sentences from unlabeled data. Finally, the models have been combined together into a final system by weighted voting technique. Experimental results show the effectiveness of the proposed approach with the overall Recall, Precision, and F-Score values of 93.81%, 92.18% and 92.98%, respectively. We have shown how the language dependent features can improve the system performance.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"284 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122087723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699741
Feiliang Ren, Muhua Zhu, Huizhen Wang, Jingbo Zhu
{"title":"Chinese-English Organization Name Translation Based on Correlative Expansion","authors":"Feiliang Ren, Muhua Zhu, Huizhen Wang, Jingbo Zhu","doi":"10.3115/1699705.1699741","DOIUrl":"https://doi.org/10.3115/1699705.1699741","url":null,"abstract":"This paper presents an approach to translating Chinese organization names into English based on correlative expansion. Firstly, some candidate translations are generated by using statistical translation method. And several correlative named entities for the input are retrieved from a correlative named entity list. Secondly, three kinds of expansion methods are used to generate some expanded queries. Finally, these queries are submitted to a search engine, and the refined translation results are mined and re-ranked by using the returned web pages. Experimental results show that this approach outperforms the compared system in overall translation accuracy.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127991607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699737
Taraka Rama, Karthik Gali
{"title":"Modeling Machine Transliteration as a Phrase Based Statistical Machine Translation Problem","authors":"Taraka Rama, Karthik Gali","doi":"10.3115/1699705.1699737","DOIUrl":"https://doi.org/10.3115/1699705.1699737","url":null,"abstract":"In this paper we use the popular phrase-based SMT techniques for the task of machine transliteration, for English-Hindi language pair. Minimum error rate training has been used to learn the model weights. We have achieved an accuracy of 46.3% on the test set. Our results show these techniques can be successfully used for the task of machine transliteration.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"6 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121009063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699720
Yan Song, C. Kit, Xiao Chen
{"title":"Transliteration of Name Entity via Improved Statistical Translation on Character Sequences","authors":"Yan Song, C. Kit, Xiao Chen","doi":"10.3115/1699705.1699720","DOIUrl":"https://doi.org/10.3115/1699705.1699720","url":null,"abstract":"Transliteration of given parallel name entities can be formulated as a phrase-based statistical machine translation (SMT) process, via its routine procedure comprising training, optimization and decoding. In this paper, we present our approach to transliterating name entities using the loglinear phrase-based SMT on character sequences. Our proposed work improves the translation by using bidirectional models, plus some heuristic guidance integrated in the decoding process. Our evaluated results indicate that this approach performs well in all standard runs in the NEWS2009 Machine Transliteration Shared Task.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133028668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699736
Balakrishnan Varadarajan, D. Rao
{"title":"epsilon-extension Hidden Markov Models and Weighted Transducers for Machine Transliteration","authors":"Balakrishnan Varadarajan, D. Rao","doi":"10.3115/1699705.1699736","DOIUrl":"https://doi.org/10.3115/1699705.1699736","url":null,"abstract":"We describe in detail a method for transliterating an English string to a foreign language string evaluated on five different languages, including Tamil, Hindi, Russian, Chinese, and Kannada. Our method involves deriving substring alignments from the training data and learning a weighted finite state transducer from these alignments. We define an e-extension Hidden Markov Model to derive alignments between training pairs and a heuristic to extract the substring alignments. Our method involves only two tunable parameters that can be optimized on held-out data.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128853159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699715
Praneeth Shishtla, S. Veeravalli, Sethuramalingam Subramaniam, Vasudeva Varma
{"title":"A Language-Independent Transliteration Schema Using Character Aligned Models at NEWS 2009","authors":"Praneeth Shishtla, S. Veeravalli, Sethuramalingam Subramaniam, Vasudeva Varma","doi":"10.3115/1699705.1699715","DOIUrl":"https://doi.org/10.3115/1699705.1699715","url":null,"abstract":"In this paper we present a statistical transliteration technique that is language independent. This technique uses statistical alignment models and Conditional Random Fields (CRF). Statistical alignment models maximizes the probability of the observed (source, target) word pairs using the expectation maximization algorithm and then the character level alignments are set to maximum posterior predictions of the model. CRF has efficient training and decoding processes which is conditioned on both source and target languages and produces globally optimal solution.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132750960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"English to Hindi Machine Transliteration System at NEWS 2009","authors":"Amitava Das, Asif Ekbal, Tapabrata Mondal, Sivaji Bandyopadhyay","doi":"10.3115/1699705.1699726","DOIUrl":"https://doi.org/10.3115/1699705.1699726","url":null,"abstract":"This paper reports about our work in the NEWS 2009 Machine Transliteration Shared Task held as part of ACL-IJCNLP 2009. We submitted one standard run and two non-standard runs for English to Hindi transliteration. The modified joint source-channel model has been used along with a number of alternatives. The system has been trained on the NEWS 2009 Machine Transliteration Shared Task datasets. For standard run, the system demonstrated an accuracy of 0.471 and the mean F-Score of 0.861. The non-standard runs yielded the accuracy and mean F-scores of 0.389 and 0.831 respectively in the first one and 0.384 and 0.828 respectively in the second one. The non-standard runs resulted in substantially worse performance than the standard run. The reasons for this are the ranking algorithm used for the output and the types of tokens present in the test set.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133847673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}