NEWS@ACM最新文献

筛选
英文 中文
Regulating Orthography-Phonology Relationship for English to Thai Transliteration 英语到泰语音译的正字法-音系关系调节
NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2712
Binh Minh Nguyen, H. Ngo, Nancy F. Chen
{"title":"Regulating Orthography-Phonology Relationship for English to Thai Transliteration","authors":"Binh Minh Nguyen, H. Ngo, Nancy F. Chen","doi":"10.18653/v1/W16-2712","DOIUrl":"https://doi.org/10.18653/v1/W16-2712","url":null,"abstract":"In this paper, we discuss our endeavors for the Named Entities Workshop (NEWS) 2016 transliteration shared task, where we focus on English to Thai transliteration. The alignment between Thai orthography and phonology is not always monotonous, but few transliteration systems take this into account. In our proposed system, we exploit phonological knowledge to resolve problematic instances where the monotonous alignment assumption breaks down. We achieve a 29% relative improvement over the baseline system for the NEWS 2016 transliteration shared task.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126351959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Moses-based official baseline for NEWS 2016 摩西为基础的官方基线新闻2016
NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2713
M. Costa-jussà
{"title":"Moses-based official baseline for NEWS 2016","authors":"M. Costa-jussà","doi":"10.18653/v1/W16-2713","DOIUrl":"https://doi.org/10.18653/v1/W16-2713","url":null,"abstract":"Transliteration is the phonetic translation between two different languages. There are many works that approach transliteration using machine translation methods. This paper describes the official baseline system for the NEWS 2016 workshop shared task. This baseline is based on a standard phrase-based machine translation system using Moses. Results are between the range of best and worst from last year’s workshops providing a nice starting point for participants this year.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133696949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multi-source named entity typing for social media 用于社交媒体的多源命名实体类型
NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2702
R. Vexler, Einat Minkov
{"title":"Multi-source named entity typing for social media","authors":"R. Vexler, Einat Minkov","doi":"10.18653/v1/W16-2702","DOIUrl":"https://doi.org/10.18653/v1/W16-2702","url":null,"abstract":"Typed lexicons that encode knowledge about the semantic types of an entity name, e.g., that ‘Paris’ denotes a geolocation, product, or person, have proven useful for many text processing tasks. While lexicons may be derived from large-scale knowledge bases (KBs), KBs are inherently imperfect, in particular they lack coverage with respect to long tail entity names. We infer the types of a given entity name using multi-source learning, considering information obtained by alignment to the Freebase knowledge base, Web-scale distributional patterns, and global semi-structured contexts retrieved by means of Web search. Evaluation in the challenging domain of social media shows that multi-source learning improves performance compared with rule-based KB lookups, boosting typing results for some semantic categories.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125066131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Target-Bidirectional Neural Models for Machine Transliteration 机器音译的目标-双向神经模型
NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2711
A. Finch, Lemao Liu, Xiaolin Wang, E. Sumita
{"title":"Target-Bidirectional Neural Models for Machine Transliteration","authors":"A. Finch, Lemao Liu, Xiaolin Wang, E. Sumita","doi":"10.18653/v1/W16-2711","DOIUrl":"https://doi.org/10.18653/v1/W16-2711","url":null,"abstract":"Our purely neural network-based system represents a paradigm shift away from the techniques based on phrase-based statistical machine translation we have used in the past. The approach exploits the agreement between a pair of target-bidirectional LSTMs, in order to generate balanced targets with both good suffixes and good prefixes. The evaluation results show that the method is able to match and even surpass the current state-of-the-art on most language pairs, but also exposes weaknesses on some tasks motivating further study. The Janus toolkit that was used to build the systems used in the evaluation is publicly available at https://github.com/lemaoliu/Agtarbidir.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134450799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Spanish NER with Word Representations and Conditional Random Fields 具有词表示和条件随机场的西班牙语NER
NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2705
J. Copara, J. Ochoa, Camilo Thorne, Goran Glavas
{"title":"Spanish NER with Word Representations and Conditional Random Fields","authors":"J. Copara, J. Ochoa, Camilo Thorne, Goran Glavas","doi":"10.18653/v1/W16-2705","DOIUrl":"https://doi.org/10.18653/v1/W16-2705","url":null,"abstract":"Word Representations such as word embeddings have been shown to significantly improve (semi-)supervised NER for the English language. In this work we investigate whether word representations can also boost (semi-)supervised NER in Spanish. To do so, we use word representations as additional features in a linear chain Conditional Random Field (CRF) classifier. Experimental results (82.44 Fscore on the CoNLL-2002 corpus) show that our approach is comparable to some state-of-the-art Deep Learning approaches for Spanish, in particular when using","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128676337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Applying Neural Networks to English-Chinese Named Entity Transliteration 神经网络在英汉命名实体音译中的应用
NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2710
Yan Shao, Joakim Nivre
{"title":"Applying Neural Networks to English-Chinese Named Entity Transliteration","authors":"Yan Shao, Joakim Nivre","doi":"10.18653/v1/W16-2710","DOIUrl":"https://doi.org/10.18653/v1/W16-2710","url":null,"abstract":"This paper presents the machine transliteration systems that we employ for our participation in the NEWS 2016 machine transliteration shared task. Based on the prevalent deep learning models developed for general sequence processing tasks, we use convolutional neural networks to extract character level information from the transliteration units and stack a simple recurrent neural network on top for sequence processing. The systems are applied to the standard runs for both English to Chinese and Chinese to English transliteration tasks. Our systems achieve competitive results according to the official evaluation.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126203145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
German NER with a Multilingual Rule Based Information Extraction System: Analysis and Issues 基于多语言规则的德语NER信息抽取系统:分析与问题
NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2704
Anna Druzhkina, A. Leontyev, M. Stepanova
{"title":"German NER with a Multilingual Rule Based Information Extraction System: Analysis and Issues","authors":"Anna Druzhkina, A. Leontyev, M. Stepanova","doi":"10.18653/v1/W16-2704","DOIUrl":"https://doi.org/10.18653/v1/W16-2704","url":null,"abstract":"This paper presents a rule-based approach to Named Entity Recognition for the German language. The approach rests upon deep linguistic parsing and has already been applied to English and Russian. In this paper we present the first results of our system, ABBYY InfoExtractor, on GermEval 2014 Shared Task corpus. We focus on the main challenges of German NER that we have encountered when adapting our system to German and possible solutions for them.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130255591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linguistic Issues in the Machine Transliteration of Chinese, Japanese and Arabic Names 汉语、日语和阿拉伯语人名机器音译中的语言学问题
NEWS@ACM Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2707
Jack Halpern
{"title":"Linguistic Issues in the Machine Transliteration of Chinese, Japanese and Arabic Names","authors":"Jack Halpern","doi":"10.18653/v1/W16-2707","DOIUrl":"https://doi.org/10.18653/v1/W16-2707","url":null,"abstract":"The romanization of non-Latin scripts is a complex computational task that is highly language dependent. This presentation will focus on three of the most challenging nonLatin scripts: Chinese, Japanese, and Arabic (CJA). Much progress has been made in personal name machine-transliteration methodologies, as documented in the various NEWS reports over the last several years. Such techniques as phrase-based SMT, RNN-based LM and CRF have emerged, leading to gradual improvements in accuracy scores. But methodology is only one aspect of the problem. Equally important is the high level of ambiguity of the CJA scripts, which poses special challenges to named entity extraction and machine transliteration. These difficulties are exacerbated by the lack of comprehensive proper noun dictionaries, the multiplicity of ambiguous transcription schemes, and orthographic variation. This presentation will clear up the differences between three basic concepts -transliteration, transcription, and romanization -that are a source of much confusion, even among computational linguists, and will focus on (1) the major linguistics issues, that is, the special characteristics of the CJA scripts that impact machine transliteration, and (2) the important role played by lexical resources such as personal name dictionaries. A major issue in romanizing Simplified Chinese (SC) is the one-to-many ambiguity of many characters (polyphones), such as /le/ and /yue/ for 乐. To disambiguate accurately, the names must be looked up in word-level (not character-level) name mapping tables. This is complicated by (1) the presence of orthographic variants in traditional Chinese (TC), and (2) the need to for cross-script conversion between (SC) and (TC), Transcription into Chinese is even more ambiguous, since some phonemes can correspond to dozens of characters. A major characteristic of Japanese, a highly agglutinative language, is the presence of countless orthographic variants. The four Japanese scripts interact in a complex way, resulting in okurigana variants (取り扱い, 取扱い, 取扱 etc. for /toriatsukai/), crossscript variants (猫, ねこ, ネコ for /neko/), kanji variants (大幅 and 大巾 for /oohaba/), kana variants (ユーザー and ユーザ for /yuuza(a)/), and more. Another issue is the numerous kun and nanori readings (some kanji have dozens) and the various romanization systems in current use, such as the Hepburn, Kunrei and hybrid systems.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121578512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing a Japanese Basic Named Entity Corpus of Various Genres 日语各种体裁基本命名实体语料库的构建
NEWS@ACM Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2706
Tomoya Iwakura, Kanako Komiya, R. Tachibana
{"title":"Constructing a Japanese Basic Named Entity Corpus of Various Genres","authors":"Tomoya Iwakura, Kanako Komiya, R. Tachibana","doi":"10.18653/v1/W16-2706","DOIUrl":"https://doi.org/10.18653/v1/W16-2706","url":null,"abstract":"This paper introduces a Japanese Named Entity (NE) corpus of various genres. We annotated 136 documents in the Balanced Corpus of Contemporary Written Japanese (BCCWJ) with the eight types of NE tags defined by Information Retrieval and Extraction Exercise. The NE corpus consists of six types of genres of documents such as blogs, magazines, white papers, and so on, and the corpus contains 2,464 NE tags in total. The corpus can be reproduced with BCCWJ corpus and the tagging information obtained from https://sites.google.com/ site/projectnextnlpne/en/ .","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124865601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration 利用实体链接和相关语言投射来提高姓名音译
NEWS@ACM Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2701
Ying Lin, Xiaoman Pan, Aliya Deri, Heng Ji, Kevin Knight
{"title":"Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration","authors":"Ying Lin, Xiaoman Pan, Aliya Deri, Heng Ji, Kevin Knight","doi":"10.18653/v1/W16-2701","DOIUrl":"https://doi.org/10.18653/v1/W16-2701","url":null,"abstract":"Traditional name transliteration methods largely ignore source context information and inter-dependency among entities for entity disambiguation. We propose a novel approach to leverage state-of-the-art Entity Linking (EL) techniques to automatically correct name transliteration results, using collective inference from source contexts and additional evidence from knowledge base. Experiments on transliterating names from seven languages to English demonstrate that our approach achieves 2.6% to 15.7% absolute gain over the baseline model, and significantly advances state-of-the-art. When contextual information exists, our approach can achieve further gains (24.2%) by collectively transliterating and disambiguating multiple related entities. We also prove that combining Entity Linking and projecting resources from related languages obtained comparable performance as themethod using the same amount of training pairs in the original languageswithout Entity Linking.1","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133488124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信