NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2712

Binh Minh Nguyen, H. Ngo, Nancy F. Chen

引用次数: 3

Moses-based official baseline for NEWS 2016 摩西为基础的官方基线新闻2016

NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2713

M. Costa-jussà

引用次数: 7

Multi-source named entity typing for social media 用于社交媒体的多源命名实体类型

NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2702

R. Vexler, Einat Minkov

引用次数: 2

Target-Bidirectional Neural Models for Machine Transliteration 机器音译的目标-双向神经模型

NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2711

A. Finch, Lemao Liu, Xiaolin Wang, E. Sumita

引用次数: 34

Spanish NER with Word Representations and Conditional Random Fields 具有词表示和条件随机场的西班牙语NER

NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2705

J. Copara, J. Ochoa, Camilo Thorne, Goran Glavas

引用次数: 18

Applying Neural Networks to English-Chinese Named Entity Transliteration 神经网络在英汉命名实体音译中的应用

NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2710

Yan Shao, Joakim Nivre

引用次数: 16

German NER with a Multilingual Rule Based Information Extraction System: Analysis and Issues 基于多语言规则的德语NER信息抽取系统:分析与问题

NEWS@ACM Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2704

Anna Druzhkina, A. Leontyev, M. Stepanova

引用次数: 0

Linguistic Issues in the Machine Transliteration of Chinese, Japanese and Arabic Names 汉语、日语和阿拉伯语人名机器音译中的语言学问题

NEWS@ACM Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2707

Jack Halpern

{"title":"Linguistic Issues in the Machine Transliteration of Chinese, Japanese and Arabic Names","authors":"Jack Halpern","doi":"10.18653/v1/W16-2707","DOIUrl":"https://doi.org/10.18653/v1/W16-2707","url":null,"abstract":"The romanization of non-Latin scripts is a complex computational task that is highly language dependent. This presentation will focus on three of the most challenging nonLatin scripts: Chinese, Japanese, and Arabic (CJA). Much progress has been made in personal name machine-transliteration methodologies, as documented in the various NEWS reports over the last several years. Such techniques as phrase-based SMT, RNN-based LM and CRF have emerged, leading to gradual improvements in accuracy scores. But methodology is only one aspect of the problem. Equally important is the high level of ambiguity of the CJA scripts, which poses special challenges to named entity extraction and machine transliteration. These difficulties are exacerbated by the lack of comprehensive proper noun dictionaries, the multiplicity of ambiguous transcription schemes, and orthographic variation. This presentation will clear up the differences between three basic concepts -transliteration, transcription, and romanization -that are a source of much confusion, even among computational linguists, and will focus on (1) the major linguistics issues, that is, the special characteristics of the CJA scripts that impact machine transliteration, and (2) the important role played by lexical resources such as personal name dictionaries. A major issue in romanizing Simplified Chinese (SC) is the one-to-many ambiguity of many characters (polyphones), such as /le/ and /yue/ for 乐. To disambiguate accurately, the names must be looked up in word-level (not character-level) name mapping tables. This is complicated by (1) the presence of orthographic variants in traditional Chinese (TC), and (2) the need to for cross-script conversion between (SC) and (TC), Transcription into Chinese is even more ambiguous, since some phonemes can correspond to dozens of characters. A major characteristic of Japanese, a highly agglutinative language, is the presence of countless orthographic variants. The four Japanese scripts interact in a complex way, resulting in okurigana variants (取り扱い, 取扱い, 取扱 etc. for /toriatsukai/), crossscript variants (猫, ねこ, ネコ for /neko/), kanji variants (大幅 and 大巾 for /oohaba/), kana variants (ユーザー and ユーザ for /yuuza(a)/), and more. Another issue is the numerous kun and nanori readings (some kanji have dozens) and the various romanization systems in current use, such as the Hepburn, Kunrei and hybrid systems.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121578512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Constructing a Japanese Basic Named Entity Corpus of Various Genres 日语各种体裁基本命名实体语料库的构建

NEWS@ACM Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2706

Tomoya Iwakura, Kanako Komiya, R. Tachibana

引用次数: 7

Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration 利用实体链接和相关语言投射来提高姓名音译

NEWS@ACM Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2701

Ying Lin, Xiaoman Pan, Aliya Deri, Heng Ji, Kevin Knight

引用次数: 13

NEWS@ACM最新文献