Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)最新文献_第8页

MT on and for the Web MT上和为网络

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587865

C. Boitet, H. Blanchon, Mark Seligman, Valérie Bellynck

{"title":"MT on and for the Web","authors":"C. Boitet, H. Blanchon, Mark Seligman, Valérie Bellynck","doi":"10.1109/NLPKE.2010.5587865","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587865","url":null,"abstract":"A Systran MT server became available on the minitel network in 1984, and on Internet in 1994. Since then we have come to a better understanding of the nature of MT systems by separately analyzing their linguistic, computational, and operational architectures. Also, thanks to the CxAxQ metatheorem, the systems' inherent limits have been clarified, and design choices can now be made in an informed manner according to the translation situations. MT evaluation has also matured: tools based on reference translations are useful for measuring progress; those based on subjective judgments for estimating future usage quality; and task-related objective measures (such as post-editing distances) for measuring operational quality. Moreover, the same technological advances that have led to “Web 2.0” have brought several futuristic predictions to fruition. Free Web MT services have democratized assimilation MT beyond belief. Speech translation research has given rise to usable systems for restricted tasks running on PDAs or on mobile phones connected to servers. New man-machine interface techniques have made interactive disambiguation usable in large-coverage multimodal MT. Increases in computing power have made statistical methods workable, and have led to the possibility of building low-linguistic-quality but still useful MT systems by machine learning from aligned bilingual corpora (SMT, EBMT). In parallel, progress has been made in developing interlingua-based MT systems, using hybrid methods. Unfortunately, many misconceptions about MT have spread among the public, and even among MT researchers, because of ignorance of the past and present of MT R&D. A compensating factor is the willingness of end users to freely contribute to building essential parts of the linguistic knowledge needed to construct MT systems, whether corpus-related or lexical. Finally, some developments we anticipated fifteen years ago have not yet materialized, such as online writing tools equipped with interactive disambiguation, and as a corollary the possibility of transforming source documents into self-explaining documents (SEDs) and of producing corresponding SEDs fully automatically in several target languages. These visions should now be realized, thanks to the evolution of Web programming and multilingual NLP techniques, leading towards a true Semantic Web, “Web 3.0”, which will support ubilingual (ubiquitous multilingual) computing.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120994756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Anusaaraka: An expert system based machine translation system Anusaaraka:基于专家系统的机器翻译系统

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587789

Sriram Chaudhury, A. Rao, D. Sharma

{"title":"Anusaaraka: An expert system based machine translation system","authors":"Sriram Chaudhury, A. Rao, D. Sharma","doi":"10.1109/NLPKE.2010.5587789","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587789","url":null,"abstract":"Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish reliable knowledge from the heuristics, provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes a unique approach to develop machine translation system based on the insights of information dynamics from Paninian Grammar Formalism. Anusaaraka is a Language Accessor cum Machine Translation system based on the fundamental premise of sharing the load producing good enough results according to the needs of the reader. The system promises to give faithful representation of the translated text, no loss of information while translating and graceful degradation (robustness) in case of failure. The layered output provides an access to all the stages of translation making the whole process transparent. Thus, Anusaaraka differs from the Machine Translation systems in two respects: (1) its commitment to faithfulness and thereby providing a layer of 100% faithful output so that a user with some training can “access the source text” faithfully. (2) The system is so designed that a user can contribute to it and participate in improving its quality. Further Anusaaraka provides an eclectic combination of the Apertium architecture with the forward chaining expert system, allowing use of both the deep parser and shallow parser outputs to analyze the SL text. Existing language resources (parsers, taggers, chunkers) available under GPL are used instead of rewriting it again. Language data and linguistic rules are independent from the core programme, making it easy for linguists to modify and experiment with different language phenomena to improve the system. Users can become contributors by contributing new word sense disambiguation (WSD) rules of the ambiguous words through a web-interface available over internet. The system uses forward chaining of expert system to infer new language facts from the existing language data. It helps to solve the complex behavior of language translation by applying specific knowledge rather than specific technique creating a vast language knowledge base in electronic form. Or in other words, the expert system facilitates the transformation of subject matter expert's (SME) knowledge available with humans into a computer processable knowledge base.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129896254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Information retrieval by text summarization for an Indian regional language 一种印度地区语言的文本摘要信息检索

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587764

Jagadish S. Kallimani, K. Srinivasa, B. E. Reddy

引用次数: 23

Chinese base phrases chunking based on latent semi-CRF model 基于潜在半crf模型的汉语基本短语分块

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587802

Xiao Sun, Xiaoli Nan

{"title":"Chinese base phrases chunking based on latent semi-CRF model","authors":"Xiao Sun, Xiaoli Nan","doi":"10.1109/NLPKE.2010.5587802","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587802","url":null,"abstract":"In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133124987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

iTree - Automating the construction of the narration tree of Hadiths (Prophetic Traditions) iTree -自动构建圣训(先知传统)的叙述树

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587810

Aqil M. Azmi, Nawaf Bin Badia

{"title":"iTree - Automating the construction of the narration tree of Hadiths (Prophetic Traditions)","authors":"Aqil M. Azmi, Nawaf Bin Badia","doi":"10.1109/NLPKE.2010.5587810","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587810","url":null,"abstract":"The two fundamental sources of Islamic legislation are Qur'an and the Hadith. The Hadiths, or Prophetic Traditions, are narrations originating from the sayings and conducts of Prophet Muhammad. Each Hadith starts with a list of narrators involved in transmitting it followed by the transmitted text. The Hadith corpus is extremely huge and runs into hundreds of volumes. Due to its legislative importance, Hadiths have been carefully scrutinized by hadith scholars. One way a scholar may grade a Hadith is by its narration chain and the individual narrators in the chain. In this paper we report on a system that automatically generates the transmission chains of a Hadith and graphically display it. Computationally, this is a challenging problem. The text of Hadith is in Arabic, a morphologically rich language; and each Hadith has its own peculiar way of listing narrators. Our solution involves parsing and annotating the Hadith text and identifying the narrators' names. We use shallow parsing along with a domain specific grammar to parse the Hadith content. Experiments on sample Hadiths show our approach to have a very good success rate.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"120 3‐4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132908081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Chinese semantic role labeling based on semantic knowledge 基于语义知识的汉语语义角色标注

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587821

Yanqiu Shao, Zhifang Sui, Ning Mao

引用次数: 2

Improving Chinese-English patent machine translation using sentence segmentation 基于分句的汉英专利机器翻译改进

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587855

Yaohong Jin, Zhiying Liu

引用次数: 15

Texture image retrieval based on gray-primitive co-occurrence matrix 基于灰度-原元共现矩阵的纹理图像检索

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587830

Wei Wang, Motoyuki Suzuki, F. Ren

引用次数: 2

Bottom up: Exploring word emotions for Chinese sentence chief sentiment classification 自下而上:探索汉语句子主情感分类的词语情感

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587793

Xin Kang, F. Ren, Yunong Wu

引用次数: 14

A new method for solving context ambiguities using field association knowledge 一种利用领域关联知识解决上下文歧义的新方法

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587858

Li Wang, E. Atlam, M. Fuketa, K. Morita, J. Aoe

引用次数: 0