{"title":"Improving phrase-based SMT model with Flattened Bilingual Parse Tree","authors":"Dakun Zhang, Le Sun, Wenbo Li","doi":"10.1109/NLPKE.2010.5587836","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587836","url":null,"abstract":"Phrase orders influence much on translation quality. However, general phrase based methods take only the source side information for phrase orderings. We instead propose a bilingual parse structure, Flattened Bilingual Parse Tree (FBPT), for better describing the inner structure of bilingual sentences and then for better translations. The main idea is to extract phrase pairs with orientation features under the help of FBPT structure. Such features can help maintain better sentence generations during translation. Furthermore, the FBPT structure can be learned automatically from parallel corpus with lower costs without the need of complex linguistic parsing. Evaluations on MT08 translation task indicate that 7% relative improvement on BLEU can be achieved compared to distortion based method (like Pharaoh).","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127236065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Patterns of syntactic trees for parsing arabic texts","authors":"Fériel Ben Fraj Trabelsi, C. Zribi, M. Ahmed","doi":"10.1109/NLPKE.2010.5587791","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587791","url":null,"abstract":"In order to parse Arabic texts, we have chosen to use a machine learning approach. It learns from an Arabic Treebank. The knowledge enclosed in this Treebank is structured as patterns of syntactic trees. These patterns are representative models of syntactic components of the Arabic language. They are not only layered but also both structurally and contextually rich. They serve as an informational source for guiding the parsing process. Our parser is progressive given that it proceeds by treating a sentence into a number of stages, equal to the number of its words. At each step, the parser affects the target word with the most likely patterns to represent it in the context where it is put. Then, it joins the selected patterns with those collected in the previous steps so as to construct the representative syntactic tree(s) of the whole sentence. Preliminary tests have yielded to obtain accuracy and f-score which are respectively equal to 84.78% and 77.52%.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125334773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel dependency based word-level reordering model for phrased-based translation","authors":"Shui Liu, Sheng Li, T. Zhao, Shiqi Li","doi":"10.1109/NLPKE.2010.5587829","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587829","url":null,"abstract":"Phrase based statistic MT (SMT) is an important milestone in MT. However, the translation model in the phrase based SMT is structure free which limits its reordering capacity to some extent. In order to enhance the reordering capacity of phrase based SMT, in this paper we propose a head-modifier relation based reordering model, which exploits the way how to utilize the structured linguistic analysis information in source language. Within very small size of reordering model, we enhance the performance of the phrase based SMT significantly.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117142532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pawan Kumar, Arun Kumar Rathaur, R. Ahmad, M. K. Sinha, R. Sangal
{"title":"Dashboard: An integration and testing platform based on backboard architecture for NLP applications","authors":"Pawan Kumar, Arun Kumar Rathaur, R. Ahmad, M. K. Sinha, R. Sangal","doi":"10.1109/NLPKE.2010.5587779","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587779","url":null,"abstract":"The paper presents a software integration, testing and visualization tool, called Dashboard, which is based on pipe-lined backboard architecture for family of natural language processing (NLP) application. The Dashboard helps in testing of a module in isolation, facilitating the training and tuning of a module, integration and testing of a set of heterogeneous modules, and building and testing of complete integrated system as well. It is also equipped with a user-friendly visualization tool to build, test, and integrate a system (or a subsystem) and view its component-wise performance, and step-wise processing as well. The Dashboard is being successfully used by a consortium of eleven academic institutions to develop a suite of bi-directional machine translation (MT) system for nine pairs of Indic languages, and six MT systems have already been deployed on web. The MT systems are being developed by reusing / re-engineering previously developed NLP modules, by different institutions, in different programming languages, using Dashboard as the testing and integration tool. The paper also discusses the experiences of developing MT products in consortium mode, using Dashboard as its integrating and testing platform, and its proposed enhancements.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114139261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transitivity in semantic relation learning","authors":"F. Fallucchi, Fabio Massimo Zanzotto","doi":"10.1109/NLPKE.2010.5587773","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587773","url":null,"abstract":"Text understanding models exploit semantic networks of words as basic components. Automatically enriching and expanding these resources is then an important challenge for NLP. Existing models for enriching semantic resources based on lexical-syntactic patterns make little use of structural properties of target semantic relations. In this paper, we propose a novel approach to include transitivity in probabilistic models for expanding semantic resources. We directly include transitivity in the formulation of probabilistic models. Experiments demonstrate that these models are an effective way for exploiting structural properties of relations in learning semantic networks.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114453951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved method of keywords extraction based on short technology text","authors":"Jun Wang, Lei Li, F. Ren","doi":"10.1109/NLPKE.2010.5587797","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587797","url":null,"abstract":"Keywords are the critical resources of information management and retrieval, automatic text classification and clustering. The keywords extraction plays an important role in the process of constructing structured text. Current algorithms of keywords extraction have matured in some ways. However the errors of word segmentation which caused by unknown words have been affected the performance of Chinese keywords extraction, particularly in the field of technological text. In order to solve the problem, this paper proposes an improved method of keywords extraction based on the relationship among words. Experiments show that the proposed method can effectively correct the errors caused by segmentation and improve the performance of keywords extraction, and it can also extend to other areas.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"451 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124486207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-based text representation model and its realization","authors":"Faguo Zhou, Fan Zhang, Bingru Yang","doi":"10.1109/NLPKE.2010.5587861","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587861","url":null,"abstract":"In this paper, on the foundation of summarizing several common used text representation models, such as Boolean model, probability model, vector space model and so on, mainly according to the defects of the vector space model, the word semantic space is proposed. And in the word semantic space, a graph-based text representation model is designed. Some properties of this type of text representation model have been given, and this model can describe the words semantic constraints in the text. At the same time, this model can also solve the defects of vector space model, such as the order or words, the boundary between sentences and phrases, etc. And at last the method of computing the text similarity is put forward.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133626865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Al-Yahya, Hend Suliman Al-Khalifa, Alia Bahanshal, Iman Alodah, Nawal Al-Helwah
{"title":"An ontological model for representing computational lexicons a componential based approach","authors":"M. Al-Yahya, Hend Suliman Al-Khalifa, Alia Bahanshal, Iman Alodah, Nawal Al-Helwah","doi":"10.1109/NLPKE.2010.5587768","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587768","url":null,"abstract":"In the last decades the computational linguistics community has developed important and widely used lexical resources. Although they are very popular among the Natural Language Processing (NLP) community, they do not address two important characteristics of language. The first is that the meaning of a word in a language is a collective effort defined by the people who use the language. The second is that language is a dynamic entity (some words change their meaning, others become obsolete, new words are born). A computational model which aims to represent this real world entity should be structured in a way that allows for expansion, facilitates collaboration, and provides transparent meaning representation. This paper addresses these two issues and provides a solution based on Semantic Web technologies. The solution is based on an ontological model for representing computational lexicons using the field theory of semantics and componential analysis. The model has been implemented on the “Time” semantic field vocabulary of the Arabic language and the results of a preliminary evaluation are presented.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131570277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data selection for statistical machine translation","authors":"Peng Liu, Yu Zhou, Chengqing Zong","doi":"10.1109/NLPKE.2010.5587827","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587827","url":null,"abstract":"The bilingual language corpus has a great effect on the performance of a statistical machine translation system. More data will lead to better performance. However, more data also increase the computational load. In this paper, we propose methods to estimate the sentence weight and select more informative sentences from the training corpus and the development corpus based on the sentence weight. The translation system is built and tuned on the compact corpus. The experimental results show that we can obtain a competitive performance with much less data.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"392 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115992325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on sentiment classification of Blog based on PMI-IR","authors":"Xiuting Duan, Tingting He, Le Song","doi":"10.1109/NLPKE.2010.5587849","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587849","url":null,"abstract":"Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116011509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}