{"title":"Designing effective web mining-based techniques for OOV translation","authors":"Haitao Yu, F. Ren, Degen Huang, Lishuang Li","doi":"10.1109/NLPKE.2010.5587807","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587807","url":null,"abstract":"Due to a limited coverage of the existing bilingual dictionary, it is often difficult to translate the Out-Of-Vocabulary terms (OOV) in many natural language processing tasks. In this paper, we propose a general cascade mining technique of three steps, it leverages OOV category to optimize the effectiveness of each step. OOV category based expansion policy is suggested to get more relevant mixed-language documents. OOV category based hybrid extraction approach is suggested to perform a robust extraction. A more flexible model combination based on OOV category is also suggested. Moreover, we conducted experiments to evaluate the effectiveness of each step and the overall performance of the mining technique. The experimental results show significantly performance improvement than the existing methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115674388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Document summarization based on improved features and clustering","authors":"Ying Xiong, Hongyan Liu, Lei Li","doi":"10.1109/NLPKE.2010.5587834","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587834","url":null,"abstract":"Multi-Document summarization is an emerging technique for understanding the main purpose of many documents about the same topic. This paper proposes a new feature selection method to improve the summarization result. When calculating similarity, we use a modified TFIDF formula which achieves a better result. We adopt two ways for exactly extracting keywords. Experimental results demonstrate that our improved method performs better than the traditional one.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128687187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bottom up: Exploring word emotions for Chinese sentence chief sentiment classification","authors":"Xin Kang, F. Ren, Yunong Wu","doi":"10.1109/NLPKE.2010.5587793","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587793","url":null,"abstract":"In this paper we demonstrate the effectiveness of employing basic sentiment components for analyzing the chief sentiment of Chinese sentence among nine categories of sentiments (including “No emotion”). Compared to traditional lexicon based methods, our research explores emotion intensities of words and phrases in an eight dimensional sentiment space as features. An emotion matrix kernel is designed to evaluate inner product of these sentiment features for SVM classification with O(n) time complexity. Experimental result shows our method significantly improves performance of sentiment classification.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"18 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130283550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Chinese-English patent machine translation using sentence segmentation","authors":"Yaohong Jin, Zhiying Liu","doi":"10.1109/NLPKE.2010.5587855","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587855","url":null,"abstract":"This paper presents a method using sentence segmentation to improve the performance of Chinese-English patent machine translation. In this method, long Chinese sentence was segmented into separated short sentences using some features from the Hierarchical Network of Concepts theory (HNC theory). Some semantic features are introduced, including main verb of CSC (Eg), main verb of CSP (Egp), long NPs and conjunctions. The main purpose of segmentation algorithm is to detect if one CSC can or cannot be a separate sentence. The segmentation method was integrated with a rule-base MT system. The sequence of these short translations was adjusted and the different ways of expressions in both Chinese and English languages also were in consideration. From the result of the experiments, we can see that the performance of the Chinese-English patent translation was improved effectively. Our method had been integrated into an online patent MT system running in SIPO.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"124 20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130009419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Texture image retrieval based on gray-primitive co-occurrence matrix","authors":"Wei Wang, Motoyuki Suzuki, F. Ren","doi":"10.1109/NLPKE.2010.5587830","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587830","url":null,"abstract":"The research of texture similarity is very important component of content-based image retrieval system. Firstly the rotation invariance of gray-primitive co-occurrence matrix was proved in this paper, then a new texture image retrieval technique based on gray-primitive co-occurrence matrix was presented. The result of experiment indicates that the algorithm proposed has low computational complexity and certain noise resisting ability.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130347415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic filtration of multiword units","authors":"Y. Liu, Zheng Tie","doi":"10.1109/NLPKE.2010.5587783","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587783","url":null,"abstract":"This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131807931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuko Nagai, T. Tanioka, Shoko Fuji, Yuko Yasuhara, Sakiko Sakamaki, Narimi Taoka, R. Locsin, Fuji Ren, Kazuyuki Matsumoto
{"title":"Needs and challenges of care robots in nursing care setting: A literature review","authors":"Yuko Nagai, T. Tanioka, Shoko Fuji, Yuko Yasuhara, Sakiko Sakamaki, Narimi Taoka, R. Locsin, Fuji Ren, Kazuyuki Matsumoto","doi":"10.1109/NLPKE.2010.5587815","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587815","url":null,"abstract":"This study aims to identify needs and challenges of care robot in nursing care setting through an extensive search of the literature. As the result shows, there exists a shortage of information about results of the introduction of care robots, the needs of recipients and care providers, and relevant ethical problems. To advance our research and to introduce care robots into setting, there are so many things to do; consider the application of natural language processing technology by collaborating with researchers in the robotics field, carry out an investigation, extract the needs, clarify ethical problems and seek solutions, conduct the on-site experiment study, and so on.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132952033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new cascade algorithm based on CRFs for recognizing Chinese verb-object collocation","authors":"Guiping Zhang, Zhichao Liu, Qiaoli Zhou, Dongfeng Cai, Jiao Cheng","doi":"10.1109/NLPKE.2010.5587828","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587828","url":null,"abstract":"This paper proposes a new cascade algorithm based on conditional random fields. The algorithm is applied to automatic recognition of Chinese verb-object collocation, and combined with a new sequence labeling of “ONIY”. Experiments compare identified results under two segmentations and part-of-speech tag sets. The comprehensive experimental results show that the best performance is 90.65 % in F-score over Tsinghua Treebank, and 82.00 % in F-score over the segmentation and part-of-speech tagging scheme of Peking University. Our experiments show that the proposed algorithm can greatly improve recognition accuracy of multi-nested collocation, and play a positive role on long distance collocation.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114551334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Negation disambiguation using the maximum entropy model","authors":"Chunliang Zhang, Xiaoxu Fei, Jingbo Zhu","doi":"10.1109/NLPKE.2010.5587857","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587857","url":null,"abstract":"Handling negation issue is of great significance for sentiment analysis. Most previous studies adopted a simple heuristic rule for sentiment negation disambiguation within a fixed context window. In this paper we present a supervised method to disambiguate which sentiment word is attached to the negator such as “(not)” in an opinionated sentence. Experimental results show that our method can achieve better performance than traditional methods.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117237956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed training for Conditional Random Fields","authors":"Xiaojun Lin, Liang Zhao, Dianhai Yu, Xihong Wu","doi":"10.1109/NLPKE.2010.5587803","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587803","url":null,"abstract":"This paper proposes a novel distributed training method of Conditional Random Fields (CRFs) by utilizing the clusters built from commodity computers. The method employs Message Passing Interface (MPI) to deal with large-scale data in two steps. Firstly, the entire training data is divided into several small pieces, each of which can be handled by one node. Secondly, instead of adopting a root node to collect all features, a new criterion is used to split the whole feature set into non-overlapping subsets and ensure that each node maintains the global information of one feature subset. Experiments are carried out on the task of Chinese word segmentation (WS) with large scale data, and we observed significant reduction on both training time and space, while preserving the performance.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123421571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}