Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)最新文献

筛选
英文 中文
Event-event relation identification: A CRF based approach 事件-事件关系识别:基于CRF的方法
A. Kolya, Asif Ekbal, Sivaji Bandyopadhyay
{"title":"Event-event relation identification: A CRF based approach","authors":"A. Kolya, Asif Ekbal, Sivaji Bandyopadhyay","doi":"10.1109/NLPKE.2010.5587774","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587774","url":null,"abstract":"Temporal information extraction is a popular and interesting research field in the area of Natural Language Processing (NLP). The main tasks involve the identification of event-time, event-document creation time and event-event relations in a text. In this paper, we take up Task C that involves identification of relations between the events in adjacent sentences under the TimeML framework. We use a supervised machine learning technique, namely Conditional Random Field (CRF). Initially, a baseline system is developed by considering the most frequent temporal relation in the task's training data. For CRF, we consider only those features that are already available in the TempEval-2007 training set. Evaluation results on the Task C test set yield precision, recall and F-score values of 55.1%, 55.1% and 55.1%, respectively under the strict evaluation scheme and 56.9%, 56.9 and 56.9%, respectively under the relaxed evaluation scheme. Results also show that the proposed system performs better than the baseline system.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
iSentenizer: An incremental sentence boundary classifier iSentenizer:一个增量式句子边界分类器
F. Wong, S. Chao
{"title":"iSentenizer: An incremental sentence boundary classifier","authors":"F. Wong, S. Chao","doi":"10.1109/NLPKE.2010.5587856","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587856","url":null,"abstract":"In this paper, we revisited the topic of sentence boundary detection, and proposed an incremental approach to tackle the problem. The boundary classifier is revised on the fly to adapt to the text of high variety of sources and genres. We applied i+Learning, an incremental algorithm, for constructing the sentence boundary detection model using different features based on local context. Although the model can be easily trained on any genre of text and on any alphabet language, we emphasize the ability that the classifier is adaptable to text with domain and topic shifts without retraining the whole model from scratch. Empirical results indicate that the performance of proposed system is comparable to that of similar systems.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124881840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The impact of parsing accuracy on syntax-based SMT 解析精度对基于语法的SMT的影响
Haotong Zhang, Huizhen Wang, Tong Xiao, Jingbo Zhu
{"title":"The impact of parsing accuracy on syntax-based SMT","authors":"Haotong Zhang, Huizhen Wang, Tong Xiao, Jingbo Zhu","doi":"10.1109/NLPKE.2010.5587845","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587845","url":null,"abstract":"In statistical machine translation (SMT), syntax-based models generally rely on the syntactic information provided by syntactic parsers in source language, target language or both of them. However, whether or how parsers impact the performance of syntax-based systems is still an open issue in the MT field. In this paper, we make an attempt to explore answers to this issue, and empirically investigate the impact of parsing accuracy on MT performance in a state-of-the-art syntax-based system. Our study shows that syntax-based system is not very sensitive to the parsing accuracy of parsers used in building MT systems.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123543037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Detecting duplicates with shallow and parser-based methods 使用浅方法和基于解析器的方法检测重复项
Sven Hartrumpf, Tim vor der Brück, Christian Eichhorn
{"title":"Detecting duplicates with shallow and parser-based methods","authors":"Sven Hartrumpf, Tim vor der Brück, Christian Eichhorn","doi":"10.1109/NLPKE.2010.5587838","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587838","url":null,"abstract":"Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntactico-semantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized semantic network index. In order to detect many kinds of paraphrases the current base semantic network is varied by applying inferences: lexico-semantic relations, relation axioms, and meaning postulates. Some important phenomena occurring in difficult-to-detect duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora like Wikipedia is explained briefly. This deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably, in comparison to traditional shallow methods. For the evaluation, a standard corpus of German plagiarisms was extended by four diverse components with an emphasis on duplicates (and not just plagiarisms), e.g., news feed articles from different web sources and two translations of the same short story.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125553538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A method of mining bilingual resources from Web Based on Maximum Frequent Sequential Pattern 基于最大频繁序列模式的Web双语资源挖掘方法
Guiping Zhang, Yang Luo, D. Ji
{"title":"A method of mining bilingual resources from Web Based on Maximum Frequent Sequential Pattern","authors":"Guiping Zhang, Yang Luo, D. Ji","doi":"10.1109/NLPKE.2010.5587831","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587831","url":null,"abstract":"The bilingual resources are indispensable and vital resources in the NPL fields, such as machine translation, etc. A large amount of electronic information is embedded in the Internet, which can be used as a potential information source of large-scale multi-language corpus, so it is a potential and feasible way to mine a great capacity of true bilingual resources from the Web. This paper proposes a method of mining bilingual resources from the Web based on Maximum Frequent Sequential Pattern. The method uses the heuristic approach to search and filter the candidate bilingual web pages, then mines patterns using maximum frequent sequential, and uses a machine learning method for extending the pattern base and verifying bilingual resources in accordance with the Japanese to Chinese word proportion. The experimental results indicate that the method could extract bilingual resources efficiently, with the precision rate over 90%.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131784929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel Chinese-English on translation method using mix-language web pages 一种基于混合语言网页的汉英互译方法
Feiliang Ren, Jingbo Zhu, Huizhen Wang
{"title":"A novel Chinese-English on translation method using mix-language web pages","authors":"Feiliang Ren, Jingbo Zhu, Huizhen Wang","doi":"10.1109/NLPKE.2010.5587832","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587832","url":null,"abstract":"In this paper, we propose a novel Chinese-English organization name translation method with the assistance of mix-language web resources. Firstly, all the implicit out-of-vocabulary terms in the input Chinese organization name are recognized by a CRFs model. Then the input Chinese organization name is translated without considering these recognized out-of-vocabulary terms. Secondly, we construct some efficient queries to find the mix-language web pages that contain both the original input organization name and its correct translation. At last, a similarity matching and limited expansion based translation identification approach is proposed to identify the correct translation from the returned web pages. Experimental results show that our method is effective for Chinese organization name translation and can improve performance of Chinese organization name translation significantly.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132852500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizations for item-based Collaborative Filtering algorithm 基于项的协同过滤算法的优化
Shuang Xia, Yang Zhao, Yong Zhang, Chunxiao Xing, Scott Roepnack, Shihong Huang
{"title":"Optimizations for item-based Collaborative Filtering algorithm","authors":"Shuang Xia, Yang Zhao, Yong Zhang, Chunxiao Xing, Scott Roepnack, Shihong Huang","doi":"10.1109/NLPKE.2010.5587833","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587833","url":null,"abstract":"Collaborative Filtering (CF) is widely used in the Internet for recommender systems to find items that fit users' interest by exploring users' opinion expressed on other items. However there are two challenges for CF algorithm, which are recommendation accuracy and data sparsity. In this paper, we try to address the accuracy problem with an approach of deviation adjustment in item-based CF. Its main idea is to add a constant value to every prediction on each user or each item to modify the uniform error between prediction and actual rating of one user or one item. Our deviation adjustment approach can be also used in other kinds of CF algorithms. For data sparsity, we improve similarity computation by filling some blank rating with a user's average rating to help decrease the sparsity of data. We run experiments with our optimization of similarity computation and deviation adjustment by using MovieLens data set. The result shows these methods can generate better predication compared with the baseline CF algorithm.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133656657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A morphology-based Chinese word segmentation method 一种基于形态学的汉语分词方法
Xiaojun Lin, Liang Zhao, Meng Zhang, Xihong Wu
{"title":"A morphology-based Chinese word segmentation method","authors":"Xiaojun Lin, Liang Zhao, Meng Zhang, Xihong Wu","doi":"10.1109/NLPKE.2010.5587786","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587786","url":null,"abstract":"This paper proposes a novel method of Chinese word segmentation utilizing morphology information. The method introduces morphology into statistical model to capture structural relationship within word. It improves the conventional Conditional Random Fields (CRFs) models on the ability of representing the structure information. Firstly, a word-segmented Chinese corpus is annotated with morphology tags by a semi-automatic method. The resulting structure-related tags are integrated into the CRFs model. Secondly, a joint CRFs model is trained, which generates both morphology tags and word boundaries. Experiments are carried out on several SIGHAN Bakeoff corpus and show that the morphology information can improve the performance of Chinese word segmentation significantly, especially for the segmentation of out-of-vocabulary words.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132122277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Feature selection for Chinese Text Categorization based on improved particle swarm optimization 基于改进粒子群优化的中文文本分类特征选择
Yaohong Jin, Wen Xiong, Cong Wang
{"title":"Feature selection for Chinese Text Categorization based on improved particle swarm optimization","authors":"Yaohong Jin, Wen Xiong, Cong Wang","doi":"10.1109/NLPKE.2010.5587844","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587844","url":null,"abstract":"Feature selection is an important preprocessing step of Chinese Text Categorization, which reduces the high dimension and keeps the reduced results comprehensible compared to feature extraction. A novel criterion to filter features coarsely is proposed, which integrating the superiorities of term frequency-inverse document frequency as inner-class measure and CHI-square as inter-class, and a new feature selection method for Chinese text categorization based on swarm intelligence is presented, which using improved particle swarm optimization to select features fine on the results of coarse grain filtering, and utilizing support vector machine to evaluate feature subsets and taking the evaluations as the fitness of particles. The experiments on Fudan University Chinese Text Classification Corpus show a higher classification accuracy obtained by using the new criterion for features filtering and an effective feature reduction ratio attained by utilizing the novel FS method for Chinese text categorization.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128081712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Boosting performance of gene mention tagging system by classifiers ensemble 利用分类器集成提高基因提及标记系统的性能
Lishuang Li, Jing Sun, Degen Huang
{"title":"Boosting performance of gene mention tagging system by classifiers ensemble","authors":"Lishuang Li, Jing Sun, Degen Huang","doi":"10.1109/NLPKE.2010.5587822","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587822","url":null,"abstract":"To further improve the tagging performance of single classifiers, a classifiers ensemble experimental framework is presented for gene mention tagging. In the framework, six classifiers are constructed by four toolkits (CRF++, YamCha, Maximum Entropy (ME) and MALLET) with different training methods and feature sets and then combined with a two-layer stacking algorithm. The recognition results of different classifiers are regarded as input feature vectors to be incorporated, and then a high-powered model is obtained. Experiments carried out on the corpus of BioCreative II GM task show that the classifiers ensemble method is effective and our best combination method achieves an F-score of 88.09%, which outperforms most of the top-ranked Bio-NER systems in the BioCreAtIvE II GM challenge.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121128249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信