Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)最新文献_第4页

Event-event relation identification: A CRF based approach 事件-事件关系识别:基于CRF的方法

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587774

A. Kolya, Asif Ekbal, Sivaji Bandyopadhyay

引用次数: 8

iSentenizer: An incremental sentence boundary classifier iSentenizer:一个增量式句子边界分类器

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587856

F. Wong, S. Chao

引用次数: 9

The impact of parsing accuracy on syntax-based SMT 解析精度对基于语法的SMT的影响

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587845

Haotong Zhang, Huizhen Wang, Tong Xiao, Jingbo Zhu

引用次数: 3

Detecting duplicates with shallow and parser-based methods 使用浅方法和基于解析器的方法检测重复项

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587838

Sven Hartrumpf, Tim vor der Brück, Christian Eichhorn

{"title":"Detecting duplicates with shallow and parser-based methods","authors":"Sven Hartrumpf, Tim vor der Brück, Christian Eichhorn","doi":"10.1109/NLPKE.2010.5587838","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587838","url":null,"abstract":"Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are derived by a syntactico-semantic parser. Semantically identical or similar semantic networks for each sentence of a given base text are efficiently retrieved by using a specialized semantic network index. In order to detect many kinds of paraphrases the current base semantic network is varied by applying inferences: lexico-semantic relations, relation axioms, and meaning postulates. Some important phenomena occurring in difficult-to-detect duplicates are discussed. The deep approach profits from background knowledge, whose acquisition from corpora like Wikipedia is explained briefly. This deep duplicate recognizer is combined with two shallow duplicate recognizers in order to guarantee high recall for texts which are not fully parsable. The evaluation shows that the combined approach preserves recall and increases precision considerably, in comparison to traditional shallow methods. For the evaluation, a standard corpus of German plagiarisms was extended by four diverse components with an emphasis on duplicates (and not just plagiarisms), e.g., news feed articles from different web sources and two translations of the same short story.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125553538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A method of mining bilingual resources from Web Based on Maximum Frequent Sequential Pattern 基于最大频繁序列模式的Web双语资源挖掘方法

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587831

Guiping Zhang, Yang Luo, D. Ji

引用次数: 0

A novel Chinese-English on translation method using mix-language web pages 一种基于混合语言网页的汉英互译方法

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587832

Feiliang Ren, Jingbo Zhu, Huizhen Wang

引用次数: 0

Optimizations for item-based Collaborative Filtering algorithm 基于项的协同过滤算法的优化

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587833

Shuang Xia, Yang Zhao, Yong Zhang, Chunxiao Xing, Scott Roepnack, Shihong Huang

引用次数: 8

A morphology-based Chinese word segmentation method 一种基于形态学的汉语分词方法

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587786

Xiaojun Lin, Liang Zhao, Meng Zhang, Xihong Wu

引用次数: 1

Feature selection for Chinese Text Categorization based on improved particle swarm optimization 基于改进粒子群优化的中文文本分类特征选择

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587844

Yaohong Jin, Wen Xiong, Cong Wang

引用次数: 20

Boosting performance of gene mention tagging system by classifiers ensemble 利用分类器集成提高基因提及标记系统的性能

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) Pub Date : 2010-09-30 DOI: 10.1109/NLPKE.2010.5587822

Lishuang Li, Jing Sun, Degen Huang

引用次数: 5