{"title":"Are we waves or are we particles? A new insight into deep semantics in natural language processing","authors":"Svetlana Machova, J. Klecková","doi":"10.1109/NLPKE.2010.5587805","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587805","url":null,"abstract":"This paper brings conceptually new, empirically based scientific approach to a deeper understanding of human mind cognition, language acquisition, modularity of language and language origin itself. The research presented provides an interactive multilingual associative experiment as an attempt to map the Cognitive Semantic Space: (CSSES) and its basic frames of the Essential Self in the Czech language, collects and compares it to the CSSES of conceptual language view in Czech, Russian, English and potentially in other languages. We attempt to merge cognitive metaphor theory with psycholinguistics and psychoanalysis applying associative experiment methodology on the Essential Self metaphors. The research has two main goals: the first is to build an Essential Self multilingual WordNet, which serves as the basic lexical resource for Artificial Intelligence describes the core of the human nature. The second is to create a multilingual 3D semantic network.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129526111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shui nationality characters stroke shape input method","authors":"Hanyue Yang, Xiaorong Chen","doi":"10.1109/NLPKE.2010.5587840","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587840","url":null,"abstract":"Shape of Shui nationality characters is similar to that of Oracle and Jinwen. In order to work out the problems of how to code hieroglyph, a coding method based on stroke shape for Shui Nationality characters is proposed. The shapes of 467 Shui Nationality characters in the Common Shui Script Dictionary are analyzed, and seven basic strokes are extracted to consist of main Shui characters. Through the statistical comparison, 21 kinds of stroke shape can be got by subdividing the seven basic strokes. A Shui Nationality character is coded by an ordered sequence composed by three strokes taken from the corner of the character according to the coding rules. Finally, the users who can not read the Shui character can input it easily and quickly.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130918935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chinese patent retrieval based on the pragmatic information","authors":"Liping Wu, Song Liu, F. Ren","doi":"10.1109/NLPKE.2010.5587776","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587776","url":null,"abstract":"In this paper, we propose a novel information retrieval approach based on the pragmatic information for Chinese patents. At present, patent retrieval is becoming more and more important. Not only because patents are always can an important resource in all kinds of field, but patent retrieval save a great deal of time and funds for corporations and researchers. However, with available methods the precision of retrieval results for patents is not very high. What's more, through analyzed the patent documentations we found that except the literal meanings, there are deeper meanings which can be concluded from the patents. Here we call the deeper meanings as pragmatic information. Therefore we established a patent retrieval system to integrate the pragmatic information with classical information retrieval technique to improve the retrieval accuracy. Some experiments using the proposed method have carried out, and the results show that the precision of patent retrieval based on the pragmatic information is higher than the one without using it.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125546479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao-Hsing Chang, Fu-Yuan Hsu, Chia-Hoang Lee, Hahn-Ming Lee
{"title":"Part-of-speech tagging for Chinese unknown words in a domain-specific small corpus using morphological and contextual rules","authors":"Tao-Hsing Chang, Fu-Yuan Hsu, Chia-Hoang Lee, Hahn-Ming Lee","doi":"10.1109/NLPKE.2010.5587771","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587771","url":null,"abstract":"Many studies have tried to search useful information on the Internet by meaningful terms or words. The performance of these approaches is often affected by the accuracy of unknown word extraction and POS tagging, while the accuracy is affected by the size of training corpora and the characteristics of language. This work proposes and develops a method that concentrates on tagging the POS of Chinese unknown words for the domain of our interest, based on the integration of morphological, contextual rules and a statistics-based method. Experimental results indicate that the proposed method can overcome the difficulties resulting from small corpora in oriental languages, and can accurately tags unknown words with POS in domain-specific small corpora.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125283724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiaoli Zhou, Yue Gu, Xin Liu, Wenjing Lang, Dongfeng Cai
{"title":"Statistical parsing based on Maximal Noun Phrase pre-processing","authors":"Qiaoli Zhou, Yue Gu, Xin Liu, Wenjing Lang, Dongfeng Cai","doi":"10.1109/NLPKE.2010.5587850","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587850","url":null,"abstract":"According to the characteristics of Chinese language, this paper proposes a statistical parsing method based on Maximal Noun Phrase(MNP) per-processing. MNP parsing is preferable to be separated from parsing of the full sentence. Firstly, MNP in a sentence are identified; next, MNP can be represented by the head of MNP, and then the sentence is parsed with the head of the MNP. Therefore, the original sentence is divided into two parts, which can be parsed separately. The first part is MNP parsing; the second part is parsing of the sentence in which the MNP are replaced by their head words. Finally, the paper takes Conditional Random Fields (CRFs) as the statistical recognition model of each level in syntactic parsing process.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127018013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A reranking method for syntactic parsing with heterogeneous treebanks","authors":"Haibo Ding, Muhua Zhu, Jingbo Zhu","doi":"10.1109/NLPKE.2010.5587842","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587842","url":null,"abstract":"In the field of natural language processing (NLP), there often exist multiple corpora with different annotation standards for the same task. In this paper, we take syntactic parsing as a case study and propose a reranking method which is able to make direct use of disparate treebanks simultaneously without using techniques such as treebank conversion. The method proceeds in three steps: 1) build parsers on individual treebanks; 2) use parsers independently to generate n-best lists for each sentence in test set; 3) rerank individual n-best lists which correspond to the same sentence by using consensus information exchanged among these n-best lists. Experimental results on two open Chinese treebanks show that our method significantly outperforms the baseline system by 0.84% and 0.53% respectively.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123574465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible English writing support based on negative-positive conversion method","authors":"Yasushi Katsura, Kazuyuki Matsumoto, F. Ren","doi":"10.1109/NLPKE.2010.5587778","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587778","url":null,"abstract":"With development of the recent globalization, the chance to exchange in English increased in the business field. In particular, it's necessary to write a thesis and a charter handwriting in English. Because many Japanese are not used to making English sentence, it is a great burden to write appropriate sentence in English without any support for creating English sentence. In this study we have developed an English composition support system. By this system, it's to search for the interlinear translation example to refer to by database and generate a new sentence by replacing a noun in the example sentence. In this paper, based on the technique of Super-Function, we propose a method to convert an affirmative sentence into negative sentence and vice versa to realize more flexible and extensive text conversion.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121698126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-based term identification and extraction for ontology construction","authors":"Hui-Ngo Goh, Ching Kiu","doi":"10.1109/NLPKE.2010.5587801","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587801","url":null,"abstract":"Ontology construction often requires a domain specific corpus in conceptualizing the domain knowledge; specifically, it is an association of terms, relation between terms and related instances. It is a vital task to identify a list of significant term for constructing a practical ontology. In this paper, we present the use of a context-based term identification and extraction methodology for ontology construction from text document. The methodology is using a taxonomy and Wikipedia to support automatic term identification and extraction from structured documents with an assumption of candidate terms for a topic are often associated with its topic-specific keywords. A hierarchical relationship of super-topics and sub-topics is defined by a taxonomy, meanwhile, Wikipedia is used to provide context and background knowledge for topics that defined in the taxonomy to guide the term identification and extraction. The experimental results have shown the context-based term identification and extraction methodology is viable in defining topic concepts and its sub-concepts for constructing ontology. The experimental results have also proven its viability to be applied in a small corpus / text size environment in supporting ontology construction.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126280720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haijun Zhang, Heyan Huang, Chao-Yong Zhu, Shumin Shi
{"title":"A pragmatic model for new Chinese word extraction","authors":"Haijun Zhang, Heyan Huang, Chao-Yong Zhu, Shumin Shi","doi":"10.1109/NLPKE.2010.5587846","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587846","url":null,"abstract":"This paper proposed a pragmatic model for repeat-based Chinese New Word Extraction (NWE). It contains two innovations. The first is a formal description for the process of NWE, which gives instructions on feature selection in theory. On the basis of this, the Conditional Random Fields model (CRF) is selected as statistical framework to solve the formal description. The second is an improved algorithm for left (right) entropy to improve the efficiency of NWE. By comparing with baseline algorithm, the improved algorithm can enhance the computational speed of entropy remarkably. On the whole, experiments show that the model this paper proposed is very effective, and the F score is 49.72% in open test and 69.83% in word extraction respectively, which is an evident improvement over previous similar works.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122616272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bagging to find better expansion words","authors":"Bingqing Wang, Yaqian Zhou, Xipeng Qiu, Qi Zhang, Xuanjing Huang","doi":"10.1109/NLPKE.2010.5587826","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587826","url":null,"abstract":"The supervised learning has been applied into the query expansion techniques, which trains a model to predict the “goodness” or “utility” of the expanded term to the retrieval system. There are many features to measure the relatedness between the expanded word and the query, which can be incorporated in the supervised learning to select the expanded terms. The training data set is generated automatically by a tricky method. However, this method can be affected by many aspects. A severe problem is that the distribution of the features is query-dependent, which has not been discussed in previous work. With a different distribution on the features, it is questionable to merge these training instances together and use the whole data set to train one single model. In this paper, we first investigate the statistical distribution of the auto-generated training data and show the problems in the training data set. Based on our analysis, we proposed to use the bagging method to ensemble several regression models in order to get a better supervised model to make prediction on the expanded terms. We conducted the experiments on the TREC benchmark test collections. Our analysis on the training data reveals some interesting phenomena about the query expansion techniques. The experiment results also show that the bagging approach can achieve the state-of-art retrieval performance on the standard TREC data set.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125257444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}