{"title":"一个基于语料库的评估领域特定文本的词法成分到知识映射原型","authors":"R. Shams, A. Elsayed","doi":"10.1109/ICCITECHN.2008.4803005","DOIUrl":null,"url":null,"abstract":"The aim of this paper is to evaluate the lexical components of a text to knowledge mapping (TKM) prototype. The prototype is domain-specific, the purpose of which is to map instructional text onto a knowledge domain. The context of the knowledge domain of the prototype is physics, specifically DC electrical circuits. During development, the prototype has been tested with a limited data set from the domain. The prototype now reached a stage where it needs to be evaluated with a representative linguistic data set called corpus. A corpus is a collection of text drawn from typical sources which can be used as a test data set to evaluate NLP systems. As there is no available corpus for the domain, we developed a representative corpus and annotated it with linguistic information. The evaluation of the prototype considers one of its two main components-lexical knowledge base. With the corpus, the evaluation enriches the lexical knowledge resources like vocabulary and grammar structure. This leads the prototype to parse a reasonable amount of sentences in the corpus.","PeriodicalId":335795,"journal":{"name":"2008 11th International Conference on Computer and Information Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Corpus-based evaluation of lexical components of a domain-specific text to Knowledge Mapping prototype\",\"authors\":\"R. Shams, A. Elsayed\",\"doi\":\"10.1109/ICCITECHN.2008.4803005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The aim of this paper is to evaluate the lexical components of a text to knowledge mapping (TKM) prototype. The prototype is domain-specific, the purpose of which is to map instructional text onto a knowledge domain. The context of the knowledge domain of the prototype is physics, specifically DC electrical circuits. During development, the prototype has been tested with a limited data set from the domain. The prototype now reached a stage where it needs to be evaluated with a representative linguistic data set called corpus. A corpus is a collection of text drawn from typical sources which can be used as a test data set to evaluate NLP systems. As there is no available corpus for the domain, we developed a representative corpus and annotated it with linguistic information. The evaluation of the prototype considers one of its two main components-lexical knowledge base. With the corpus, the evaluation enriches the lexical knowledge resources like vocabulary and grammar structure. This leads the prototype to parse a reasonable amount of sentences in the corpus.\",\"PeriodicalId\":335795,\"journal\":{\"name\":\"2008 11th International Conference on Computer and Information Technology\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 11th International Conference on Computer and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2008.4803005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 11th International Conference on Computer and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2008.4803005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Corpus-based evaluation of lexical components of a domain-specific text to Knowledge Mapping prototype
The aim of this paper is to evaluate the lexical components of a text to knowledge mapping (TKM) prototype. The prototype is domain-specific, the purpose of which is to map instructional text onto a knowledge domain. The context of the knowledge domain of the prototype is physics, specifically DC electrical circuits. During development, the prototype has been tested with a limited data set from the domain. The prototype now reached a stage where it needs to be evaluated with a representative linguistic data set called corpus. A corpus is a collection of text drawn from typical sources which can be used as a test data set to evaluate NLP systems. As there is no available corpus for the domain, we developed a representative corpus and annotated it with linguistic information. The evaluation of the prototype considers one of its two main components-lexical knowledge base. With the corpus, the evaluation enriches the lexical knowledge resources like vocabulary and grammar structure. This leads the prototype to parse a reasonable amount of sentences in the corpus.