一个基于语料库的评估领域特定文本的词法成分到知识映射原型

2008 11th International Conference on Computer and Information Technology Pub Date : 2008-12-01 DOI:10.1109/ICCITECHN.2008.4803005

R. Shams, A. Elsayed

{"title":"一个基于语料库的评估领域特定文本的词法成分到知识映射原型","authors":"R. Shams, A. Elsayed","doi":"10.1109/ICCITECHN.2008.4803005","DOIUrl":null,"url":null,"abstract":"The aim of this paper is to evaluate the lexical components of a text to knowledge mapping (TKM) prototype. The prototype is domain-specific, the purpose of which is to map instructional text onto a knowledge domain. The context of the knowledge domain of the prototype is physics, specifically DC electrical circuits. During development, the prototype has been tested with a limited data set from the domain. The prototype now reached a stage where it needs to be evaluated with a representative linguistic data set called corpus. A corpus is a collection of text drawn from typical sources which can be used as a test data set to evaluate NLP systems. As there is no available corpus for the domain, we developed a representative corpus and annotated it with linguistic information. The evaluation of the prototype considers one of its two main components-lexical knowledge base. With the corpus, the evaluation enriches the lexical knowledge resources like vocabulary and grammar structure. This leads the prototype to parse a reasonable amount of sentences in the corpus.","PeriodicalId":335795,"journal":{"name":"2008 11th International Conference on Computer and Information Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Corpus-based evaluation of lexical components of a domain-specific text to Knowledge Mapping prototype\",\"authors\":\"R. Shams, A. Elsayed\",\"doi\":\"10.1109/ICCITECHN.2008.4803005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The aim of this paper is to evaluate the lexical components of a text to knowledge mapping (TKM) prototype. The prototype is domain-specific, the purpose of which is to map instructional text onto a knowledge domain. The context of the knowledge domain of the prototype is physics, specifically DC electrical circuits. During development, the prototype has been tested with a limited data set from the domain. The prototype now reached a stage where it needs to be evaluated with a representative linguistic data set called corpus. A corpus is a collection of text drawn from typical sources which can be used as a test data set to evaluate NLP systems. As there is no available corpus for the domain, we developed a representative corpus and annotated it with linguistic information. The evaluation of the prototype considers one of its two main components-lexical knowledge base. With the corpus, the evaluation enriches the lexical knowledge resources like vocabulary and grammar structure. This leads the prototype to parse a reasonable amount of sentences in the corpus.\",\"PeriodicalId\":335795,\"journal\":{\"name\":\"2008 11th International Conference on Computer and Information Technology\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 11th International Conference on Computer and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2008.4803005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 11th International Conference on Computer and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2008.4803005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

本文的目的是评估文本到知识映射(TKM)原型的词汇成分。原型是特定于领域的，其目的是将指导性文本映射到知识领域。原型知识领域的背景是物理学，特别是直流电路。在开发过程中，原型已经使用来自该领域的有限数据集进行了测试。原型现在达到了一个阶段，它需要用一个称为语料库的代表性语言数据集进行评估。语料库是从典型来源提取的文本集合，可以用作评估NLP系统的测试数据集。由于该领域没有可用的语料库，我们开发了一个具有代表性的语料库，并用语言信息对其进行了注释。原型的评价考虑了原型的两个主要组成部分之一——词汇知识库。有了语料库，评价丰富了词汇和语法结构等词汇知识资源。这使得原型能够解析语料库中合理数量的句子。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Corpus-based evaluation of lexical components of a domain-specific text to Knowledge Mapping prototype

The aim of this paper is to evaluate the lexical components of a text to knowledge mapping (TKM) prototype. The prototype is domain-specific, the purpose of which is to map instructional text onto a knowledge domain. The context of the knowledge domain of the prototype is physics, specifically DC electrical circuits. During development, the prototype has been tested with a limited data set from the domain. The prototype now reached a stage where it needs to be evaluated with a representative linguistic data set called corpus. A corpus is a collection of text drawn from typical sources which can be used as a test data set to evaluate NLP systems. As there is no available corpus for the domain, we developed a representative corpus and annotated it with linguistic information. The evaluation of the prototype considers one of its two main components-lexical knowledge base. With the corpus, the evaluation enriches the lexical knowledge resources like vocabulary and grammar structure. This leads the prototype to parse a reasonable amount of sentences in the corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 11th International Conference on Computer and Information Technology

自引率

0.00%

发文量