X-tract: structure extraction from botanical textual descriptions

6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268) Pub Date : 1999-09-21 DOI:10.1109/SPIRE.1999.796571

Rocío Abascal-Mena, J. A. Sánchez

{"title":"X-tract: structure extraction from botanical textual descriptions","authors":"Rocío Abascal-Mena, J. A. Sánchez","doi":"10.1109/SPIRE.1999.796571","DOIUrl":null,"url":null,"abstract":"Most available information today, both from printed books and digital repositories, is in the form of free-format texts. The task of retrieving information from these ever-growing repositories has become a challenge for information retrieval (IR) researchers. In some fields, such as botany and taxonomy, textual descriptions observe a set of rules and use a relatively limited vocabulary. This makes botanical textual descriptions an interesting area to explore IR techniques for finding structure and facilitating semantic analysis. This paper presents X-tract, a solution to the problem of text analysis and structure extraction in a specific application domain, namely floristic morphologic descriptions. The solution demonstrates the potential of using a grammar in the determination of information structure in a botanical digital library. We have developed a prototype based on this approach in which given an HTML or plain text, X-tract analyzes it and presents results to the user so he or she can verify the proposed structure before updating the database. This transformation is useful also in the process of storing morphologic descriptions in a database with a preestablished format. The solution is implemented in the context of the Floristic Digital Library (FDL), a large digital library project comprising a wide variety of botanical documents, formats and services.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIRE.1999.796571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Most available information today, both from printed books and digital repositories, is in the form of free-format texts. The task of retrieving information from these ever-growing repositories has become a challenge for information retrieval (IR) researchers. In some fields, such as botany and taxonomy, textual descriptions observe a set of rules and use a relatively limited vocabulary. This makes botanical textual descriptions an interesting area to explore IR techniques for finding structure and facilitating semantic analysis. This paper presents X-tract, a solution to the problem of text analysis and structure extraction in a specific application domain, namely floristic morphologic descriptions. The solution demonstrates the potential of using a grammar in the determination of information structure in a botanical digital library. We have developed a prototype based on this approach in which given an HTML or plain text, X-tract analyzes it and presents results to the user so he or she can verify the proposed structure before updating the database. This transformation is useful also in the process of storing morphologic descriptions in a database with a preestablished format. The solution is implemented in the context of the Floristic Digital Library (FDL), a large digital library project comprising a wide variety of botanical documents, formats and services.

查看原文本刊更多论文

X-tract:从植物文本描述中提取结构

今天，大多数可获得的信息，无论是来自印刷书籍还是数字存储库，都是以自由格式文本的形式存在的。从这些不断增长的知识库中检索信息的任务已经成为信息检索(IR)研究人员面临的一个挑战。在一些领域，如植物学和分类学，文本描述遵循一套规则，使用相对有限的词汇。这使得植物文本描述成为探索IR技术查找结构和促进语义分析的一个有趣领域。本文提出了一种解决植物区系形态描述这一特定应用领域中文本分析和结构提取问题的方法——X-tract。该解决方案展示了在植物数字图书馆中使用语法确定信息结构的潜力。我们基于这种方法开发了一个原型，在这种方法中，给定HTML或纯文本，X-tract对其进行分析并将结果呈现给用户，以便他或她可以在更新数据库之前验证所建议的结构。这种转换在使用预先建立的格式将形态描述存储在数据库中的过程中也很有用。该解决方案是在植物数字图书馆(FDL)的背景下实施的，FDL是一个大型数字图书馆项目，包括各种各样的植物文献、格式和服务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)

自引率

0.00%

发文量