Sung-Geun Han, Jeong-Han Son, Jae-Woo Chang, Z. Zhoo
{"title":"Design and implementation of a structured information retrieval system for SGML documents","authors":"Sung-Geun Han, Jeong-Han Son, Jae-Woo Chang, Z. Zhoo","doi":"10.1109/DASFAA.1999.765739","DOIUrl":null,"url":null,"abstract":"SGML has become very popular as the source language of structured documents because its usefulness has already been verified by the CALS (Commerce At Light Speed) project. Traditional information retrieval systems (IRSs), however, cannot support retrieval based on the logical structure of SGML documents. In this paper, we design a structured IRS (SIRS) supporting both content-based retrieval and structure-based retrieval, and implement it using the O2 Store storage system with the standard C language under the UNIX operating system environment. In order to make it easy to write a user query for SIRS and to obtain SGML documents that are relevant to the query, we also implement a WWW-based user interface by using CGI. Finally, we evaluate the performance of SIRS in terms of the record insertion time, deletion time, retrieval time and storage overhead.","PeriodicalId":229416,"journal":{"name":"Proceedings. 6th International Conference on Advanced Systems for Advanced Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 6th International Conference on Advanced Systems for Advanced Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASFAA.1999.765739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
SGML has become very popular as the source language of structured documents because its usefulness has already been verified by the CALS (Commerce At Light Speed) project. Traditional information retrieval systems (IRSs), however, cannot support retrieval based on the logical structure of SGML documents. In this paper, we design a structured IRS (SIRS) supporting both content-based retrieval and structure-based retrieval, and implement it using the O2 Store storage system with the standard C language under the UNIX operating system environment. In order to make it easy to write a user query for SIRS and to obtain SGML documents that are relevant to the query, we also implement a WWW-based user interface by using CGI. Finally, we evaluate the performance of SIRS in terms of the record insertion time, deletion time, retrieval time and storage overhead.