{"title":"Evaluating and enhancing meta-search performance in digital libraries","authors":"Bethina Schmitt, Sven Oberländer","doi":"10.1109/WISE.2002.1181647","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181647","url":null,"abstract":"Applying meta search systems is a suitable method for supporting the user if there are many different retrieval services available on the Web. Due to information splitting strategies of literature services existing meta search systems either provide minimal integration of results or slow response times. We present an approach that combines techniques of personalization and query processing in order to satisfy the user's demand for both fast and comprehensive results. In order to evaluate and compare different query processing strategies and additional influencing parameters we developed a simulation tool called SIMPSON. Thereby, we can observe the performance of query processing within the context of different response times of the underlying digital library services in the Web, with different kinds of user queries, and with different sizes of query results. To evaluate and compare the performance of different query processing and duplicate detection strategies we developed metrics, particularly with regard to user satisfaction. We present results from our first experiments with SIMPSON, focusing on duplicate detection, query specification, and Web server performance of the underlying digital library services.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114253650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Path locks for XML document collaboration","authors":"Stijn Dekeyser, J. Hidders","doi":"10.1109/WISE.2002.1181648","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181648","url":null,"abstract":"The hierarchical and semistructured nature of XML data can cause complicated update-behavior. The updates are not limited to entire document trees, but can involve subtrees and even individual elements. These document parts correspond to, e.g. sections in text documents or sub-diagrams in vector graphics files. Providing suitable locking mechanisms for semi-structured data can significantly improve collaboration systems that store their data as XML documents. We show that concurrency control mechanisms in CVS, relational, and object oriented database systems are inadequate for collaboration systems based on semistructured data. We therefore propose a new locking scheme of fine granularity based on path locks. We also show that our proposed mechanism avoids conflicts by ensuring serializability, supports both top-down and bottom-up query evaluation, and is relatively efficient.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121598197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UTML: Unified Transaction Modeling Language","authors":"N. Gioldasis, S. Christodoulakis","doi":"10.1109/WISE.2002.1181649","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181649","url":null,"abstract":"We propose UTML as a high level transaction modeling language to facilitate the complex Web transaction design process. Web transactions may be complex, composed of several sub-transactions and they may access resources with diverse behavior and interfaces like legacy systems and databases. They may also have complex semantics. Thus, transaction design methodologies and tools need to be very flexible, allowing for designing Web applications from scratch (top-down design), as well as using existing systems or services to compose new applications which offer added-value services (bottom-up design) to the user. UTML is based on a transaction meta-model which can describe, in a flexible and extensible manner, most of the known transaction models as well as new ones according to the application's requirements. It provides modeling for transactions that incorporate different behavioral patterns, and it is capable of describing activities with weaker transactional semantics that do not have all the ACID properties. Unlike other models, it can be used to synthesize new transactions from pre-existing transaction systems (like legacy systems), with diverse transactional semantics. UTML provides a rich notation to visualize the transaction design process. This notation has been built on top of UML using its extension mechanisms.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127417312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Ouyang, N. Memon, Torsten Suel, Dimitre Trendafilov
{"title":"Cluster-based delta compression of a collection of files","authors":"Z. Ouyang, N. Memon, Torsten Suel, Dimitre Trendafilov","doi":"10.1109/WISE.2002.1181662","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181662","url":null,"abstract":"Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. We study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of Web pages. Our experiments show that cluster-based delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128129038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web services and data integration","authors":"S. Abiteboul, O. Benjelloun, T. Milo","doi":"10.1109/WISE.2002.1181637","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181637","url":null,"abstract":"The developments of XML [4] and Web services [3] arechanging radically the art of data integration. We brieflydescribe Web services and consider some of their impacton data integration. We argue that XML and Web servicesprovide the proper infrastructure for data integration at theWeb scale. This is illustrated by some work going on atINRIA on Active XML, that is XML extended by allowingthe embedding of calls to Web services.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121504696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Storing and maintaining semistructured data efficiently in an object-relational database","authors":"Yuanying Mo, T. Ling","doi":"10.1109/WISE.2002.1181661","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181661","url":null,"abstract":"We propose to use object-relational database management systems to store and manage semi-structured data. ORA-SS (Object-Relationship-Attribute model for Semi-Structured data) (Dobbie et al., 2000) is used as the data model. It not only reflects the nested structure of semi-structured data, but also distinguishes between object classes and relationship types, and between attributes of object classes and attributes of relationship types. ORA-SS can specify the degree of n-ary relationship types and indicate if an attribute is an attribute of a relationship type or an attribute of an object class. Existing semi-structured data models cannot specify such information. We use this information to translate XML Schemas/DTD to ORA-SS schemas, then to object-relational databases correctly and without avoidable redundancy. The existing techniques have a lot of redundancy in storage and introduce node IDs of the tree instance which are not needed in our approach.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130707310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sergej Sizov, M. Theobald, Stefan Siersdorfer, G. Weikum
{"title":"BINGO!: bookmark-induced gathering of information","authors":"Sergej Sizov, M. Theobald, Stefan Siersdorfer, G. Weikum","doi":"10.1109/WISE.2002.1181668","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181668","url":null,"abstract":"Focused (thematic) crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It involves the automatic classification of visited documents into a user- or community-specific topic hierarchy (ontology). The quality of training data for the classifier is the most critical issue and a potential bottleneck for the effectivity and scale of a focused crawler. This paper presents the BINGO! approach to focused crawling that aims to overcome the limitations of initial training data. To this end, BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic \"archetypes\" and uses them for periodically re-training the classifier; this way the crawler is dynamically adapted based on the most significant documents seen so far. Two kinds of archetypes are considered: good authorities as determined by employing Kleinberg's (1999) link analysis algorithm, and documents that have been automatically classified with high confidence using a linear SVM classifier. Our approach is fully implemented in the BINGO! system, and our experiments indicate that the dynamic enhancement of training data based on archetypes extends the \"knowledge base\" of the classifier by a substantial margin without loss of classification accuracy.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129135545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-rich section extraction from HTML pages","authors":"Jiying Wang, F. Lochovsky","doi":"10.1109/WISE.2002.1181667","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181667","url":null,"abstract":"We propose a novel algorithm, DSE (data-rich subtree extraction) to recognize and extract the data-rich section of an HTML page. We apply the DSE algorithm as a pre-processing \"clean-up\" step for two typical Web information retrieval problems: topic distillation and Web information extraction. Our experiments show that, for the test data sets used, the DSE algorithm can correctly identify data-rich sections of HTML pages with 100% accuracy. Therefore, it can effectively reduce the root set size for the topic distillation problem thereby improving the precision and accuracy of the IETS algorithm. Furthermore, when applied to the Web information extraction problem using the IEPAD algorithm, it can decrease the number of patterns discovered by this algorithm, thus shortening its time cost to generalize a wrapper for HTML pages.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129332508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A RDF-based model for expressing spatio-temporal relations between Web sites","authors":"S. Buraga, Gabriel Ciobanu","doi":"10.1109/WISE.2002.1181671","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181671","url":null,"abstract":"The paper proposes a high-level model to represent spatio-temporal relations between Web sites or fragments of Web sites in order to facilitate resource discovery, cataloging or describe information. This model is based on the Resource Description Framework (RDF) recommendation of the World-Wide Web Consortium, a general purpose technology that enables the description of resources on the Web. The representation of the spatio-temporal relationships is expressed by an XML-based language.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121149628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating and selecting Web sources as external information resources of a data warehouse","authors":"Yan Zhu, A. Buchmann","doi":"10.1109/WISE.2002.1181652","DOIUrl":"https://doi.org/10.1109/WISE.2002.1181652","url":null,"abstract":"A company's local data is often insufficient for analyzing market trends and making reasonable business plans. Decision making must also be based on information from suppliers, partners and competitors. Systematically integrating suitable external data from the Web into a data warehouse is a meaningful solution and will benefit the enterprise. However, the autonomy and dynamics of the Web make the task of selecting relevant and qualified external data from the Web challenging. We develop a set of criteria for evaluating and selecting Web resources as external data sources of a data warehouse and discuss how to screen Web data sources using multi-criteria decision making (MCDM) methods. The final decision with respect to selecting Web sources is sensitive to critical factors, i.e., the criterion weight and performance score of alternatives in terms of each criterion. We analyzed the sensitivity of the final rank of alternatives in terms of critical factors in order to gain an insight into the stability of our final decision. The comparison of several MCDM approaches for Web source screening is also presented.","PeriodicalId":392999,"journal":{"name":"Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122020937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}