{"title":"Downloading textual hidden web content through keyword queries","authors":"A. Ntoulas, P. Zerfos, Junghoo Cho","doi":"10.1145/1065385.1065407","DOIUrl":"https://doi.org/10.1145/1065385.1065407","url":null,"abstract":"An ever-increasing amount of information on the Web today is available only through search interfaces: the users have to type in a set of keywords in a search form in order to access the pages from certain Web sites. These pages are often referred to as the hidden Web or the deep Web. Since there are no static links to the hidden Web pages, search engines cannot discover and index such pages and thus do not return them in the results. However, according to recent studies, the content provided by many hidden Web sites is often of very high quality and can be extremely valuable to many users. In this paper, we study how we can build an effective hidden Web crawler that can autonomously discover and download pages from the hidden Web. Since the only \"entry point\" to a hidden Web site is a query interface, the main challenge that a hidden Web crawler has to face is how to automatically generate meaningful queries to issue to the site. We provide a theoretical framework to investigate the query generation problem for the hidden Web and we propose effective policies for generating queries automatically. Our policies proceed iteratively, issuing a different query in every iteration. We experimentally evaluate the effectiveness of these policies on 4 real hidden Web sites and our results are very promising. For instance, in one experiment, one of our policies downloaded more than 90% of a hidden Web site (that contains 14 million documents) after issuing fewer than 100 queries","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129736370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What's there and what's not?: focused crawling for missing documents in digital libraries","authors":"Ziming Zhuang, R. Wagle, C. Lee Giles","doi":"10.1145/1065385.1065455","DOIUrl":"https://doi.org/10.1145/1065385.1065455","url":null,"abstract":"Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors' self-submissions. While these approaches have so far built reasonable size libraries, they can suffer from having only a portion of the documents from specific publishing venues. We propose to use alternative online resources and techniques that maximally exploit other resources to build the complete document collection of any given publication venue. We investigate the feasibility of using publication metadata to guide the crawler towards authors' homepages to harvest what is missing from a digital library collection. We collect a real-world dataset from two Computer Science publishing venues, involving a total of 593 unique authors over a time frame of 1998 to 2004. We then identify the missing papers that are not indexed by CiteSeer. Using a fully automatic heuristic-based system that has the capability of locating authors' homepages and then using focused crawling to download the desired papers, we demonstrate that it is practical to harvest using a focused crawler academic papers that are missing from our digital library. Our harvester achieves a performance with an average recall level of 0.82 overall and 0.75 for those missing documents. Evaluation of the crawler's performance based on the harvest rate shows definite advantages over other crawling approaches and consistently outperforms a defined baseline crawler on a number of measures","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122649659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Icon abacus: positional display of document attributes","authors":"E. Bier, Adam Perer","doi":"10.1145/1065385.1065452","DOIUrl":"https://doi.org/10.1145/1065385.1065452","url":null,"abstract":"This paper presents icon abacus, a space-efficient technique for displaying document attributes by automatic positioning of document icons. It displays the value of an attribute by using position on a single axis, allowing the other axis to display different metadata simultaneously. The layout is stable enough to support navigation using spatial memory","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121445480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BEN collaborative poster","authors":"L. Akli, Cal T. Collins, Y. George","doi":"10.1145/1065385.1065467","DOIUrl":"https://doi.org/10.1145/1065385.1065467","url":null,"abstract":"Summary form only given. In 1999, the American Association for the Advancement of Science (AAAS) Directorate for Education and Human Resources (EHR) Programs and Science's Signal Transduction Knowledge Environment (STKE) - with 11 other professional societies and coalitions for biological sciences - established the BiosciEdNet (BEN) Collaborative. Since its inception, BEN has grown from its original 11 to 24 Collaborators. Currently, the digital library collections of BEN Collaborators provide a rich array of materials for undergraduate biological sciences educators, including ones that prepare K-12 teachers. BEN Collaborators are building digital collections that are inclusive of all educators and students. In summary, BEN has already developed tools and services that can be shared to leverage technologies across societies, coalitions, and collections affiliates. BEN has a documentation site - http://www.biosciednet.org/project_site/ that includes all its technical standards and specifications. For long-term sustainability the BEN Collaborative goal is to design and develop digital library collections that are valued by the members of professional societies, thereby eventually ensuring inclusion in the ongoing operating budgets of societies. Some societies are already providing in- kind support for staff and dollars for development","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133797804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Bartolo, Cathy S. Lowe, D. Sadoway, Patrick E. Trapa
{"title":"Large introductory science courses&digital libraries","authors":"L. Bartolo, Cathy S. Lowe, D. Sadoway, Patrick E. Trapa","doi":"10.1145/1065385.1065469","DOIUrl":"https://doi.org/10.1145/1065385.1065469","url":null,"abstract":"Student self-assessment survey results indicate that a virtual lab experience improved understanding of many key laboratory learning objectives and that the Materials Digital Library (MatDL) has potential value in supporting a virtual lab","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131660443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The climate change collection: a case study on digital library collection review and the integration of research, education and evaluation","authors":"M. McCaffrey, T. Weston","doi":"10.1145/1065385.1065495","DOIUrl":"https://doi.org/10.1145/1065385.1065495","url":null,"abstract":"Validating the scientific quality and potential of digital resources use in classroom settings has become a major focus of recent digital library efforts such as the Digital Library for Earth System Education (DLESE). The Climate Change Collection is thematic collection of digital resources relating to the topic of global climate change and natural climate variability designed as a pilot project for reviewing the scientific quality and pedagogical potential of selected digital resources using a focused and streamlined approach. The collection offers a case-study in integrating research and education through the collaborative efforts of an interdisciplinary review team made up of professionals from the fields of climate research, geoscience education, cognitive psychology, and evaluation. Each participant received a stipend for their involvement in the process. Designed as an experiment in streamlined collection development, it is anticipated that the experience of the Climate Change Collection effort will help inform future digital library review and collection-building efforts","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116361097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time genre classification for music digital libraries","authors":"J. S. Downie, Andreas F. Ehmann, D. Tcheng","doi":"10.1145/1065385.1065480","DOIUrl":"https://doi.org/10.1145/1065385.1065480","url":null,"abstract":"This paper describes a real-time audio-based automatic music genre classifier for use in organizing, browsing, and searching musical digital libraries. A decision tree classifier trained on a 40-dimension feature space is used to categorize music into one of 14 different genres with the results being displayed to a continuously updating user interface","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124619224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Chang, J. Hedberg, Y. Theng, Ee-Peng Lim, T. Teh, D. Goh
{"title":"Evaluating G-portal for geography learning and teaching","authors":"C. Chang, J. Hedberg, Y. Theng, Ee-Peng Lim, T. Teh, D. Goh","doi":"10.1145/1065385.1065390","DOIUrl":"https://doi.org/10.1145/1065385.1065390","url":null,"abstract":"This paper describes G-Portal, a geospatial digital library of geographical assets, providing an interactive platform to engage students in active manipulation and analysis of information resources and collaborative learning activities. Using a G-Portal application in which students conducted a field study of an environmental problem of beach erosion and sea level rise, we describe a pilot study to evaluate usefulness and usability issues to support the learning of geographical concepts, and in turn teaching","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114665631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Link prediction approach to collaborative filtering","authors":"Zan Huang, Xin Li, Hsinchun Chen","doi":"10.1145/1065385.1065415","DOIUrl":"https://doi.org/10.1145/1065385.1065415","url":null,"abstract":"Recommender systems can provide valuable services in a digital library environment, as demonstrated by its commercial success in book, movie, and music industries. One of the most commonly-used and successful recommendation algorithms is collaborative filtering, which explores the correlations within user-item interactions to infer user interests and preferences. However, the recommendation quality of collaborative filtering approaches is greatly limited by the data sparsity problem. To alleviate this problem we have previously proposed graph-based algorithms to explore transitive user-item associations. In this paper, we extend the idea of analyzing user-item interactions as graphs and employ link prediction approaches proposed in the recent network modeling literature for making collaborative filtering recommendations. We have adapted a wide range of linkage measures for making recommendations. Our preliminary experimental results based on a book recommendation dataset show that some of these measures achieved significantly better performance than standard collaborative filtering algorithms","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121920766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"International scientific data, standards,&digital libraries","authors":"L. Bartolo, J. Rumble","doi":"10.1145/1065385.1065534","DOIUrl":"https://doi.org/10.1145/1065385.1065534","url":null,"abstract":"This workshop explores the various models used successfully to develop internationals standards for languages and tools, as well as scientific&technical information for use of data on the emerging Semantic Web. The advantages and disadvantages of the models will be highlighted in a manner that allows emerging standards to benefit from existing experience.","PeriodicalId":248721,"journal":{"name":"Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122084594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}