{"title":"Assembling and enriching digital library collections","authors":"D. Bainbridge, John Thompson, I. Witten","doi":"10.1109/JCDL.2003.1204885","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204885","url":null,"abstract":"People who create digital libraries need to gather together the raw material, add metadata as necessary, and design and build new collections. We set out the requirements for these tasks and describe a new tool that supports them interactively, making it easy for users to create their own collections from electronic files of all types. The process involves selecting documents for inclusion, coming up with a suitable metadata set, assigning metadata to each document or group of documents, designing the form of the collection in terms of document formats, searchable indexes, and browsing facilities, building the necessary indexes and data structures, and putting the collection in place for others to use. Moreover, different situations require different workflows, and the system must be flexible enough to cope with these demands. Although the tool is specific to the Greenstone digital library software, the underlying ideas should prove useful in more general contexts.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CephSchool: a pedagogic portal for teaching biological principles with cephalopod molluscs","authors":"J. Wood, Caitlin M. H. Shaw","doi":"10.1109/JCDL.2003.1204920","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204920","url":null,"abstract":"CephSchool is based on CephBase and takes the information present in CephBase's digital libraries and redirects it towards students and teachers. CephSchool is organized into eight arms and contains information about cephalopods, discussion topics, teacher support, and student assessment techniques. These provide an accurate and inquiry base-learning environment for students to learn basic biological concepts using cephalopods as the subject organism by giving them a dynamic Web page that is updated, as new information is made available.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127596015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging a common representation for personalized search and summarization in a medical digital library","authors":"K. McKeown, Noémie Elhadad, V. Hatzivassiloglou","doi":"10.1109/JCDL.2003.1204856","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204856","url":null,"abstract":"Despite the large amount of online medical literature, it can be difficult for clinicians to find relevant information at the point of patient care. We present techniques to personalize the results of search, making use of the online patient record as a sophisticated, preexisting user model. Our work in PERSIVAL, a medical digital library, includes methods for reranking the results of search to prioritize those that better match the patient record. It also generates summaries of the reranked results, which highlight information that is relevant to the patient under the physician's care. We focus on the use of a common representation for the articles returned by search and the patient record, which facilitates both the reranking and the summarization tasks. This common approach to both tasks has a strong positive effect on the ability to personalize information.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"421 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133638941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Liddy, Eileen Allen, Christina M. Finneran, Geri Gay, H. Hembrooke, Laura A. Granka
{"title":"MetaTest: evaluation of metadata from generation to use","authors":"E. Liddy, Eileen Allen, Christina M. Finneran, Geri Gay, H. Hembrooke, Laura A. Granka","doi":"10.1109/JCDL.2003.1204917","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204917","url":null,"abstract":"We are studying metadata from its initial generation to its use in accessing desired educational resources. With a testbed of lesson plans and activities, we are comparing the manually and automatically generated metadata for their retrieval effectiveness (i.e. ability to retrieve the most relevant resources); conducting a subjective evaluation of manually and automatically generated metadata as representations of the resource as judged by subject matter experts, and; conducting studies of users' search and navigation behavior when accessing the digital library. These evaluations successfully combine what we believe are necessary foci on how and whether metadata affects the user and system performance.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131762553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A system for building expandable digital libraries","authors":"D. Castelli, P. Pagano","doi":"10.1109/JCDL.2003.1204886","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204886","url":null,"abstract":"Expandability is one of the main requirements of future digital libraries. We introduce a digital library service system, OpenDLib, that has been designed to be highly expandable in terms of content, services and usage. We illustrate the mechanisms that enable expandability and discuss their impact on the development of the system architecture.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133975659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correcting broken characters in the recognition of historical printed documents","authors":"M. Droettboom","doi":"10.1109/JCDL.2003.1204889","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204889","url":null,"abstract":"We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121155043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Protein association discovery in biomedical literature","authors":"Yueyu Fu, Javed Mostafa, Kazuhiro Seki","doi":"10.1109/JCDL.2003.1204848","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204848","url":null,"abstract":"Protein association discovery can directly contribute toward developing protein pathways; hence it is a significant problem in bioinformatics. LUCAS (Library of User-Oriented Concepts for Access Services) was designed to automatically extract and determine associations among proteins from biomedical literature. Such a tool has notable potential to automate database construction in biomedicine, instead of relying on experts' analysis. We report on the mechanisms for automatically generating clusters of proteins. A formal evaluation of the system, based on a subset of 2000 MEDLINE titles and abstracts, has been conducted against Swiss-Prot database in which the associations among concepts are entered by experts manually.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125876027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An XQuery engine for digital library systems","authors":"Ji-Hoon Kang, Chul-Soo Kim, Eun-Jeong Ko","doi":"10.1109/JCDL.2003.1204919","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204919","url":null,"abstract":"A standard query language is very helpful for interoperability among digital library systems over the Internet. We propose an XQuery engine that can be used as an XQuery processing module in a digital library system that supports XML documents. We assume generic digital library system architecture. It consists of four modules: a user interface, an XQuery engine, an information retrieval engine, and an XML repository. The XQuery engine parses an input XQuery and constructs a syntax tree for the query.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Han, C. Lee Giles, Eren Manavoglu, H. Zha, Zhenyue Zhang, E. Fox
{"title":"Automatic document metadata extraction using support vector machines","authors":"Hui Han, C. Lee Giles, Eren Manavoglu, H. Zha, Zhenyue Zhang, E. Fox","doi":"10.1109/JCDL.2003.1204842","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204842","url":null,"abstract":"Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a support vector machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer [S. Lawrence et al., (1999)] and EbizSearch [Y. Petinot et al., (2003)]. We believe it can be generalized to other digital libraries.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"125 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120853091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to turn the page [digital libraries]","authors":"Yi-Chun Chu, I. Witten, R. Lobb, D. Bainbridge","doi":"10.1109/JCDL.2003.1204862","DOIUrl":"https://doi.org/10.1109/JCDL.2003.1204862","url":null,"abstract":"Can digital libraries provide a reading experience that more closely resembles a real book than a scrolled or paginated electronic display? We describe a prototype page turning system that realistically animates full three-dimensional page-turns. The dynamic behavior is generated by a mass-spring model defined on a rectangular grid of particles. The prototype takes a PDF or e-book file, renders it into a sequence of PNG images representing individual pages, and animates the page-turns under user control. The simulation behaves fairly naturally, although more computer graphics work is required to perfect it.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129930433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}