Petr Knoth, Lucas Anastasiou, Drahomira Herrmannova, Nancy Pontika
{"title":"5th International workshop on mining scientific publications (WOSP 2016)","authors":"Petr Knoth, Lucas Anastasiou, Drahomira Herrmannova, Nancy Pontika","doi":"10.1145/2910896.2926737","DOIUrl":"https://doi.org/10.1145/2910896.2926737","url":null,"abstract":"Digital libraries that store scientific publications are becoming increasingly central to the research process. They are not only used for traditional tasks, such as finding and storing research outputs, but also as a source for discovering new research trends or evaluating research excellence. With the current growth of scientific publications deposited in digital libraries, it is no longer sufficient to provide only access to content. To aid research, it is especially important to leverage the potential of text and data mining technologies to improve the process of how research is being done.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132165192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards identifying potential research collaborations from scientific research networks using scholarly data","authors":"Y. Garay, Monika Akbar, A. Gates","doi":"10.1145/2910896.2925439","DOIUrl":"https://doi.org/10.1145/2910896.2925439","url":null,"abstract":"Identifying research areas of researchers is a difficult task because of the various levels of abstraction in which information may be stored; however, such a task is essential for detecting potential research collaborations within an institution. This work describes an approach to create a scientific research network with topics identified from the researchers' scholarly data and relations between topics by analyzing data harvested from digital libraries and queries to domain ontologies. The relations are used to connect the researchers. Such networks have the potential for revealing the synergy between different topics and researchers within an institution. It will also show less explored research areas that can be targeted for further study. The poster will describe the approach and how it was applied to a biomedical domain at the university.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132565671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting medical subject headings based on abstract similarity and citations to MEDLINE records","authors":"Adam K. Kehoe, Vetle I. Torvik","doi":"10.1145/2910896.2910920","DOIUrl":"https://doi.org/10.1145/2910896.2910920","url":null,"abstract":"We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH®) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting “Heterocyclic Compounds, 2-Ring” vs. other “Heterocyclic Compounds”). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+% precision and recall with 10-fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on “Heterocyclic Compounds, 2-Ring”, while our approach performs better on Alzheimer Disease and Neuroimaging. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130081096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Esteva, Sandra Sweat, R. McLay, Weijia Xu, Sivakumar Kulasekaran
{"title":"Data curation with a focus on reuse","authors":"M. Esteva, Sandra Sweat, R. McLay, Weijia Xu, Sivakumar Kulasekaran","doi":"10.1145/2910896.2910906","DOIUrl":"https://doi.org/10.1145/2910896.2910906","url":null,"abstract":"A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements from prospective users of the dataset, focusing on how and for which research projects they would reuse the data. Users needs informed which curation tasks to conduct, which included: adding more information elements to the dataset to expand its content scope; removing personal information; and, packaging the data in a size, a format, and at a frequency of delivery that are convenient for access and analysis purposes. The curation tasks are embedded in the software that produces the data, and are implemented as an automated workflow that spans various HPC resources, in which the dataset is generated, processed and stored and the Texas ScholarWorks institutional repository, through which the data is published. Within this distributed architecture, the integrated data creation and curation workflow complies with long-term preservation requirements, and is the first one implemented as a collaboration between the supercomputing center where the data is created on ongoing basis, and the University Libraries at UT Austin where it is published. The targeted curation strategy included the design of proof of concept data analyses to evaluate if the curated data met the reuse scenarios proposed by users. The results suggest that the dataset is understandable, and that researchers can use it to answer some of the research questions they posed. Results also pointed to specific elements of the curation strategy that had to be improved and disclosed the difficulties involved in breaking data to new users.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116188398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Panel: Preserving born-digital news","authors":"Edward McCain, Martin Klein, Matthew S. Weber","doi":"10.1145/2910896.2926739","DOIUrl":"https://doi.org/10.1145/2910896.2926739","url":null,"abstract":"This panel examines the need for digital libraries to capture and preserve journalistic content in digital formats, especially online news.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114603987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WADL 2016: Third international workshop on Web archiving and digital libraries","authors":"E. Fox, Zhiwu Xie, Martin Klein","doi":"10.1145/2910896.2926735","DOIUrl":"https://doi.org/10.1145/2910896.2926735","url":null,"abstract":"This workshop will explore integration of Web archiving and digital libraries, so the complete life cycle involved is covered: creation/authoring, uploading/publishing in the Web (2.0), (focused) crawling, indexing, exploration (searching, browsing), archiving (of events), etc. It will include particular coverage of current topics of interest, like: big data, mobile web archiving, and systems (e.g., Memento, SiteStory, Hadoop processing).","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128577527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital history meets Wikipedia: Analyzing historical persons in Wikipedia","authors":"A. Jatowt, Daisuke Kawai, Katsumi Tanaka","doi":"10.1145/2910896.2910911","DOIUrl":"https://doi.org/10.1145/2910896.2910911","url":null,"abstract":"Wikipedia is the result of a collaborative effort aiming to represent human knowledge and to make it accessible for everyone. As such it contains lots of contemporary as well as history-related information. This research looks into historical data available in Wikipedia to explore its various time-related characteristics. In particular, we study Wikipedia articles on historical persons. Our analysis sheds new light on the characteristics of information about historical persons in Wikipedia and quantifies user interest in such data. We use signals derived from the hyperlink structure of Wikipedia as well as from article view logs and we overlay them over temporal dimension to understand relations between time, link structure and article popularity. In the latter part of the paper, we also demonstrate different ways for estimating person importance based on the temporal aspects of the link structure.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127895348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parthasarathy Gopavarapu, Line C. Pouchard, S. Pujol
{"title":"Increasing datasets discoverability in an engineering data platform using keyword extraction","authors":"Parthasarathy Gopavarapu, Line C. Pouchard, S. Pujol","doi":"10.1145/2910896.2925443","DOIUrl":"https://doi.org/10.1145/2910896.2925443","url":null,"abstract":"In this paper we describe the use of keyword extraction in a data management platform for the storage, publication, and sharing of scientific and engineering datasets primarily related to the stress of concrete structures under earthquake conditions. To improve discoverability of datasets and assist scientists who upload data, we designed an automated keyword extraction system that will propose keywords for uploaded datasets.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"47 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121012686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unraveling K-12 standard alignment; Report on a new attempt","authors":"Byron Marshall, R. Reitsma, Carleigh C. Samson","doi":"10.1145/2910896.2910919","DOIUrl":"https://doi.org/10.1145/2910896.2910919","url":null,"abstract":"We present the results of an experiment which indicate that automated alignment of electronic learning objects to educational standards may be more feasible than previously implied. We highlight some important deficiencies in existing alignment systems and formulate suggestions for improved future ones. We consider how the changing substance of newer educational standards, a multi-faceted view of standard alignment, and a more nuanced view of the `alignment' concept may bring the long-sought goal of automated standard alignment closer. We explore how lexical similarity of documents, a World+Method representation of semantics, and network-based analysis can yield promising results. We furthermore investigate the nature of false positives to better understand how validity of match is evaluated so as to better focus future alignment system development.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122427583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyrille Suire, Axel Jean-Caurant, V. Courboulay, J. Burie, P. Estraillier
{"title":"User activity characterization in a cultural heritage digital library system","authors":"Cyrille Suire, Axel Jean-Caurant, V. Courboulay, J. Burie, P. Estraillier","doi":"10.1145/2910896.2925459","DOIUrl":"https://doi.org/10.1145/2910896.2925459","url":null,"abstract":"Digital access to large amount of heterogeneous data can create methodological biases regarding the discovery and exploitation of resources, particularly when it comes to Social Sciences. In order to provide relevant adaptivity for social scientists, it is important to fully consider their research practice diversity. To do so, we consider an activity-based approach for researchers' information search behavior. We have also conducted an experiment in a Cultural Heritage use case. The main result shows us that social scientists have the same research behaviors as those observed in exact Sciences.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122521942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}