J. Choi, H. Abbasi, D. Pugmire, N. Podhorszki, S. Klasky, Cristian Capdevila, M. Parashar, M. Wolf, J. Qiu, G. Fox
{"title":"Mining hidden mixture context with ADIOS-P to improve predictive pre-fetcher accuracy","authors":"J. Choi, H. Abbasi, D. Pugmire, N. Podhorszki, S. Klasky, Cristian Capdevila, M. Parashar, M. Wolf, J. Qiu, G. Fox","doi":"10.1109/eScience.2012.6404418","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404418","url":null,"abstract":"Predictive pre-fetcher, which predicts future data access events and loads the data before users requests, has been widely studied, especially in file systems or web contents servers, to reduce data load latency. Especially in scientific data visualization, pre-fetching can reduce the IO waiting time. In order to increase the accuracy, we apply a data mining technique to extract hidden information. More specifically, we apply a data mining technique for discovering the hidden contexts in data access patterns and make prediction based on the inferred context to boost the accuracy. In particular, we performed Probabilistic Latent Semantic Analysis (PLSA), a mixture model based algorithm popular in the text mining area, to mine hidden contexts from the collected user access patterns and, then, we run a predictor within the discovered context. We further improve PLSA by applying the Deterministic Annealing (DA) method to overcome the local optimum problem. In this paper we demonstrate how we can apply PLSA and DA optimization to mine hidden contexts from users data access patterns and improve predictive pre-fetcher performance.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89001899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Djorgovski, A. Mahabal, C. Donalek, M. Graham, A. Drake, B. Moghaddam, M. Turmon
{"title":"Flashes in a star stream: Automated classification of astronomical transient events","authors":"S. Djorgovski, A. Mahabal, C. Donalek, M. Graham, A. Drake, B. Moghaddam, M. Turmon","doi":"10.1109/eScience.2012.6404437","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404437","url":null,"abstract":"An automated, rapid classification of transient events detected in the modern synoptic sky surveys is essential for their scientific utility and effective follow-up using scarce resources. This presents some unusual challenges: the data are sparse, heterogeneous and incomplete; evolving in time; and most of the relevant information comes not from the data stream itself, but from a variety of archival data and contextual information (spatial, temporal, and multi-wavelength). We are exploring a variety of novel techniques, mostly Bayesian, to respond to these challenges, using the ongoing CRTS sky survey as a testbed. The current surveys are already overwhelming our ability to effectively follow all of the potentially interesting events, and these challenges will grow by orders of magnitude over the next decade as the more ambitious sky surveys get under way. While we focus on an application in a specific domain (astrophysics), these challenges are more broadly relevant for event or anomaly detection and knowledge discovery in massive data streams.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88801844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
André Luckow, M. Santcroos, André Merzky, Ole Weidner, P. Mantha, S. Jha
{"title":"P∗: A model of pilot-abstractions","authors":"André Luckow, M. Santcroos, André Merzky, Ole Weidner, P. Mantha, S. Jha","doi":"10.1109/eScience.2012.6404423","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404423","url":null,"abstract":"Pilot-Jobs support effective distributed resource utilization, and are arguably one of the most widely-used distributed computing abstractions - as measured by the number and types of applications that use them, as well as the number of production distributed cyberinfrastructures that support them. In spite of broad uptake, there does not exist a well-defined, unifying conceptual model of Pilot-Jobs which can be used to define, compare and contrast different implementations. Often Pilot-Job implementations are strongly coupled to the distributed cyber-infrastructure they were originally designed for. These factors present a barrier to extensibility and interoperability. This paper is an attempt to (i) provide a minimal but complete model (P*) of Pilot-Jobs, (ii) establish the generality of the P* Model by mapping various existing and well known Pilot-Job frameworks such as Condor and DIANE to P*, (iii) derive an interoperable and extensible API for the P* Model (Pilot-API), (iv) validate the implementation of the Pilot-API by concurrently using multiple distinct Pilot-Job frameworks on distinct production distributed cyberinfrastructures, and (v) apply the P* Model to Pilot-Data.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89978938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reverse Engineering Europe's PSI Re-use Rules -- Towards an Integrated Conceptual Framework for PSI Re-use","authors":"M. D. Vries","doi":"10.1109/ESCIENCEW.2010.29","DOIUrl":"https://doi.org/10.1109/ESCIENCEW.2010.29","url":null,"abstract":"Despite various studies evincing the huge potential locked up in public sector information (PSI), this potential is far from being fully exploited. To a large extent, this failure is caused by the immensely complex legal labyrinth surrounding PSI re-use. This complexity works in two ways: public sector bodies do not comply with the regulatory framework and reusers do not avail themselves of the legal instruments offered, resulting in an unexploited economic potential. What makes the legal framework so complex is the transcending nature of PSI re-use, as it blends four areas of law – freedom of information law, ICT law, intellectual property law and competition law – that, throughout the years, have been regulated at a European, national and even sect oral level, but in isolation. The fundamental impact that ICT developments have on our society, subsequently also rocking the legal rules and underlying principles and axioms, makes the picture even more complicated. In this article, these legal frameworks are reverse engineered, demonstrating their interaction, culminating in a conceptual framework that allows public sector bodies and re-users (and courts where necessary) to apply and rely on the rules.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78067932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Educating the Humanities for e-Science","authors":"S. Strömqvist","doi":"10.1109/E-SCIENCE.2006.56","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2006.56","url":null,"abstract":"The first part of the present paper discusses why the Humanities is lagging behind in terms of making use of e-science and what might be done to remedy that situation. The diversity of ontologies in the Humanities, hampering consensus over metadata, is one problem. Another problem is the lack of education in e-science tailored to the needs of researchers in the Humanities and the lack of efforts to try to integrate elements of e-science with the standard repertoire of research and education in the Humanities. Drawing on experiences from the project European Cultural Heritage Online (ECHO) and from the on-going project Distributed Access Management of Language Resources (DAM-LR), the Centre for Languages and literature at Lund University is trying to implement new elements of e-science at the local Faculty of Humanities. The second part of the paper briefly describes the process as well as some of its added values.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73438186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Grid Environment for Data Integration of Scientific Databases","authors":"H. Matsuda","doi":"10.1109/E-SCIENCE.2005.5","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2005.5","url":null,"abstract":"Effective integration of heterogeneous data sources has been studied as the most pressing challenge in various fields; such as, high energy physics, astronomy, and life sciences. In this talk, we present a data integration system by using Globus Toolkit with OGSA-DAI. For associating related data among many databases, we have introduced metadata based on their domain ontologies. Using the system one can make a database access flow for describing a set of queries as a workflow, and can query across the databases without aware of their locations and schemas","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72894344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Service-Oriented Science: Scaling the Application and Impact of eResearch","authors":"Ian T. Foster","doi":"10.1109/E-SCIENCE.2005.75","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2005.75","url":null,"abstract":"The importance of service-oriented architecture for science is widely recognized. Increasingly, scientific communities are making information tools accessible as services that clients can access over the network, without knowledge of their internal workings. In this way, tools formerly accessible only to the specialist can be made available to all. Equally importantly, new value-added services can be constructed that integrate other services to automate useful tasks. The value of such service-oriented science has been demonstrated in disciplines as diverse as astronomy, biology, and fusion science. The mechanisms required to achieve these goals are provided, in part, by grid infrastructure. I review the mechanisms that have been developed to date for grid infrastructure and experience gained implementing these mechanisms, for example within the open source Globus Toolkit version 4. I present a range of dynamic service deployment scenarios, in which for example the TeraGrid and Open Science Grid are used to host services for science communities. I discuss how these scenarios demonstrate the potential for scaling service-oriented science","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86490895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reprocessing D0 Data with SAMGrid","authors":"F. Villeneuve-Séguier","doi":"10.1109/E-SCIENCE.2005.70","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2005.70","url":null,"abstract":"The DOslash experiment studies proton-antiproton collisions at the Tevatron collider based at Fermilab. Reprocessing, managing and distributing the large amount of real data coming from the detector as well as generating sufficient Monte Carlo data are some of the challenges faced by the DOslash collaboration. SAMGrid combines the SAM data handling system with the necessary job and information management allowing us to use the distributed computing resources in the various worldwide computing centers. This is one of the first large scale grid applications in high energy physics (in particular as we are using real data). After successful Monte Carlo production and a limited data reprocessing in the winter of 2003/04, the next milestone will be the reprocessing of the full current data set by this autumn/winter. It consists of ~500 TB of data, encompassing one billion events","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86820111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Surridge, Steve Taylor, D. D. Roure, E. Zaluska
{"title":"Experiences with GRIA - Industrial Applications on a Web Services Grid","authors":"M. Surridge, Steve Taylor, D. D. Roure, E. Zaluska","doi":"10.1109/E-SCIENCE.2005.38","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2005.38","url":null,"abstract":"The GRIA project set out to make the grid usable by industry. The GRIA middleware is based on Web services, and designed to meet the needs of industry for security and business-to-business (B2B) service procurement and operation. It provides well-defined B2B models for accounting and QoS agreement, and proxy-free delegation to support account management and service federation. The GRIA v3 software is now being used by industry. By taking a business-oriented approach independent of the evolving Open Grid Services Architecture proposals from the Global Grid Forum, GRIA has demonstrated the need for a wider understanding of virtual organizations (VOs). Traditional academic VOs are persistent, resourceful and have logically centralized, membership-oriented management structures. In contrast, the GRIA experience has been that business VOs are likely to be project-focused and have distributed process-oriented management structures","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73098054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Putting Semantics into e-Science and Grids","authors":"C. Goble","doi":"10.1109/E-SCIENCE.2005.68","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2005.68","url":null,"abstract":"What is the semantic grid? How can e-Science benefit from the technologies of the semantic grid? Can we build a semantic Web for e-Science? Would that differ from a semantic grid? Given our past experiences with scientists, grid developers and semantic Web researchers, what are the prospects, and pitfalls, of putting semantics into e-Science applications and grid infrastructure?","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2005-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90304424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}