2017 IEEE 13th International Conference on e-Science (e-Science)最新文献_第2页

Provenance-Based Scientific Workflow Search 基于出处的科学工作流搜索

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/eScience.2017.24

A. A. Jabal, E. Bertino, Geeth de Mel

引用次数: 1

Energy-Efficient Dynamic Scheduling of Deadline-Constrained MapReduce Workflows 限期约束MapReduce工作流的节能动态调度

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/eScience.2017.18

Tong Shu, C. Wu

{"title":"Energy-Efficient Dynamic Scheduling of Deadline-Constrained MapReduce Workflows","authors":"Tong Shu, C. Wu","doi":"10.1109/eScience.2017.18","DOIUrl":"https://doi.org/10.1109/eScience.2017.18","url":null,"abstract":"Big data workflows comprised of moldable parallel MapReduce programs running on a large number of processors have become a main consumer of energy at data centers. The degree of parallelism of each moldable job in such workflows has a significant impact on the energy efficiency of parallel computing systems, which remains largely unexplored. In this paper, we validate with experimental results the moldable parallel computing model where the dynamic energy consumption of a moldable job increases with the number of parallel tasks. Based on our validation, we construct rigorous cost models and formulate a dynamic scheduling problem of deadline-constrained MapReduce workflows to minimize energy consumption in Hadoop systems. We propose a semi-dynamic online scheduling algorithm based on adaptive task partitioning to reduce dynamic energy consumption while meeting performance requirements from a global perspective, and also design the corresponding system modules for algorithm implementation in Hadoop architecture. The performance superiority of the proposed algorithm in terms of dynamic energy saving and deadline violation is illustrated by extensive simulation results in Hadoop/YARN in comparison with existing algorithms, and the core module of adaptive task partitioning is further validated through real-life workflow implementation and experimental results using the Oozie workflow engine in Hadoop/YARN systems.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"975 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134057232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

WorkflowHunt: Combining Keyword and Semantic Search in Scientific Workflow Repositories WorkflowHunt:在科学工作流存储库中结合关键字和语义搜索

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/eScience.2017.26

J. B. Diaz, C. B. Medeiros

{"title":"WorkflowHunt: Combining Keyword and Semantic Search in Scientific Workflow Repositories","authors":"J. B. Diaz, C. B. Medeiros","doi":"10.1109/eScience.2017.26","DOIUrl":"https://doi.org/10.1109/eScience.2017.26","url":null,"abstract":"Scientific datasets and the experiments that analyze them are growing in size and complexity, and scientists are facing difficulties to share such resources. Some initiatives have emerged to try to solve this problem. One of them involves the use of scientific workflows to represent and enact experiment execution. There is an increasing number of workflows that are potentially relevant for more than one scientific domain. However, it is hard to find workflows suitable for reuse given an experiment. Creating a workflow takes time and resources, and their reuse helps scientists to build new workflows faster and in a more reliable way. Search mechanisms in workflow repositories should provide different options for workflow discovery, but it is difficult for generic repositories to provide multiple mechanisms. This paper presents WorkflowHunt, a hybrid architecture for workflow search and discovery for generic repositories, which combines keyword and semantic search to allow finding relevant workflows using different search methods. We validated our architecture creating a prototype that uses real workflows and metadata from myExperiment, and compare search results via WorkflowHunt and via myExperiment's search interface.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125609903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Science Gateways Incubator: Software Sustainability Meets Community Needs 科学门户孵化器:软件可持续性满足社区需求

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/eScience.2017.77

S. Gesing, M. Zentner, Juliana Casavan, Betsy Hillery, Mihaela Vorvoreanu, R. Heiland, S. Marru, M. Pierce, Nayiri Mullinix, Nancy Maron

{"title":"Science Gateways Incubator: Software Sustainability Meets Community Needs","authors":"S. Gesing, M. Zentner, Juliana Casavan, Betsy Hillery, Mihaela Vorvoreanu, R. Heiland, S. Marru, M. Pierce, Nayiri Mullinix, Nancy Maron","doi":"10.1109/eScience.2017.77","DOIUrl":"https://doi.org/10.1109/eScience.2017.77","url":null,"abstract":"The main goal of the US Science Gateways Community Institute (SGCI) is to serve science gateways to achieve sustainability and growth. Science gateways allow science and engineering communities to access shared data, software, computing services, instruments, educational materials, and other resources specific to their disciplines. Thus, science gateways are a subgroup of scientific software and the means for addressing software sustainability are also suitable for science gateways and vice versa, e.g., best practices for software engineering. Since science gateways are tailored to specific communities, understanding users' requirements is critical for sustainability. SGCI consists of five service areas that closely interact with each other. The Incubator acknowledges the value of business strategy to inform well-designed science gateways and offers two main types of services: individualized consultancy, tailored to specific challenges a gateway faces, and the Science Gateways Bootcamp. The cornerstone of the Bootcamp is a one-week onsite intensive workshop where participants create their own roadmap for a sustainable science gateway via sessions with experts, hands-on exercises, and group work. This paper offers an overview of the work of the Incubator and shares lessons learned from the inaugural session of the Bootcamp in April 2017.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132116197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Adaptive Lossy Compression of Complex Environmental Indices Using Seasonal Auto-Regressive Integrated Moving Average Models 基于季节自回归综合移动平均模型的复杂环境指数自适应有损压缩

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/eScience.2017.45

Ugur Çayoglu, P. Braesicke, T. Kerzenmacher, Jörg Meyer, A. Streit

{"title":"Adaptive Lossy Compression of Complex Environmental Indices Using Seasonal Auto-Regressive Integrated Moving Average Models","authors":"Ugur Çayoglu, P. Braesicke, T. Kerzenmacher, Jörg Meyer, A. Streit","doi":"10.1109/eScience.2017.45","DOIUrl":"https://doi.org/10.1109/eScience.2017.45","url":null,"abstract":"Significant increases in computational resources have enabled the development of more complex and spatially better resolved weather and climate models. As a result the amount of output generated by data assimilation systems and by weather and climate simulations is rapidly increasing e.g. due to higher spatial resolution, more realisations and higher frequency data. However, while compute performance has increased significantly because of better scaling program code and increasing number of cores the storage capacity is only increasing slowly. One way to tackle the data storage problem is data compression. Here, we build the groundwork for an environmental data compressor by improving compression for established weather and climate indices like El Ni~no Southern Oscillation (ENSO), North Atlantic Oscillation (NAO) and Quasi-Biennial Oscillation (QBO). We investigate options for compressing these indices by using a statistical method based on the Auto Regressive Integrated Moving Average (ARIMA) model. The introduced adaptive approach shows that it is possible to improve accuracy of lossily compressed data by applying an adaptive compression method which preserves selected data with higher precision. Our analysis reveals no potential for lossless compression of these indices. However, as the ARIMA model is able to capture all relevant temporal variability, lossless compression is not necessary and lossy compression is acceptable. The reconstruction based on the lossily compressed data can reproduce the chosen indices to such a high degree that statistically relevant information needed for describing climate dynamics is preserved. The performance of the (seasonal) ARIMA model was tested with daily and monthly indices.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116565079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

ScienceDB: A Public Multidisciplinary Research Data Repository for eScience ScienceDB:面向科学的公共多学科研究数据存储库

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/ESCIENCE.2017.38

Chengzan Li, Yanfei Hou, Jianhui Li, Z. Lili

{"title":"ScienceDB: A Public Multidisciplinary Research Data Repository for eScience","authors":"Chengzan Li, Yanfei Hou, Jianhui Li, Z. Lili","doi":"10.1109/ESCIENCE.2017.38","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2017.38","url":null,"abstract":"Research data repositories are necessary infrastructures that ensure the data generated for research are accessible, stable, reliable, and reusable. Based on years of accumulated data work experience, the Computer Network Information Center of the Chinese Academy of Sciences has built a multi-disciplinary data repository ScienceDB for research users and teams using its big data storage, analysis and computing environments. This paper firstly introduces the motivation to develop ScienceDB and gives a profile to it. Then the overall technical framework of ScienceDB is introduced, and the key technologies such as the support for multidiscipline extensibility, data collaboration and data recommendation are analyzed deeply. And then this paper presents the functions and features of ScienceDB's current version and discusses some issues such as its data policy, data quality assurance measures, and current application status. Finally, it summarizes and puts forward that it needs to carry out more in-depth research and practice of ScienceDB in order to meet the higher requirements of eScience in terms of thorough data association and fusion, data analysis and mining, data evaluation, and so on.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116987225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Investigation into Acoustic Analysis Methods for Endangered Species Monitoring: A Case of Monitoring the Critically Endangered White-Bellied Heron in Bhutan 濒危物种监测声学分析方法的研究——以不丹濒危物种白鹭监测为例

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/eScience.2017.30

Tshering Dema, L. Zhang, M. Towsey, A. Truskinger, S. Sherub, Kinley, Jinglan Zhang, M. Brereton, P. Roe

{"title":"An Investigation into Acoustic Analysis Methods for Endangered Species Monitoring: A Case of Monitoring the Critically Endangered White-Bellied Heron in Bhutan","authors":"Tshering Dema, L. Zhang, M. Towsey, A. Truskinger, S. Sherub, Kinley, Jinglan Zhang, M. Brereton, P. Roe","doi":"10.1109/eScience.2017.30","DOIUrl":"https://doi.org/10.1109/eScience.2017.30","url":null,"abstract":"Passive acoustic recording has great potential for monitoring soniferous endangered and cryptic species. However, this approach requires analysis of long duration environmental acoustic recordings that span months or years. There is a variety of approaches to analysing acoustic data. However, it is unclear which approaches are best suited for monitoring of endangered species in the wild. Specifically, this study is undertaking acoustic monitoring of the critically endangered White-bellied Heron (Ardea insignis) in Bhutan. Four different acoustic analysis methods are investigated in terms of their detection accuracy, involvement of human experts, and overall utility to ecologists for target species monitoring work. Our experimental results show that human pattern detection using a visualization technique has detection performance on par with a cluster-based recogniser, while a machine learning classifier implemented using the same acoustic features suffers from very low precision. Further, specific cases of false positives and false negatives by the different methods are investigated and discussed in terms of their overall utility for ecological monitoring. Based on our experimental results, we demonstrate how an integrated semi-automated approach of human visual pattern analysis with a recogniser is a robust system for acoustic monitoring of target species.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130460670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Accelerating Genome Sequence Alignment on Hadoop on Lustre Environment 在Lustre环境下加速Hadoop基因组序列比对

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/eScience.2017.59

Eun-Kyu Byun, Junehawk Lee, S. Yu, J. Kwak, Soonwook Hwang

引用次数: 1

Hunting Data Rogues at Scale: Data Quality Control for Observational Data in Research Infrastructures 大规模搜寻数据盗贼:研究基础设施中观测数据的数据质量控制

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/ESCIENCE.2017.64

G. Pastorello, D. Gunter, H. Chu, D. Christianson, C. Trotta, E. Canfora, B. Faybishenko, Y. Cheah, N. Beekwilder, S. Chan, S. Dengel, T. Keenan, F. O'Brien, Abdelrahman Elbashandy, C. Poindexter, M. Humphrey, D. Papale, D. Agarwal

{"title":"Hunting Data Rogues at Scale: Data Quality Control for Observational Data in Research Infrastructures","authors":"G. Pastorello, D. Gunter, H. Chu, D. Christianson, C. Trotta, E. Canfora, B. Faybishenko, Y. Cheah, N. Beekwilder, S. Chan, S. Dengel, T. Keenan, F. O'Brien, Abdelrahman Elbashandy, C. Poindexter, M. Humphrey, D. Papale, D. Agarwal","doi":"10.1109/ESCIENCE.2017.64","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2017.64","url":null,"abstract":"Data quality control is one of the most time consuming activities within Research Infrastructures (RIs), especially when involving observational data and multiple data providers. In this work we report on our ongoing development of data rogues, a scalable approach to manage data quality issues for observational data within RIs. The motivation for this work started with the creation of the FLUXNET2015 dataset, which includes carbon, water, and energy fluxes plus micrometeorological and ancillary data measured in over 200 sites around the world. To create an uniform dataset, including derived data products, extensive work on data quality control was needed. The unpredictable nature of observational data quality issues makes the automation of data quality control inherently difficult. Developed based on this experience, the data rogues methodology allows for increased automation of quality control activities by systematically identifying, cataloging, and documenting implementations of solutions to data issues. We believe this methodology can be extended and applied to others domains and types of data, making the automation of data quality control a more tractable problem.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"218 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113998795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Semantic Technologies and Text Analysis in Support of Scientific Knowledge Reuse 支持科学知识重用的语义技术和文本分析

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI: 10.1109/eScience.2017.68

Andres Garcia-Silva, Raúl Palma, José Manuél Gómez-Pérez

引用次数: 1