2018 IEEE 14th International Conference on e-Science (e-Science)最新文献_第10页

Linking Text and Knowledge Using the INCEpTION Annotation Platform 使用INCEpTION注释平台链接文本和知识

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/ESCIENCE.2018.00077

Richard Eckart de Castilho, Jan-Christoph Klie, Naveen Kumar, Beto Boullosa, Iryna Gurevych

引用次数: 9

Increasing Parallelism in Climate Models Via Additional Component Concurrency 通过附加组件并发增加气候模式的并行性

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00044

Jorg Behrens, J. Biercamp, H. Bockelmann, P. Neumann

引用次数: 3

Democratizing Ancient Mesopotamian Research through Digital Scholarship 通过数字学术使古代美索不达米亚研究民主化

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00074

Raquel Alegre, Anastasis Georgoulas, S. Grieve, E. Robson

{"title":"Democratizing Ancient Mesopotamian Research through Digital Scholarship","authors":"Raquel Alegre, Anastasis Georgoulas, S. Grieve, E. Robson","doi":"10.1109/eScience.2018.00074","DOIUrl":"https://doi.org/10.1109/eScience.2018.00074","url":null,"abstract":"Since the 19th century, historians and archaeologists have compiled transliterations and translations of surviving cuneiform texts from the Middle East area, documenting the ancient history of the region, c. 3000 BC–75 AD. The Open Richly Annotated Cuneiform Corpus (Oracc)1 is an international collaborative effort to gather and digitise a complete collection of cuneiform texts and their translations, with the goal of making them available to researchers and students worldwide. Oracc was developed ten years ago around the core value of ensuring accessibility to a broad audience, rather than a select group of experts. This principle presented new technological challenges, but has equally offered important benefits. Initial transliteration of cuneiform tablets into the ASCII Transliteration Format (ATF) was performed using an Emacs plugin, the use of which was challenging for novice and experienced users alike. This precipitated the development of Nammu [1], a dedicated editor for files written in ATF, to provide a consistent environment for users to contribute to Oracc projects. This is an important step in the democratization of this research as it lowers the technological expertise required to join the platform, and reduces the amount of time needed to train new users, which was previously a large drain on Principal Investigators’ time and resources. Nammu in turn takes advantage of pyORACC [2], a bespoke library developed for parsing ATF files and a key enabler of automation in the project. Separately to the editing considerations, the Oracc website hosts the body of information editions and translations that researchers from different groups have accumulated during their work. An important aspect of this is the search capability it offers, allowing a user to retrieve information about a subject or term of their choice. A new version of this functionality is being developed, using the ElasticSearch platform to index and efficiently search large bodies of text. Users can choose to query the compiled glossaries, looking for words with a particular meaning, or for the meaning and appearances of a transliterated cuneiform term. Alternatively, they will be able to search through the information pages for a topic of their choice, effectively using the website as a domain-specific search engine. This dual functionality has been chosen so as to make the search of interest to both domain experts and the general public. Early versions of Nammu focused on the transliteration and translation of cuneiform into English and other European languages. Meanwhile, decades of war and political instability across the Middle East have prevented researchers from Iraq, Syria and neighbouring countries from contributing to the ancient history Programming work on Oracc is funded by UCL’s School of Social and Historical Sciences, and through the Nahrein Network’s grant from the UK Arts and Humanities Council’s Global Challenges Research Fund. 1http://oracc.org of their re","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"37 1","pages":"322-322"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80937159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cookery: A Framework for Creating Data Processing Pipeline Using Online Services 烹饪:一个使用在线服务创建数据处理管道的框架

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00102

Mikolaj Baranowski, A. Belloum, R. Cushing, O. Valkering

引用次数: 5

Message from the eScience 2018 Program Committee Chairs for the Focused Session on Exascale Computing for High-Energy Physics 来自eScience 2018项目委员会主席关于高能物理百亿亿次计算重点会议的信息

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00079

J. Templon, Y. Dzigan

引用次数: 0

Automated Parallel Calculation of Collaborative Statistical Models in RooFit 协同统计模型的自动并行计算

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00089

E. G. P. Bos, I. Pelupessy, V. Croft, W. Verkerke, C. Burgard

{"title":"Automated Parallel Calculation of Collaborative Statistical Models in RooFit","authors":"E. G. P. Bos, I. Pelupessy, V. Croft, W. Verkerke, C. Burgard","doi":"10.1109/eScience.2018.00089","DOIUrl":"https://doi.org/10.1109/eScience.2018.00089","url":null,"abstract":"RooFit [4], [6] is the statistical modeling and fitting package used in many big particle physics experiments to extract physical parameters from reduced particle collision data, e.g. the Higgs boson experiments at the LHC [1], [2]. RooFit aims to separate particle physics model building and fitting (the users’ goals) from their technical implementation and optimization in the back-end. In this paper, we outline our efforts to further optimize the back-end by automatically running major parts of user models in parallel on multi-core machines. A major challenge is that RooFit allows users to define many different types of models, with different types of computational bottlenecks. Our automatic parallelization framework must then be flexible, while still reducing run-time by at least an order of magnitude, preferably more. We have performed extensive benchmarks and identified at least three bottlenecks that will benefit from parallelization. To tackle these and possible future bottlenecks, we designed a parallelization layer that allows us to parallelize existing classes with minimal effort, but with high performance and retaining as much of the existing class’s interface as possible. The high-level parallelization model is a task-stealing approach. The implementation is currently based on a multi-process approach using a bi-directional memory mapped pipe for communication, which is both easy to use and highly performant. Preliminary results show speed-ups of factor 2 to 20, depending on the exact model and parallelization strategy.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"1 1","pages":"345-346"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82578311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Search for Computational Workflow Synergies in Reproducible Research Data Analyses in Particle Physics and Life Sciences 粒子物理和生命科学中可重复研究数据分析中计算工作流协同效应的搜索

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00123

T. Simko, K. Cranmer, M. Crusoe, L. Heinrich, A. Khodak, Dinos Kousidis, D. Rodríguez

引用次数: 5

Catching Toad Calls in the Cloud: Commodity Edge Computing for Flexible Analysis of Big Sound Data 在云端捕捉蟾蜍的叫声:用于灵活分析大声音数据的商品边缘计算

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00022

P. Roe, Meriem Ferroudj, M. Towsey, L. Schwarzkopf

{"title":"Catching Toad Calls in the Cloud: Commodity Edge Computing for Flexible Analysis of Big Sound Data","authors":"P. Roe, Meriem Ferroudj, M. Towsey, L. Schwarzkopf","doi":"10.1109/eScience.2018.00022","DOIUrl":"https://doi.org/10.1109/eScience.2018.00022","url":null,"abstract":"Passive acoustic recording has great potential for monitoring both endangered and pest species. However, the automatic analysis of natural sound recordings is challenging due to geographic variation in background sounds in habitats and species calls. We have designed and deployed an acoustic sensor network constituting an early warning system for a vocal invasive species, in particular cane toads. The challenging nature of recognising toad calls and the big data arising from sound recording gave rise to a novel edge computing system which permits both effective monitoring and flexible experimentation. This is achieved through a multi-stage analysis system in which calls are detected and progressively filtered, to both reduce data communication needs and to improve detection accuracy. The filtering occurs across different stages of the cloud system. This permits flexible experimentation, for example when a new call or false positive is received. Furthermore, to balance the loss of data from aggressive filtering (call recognition), novel overview techniques are employed to provide data summaries. In this way an end user can receive alerts that a toad call is present, the system can be tuned on the fly, and the user can view summary data to have confidence that the system is functioning correctly. The system has been deployed and is in day-to-day use. The novel approaches taken are applicable to other edge computing systems, which analyse large data streams looking for infrequent events and the system has application for monitoring other vocal species.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"58 1","pages":"67-74"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80770774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Understanding the Performance of a Prototype of a WLCG Data Lake for HL-LHC HL-LHC中WLCG数据湖原型的性能研究

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00080

J. Schovancová, S. Campana, X. E. Curull, M. Girone, I. Kadochnikov, G. McCance

{"title":"Understanding the Performance of a Prototype of a WLCG Data Lake for HL-LHC","authors":"J. Schovancová, S. Campana, X. E. Curull, M. Girone, I. Kadochnikov, G. McCance","doi":"10.1109/eScience.2018.00080","DOIUrl":"https://doi.org/10.1109/eScience.2018.00080","url":null,"abstract":"Storage is identified as one of the main challenges for WLCG in the next decade in the computing strategy document for HL-LHC [1]. Extrapolating todays computing models, the ATLAS and CMS experiments alone would need one order of magnitude more storage resources than what could be provided by the funding agencies. Organization and consolidation of storage and evolution of the compute facilities will be central in addressing the possible resources shortage. In this contribution we describe the architecture of a prototype of a WLCG data lake for HL-LHC. A WLCG data lake aims to provide a geographically distributed storage service, distributed across large data centres interconnected by fast networks with low latency. We present methodology used to measure and understand the performance of a WLCG data lake prototype, in order to compare event throughput at the same cost at the same compute facilities backed by the traditional storage services and backed by the WLCG data lake. We will discuss various possible data processing models w.r.t. network latency, available storage media, and data caching approaches. We will present benchmarks for storage service and compute performance.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"22 1","pages":"332-333"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74813409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

eWaterCycle II

2018 IEEE 14th International Conference on e-Science (e-Science) Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00108

R. Hut, N. Drost, W. V. Hage, N. Giesen

{"title":"eWaterCycle II","authors":"R. Hut, N. Drost, W. V. Hage, N. Giesen","doi":"10.1109/eScience.2018.00108","DOIUrl":"https://doi.org/10.1109/eScience.2018.00108","url":null,"abstract":"From a hydrological point of view, every field, every street, every part of the world, is different. We understand quite well how water moves through plants and soils at small scales but the medium is never the same from one spot to the next. This is the curse of locality. It is difficult to capture such processes with a single global model. In the last two decades, hydrology has slowly moved into two related fields: global hydrology and catchment hydrology. In global hydrology, making use of new computational resources, scientists use uniform global models at ever increasing spatial and temporal resolutions, forced with satellite data or climate model output to make claims on the global state of the hydrological cycle [1], [2]. Parallel to this development, researchers in catchment hydrology, have focussed on deriving, for each catchment that is studied, the best hydrological models for that specific catchment. This is nicely summarized in the overview paper of the last hydrological decade [3]. While global hydrologists realize that hydrological processes are locally very different and human influence even more so [4], incorporating the body of local hydrological knowledge is not easy. Catchment hydrologists realize the importance of their work to the global watercycle but often lack the (computational) resources and tools to upscale from their catchment to the global picture. The eWaterCycle II project will build and maintain an e- Infrastructure that allows for quick and safe inclusion of submodels and model concepts into global hydrological models, leading to a better understanding of the Hydrological cycle. The foreseen e-infrastructure will have a number of unique advantages, including an ability for knowledge gap discovery, machine-assisted model curation, and easily changeable model parts. In this work we will present the how we will achieve the goals of the recently started eWaterCycle II project over its three year runtime. We will show a demo of a first prototype environment where scientist can run, compare and alter different hydrological models that focus on the same region and use the same input data sources. This will work even if the underlying hydrological models are written in different programming languages without exposing the hydrologists doing the comparison to these technical intricacies. Although the eWaterCycle II project focusses on the hydrological setting, the underlying framework will be suitable outside of hydrology, wherever a collaborative environment is required. eScience aspects such as large scale data assimilation (DA) techniques, generic multi-model multi-scale environments, FAIR data as well as FAIR software, will all benefit from research done in this project.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"55 1","pages":"379-379"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76244675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1