{"title":"Message from the eScience 2018 Program Committee Chairs for the Focused Session on Exascale Computing for High-Energy Physics","authors":"J. Templon, Y. Dzigan","doi":"10.1109/eScience.2018.00079","DOIUrl":"https://doi.org/10.1109/eScience.2018.00079","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"93 1","pages":"331-331"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80494631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jorg Behrens, J. Biercamp, H. Bockelmann, P. Neumann
{"title":"Increasing Parallelism in Climate Models Via Additional Component Concurrency","authors":"Jorg Behrens, J. Biercamp, H. Bockelmann, P. Neumann","doi":"10.1109/eScience.2018.00044","DOIUrl":"https://doi.org/10.1109/eScience.2018.00044","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"21 1","pages":"271-271"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79193025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Simko, K. Cranmer, M. Crusoe, L. Heinrich, A. Khodak, Dinos Kousidis, D. Rodríguez
{"title":"Search for Computational Workflow Synergies in Reproducible Research Data Analyses in Particle Physics and Life Sciences","authors":"T. Simko, K. Cranmer, M. Crusoe, L. Heinrich, A. Khodak, Dinos Kousidis, D. Rodríguez","doi":"10.1109/eScience.2018.00123","DOIUrl":"https://doi.org/10.1109/eScience.2018.00123","url":null,"abstract":"We describe the REANA reusable and reproducible research data analysis platform that originated in the domain of particle physics. We integrated support for running Common Workflow Language (CWL) workflows that originated in the domain of life sciences. This integration allowed us to study the applicability of CWL to particle physics analyses and look for synergies in computational practices in the two communities.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"76 1","pages":"403-404"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82533057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. G. P. Bos, I. Pelupessy, V. Croft, W. Verkerke, C. Burgard
{"title":"Automated Parallel Calculation of Collaborative Statistical Models in RooFit","authors":"E. G. P. Bos, I. Pelupessy, V. Croft, W. Verkerke, C. Burgard","doi":"10.1109/eScience.2018.00089","DOIUrl":"https://doi.org/10.1109/eScience.2018.00089","url":null,"abstract":"RooFit [4], [6] is the statistical modeling and fitting package used in many big particle physics experiments to extract physical parameters from reduced particle collision data, e.g. the Higgs boson experiments at the LHC [1], [2]. RooFit aims to separate particle physics model building and fitting (the users’ goals) from their technical implementation and optimization in the back-end. In this paper, we outline our efforts to further optimize the back-end by automatically running major parts of user models in parallel on multi-core machines. A major challenge is that RooFit allows users to define many different types of models, with different types of computational bottlenecks. Our automatic parallelization framework must then be flexible, while still reducing run-time by at least an order of magnitude, preferably more. We have performed extensive benchmarks and identified at least three bottlenecks that will benefit from parallelization. To tackle these and possible future bottlenecks, we designed a parallelization layer that allows us to parallelize existing classes with minimal effort, but with high performance and retaining as much of the existing class’s interface as possible. The high-level parallelization model is a task-stealing approach. The implementation is currently based on a multi-process approach using a bi-directional memory mapped pipe for communication, which is both easy to use and highly performant. Preliminary results show speed-ups of factor 2 to 20, depending on the exact model and parallelization strategy.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"1 1","pages":"345-346"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82578311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Results and Challenges of Using Administrative Health Data Within a Natural Experimental Evaluation of the Abolition of Prescription Fees in Scotland","authors":"A. Williams, W. Henley, J. Frank","doi":"10.1109/eScience.2018.00128","DOIUrl":"https://doi.org/10.1109/eScience.2018.00128","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"57 1","pages":"412-412"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76929937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Eckart de Castilho, Jan-Christoph Klie, Naveen Kumar, Beto Boullosa, Iryna Gurevych
{"title":"Linking Text and Knowledge Using the INCEpTION Annotation Platform","authors":"Richard Eckart de Castilho, Jan-Christoph Klie, Naveen Kumar, Beto Boullosa, Iryna Gurevych","doi":"10.1109/ESCIENCE.2018.00077","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2018.00077","url":null,"abstract":"Abstract-In the Digital Humanities (DH), linking text collections to general or domain-specific knowledge bases (KBs) or authority files is important to enable a contextualised analysis. Automatic named entity recognition and entity linking tools require training data or domain-specific methods. Interactive annotation tools do often not support the tasks of entity linking, fact-linking, cross-document reference resolution, etc. We aim to address this gap with the INCEpTION annotation platform, which not only provides these capabilities in the context of a generic annotation tool, but also combines them with machine learning methods to improve annotation efficiency.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"65 1","pages":"327-328"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75974362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikolaj Baranowski, A. Belloum, R. Cushing, O. Valkering
{"title":"Cookery: A Framework for Creating Data Processing Pipeline Using Online Services","authors":"Mikolaj Baranowski, A. Belloum, R. Cushing, O. Valkering","doi":"10.1109/eScience.2018.00102","DOIUrl":"https://doi.org/10.1109/eScience.2018.00102","url":null,"abstract":"With the increasing amount of data the importance of data analysis has grown. A large amount of this data has shifted to cloud-based storage. The cloud offers storage and computation power. The Cookery framework is a tool developed to build application in the cloud for scientists without a complete understanding of programming. In this paper with present the cookery systems and how it can be used to authenticate and use standard online 3rd party services to easily create data analytics pipeline. Cookery framework is not limited to work with standard web services, it can also integrate and work with the emerging AWS Lambda. The combination of AWS Lambda and Cookery, which makes it possible for people, who do not have any program experience, to create data processing pipeline using cloud services in short time.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"38 1","pages":"368-369"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80916449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raquel Alegre, Anastasis Georgoulas, S. Grieve, E. Robson
{"title":"Democratizing Ancient Mesopotamian Research through Digital Scholarship","authors":"Raquel Alegre, Anastasis Georgoulas, S. Grieve, E. Robson","doi":"10.1109/eScience.2018.00074","DOIUrl":"https://doi.org/10.1109/eScience.2018.00074","url":null,"abstract":"Since the 19th century, historians and archaeologists have compiled transliterations and translations of surviving cuneiform texts from the Middle East area, documenting the ancient history of the region, c. 3000 BC–75 AD. The Open Richly Annotated Cuneiform Corpus (Oracc)1 is an international collaborative effort to gather and digitise a complete collection of cuneiform texts and their translations, with the goal of making them available to researchers and students worldwide. Oracc was developed ten years ago around the core value of ensuring accessibility to a broad audience, rather than a select group of experts. This principle presented new technological challenges, but has equally offered important benefits. Initial transliteration of cuneiform tablets into the ASCII Transliteration Format (ATF) was performed using an Emacs plugin, the use of which was challenging for novice and experienced users alike. This precipitated the development of Nammu [1], a dedicated editor for files written in ATF, to provide a consistent environment for users to contribute to Oracc projects. This is an important step in the democratization of this research as it lowers the technological expertise required to join the platform, and reduces the amount of time needed to train new users, which was previously a large drain on Principal Investigators’ time and resources. Nammu in turn takes advantage of pyORACC [2], a bespoke library developed for parsing ATF files and a key enabler of automation in the project. Separately to the editing considerations, the Oracc website hosts the body of information editions and translations that researchers from different groups have accumulated during their work. An important aspect of this is the search capability it offers, allowing a user to retrieve information about a subject or term of their choice. A new version of this functionality is being developed, using the ElasticSearch platform to index and efficiently search large bodies of text. Users can choose to query the compiled glossaries, looking for words with a particular meaning, or for the meaning and appearances of a transliterated cuneiform term. Alternatively, they will be able to search through the information pages for a topic of their choice, effectively using the website as a domain-specific search engine. This dual functionality has been chosen so as to make the search of interest to both domain experts and the general public. Early versions of Nammu focused on the transliteration and translation of cuneiform into English and other European languages. Meanwhile, decades of war and political instability across the Middle East have prevented researchers from Iraq, Syria and neighbouring countries from contributing to the ancient history Programming work on Oracc is funded by UCL’s School of Social and Historical Sciences, and through the Nahrein Network’s grant from the UK Arts and Humanities Council’s Global Challenges Research Fund. 1http://oracc.org of their re","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"37 1","pages":"322-322"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80937159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"eWaterCycle II","authors":"R. Hut, N. Drost, W. V. Hage, N. Giesen","doi":"10.1109/eScience.2018.00108","DOIUrl":"https://doi.org/10.1109/eScience.2018.00108","url":null,"abstract":"From a hydrological point of view, every field, every street, every part of the world, is different. We understand quite well how water moves through plants and soils at small scales but the medium is never the same from one spot to the next. This is the curse of locality. It is difficult to capture such processes with a single global model. In the last two decades, hydrology has slowly moved into two related fields: global hydrology and catchment hydrology. In global hydrology, making use of new computational resources, scientists use uniform global models at ever increasing spatial and temporal resolutions, forced with satellite data or climate model output to make claims on the global state of the hydrological cycle [1], [2]. Parallel to this development, researchers in catchment hydrology, have focussed on deriving, for each catchment that is studied, the best hydrological models for that specific catchment. This is nicely summarized in the overview paper of the last hydrological decade [3]. While global hydrologists realize that hydrological processes are locally very different and human influence even more so [4], incorporating the body of local hydrological knowledge is not easy. Catchment hydrologists realize the importance of their work to the global watercycle but often lack the (computational) resources and tools to upscale from their catchment to the global picture. The eWaterCycle II project will build and maintain an e- Infrastructure that allows for quick and safe inclusion of submodels and model concepts into global hydrological models, leading to a better understanding of the Hydrological cycle. The foreseen e-infrastructure will have a number of unique advantages, including an ability for knowledge gap discovery, machine-assisted model curation, and easily changeable model parts. In this work we will present the how we will achieve the goals of the recently started eWaterCycle II project over its three year runtime. We will show a demo of a first prototype environment where scientist can run, compare and alter different hydrological models that focus on the same region and use the same input data sources. This will work even if the underlying hydrological models are written in different programming languages without exposing the hydrologists doing the comparison to these technical intricacies. Although the eWaterCycle II project focusses on the hydrological setting, the underlying framework will be suitable outside of hydrology, wherever a collaborative environment is required. eScience aspects such as large scale data assimilation (DA) techniques, generic multi-model multi-scale environments, FAIR data as well as FAIR software, will all benefit from research done in this project.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"55 1","pages":"379-379"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76244675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Benjamin, P. Calafiura, T. Childers, K. De, A. Girolamo, E. Fullana, W. Guan, T. Maeno, Nicolò Magini, P. Nilsson, D. Oleynik, Shaojun Sun, V. Tsulaia, P. Gemmeren, T. Wenaus, W. Yang
{"title":"Fine-Grained Processing Towards HL-LHC Computing in ATLAS","authors":"D. Benjamin, P. Calafiura, T. Childers, K. De, A. Girolamo, E. Fullana, W. Guan, T. Maeno, Nicolò Magini, P. Nilsson, D. Oleynik, Shaojun Sun, V. Tsulaia, P. Gemmeren, T. Wenaus, W. Yang","doi":"10.1109/eScience.2018.00083","DOIUrl":"https://doi.org/10.1109/eScience.2018.00083","url":null,"abstract":"During LHC's Run-2 ATLAS has been developing and evaluating new fine-grained approaches to workflows and dataflows able to better utilize computing resources in terms of storage, processing and networks. The compute-limited physics of ATLAS has driven the collaboration to aggressively harvest opportunistic cycles from what are often transiently available resources, including HPCs, clouds, volunteer computing, and grid resources in transitional states. Fine-grained processing (with typically a few minutes' granularity, corresponding to one event for the present ATLAS full simulation) enables agile workflows with a light footprint on the resource such that cycles can be more fully and efficiently utilized than with conventional workflows processing O(GB) files per job.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"1 1","pages":"338-338"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79131688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}