2012 IEEE 8th International Conference on E-Science最新文献_第7页

Digitization and search: A non-traditional use of HPC 数字化与搜索:高性能计算的非传统应用

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404445

Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry

{"title":"Digitization and search: A non-traditional use of HPC","authors":"Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry","doi":"10.1109/eScience.2012.6404445","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404445","url":null,"abstract":"Automated search of handwritten content is a highly interesting and applicative subject, especially important today due to the public availability of large digitized document collections. We describe our efforts with the National Archives (NARA) to provide searchable access to the 1940 Census data and discuss the HPC resources needed to implement the suggested framework. Instead of trying to recognize the handwritten text, a still very difficult task, we use a content based image retrieval technique known as Word Spotting. Through this paradigm, the system is queried by the use of handwritten text images instead of ASCII text and ranked groups of similar looking images are presented to the user. A significant amount of computing power is needed to accomplish the pre-processing of the data so to make this search capability available on an archive. The required preprocessing steps and the open source framework developed are discussed focusing specifically on HPC considerations that are relevant when preparing to provide searchable access to sizeable collections, such as the US Census. Having processed the state of North Carolina from the 1930 Census using 98,000 SUs we estimate the processing of the entire country for 1940 could require up to 2.5 million SUs. The proposed framework can be used to provide an alternative to costly manual transcriptions for a variety of digitized paper archives.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"113 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80601556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

eResearch environment for remote instrumentation: VBL, RLI, VisLabl & 2 远程仪器研究环境:VBL, RLI, VisLabl & 2

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404465

C. Myers, Michael D'Silva

引用次数: 0

Partial replica selection for spatial datasets 空间数据集的部分副本选择

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404473

Yun Tian, P. J. Rhodes

{"title":"Partial replica selection for spatial datasets","authors":"Yun Tian, P. J. Rhodes","doi":"10.1109/eScience.2012.6404473","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404473","url":null,"abstract":"The implementation of partial or incomplete replicas, which represent only a subset of a larger dataset, has been an active topic of research. Partial Spatial Replicas extend this functionality to spatial data, allowing us to distribute a spatial dataset in pieces over several locations. Accessing only a subset of a spatial replica usually results in a large number of relatively small read requests made to the underlying storage device. For this reason, an accurate model of disk access is important when working with spatial subsets. We make two primary contributions in this paper. First, we describe a model for disk access performance that takes filesystem prefetching into account and is sufficiently accurate for spatial replica selection. Second, making a few simplifying assumptions, we propose a fast replica selection algorithm for partial spatial replicas. The algorithm uses a greedy approach that attempts to maximize performance by choosing a collection of replica subsets that allow fast data retrieval by a client machine. Experiments show that the performance of the solution found by our algorithm is on average always at least 91% and 93.4% of the performance of the optimal solution in 4-node and 8-node tests respectively.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"59 1 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89349493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A system for management of Computational Fluid Dynamics simulations for civil engineering 土木工程计算流体力学模拟管理系统

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404433

Peter Sempolinski, D. Thain, Daniel Wei, A. Kareem

引用次数: 4

Temporal representation for scientific data provenance 科学数据来源的时态表示

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404477

Peng Chen, Beth Plale, M. Aktaş

引用次数: 29

A data-driven urban research environment for Australia 数据驱动的澳大利亚城市研究环境

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404481

R. Sinnott, Christopher Bayliss, G. Galang, Phillip Greenwood, George Koetsier, D. Mannix, L. Morandini, Marcos Nino-Ruiz, C. Pettit, Martin Tomko, M. Sarwar, R. Stimson, W. Voorsluys, I. Widjaja

引用次数: 19

High-performance computing without commitment: SC2IT: A cloud computing interface that makes computational science available to non-specialists 无需承诺的高性能计算:SC2IT:使非专业人员可以使用计算科学的云计算接口

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404441

K. Jorissen, W. Johnson, F. Vila, J. Rehr

{"title":"High-performance computing without commitment: SC2IT: A cloud computing interface that makes computational science available to non-specialists","authors":"K. Jorissen, W. Johnson, F. Vila, J. Rehr","doi":"10.1109/eScience.2012.6404441","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404441","url":null,"abstract":"Computational work is a vital part of many scientific studies. In materials science research in particular, theoretical models are often needed to understand measurements. There is currently a double barrier that keeps a broad class of researchers from using state-of-the-art materials science codes: the software typically lacks user-friendliness, and the hardware requirements can demand a significant investment, e.g. the purchase of a Beowulf cluster. Scientific Cloud Computing has the potential to remove this barrier and make computational science accessible to a wider class of scientists who are not computational specialists. We present a set of interface tools, SC2IT, that enables seamless control of virtual compute clusters in the Amazon EC2 cloud and is designed to be embedded in user-friendly Java GUIs. We present applications of our Scientific Cloud Computing method to the materials science codes FEFF9, WIEN2k, and MEEP-mpi. SC2IT and the paradigm described here are applicable to other fields of research outside materials science within current Cloud Computing capability.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"22 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80195839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Discovering drug targets for neglected diseases using a pharmacophylogenomic cloud workflow 使用药理学云工作流发现被忽视疾病的药物靶点

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404431

Kary A. C. S. Ocaña, Daniel de Oliveira, Jonas Dias, Eduardo S. Ogasawara, M. Mattoso

{"title":"Discovering drug targets for neglected diseases using a pharmacophylogenomic cloud workflow","authors":"Kary A. C. S. Ocaña, Daniel de Oliveira, Jonas Dias, Eduardo S. Ogasawara, M. Mattoso","doi":"10.1109/eScience.2012.6404431","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404431","url":null,"abstract":"Illnesses caused by parasitic protozoan are a research priority. A representative group of these illnesses is the commonly known as Neglected Tropical Diseases (NTD). NTD specially attack low socioeconomic population around the world and new anti-protozoan inhibitors are needed and several drug discovery projects focus on researching new drug targets. Pharmacophylogenomics is a novel bioinformatics field that aims at reducing the time and the financial cost of the drug discovery process. Pharmacophylogenomic analyses are applied mainly in the early stages of the research phase in drug discovery. Pharmacophylogenomic analysis executes several bioinformatics programs in a coherent flow to identify homologues sequences, construct phylogenetic trees and execute evolutionary and structural experiments. This way, it can be modeled as scientific workflows. Pharmacophylogenomic analysis workflows are complex, computing and data intensive and may execute during weeks. This way, it benefits from parallel execution. We propose SciPPGx, a scientific workflow that aims at providing thorough inferring support for pharmacophylogenomic hypotheses. SciPPGx is executed in parallel in a cloud using SciCumulus workflow engine. Experiments show that SciPPGx considerably reduces the total execution time up to 97.1% when compared to a sequential execution. We also present representative biological results taking advantage of the inference covering several related bioinformatics overviews.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"3 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74944990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

BIGS: A framework for large-scale image processing and analysis over distributed and heterogeneous computing resources BIGS:用于在分布式和异构计算资源上进行大规模图像处理和分析的框架

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404424

R. Ramos-Pollán, F. González, Juan C. Caicedo, Angel Cruz-Roa, Jorge E. Camargo, Jorge A. Vanegas, Santiago A. Pérez-Rubiano, J. Bermeo, Juan Sebastian Otálora Montenegro, Paola K. Rozo, John Arevalo

{"title":"BIGS: A framework for large-scale image processing and analysis over distributed and heterogeneous computing resources","authors":"R. Ramos-Pollán, F. González, Juan C. Caicedo, Angel Cruz-Roa, Jorge E. Camargo, Jorge A. Vanegas, Santiago A. Pérez-Rubiano, J. Bermeo, Juan Sebastian Otálora Montenegro, Paola K. Rozo, John Arevalo","doi":"10.1109/eScience.2012.6404424","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404424","url":null,"abstract":"This paper presents BIGS the Big Image Data Analysis Toolkit, a software framework for large scale image processing and analysis over heterogeneous computing resources, such as those available in clouds, grids, computer clusters or throughout scattered computer resources (desktops, labs) in an opportunistic manner. Through BIGS, eScience for image processing and analysis is conceived to exploit coarse grained parallelism based on data partitioning and parameter sweeps, avoiding the need of inter-process communication and, therefore, enabling loosely coupled computing nodes (BIGS workers). It adopts an uncommitted resource allocation model where (1) experimenters define their image processing pipelines in a simple configuration file, (2) a schedule of jobs is generated and (3) workers, as they become available, take over pending jobs as long as their dependency on other jobs is fulfilled. BIGS workers act autonomously, querying the job schedule to determine which one to take over. This removes the need for a central scheduling node, requiring only access by all workers to a shared information source. Furthermore, BIGS workers are encapsulated within different technologies to enable their agile deployment over the available computing resources. Currently they can be launched through the Amazon EC2 service over their cloud resources, through Java Web Start from any desktop computer and through regular scripting or SSH commands. This suits well different kinds of research environments, both when accessing dedicated computing clusters or clouds with committed computing capacity or when using opportunistic computing resources whose access is seldom or cannot be provisioned in advance. We also adopt a NoSQL storage model to ensure the scalability of the shared information sources required by all workers, including within BIGS support for HBase and Amazon's DynamoDB service. Overall, BIGS now enables researchers to run large scale image processing pipelines in an easy, affordable and unplanned manner with the capability to take over computing resources as they become available at run time. This is shown in this paper by using BIGS in different experimental setups in the Amazon cloud and in an opportunistic manner, demonstrating its configurability, adaptability and scalability capabilities.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"69 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75651632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

IRMIS: The care and feeding of a generalized relatively relational database for accelerator components with a connection to the real time EPICS Input output controllers IRMIS:与实时EPICS输入输出控制器连接的加速器组件的通用相对关系数据库的维护和馈送

2012 IEEE 8th International Conference on E-Science Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404469

R. Farnsworth, S. Benes

引用次数: 0