{"title":"Consensus Sigma-70 Promoter Prediction Using Hadoop","authors":"J. Hogan, W. Kelly, Felicity Newell","doi":"10.1109/eScience.2013.42","DOIUrl":"https://doi.org/10.1109/eScience.2013.42","url":null,"abstract":"MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a well established workflow for identifying promoters - binding sites for regulatory proteins - across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the \"dominant decomposition\" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132350870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cabinet: Managing Data Efficiently in the Global Federated File System","authors":"Avinash Kalyanaraman, A. Grimshaw","doi":"10.1109/eScience.2013.36","DOIUrl":"https://doi.org/10.1109/eScience.2013.36","url":null,"abstract":"With ever expanding datasets, efficient data management in grids becomes important. This paper describes Cabinet which employs two techniques for efficiently managing data in grids-a caching system and a new file staging approach called coordinated staging. The caching system is designed based on the characteristics of grid applications. Coordinated staging is based on the BitTorrent Protocol model and is specifically designed for High Throughput Computing (HTC) applications, a common use-case for grids. In coordinated staging, each site that is assigned to execute an individual job of the HTC application treats other execution sites as potential replica-stores. In our evaluation, we show that coordinated staging lowered the download time of a file by 3.85x, and increased the throughput of the download by 2.86x over the conventional approach of file transfer from a single source.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126211396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanchun Zhou, Jing Shao, Xuezhi Wang, Ze Luo, Jianhui Li, Baoping Yan
{"title":"Bird-SDPS: A Migratory Birds' Spatial Distribution Prediction System","authors":"Yuanchun Zhou, Jing Shao, Xuezhi Wang, Ze Luo, Jianhui Li, Baoping Yan","doi":"10.1109/eScience.2013.12","DOIUrl":"https://doi.org/10.1109/eScience.2013.12","url":null,"abstract":"Species distribution modeling is an important ecological research task that has received a great deal of interest. There are several single model packages and applications available for species distribution analysis. This paper introduces Bird-SDPS, a Prediction System for Migratory Birds' Spatial Distribution, which is an extensible system for birds' spatial distribution prediction. The Bird-SDPS uses birds' GPS tracking data and remote sensing data as input to build multiple distribution models, which are implemented by different programming languages. And the system provides online access and visualization functions. In order to store large dataset of remote sensing data, we design a hybrid storage structure based on HBase. We extensively evaluate our system using a real-world GPS dataset collected from 90 wild birds over 3 years. We show that the system can conduct birds' distribution prediction based on multiple models, and our hybrid data storage modes can outperform the traditional storage modes of files.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126758593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Travis Desell, Robert Bergman, K. Goehner, R. Marsh, Rebecca VanderClute, Susan N. Ellis‐Felege
{"title":"Wildlife@Home: Combining Crowd Sourcing and Volunteer Computing to Analyze Avian Nesting Video","authors":"Travis Desell, Robert Bergman, K. Goehner, R. Marsh, Rebecca VanderClute, Susan N. Ellis‐Felege","doi":"10.1109/ESCIENCE.2013.50","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2013.50","url":null,"abstract":"New camera technology is allowing avian ecologists to perform detailed studies of avian behavior, nesting strategies and predation in areas where it was previously impossible to gather data. Unfortunately, studies have shown mechanical triggers and a variety of sensors to be inadequate in capturing footage of small predators (e.g., snakes, rodents) or events in dense vegetation. Because of this, continuous camera recording is currently the most robust solution for avian monitoring, especially in ground nesting species. However, continuous video footage results in a data deluge, as monitoring enough nests to make biologically significant inferences results in massive amounts of data which is unclassifiable by humans alone. In the summer of 2012, Dr. Ellis-Felege gathered video footage from 63 sharp-tailed grouse (Tympanuchus phasianellus) nests, as well as preliminary interior least tern (Sternula antillarum) and piping plover (Charadrius melodus) nests, resulting in over 20,000 hours of video footage. In order to effectively analyze this video, a project combining both crowd sourcing and volunteer computing was developed, where volunteers can stream nesting video and report their observations, as well as have their computers download video for analysis by computer vision techniques. This provides a robust way to analyze the video, as user observations are validated by multiple views as well as the results of the computer vision techniques. This work provides initial results analyzing the effectiveness of the crowd sourced observations and computer vision techniques.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131368829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Truskinger, Mark Cottman-Fields, Daniel M. Johnson, P. Roe
{"title":"Rapid Scanning of Spectrograms for Efficient Identification of Bioacoustic Events in Big Data","authors":"A. Truskinger, Mark Cottman-Fields, Daniel M. Johnson, P. Roe","doi":"10.1109/eScience.2013.25","DOIUrl":"https://doi.org/10.1109/eScience.2013.25","url":null,"abstract":"Acoustic sensing is a promising approach to scaling faunal biodiversity monitoring. Scaling the analysis of audio collected by acoustic sensors is a big data problem. Standard approaches for dealing with big acoustic data include automated recognition and crowd based analysis. Automatic methods are fast at processing but hard to rigorously design, whilst manual methods are accurate but slow at processing. In particular, manual methods of acoustic data analysis are constrained by a 1:1 time relationship between the data and its analysts. This constraint is the inherent need to listen to the audio data. This paper demonstrates how the efficiency of crowd sourced sound analysis can be increased by an order of magnitude through the visual inspection of audio visualized as spectrograms. Experimental data suggests that an analysis speedup of 12× is obtainable for suitable types of acoustic analysis, given that only spectrograms are shown.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121365963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}