Zhiyuan Lin, Minsuk Kahng, Kaeser Md Sabrin, Duen Horng Polo Chau, Ho Lee, U Kang
{"title":"MMap: Fast Billion-Scale Graph Computation on a PC via Memory Mapping.","authors":"Zhiyuan Lin, Minsuk Kahng, Kaeser Md Sabrin, Duen Horng Polo Chau, Ho Lee, U Kang","doi":"10.1109/BigData.2014.7004226","DOIUrl":"https://doi.org/10.1109/BigData.2014.7004226","url":null,"abstract":"<p><p>Graph computation approaches such as GraphChi and TurboGraph recently demonstrated that a single PC can perform efficient computation on billion-node graphs. To achieve high speed and scalability, they often need sophisticated data structures and memory management strategies. We propose a minimalist approach that forgoes such requirements, by leveraging the fundamental <i>memory mapping</i> (MMap) capability found on operating systems. We contribute: (1) a new insight that MMap is a viable technique for creating fast and scalable graph algorithms that surpasses some of the best techniques; (2) the design and implementation of popular graph algorithms for billion-scale graphs with little code, thanks to memory mapping; (3) extensive experiments on real graphs, including the 6.6 billion edge YahooWeb graph, and show that this new approach is significantly faster or comparable to the highly-optimized methods (e.g., 9.5× faster than GraphChi for computing PageRank on 1.47B edge Twitter graph). We believe our work provides a new direction in the design and development of scalable algorithms. Our packaged code is available at http://poloclub.gatech.edu/mmap/.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2014 ","pages":"159-164"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BigData.2014.7004226","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33212248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Epidemiological Modeling of Bovine Brucellosis in India.","authors":"Gloria J Kang, L Gunaseelan, Kaja M Abbas","doi":"10.1109/BigData.2014.7004420","DOIUrl":"10.1109/BigData.2014.7004420","url":null,"abstract":"<p><p>The study objective is to develop an epidemiological model of brucellosis transmission dynamics among cattle in India and to estimate the impact of different prevention and control strategies. The prevention and control strategies are test-and-slaughter, transmission rate reduction, and mass vaccination. We developed a mathematical model based on the susceptible-infectious-recovered epidemic model to simulate brucellosis transmission dynamics, calibrated to the endemically stable levels of bovine brucellosis prevalence of cattle in India. We analyzed the epidemiological benefit of different rates of reduced transmission and vaccination. Test-and-slaughter is an effective strategy for elimination and eradication of brucellosis, but socio-cultural constraints forbid culling of cattle in India. Reducing transmission rates lowered the endemically stable levels of brucellosis prevalence correspondingly. One-time vaccination lowered prevalence initially but increased with influx of new susceptible births. While this epidemiological model is a basic representation of brucellosis transmission dynamics in India and constrained by limitations in surveillance data, this study illustrates the comparative epidemiological impact of different bovine brucellosis prevention and control strategies.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2014 ","pages":"6-10"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4537291/pdf/nihms714156.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33925828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and Its Application to SNOMED CT.","authors":"Guo-Qiang Zhang, Wei Zhu, Mengmeng Sun, Shiqiang Tao, Olivier Bodenreider, Licong Cui","doi":"10.1109/BigData.2014.7004301","DOIUrl":"10.1109/BigData.2014.7004301","url":null,"abstract":"<p><p>Non-lattice fragments are often indicative of structural anomalies in ontological systems and, as such, represent possible areas of focus for subsequent quality assurance work. However, extracting the non-lattice fragments in large ontological systems is computationally expensive if not prohibitive, using a traditional sequential approach. In this paper we present a general MapReduce pipeline, called MaPLE (MapReduce Pipeline for Lattice-based Evaluation), for extracting non-lattice fragments in large partially ordered sets and demonstrate its applicability in ontology quality assurance. Using MaPLE in a 30-node Hadoop local cloud, we systematically extracted non-lattice fragments in 8 SNOMED CT versions from 2009 to 2014 (each containing over 300k concepts), with an average total computing time of less than 3 hours per version. With dramatically reduced time, MaPLE makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies, but also to systematically track structural changes between versions. Our change analysis showed that the average change rates on the non-lattice pairs are up to 38.6 times higher than the change rates of the background structure (concept nodes). This demonstrates that fragments around non-lattice pairs exhibit significantly higher rates of change in the process of ontological evolution.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2014 ","pages":"754-759"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4334137/pdf/nihms-654706.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33075407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maryam Panahiazar, Vahid Taslimitehrani, Ashutosh Jadhav, Jyotishman Pathak
{"title":"Empowering Personalized Medicine with Big Data and Semantic Web Technology: Promises, Challenges, and Use Cases.","authors":"Maryam Panahiazar, Vahid Taslimitehrani, Ashutosh Jadhav, Jyotishman Pathak","doi":"10.1109/BigData.2014.7004307","DOIUrl":"https://doi.org/10.1109/BigData.2014.7004307","url":null,"abstract":"<p><p>In healthcare, big data tools and technologies have the potential to create significant value by improving outcomes while lowering costs for each individual patient. Diagnostic images, genetic test results and biometric information are increasingly generated and stored in electronic health records presenting us with challenges in data that is by nature high volume, variety and velocity, thereby necessitating novel ways to store, manage and process big data. This presents an urgent need to develop new, scalable and expandable big data infrastructure and analytical methods that can enable healthcare providers access knowledge for the individual patient, yielding better decisions and outcomes. In this paper, we briefly discuss the nature of big data and the role of semantic web and data analysis for generating \"smart data\" which offer actionable information that supports better decision for personalized medicine. In our view, the biggest challenge is to create a system that makes big data robust and smart for healthcare providers and patients that can lead to more effective clinical decision-making, improved health outcomes, and ultimately, managing the healthcare costs. We highlight some of the challenges in using big data and propose the need for a semantic data-driven environment to address them. We illustrate our vision with practical use cases, and discuss a path for empowering personalized medicine using big data and semantic web technology.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2014 ","pages":"790-795"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BigData.2014.7004307","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33075408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Pienta, Acar Tamersoy, Hanghang Tong, Duen Horng Chau
{"title":"MAGE: Matching Approximate Patterns in Richly-Attributed Graphs.","authors":"Robert Pienta, Acar Tamersoy, Hanghang Tong, Duen Horng Chau","doi":"10.1109/BigData.2014.7004278","DOIUrl":"10.1109/BigData.2014.7004278","url":null,"abstract":"<p><p>Given a large graph with millions of nodes and edges, say a social network where both its nodes and edges have multiple attributes (e.g., job titles, tie strengths), how to quickly find subgraphs of interest (e.g., a ring of businessmen with strong ties)? We present MAGE, a scalable, multicore subgraph matching approach that supports expressive queries over large, richly-attributed graphs. Our major contributions include: (1) MAGE supports graphs with both node and edge attributes (most existing approaches handle either one, but not both); (2) it supports expressive queries, allowing multiple attributes on an edge, wildcards as attribute values (i.e., match <i>any</i> permissible values), and attributes with continuous values; and (3) it is scalable, supporting graphs with several hundred million edges. We demonstrate MAGE's effectiveness and scalability via extensive experiments on large real and synthetic graphs, such as a Google+ social network with <i>460 million</i> edges.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2014 ","pages":"585-590"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4388251/pdf/nihms675787.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33204892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiqi Chen, Zhiyuan Lin, Robert Pienta, Minsuk Kahng, Duen Horng Chau
{"title":"Towards Scalable Graph Computation on Mobile Devices.","authors":"Yiqi Chen, Zhiyuan Lin, Robert Pienta, Minsuk Kahng, Duen Horng Chau","doi":"10.1109/BigData.2014.7004353","DOIUrl":"10.1109/BigData.2014.7004353","url":null,"abstract":"<p><p>Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a <i>single</i> mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar <i>memory mapping</i> capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as <i>272 million</i> edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2014 ","pages":"29-35"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4388237/pdf/nihms675767.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33203395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daphne Lopez, M Gunasekaran, B Senthil Murugan, Harpreet Kaur, Kaja M Abbas
{"title":"Spatial Big Data Analytics of Influenza Epidemic in Vellore, India.","authors":"Daphne Lopez, M Gunasekaran, B Senthil Murugan, Harpreet Kaur, Kaja M Abbas","doi":"10.1109/BigData.2014.7004422","DOIUrl":"https://doi.org/10.1109/BigData.2014.7004422","url":null,"abstract":"<p><p>The study objective is to develop a big spatial data model to predict the epidemiological impact of influenza in Vellore, India. Large repositories of geospatial and health data provide vital statistics on surveillance and epidemiological metrics, and valuable insight into the spatiotemporal determinants of disease and health. The integration of these big data sources and analytics to assess risk factors and geospatial vulnerability can assist to develop effective prevention and control strategies for influenza epidemics and optimize allocation of limited public health resources. We used the spatial epidemiology data of the HIN1 epidemic collected at the National Informatics Center during 2009-2010 in Vellore. We developed an ecological niche model based on geographically weighted regression for predicting influenza epidemics in Vellore, India during 2013-2014. Data on rainfall, temperature, wind speed, humidity and population are included in the geographically weighted regression analysis. We inferred positive correlations for H1N1 influenza prevalence with rainfall and wind speed, and negative correlations for H1N1 influenza prevalence with temperature and humidity. We evaluated the results of the geographically weighted regression model in predicting the spatial distribution of the influenza epidemic during 2013-2014.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2014 ","pages":"19-24"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BigData.2014.7004422","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33862020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E Thomas Ewing, Samah Gad, Naren Ramakrishnan, Jeffrey S Reznick
{"title":"Understanding the Role of Medical Experts during a Public Health Crisis Digital Tools and Library Resources for Research on the 1918 Spanish Influenza.","authors":"E Thomas Ewing, Samah Gad, Naren Ramakrishnan, Jeffrey S Reznick","doi":"10.1109/BigData.2014.7004451","DOIUrl":"10.1109/BigData.2014.7004451","url":null,"abstract":"<p><p>Humanities scholars, particularly historians of health and disease, can benefit from digitized library collections and tools such as topic modeling. Using a case study from the 1918 Spanish Flu epidemic, this paper explores the application of a big humanities approach to understanding the impact of a public health official on the course of the disease and the response of the public, as documented through digitized newspapers and medical periodicals.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2014 ","pages":"39-46"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582797/pdf/nihms717146.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34040099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}