{"title":"Applying Lessons from e-Discovery to Process Big Data using HPC","authors":"Sukrit Sondhi, R. Arora","doi":"10.1145/2616498.2616525","DOIUrl":null,"url":null,"abstract":"The term 'Big Data' defines large datasets that are difficult to use and manage through conventional software tools. Legal Electronic Discovery (e-Discovery) is a business domain which has massive consumption of Big Data, where electronic records such as e-mail, documents, databases and social media postings are processed in order to discover evidence that may be pertinent to legal/compliance needs, litigation or other investigations. Numerous vendors exist in the market to provide organizations with services such as data collection, digital forensics and electronic discovery. High-end instrumentation and modern information technologies are creating data at an ever increasing rate. The challenges associated with managing the large datasets are related to the capture, storage, search, sharing, analytics, and visualization of the data. Big Data also offers unprecedented opportunities in other fields, ranging from astronomy and biology to marketing and e-commerce. This paper presents lessons learnt from the legal e-Discovery domain that can be adapted to process Big Data effectively on HPC resources, thereby benefitting the various disciplines of science, engineering and business that are grappling with a deluge of Big Data challenges and opportunities.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"52 1","pages":"8:1-8:2"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The term 'Big Data' defines large datasets that are difficult to use and manage through conventional software tools. Legal Electronic Discovery (e-Discovery) is a business domain which has massive consumption of Big Data, where electronic records such as e-mail, documents, databases and social media postings are processed in order to discover evidence that may be pertinent to legal/compliance needs, litigation or other investigations. Numerous vendors exist in the market to provide organizations with services such as data collection, digital forensics and electronic discovery. High-end instrumentation and modern information technologies are creating data at an ever increasing rate. The challenges associated with managing the large datasets are related to the capture, storage, search, sharing, analytics, and visualization of the data. Big Data also offers unprecedented opportunities in other fields, ranging from astronomy and biology to marketing and e-commerce. This paper presents lessons learnt from the legal e-Discovery domain that can be adapted to process Big Data effectively on HPC resources, thereby benefitting the various disciplines of science, engineering and business that are grappling with a deluge of Big Data challenges and opportunities.