2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering最新文献

Research Opportunities for the Big Data Era of Software Engineering 软件工程大数据时代的研究机遇

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.13

R. Deline

引用次数: 17

Big Data System Development: An Embedded Case Study with a Global Outsourcing Firm 大数据系统开发:一个全球外包公司的嵌入式案例研究

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.15

Hong-Mei Chen, R. Kazman, Serge Haziyev, Olha Hrytsay

{"title":"Big Data System Development: An Embedded Case Study with a Global Outsourcing Firm","authors":"Hong-Mei Chen, R. Kazman, Serge Haziyev, Olha Hrytsay","doi":"10.1109/BIGDSE.2015.15","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.15","url":null,"abstract":"Big data system development is dramatically different from small (traditional, structured) data system development. At the end of 2014, big data deployment is still scarce and failures abound. Outsourcing has become a main strategy for many enterprises. We therefore selected an outsourcing company who has successfully deployed big data projects for our study. Our research results from analyzing 10 outsourced big data projects provide a glimpse into early adopters of big data, illuminates the challenges for system development that stem from the 5Vs of big data and crystallizes the importance of architecture design choices and technology selection. We followed a collaborative practice research (CPR) method to develop and validate a new method, called BDD. BDD is the first attempt to systematically combine architecture design with data modeling approaches to address big data system development challenges. The use of reference architectures and a technology catalog are advancements to architecture design methods and are proving to be well-suited for big data system architecture design and system development.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130897385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

On the Cost of Mining Very Large Open Source Repositories 关于挖掘超大型开源存储库的成本

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.16

Sean Banerjee, B. Cukic

{"title":"On the Cost of Mining Very Large Open Source Repositories","authors":"Sean Banerjee, B. Cukic","doi":"10.1109/BIGDSE.2015.16","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.16","url":null,"abstract":"Open source bug tracking systems provide a rich information suite that is actively used by software engineering researchers to design solutions to triaging, duplicate classification and developer assignment problems. Today, open repositories often contain in excess of 100, 000 reports, and in cases of RedHat and Mozilla, over a million. Obtaining and analyzing the contents of such datasets are both time and resource consuming. By summarizing the related work we demonstrate that researchers often focused on smaller subsets of the data, and seldom embrace the “big-dataism”. With the emergence of cloud based computation systems such as Amazon EC2, one expects it to be easier to perform large scale analyses. However, our detailed time and cost analysis indicates that significant challenges still remain. Acquiring the open source data can be time intensive, and prone to being misinterpreted as Denial of Service attacks. Generating similarity scores for all prior reports, for example, is a polynomial time problem. In this paper, we present actual costs that we incurred when analyzing the complete repositories from Eclipse, Firefox and Open Office. In our approach, we relied on computing clusters to process the data in an attempt to reduce the cost of analyzing large datasets on the cloud. We present estimated costs for a researcher attempting to analyze complete datasets from Eclipse, Mozilla, Novell and RedHat using the best possible resources. In an ideal situation, with no bottlenecks, a researcher investing just over $40, 000 and 2 weeks of non stop computing time would be able to measure similarity of problem reports within all four datasets.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115579194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Industrial Big Data Analytics: Lessons from the Trenches 工业大数据分析:经验教训

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.8

Flavio Villanustre

引用次数: 6

Mining Big Data for Detecting, Extracting and Recommending Architectural Design Concepts 挖掘大数据，发现、提炼和推荐建筑设计理念

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.11

Mehdi Mirakhorli, Hong-Mei Chen, R. Kazman

引用次数: 8

Safely Managing Data Variety in Big Data Software Development 安全管理大数据软件开发中的数据多样性

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.5555/2819289.2819293

Thomas Cerqueus, E. Almeida, Stefanie Scherzinger

{"title":"Safely Managing Data Variety in Big Data Software Development","authors":"Thomas Cerqueus, E. Almeida, Stefanie Scherzinger","doi":"10.5555/2819289.2819293","DOIUrl":"https://doi.org/10.5555/2819289.2819293","url":null,"abstract":"We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plug in. Our plug in ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128927485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Big Picture of Big Data Software Engineering: With Example Research Challenges 大数据软件工程的大图景:与实例研究挑战

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.10

N. Madhavji, A. Miranskyy, K. Kontogiannis

引用次数: 49

Software Analytics to Software Practice: A Systematic Literature Review 软件分析到软件实践:系统的文献综述

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.14

T. Abdellatif, Luiz Fernando Capretz, D. Ho

{"title":"Software Analytics to Software Practice: A Systematic Literature Review","authors":"T. Abdellatif, Luiz Fernando Capretz, D. Ho","doi":"10.1109/BIGDSE.2015.14","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.14","url":null,"abstract":"Software Analytics (SA) is a new branch of big data analytics that has recently emerged (2011). What distinguishes SA from direct software analysis is that it links data mined from many different software artifacts to obtain valuable insights. These insights are useful for the decision-making process throughout the different phases of the software lifecycle. Since SA is currently a hot and promising topic, we have conducted a systematic literature review, presented in this paper, to identify gaps in knowledge and open research areas in SA. Because many researchers are still confused about the true potential of SA, we had to filter out available research papers to obtain the most SA-relevant work for our review. This filtration yielded 19 studies out of 135. We have based our systematic review on four main factors: which software practitioners SA targets, which domains are covered by SA, which artifacts are extracted by SA, and whether these artifacts are linked or not. The results of our review have shown that much of the available SA research only serves the needs of developers. Also, much of the available research uses only one artifact which, in turn, means fewer links between artifacts and fewer insights. This shows that the available SA research work is still embryonic leaving plenty of room for future research in the SA field.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116719123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Embrace the Challenges: Software Engineering in a Big Data World 《迎接挑战:大数据世界中的软件工程

2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.12

K. Anderson

引用次数: 40