2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering最新文献

筛选
英文 中文
Research Opportunities for the Big Data Era of Software Engineering 软件工程大数据时代的研究机遇
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.13
R. Deline
{"title":"Research Opportunities for the Big Data Era of Software Engineering","authors":"R. Deline","doi":"10.1109/BIGDSE.2015.13","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.13","url":null,"abstract":"Big Data Analysis is becoming a widespread practice on many software development projects, and statisticians and data analysts are working alongside developers, testers and program managers. Because data science is still an emerging discipline in software projects, there are many opportunities where software engineering researchers can help improve practice. In terms of productivity, data scientists need support for exploratory analysis of large datasets, relief from clerical tasks like data cleaning, and easier paths for live deployment of new analyses. In terms of correctness, data scientists need help in preserving data meaning and provenance, and non-experts need help avoiding analysis errors. In terms of communication and coordination, teams need more approachable ways to discuss uncertainty and risk, and support for data-driven decision making needs to become available to all roles. This position paper describes these open problems and points to ongoing research beginning to tackle them.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114913788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Big Data System Development: An Embedded Case Study with a Global Outsourcing Firm 大数据系统开发:一个全球外包公司的嵌入式案例研究
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.15
Hong-Mei Chen, R. Kazman, Serge Haziyev, Olha Hrytsay
{"title":"Big Data System Development: An Embedded Case Study with a Global Outsourcing Firm","authors":"Hong-Mei Chen, R. Kazman, Serge Haziyev, Olha Hrytsay","doi":"10.1109/BIGDSE.2015.15","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.15","url":null,"abstract":"Big data system development is dramatically different from small (traditional, structured) data system development. At the end of 2014, big data deployment is still scarce and failures abound. Outsourcing has become a main strategy for many enterprises. We therefore selected an outsourcing company who has successfully deployed big data projects for our study. Our research results from analyzing 10 outsourced big data projects provide a glimpse into early adopters of big data, illuminates the challenges for system development that stem from the 5Vs of big data and crystallizes the importance of architecture design choices and technology selection. We followed a collaborative practice research (CPR) method to develop and validate a new method, called BDD. BDD is the first attempt to systematically combine architecture design with data modeling approaches to address big data system development challenges. The use of reference architectures and a technology catalog are advancements to architecture design methods and are proving to be well-suited for big data system architecture design and system development.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130897385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
On the Cost of Mining Very Large Open Source Repositories 关于挖掘超大型开源存储库的成本
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.16
Sean Banerjee, B. Cukic
{"title":"On the Cost of Mining Very Large Open Source Repositories","authors":"Sean Banerjee, B. Cukic","doi":"10.1109/BIGDSE.2015.16","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.16","url":null,"abstract":"Open source bug tracking systems provide a rich information suite that is actively used by software engineering researchers to design solutions to triaging, duplicate classification and developer assignment problems. Today, open repositories often contain in excess of 100, 000 reports, and in cases of RedHat and Mozilla, over a million. Obtaining and analyzing the contents of such datasets are both time and resource consuming. By summarizing the related work we demonstrate that researchers often focused on smaller subsets of the data, and seldom embrace the “big-dataism”. With the emergence of cloud based computation systems such as Amazon EC2, one expects it to be easier to perform large scale analyses. However, our detailed time and cost analysis indicates that significant challenges still remain. Acquiring the open source data can be time intensive, and prone to being misinterpreted as Denial of Service attacks. Generating similarity scores for all prior reports, for example, is a polynomial time problem. In this paper, we present actual costs that we incurred when analyzing the complete repositories from Eclipse, Firefox and Open Office. In our approach, we relied on computing clusters to process the data in an attempt to reduce the cost of analyzing large datasets on the cloud. We present estimated costs for a researcher attempting to analyze complete datasets from Eclipse, Mozilla, Novell and RedHat using the best possible resources. In an ideal situation, with no bottlenecks, a researcher investing just over $40, 000 and 2 weeks of non stop computing time would be able to measure similarity of problem reports within all four datasets.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115579194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Industrial Big Data Analytics: Lessons from the Trenches 工业大数据分析:经验教训
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.8
Flavio Villanustre
{"title":"Industrial Big Data Analytics: Lessons from the Trenches","authors":"Flavio Villanustre","doi":"10.1109/BIGDSE.2015.8","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.8","url":null,"abstract":"Big Data Analytics in particular and Data Science in general have become key disciplines in the last decade. The convergence of Information Technology, Statistics and Mathematics, to explore and extract information from Big Data have challenged the way many industries used to operate, shifting the decision making process in many organizations. A new breed of Big Data platforms has appeared, to fulfill the needs to process data that is large, complex, variable and rapidly generated. The author describes the experience in this field from a company that provides Big Data analytics as its core business.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125655356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Mining Big Data for Detecting, Extracting and Recommending Architectural Design Concepts 挖掘大数据,发现、提炼和推荐建筑设计理念
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.11
Mehdi Mirakhorli, Hong-Mei Chen, R. Kazman
{"title":"Mining Big Data for Detecting, Extracting and Recommending Architectural Design Concepts","authors":"Mehdi Mirakhorli, Hong-Mei Chen, R. Kazman","doi":"10.1109/BIGDSE.2015.11","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.11","url":null,"abstract":"An architecture recommender system can help programmers make better design choices to address their architectural quality attribute concerns while doing their daily programming tasks. We mine big data to detect and extract a large set of architectural design concepts, such as design patterns, design tactics, architecture styles, etc., to be used in our architecture recommender system called ARS. However, mining big data poses many practical challenges for system implementation. The volume, velocity and variety of our data set, like all other big data systems, requires careful planning. This first challenge is to select appropriate technologies from the large number of available products for our system implementation. Building on these technologies our greatest challenge is to custom-fit our algorithms to the parallel processing platform we have selected for ARS, to meet our performance goals.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129797060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Safely Managing Data Variety in Big Data Software Development 安全管理大数据软件开发中的数据多样性
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.5555/2819289.2819293
Thomas Cerqueus, E. Almeida, Stefanie Scherzinger
{"title":"Safely Managing Data Variety in Big Data Software Development","authors":"Thomas Cerqueus, E. Almeida, Stefanie Scherzinger","doi":"10.5555/2819289.2819293","DOIUrl":"https://doi.org/10.5555/2819289.2819293","url":null,"abstract":"We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plug in. Our plug in ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128927485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Big Picture of Big Data Software Engineering: With Example Research Challenges 大数据软件工程的大图景:与实例研究挑战
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.10
N. Madhavji, A. Miranskyy, K. Kontogiannis
{"title":"Big Picture of Big Data Software Engineering: With Example Research Challenges","authors":"N. Madhavji, A. Miranskyy, K. Kontogiannis","doi":"10.1109/BIGDSE.2015.10","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.10","url":null,"abstract":"In the rapidly growing field of Big Data, we note that a disproportionately larger amount of effort is being invested in infrastructure development and data analytics in comparison to applications software development -- approximately a 80:20 ratio. This prompted us to create a context model of Big Data Software Engineering (BDSE) containing various elements -- such as development practice, Big Data systems, corporate decision-making, and research -- and their relationships. The model puts into perspective where various types of stakeholders fit in. From the research perspective, we describe example challenges in BDSE, specifically requirements, architectures, and testing and maintenance.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122908137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Software Analytics to Software Practice: A Systematic Literature Review 软件分析到软件实践:系统的文献综述
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.14
T. Abdellatif, Luiz Fernando Capretz, D. Ho
{"title":"Software Analytics to Software Practice: A Systematic Literature Review","authors":"T. Abdellatif, Luiz Fernando Capretz, D. Ho","doi":"10.1109/BIGDSE.2015.14","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.14","url":null,"abstract":"Software Analytics (SA) is a new branch of big data analytics that has recently emerged (2011). What distinguishes SA from direct software analysis is that it links data mined from many different software artifacts to obtain valuable insights. These insights are useful for the decision-making process throughout the different phases of the software lifecycle. Since SA is currently a hot and promising topic, we have conducted a systematic literature review, presented in this paper, to identify gaps in knowledge and open research areas in SA. Because many researchers are still confused about the true potential of SA, we had to filter out available research papers to obtain the most SA-relevant work for our review. This filtration yielded 19 studies out of 135. We have based our systematic review on four main factors: which software practitioners SA targets, which domains are covered by SA, which artifacts are extracted by SA, and whether these artifacts are linked or not. The results of our review have shown that much of the available SA research only serves the needs of developers. Also, much of the available research uses only one artifact which, in turn, means fewer links between artifacts and fewer insights. This shows that the available SA research work is still embryonic leaving plenty of room for future research in the SA field.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116719123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Embrace the Challenges: Software Engineering in a Big Data World 《迎接挑战:大数据世界中的软件工程
2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering Pub Date : 2015-05-16 DOI: 10.1109/BIGDSE.2015.12
K. Anderson
{"title":"Embrace the Challenges: Software Engineering in a Big Data World","authors":"K. Anderson","doi":"10.1109/BIGDSE.2015.12","DOIUrl":"https://doi.org/10.1109/BIGDSE.2015.12","url":null,"abstract":"The design and development of data-intensive software systems -- systems that generate, collect, store, process, analyze, query, and visualize large sets of data -- is fraught with significant challenges both technical and social. Project EPIC has been designing and developing data-intensive systems in support of crisis informatics research since Fall 2009. Our experience working on Project EPIC has provided insight into these challenges. In this paper, we share our experience working in this design space and describe the choices we made in tackling these challenges and their attendant trade-offs. We highlight the lack of developer support tools for data-intensive systems, the importance of multidisciplinary teams, the use of highly-iterative life cycles, the need for deep understanding of the frameworks and technologies used in data intensive systems, how simple operations transform into significant challenges at scale, and the paramount significance of data modeling in producing systems that are scalable, robust, and efficient.","PeriodicalId":122056,"journal":{"name":"2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124505124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信