{"title":"Mining positive and negative association rules in Hadoop's MapReduce environment","authors":"S. Bagui, Probal Chandra Dhar","doi":"10.1145/3190645.3190701","DOIUrl":"https://doi.org/10.1145/3190645.3190701","url":null,"abstract":"In this paper, we mine positive and negative rules from Big Data in Hadoop's MapReduce Environment. Positive association rule mining finds items that are positively co-related whereas negative association rule mining finds items that are negatively correlated. Positive association rule mining has been traditionally used to mine association rules, but negative association rule mining also has many applications, including the building of efficient decision support systems, for crime data analysis [2], in the health care sector [1], etc. In this paper, we mine positive and negative association rules using the Apriori algorithm in the Big Data environment using Hadoop's MapReduce environment. Positive association rules are in the form X→Y, which has support s in a transaction set D if s% of the transactions in D contain X U Y. A negative association rule is in the form X → ┐ Y or ┐ X → Y or ┐ X → ┐ Y where X ∩ Y = Ø. X → ┐ Y refers to X occurring in the absence of Y; ┐ X → Y refers to Y occurring in the absence of X; ┐ X → ┐ Y means not X and not Y. For positive association rules: Support (X → Y) refers to the percentage of transactions where itemsets X and Y co-exist in a dataset. Confidence (X → Y) is taken to be the conditional probability, P(X|Y). That is, the percentage of transactions containing X that also contain Y. Support of the negative association rules will be form: Supp(X → ┐ Y) > min_supp; Supp(┐ X → Y) > min_supp; Supp(┐ X → ┐ Y) > min_supp. Confidence of negative association rules will be in the form: Conf(X → ┐ Y) > min_supp; Conf(┐ X → Y) > min_supp; Conf(┐ X → ┐ Y) > min_supp. In MapReduce, we scan the dataset and create 1-itemsets in one MapReduce job and then use this 1-itemset to create 2-itemsets in another MapReduce job. In the last map job, the calculation of positive and negative association rules as well as the calculations of support, confidence and lift are performed. Therefore, in essence, we use three map and two reduce jobs. The main contribution of this work is in presenting how the apriori algorithm can be used to extract negative association rules from Big Data and how it can be executed efficiently on MapReduce.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130261128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and accurate volume data curvature determination using GPGPU computation","authors":"Jacob D. Hauenstein, Timothy S Newman","doi":"10.1145/3190645.3190681","DOIUrl":"https://doi.org/10.1145/3190645.3190681","url":null,"abstract":"A methodology for fast determination of a key shape feature in volume datasets using a GPU is described. The shape feature, surface curvature, which is a valuable descriptor for structure classification and dataset registration applications, can be time-consuming to determine reliably by conventional serial computing. The techniques here use parallel processing on a commodity GPU to achieve 100-fold (and above) improvements (for moderate-sized datasets) over conventional serial processing for curvature determination. Techniques for one class of curvature determination methods are detailed, including methods well-suited to datasets acquired by medical scanners.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128888360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D pose reconstruction method for CG designers","authors":"Kazumoto Tanaka","doi":"10.1145/3190645.3190703","DOIUrl":"https://doi.org/10.1145/3190645.3190703","url":null,"abstract":"This paper proposes a method for reconstructing the 3D poses of drawing-dolls from their images on photographs.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124226109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting NFRs in the early stages of agile software engineering","authors":"Richard R. Maiti, A. Krasnov","doi":"10.1145/3190645.3190716","DOIUrl":"https://doi.org/10.1145/3190645.3190716","url":null,"abstract":"Non-Functional requirements (NFRs) are overlooked whereas Functional Requirements (FRs) take the center stage in developing agile software. Research has shown that ignoring NFRs can have negative impacts on the software and could potentially cost more to fix at later stages. This research extends the Capture Elicit Prioritize (CEP) methodology to predict NFRs in the early stages agile software development. Research in other fields such as the medical field have shown that historical data can be beneficial in the long run. In the medical field it was found that historical data can be beneficial in determining patient treatments. The Capture Elicit Prioritize (CEP) methodology extended the NERV and NORMAP methodologies in previous research. The CEP methodology identified 56 out of 57 requirement sentences and was successful in eliciting 98.24% of the baseline an improvement of 10.53% of the NORMAP methodology and 1.75% improvement over the NERV methodology. The NFRs count for the CEP methodology was 86 out of 88 NFRs which was an improvement of 12.49% over the NORMAP methodology and 4.55% over the NERV methodology. The CEP was used and utilized the EU eProcument requirements document. The CEP methodology utilized the capture methodology by gathering potential NFRs using OCR from requirements images. The elicit part took the NFR Locator plus and takes sentences from documents and places them in distinct categories. The NFR categories are defined from the Chung's NFR framework utilizing a set of keywords utilized for training to locate NFRs. The e αβγ-framework was utilized to prioritize the NFRs. Utilizing the data from previous research of the CEP methodology and extending the CEP methodology to include a decision tree to predict future NFRs. A simple decision tree was utilized to make a prediction utilizing the past NFRs data. If a certain NFR appears three times or higher in the requirements document. It is most likely that NFRs will appear in the next iteration of the software requirements specification. If the NFRs is equivalent to three times it is likely it will appear in the next iteration. If the NFRs is between one and two it is not likely to appear in future iteration. The path can be traced from the root of the tree to a decision tree's leaf (yes or no) that determines whether the NFRs will appear in future iterations. This research showed that using the data available can be beneficial for the next iteration of software development. This research showed that historical metadata can help in predicting NFRs utilizing a decision tree to make a prediction where NFRs appear multiple times in a set of the EU procurement documents can predict the next iteration of software development. The NFRs Availability, Compliance, Confidentiality, Documentation, Performance, Security, and Usability were found and these NFRs are most likely to appear in the next iteration of the EU procurement software.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115877293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using software birthmarks and clustering to identify similar classes and major functionalities","authors":"Matt Beck, J. Walden","doi":"10.1145/3190645.3190677","DOIUrl":"https://doi.org/10.1145/3190645.3190677","url":null,"abstract":"Software birthmarks are a class of software metrics designed to identify copies of software. An article published in 2006 examined additional applications of software birthmarks. The article described an experiment using software birthmarks to identify similar classes and major functionalities in software applications. This study replicates and extends that experiment, using a modern software birthmark tool and larger dataset, while improving the precision of the research questions and methodologies used in the original article. We found that one of the conclusions of the original article could be replicated while the the other conclusion could not. While software birthmarks provide an effective method for identifying similar class files, they do not offer a reliable, objective, and generalizable method for finding major functionalities in a software release.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"441 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115266280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterization of differentially private logistic regression","authors":"S. Suthaharan","doi":"10.1145/3190645.3190682","DOIUrl":"https://doi.org/10.1145/3190645.3190682","url":null,"abstract":"The purpose of this paper is to present an approach that can help data owners select suitable values for the privacy parameter of a differentially private logistic regression (DPLR), whose main intention is to achieve a balance between privacy strength and classification accuracy. The proposed approach implements a supervised learning technique and a feature extraction technique to address this challenging problem and generate solutions. The supervised learning technique selects subspaces from a training data set and generates DPLR classifiers for a range of values of the privacy parameter. The feature extraction technique transforms an original subspace to a differentially private subspace by querying the original subspace multiple times using the DPLR model and the privacy parameter values that were selected by the supervised learning module. The proposed approach then employs a signal processing technique called signal-interference-ratio as a measure to quantify the privacy level of the differentially private subspaces; hence, allows data owner learn the privacy level that the DPLR models can provide for a given subspace and a given classification accuracy.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122619047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prioritized task scheduling in fog computing","authors":"Tejas Choudhari, M. Moh, Teng-Sheng Moh","doi":"10.1145/3190645.3190699","DOIUrl":"https://doi.org/10.1145/3190645.3190699","url":null,"abstract":"Fog computing, similar to edge computing, has been proposed as a model to introduce a virtualized layer between the end users and the back-end cloud data centers. Fog computing has attracted much attention due to the recent rapid deployment of smart devices and Internet-of-Things (IoT) systems, which often requires real-time, stringent-delay services. The fog layer placed between client and cloud layers aims to reduce the delay in terms of transmission and processing times, as well as the overall cost. To support the increasing number of IoT, smart devices, and to improve performance and reduce cost, this paper proposes a task scheduling algorithm in the fog layer based on priority levels. The proposed architecture, queueing and priority models, priority assignment module, and the priority-based task scheduling algorithms are carefully described. Performance evaluation shows that, comparing with existing task scheduling algorithms, the proposed algorithm reduces the overall response time and notably decreases the total cost. We believe that this work is significant to the emerging fog computing technology, and the priority-based algorithm is useful to a wide range of application domains.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126780035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-source data analysis and evaluation of machine learning techniques for SQL injection detection","authors":"Kevin Ross, M. Moh, Teng-Sheng Moh, Jason Yao","doi":"10.1145/3190645.3190670","DOIUrl":"https://doi.org/10.1145/3190645.3190670","url":null,"abstract":"SQL Injection continues to be one of the most damaging security exploits in terms of personal information exposure as well as monetary loss. Injection attacks are the number one vulnerability in the most recent OWASP Top 10 report, and the number of these attacks continues to increase. Traditional defense strategies often involve static, signature-based IDS (Intrusion Detection System) rules which are mostly effective only against previously observed attacks but not unknown, or zero-day, attacks. Much current research involves the use of machine learning techniques, which are able to detect unknown attacks, but depending on the algorithm can be costly in terms of performance. In addition, most current intrusion detection strategies involve collection of traffic coming into the web application either from a network device or from the web application host, while other strategies collect data from the database server logs. In this project, we are collecting traffic from two points: at the web application host, and at a Datiphy appliance node located between the webapp host and the associated MySQL database server. In our analysis of these two datasets, and another dataset that is correlated between the two, we have been able to demonstrate that accuracy obtained with the correlated dataset using algorithms such as rule-based and decision tree are nearly the same as those with a neural network algorithm, but with greatly improved performance.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126857622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mitigating IoT insecurity with inoculation epidemics","authors":"James A. Jerkins, Jillian Stupiansky","doi":"10.1145/3190645.3190678","DOIUrl":"https://doi.org/10.1145/3190645.3190678","url":null,"abstract":"Compromising IoT devices to build botnets and disrupt critical infrastructure is an existential threat. Refrigerators, washing machines, DVRs, security cameras, and other consumer goods are high value targets for attackers due to inherent security weaknesses, a lack of consumer security awareness, and an absence of market forces or regulatory requirements to motivate IoT security. As a result of the deficiencies, attackers have quickly assembled large scale botnets of IoT devices to disable Internet infrastructure and deny access to dominant web properties with near impunity. IoT malware is often transmitted from host to host similar to how biological viruses spread in populations. Both biological viruses and computer malware may exhibit epidemic characteristics when spreading in populations of vulnerable hosts. Vaccines are used to stimulate resistance to biological viruses by inoculating a sufficient number of hosts in the vulnerable population to limit the spread of the biological virus and prevent epidemics. Inoculation programs may be viewed as a human instigated epidemic that spreads a vaccine in order to mitigate the damage from a biological virus. In this paper we propose a technique to create an inoculation epidemic for IoT devices using a novel variation of a SIS epidemic model and show experimental results that indicate utility of the approach.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126982400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehension and application of design patterns by novice software engineers: an empirical study of undergraduate software engineering and computer science students","authors":"Jonathan W. Lartigue, Richard O. Chapman","doi":"10.1145/3190645.3190686","DOIUrl":"https://doi.org/10.1145/3190645.3190686","url":null,"abstract":"Although there has been a large body of work cataloguing design patterns since their introduction, there is a limited amount of detailed, empirical evidence on pattern use and application. Those studies that have collected experimental data generally focus on experienced, professional software engineers or graduate-level computer science and software engineering students. Although the value of design pattens in general is still widely debated, many experts have concluded that the use of design patterns is beneficial for experienced software engineers and architects. But it is still unclear if the benefits of design patterns translate equally to young, inexperienced software engineers. To assess this, we conducted a controlled experiment to evaluate the comparative performance in targeted tasks of novice software engineers, which are represented by software engineering undergraduate students about to earn a bachelors degree in an ABET-accredited computer science or software engineering program. We assessed the ability of subjects to recognize, comprehend, and refactor software containing a number of design patterns. We also collected subjective data measuring the subjects' preferences for or against pattern use. Although experiment results are mixed, depending on the complexity of the pattern involved, we observe that novice software engineers can recognize and understand software containing some design patterns, but that benefits of pattern use, in terms of refactoring time, are dependent on the complexity of the pattern. We conclude that, while simpler patterns show benefits, more complex design patterns may be an impediment for novice developers.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}