{"title":"Session details: Authentication and Intrusion Detection","authors":"Sam Bretheim","doi":"10.1145/3252888","DOIUrl":"https://doi.org/10.1145/3252888","url":null,"abstract":"","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"127 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128028295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiromu Yakura, S. Shinozaki, R. Nishimura, Y. Oyama, Jun Sakuma
{"title":"Malware Analysis of Imaged Binary Samples by Convolutional Neural Network with Attention Mechanism","authors":"Hiromu Yakura, S. Shinozaki, R. Nishimura, Y. Oyama, Jun Sakuma","doi":"10.1145/3128572.3140457","DOIUrl":"https://doi.org/10.1145/3128572.3140457","url":null,"abstract":"This paper presents a method to extract important byte sequences in malware samples by application of convolutional neural network (CNN) to images converted from binary data. This method, by combining a technique called the attention mechanism into CNN, enables calculation of an \"attention map,\" which shows regions having higher importance for classification in the image. The extracted region with higher importance can provide useful information for human analysts who investigate the functionalities of unknown malware samples. Results of our evaluation experiment using malware dataset show that the proposed method provides higher classification accuracy than a conventional method. Furthermore, analysis of malware samples based on the calculated attention map confirmed that the extracted sequences provide useful information for manual analysis.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115432580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Malware Classification and Class Imbalance via Stochastic Hashed LZJD","authors":"Edward Raff, Charles K. Nicholas","doi":"10.1145/3128572.3140446","DOIUrl":"https://doi.org/10.1145/3128572.3140446","url":null,"abstract":"There are currently few methods that can be applied to malware classification problems which don't require domain knowledge to apply. In this work, we develop our new SHWeL feature vector representation, by extending the recently proposed Lempel-Ziv Jaccard Distance. These SHWeL vectors improve upon LZJD's accuracy, outperform byte n-grams, and allow us to build efficient algorithms for both training (a weakness of byte n-grams) and inference (a weakness of LZJD). Furthermore, our new SHWeL method also allows us to directly tackle the class imbalance problem, which is common for malware-related tasks. Compared to existing methods like SMOTE, SHWeL provides significantly improved accuracy while reducing algorithmic complexity to O(N). Because our approach is developed without the use of domain knowledge, it can be easily re-applied to any new domain where there is a need to classify byte sequences.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133994936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hassan Halawa, M. Ripeanu, K. Beznosov, Baris Coskun, Meizhu Liu
{"title":"An Early Warning System for Suspicious Accounts","authors":"Hassan Halawa, M. Ripeanu, K. Beznosov, Baris Coskun, Meizhu Liu","doi":"10.1145/3128572.3140455","DOIUrl":"https://doi.org/10.1145/3128572.3140455","url":null,"abstract":"In the face of large-scale automated cyber-attacks to large online services, fast detection and remediation of compromised accounts are crucial to limit the spread of new attacks and to mitigate the overall damage to users, companies, and the public at large. We advocate a fully automated approach based on machine learning to enable large-scale online service providers to quickly identify potentially compromised accounts. We develop an early warning system for the detection of suspicious account activity with the goal of quick identification and remediation of compromised accounts. We demonstrate the feasibility and applicability of our proposed system in a four month experiment at a large-scale online service provider using real-world production data encompassing hundreds of millions of users. We show that - even using only login data, features with low computational cost, and a basic model selection approach - around one out of five accounts later flagged as suspicious are correctly predicted a month in advance based on one week's worth of their login activity.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121211586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuchu Han, Yifan Hu, S. Skiena, Baris Coskun, Meizhu Liu, Hong Qin, Jaime Perez
{"title":"Generating Look-alike Names For Security Challenges","authors":"Shuchu Han, Yifan Hu, S. Skiena, Baris Coskun, Meizhu Liu, Hong Qin, Jaime Perez","doi":"10.1145/3128572.3140441","DOIUrl":"https://doi.org/10.1145/3128572.3140441","url":null,"abstract":"Motivated by the need to automatically generate behavior-based security challenges to improve user authentication for web services, we consider the problem of large-scale construction of realistic-looking names to serve as aliases for real individuals. We aim to use these names to construct security challenges, where users are asked to identify their real contacts among a presented pool of names. We seek these look-alike names to preserve name characteristics like gender, ethnicity, and popularity, while being unlinkable back to the source individual, thereby making the real contacts not easily guessable by attackers. To achive this, we introduce the technique of distributed name embeddings, representing names in a high-dimensional space such that distance between name components reflects the degree of cultural similarity between these strings. We present different approaches to construct name embeddings from contact lists observed at a large web-mail provider, and evaluate their cultural coherence. We demonstrate that name embeddings strongly encode gender and ethnicity, as well as name popularity. We applied this algorithm to generate imitation names in email contact list challenge. Our controlled user study verified that the proposed technique reduced the attacker's success rate to 26.08%, indistinguishable from random guessing, compared to a success rate of 62.16% from previous name generation algorithms. Finally, we use these embeddings to produce an open synthetic name resource of 1 million names for security applications, constructed to respect both cultural coherence and U.S. census name frequencies.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116347279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang Liu, Bo Li, Yevgeniy Vorobeychik, Alina Oprea
{"title":"Robust Linear Regression Against Training Data Poisoning","authors":"Chang Liu, Bo Li, Yevgeniy Vorobeychik, Alina Oprea","doi":"10.1145/3128572.3140447","DOIUrl":"https://doi.org/10.1145/3128572.3140447","url":null,"abstract":"The effectiveness of supervised learning techniques has made them ubiquitous in research and practice. In high-dimensional settings, supervised learning commonly relies on dimensionality reduction to improve performance and identify the most important factors in predicting outcomes. However, the economic importance of learning has made it a natural target for adversarial manipulation of training data, which we term poisoning attacks. Prior approaches to dealing with robust supervised learning rely on strong assumptions about the nature of the feature matrix, such as feature independence and sub-Gaussian noise with low variance. We propose an integrated method for robust regression that relaxes these assumptions, assuming only that the feature matrix can be well approximated by a low-rank matrix. Our techniques integrate improved robust low-rank matrix approximation and robust principle component regression, and yield strong performance guarantees. Moreover, we experimentally show that our methods significantly outperform state of the art both in running time and prediction error.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134431218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beyond Big Data: What Can We Learn from AI Models?: Invited Keynote","authors":"Aylin Caliskan","doi":"10.1145/3128572.3140452","DOIUrl":"https://doi.org/10.1145/3128572.3140452","url":null,"abstract":"My research involves the heavy use of machine learning and natural language processing in novel ways to interpret big data, develop privacy and security attacks, and gain insights about humans and society through these methods. I do not use machine learning only as a tool but I also analyze machine learning models? internal representations to investigate how the artificial intelligence perceives the world. This work [3] has been recently featured in Science where I showed that societal bias exists at the construct level of machine learning models, namely semantic space word embeddings which are dictionaries for machines to understand language. When I use machine learning as a tool to uncover privacy and security problems, I characterize and quantify human behavior in language, including programming languages, by coming up with a linguistic fingerprint for each individual. By extracting linguistic features from natural language or programming language texts of humans, I show that humans have unique linguistic fingerprints since they all learn language on an individual basis. Based on this finding, I can de-anonymize humans that have written certain text, source code, or even executable binaries of compiled code [2, 4, 5]. This is a serious privacy threat for individuals that would like to remain anonymous, such as activists, programmers in oppressed regimes, or malware authors. Nevertheless, being able to identify authors of malicious code enhances security. On the other hand, identifying authors can be used to resolve copyright disputes or detect plagiarism. The methods in this realm [1] have been used to identify so called doppelgängers to link the accounts that belong to the same identities across platforms, especially underground forums that are business platforms for cyber criminals. By analyzing machine learning models? internal representation and linguistic human fingerprints, I am able to uncover facts about the world, society, and the use of language, which have implications for privacy, security, and fairness in machine learning.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"408 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129254182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Lightning Round","authors":"David Mandell Freeman","doi":"10.1145/3252887","DOIUrl":"https://doi.org/10.1145/3252887","url":null,"abstract":"","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114555657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Defense against Poisoning","authors":"Luis Mu?oz-Gonz?lez","doi":"10.1145/3252889","DOIUrl":"https://doi.org/10.1145/3252889","url":null,"abstract":"","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129200830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract)","authors":"D. M. Bittner, A. Sarwate, R. Wright","doi":"10.1145/3128572.3140456","DOIUrl":"https://doi.org/10.1145/3128572.3140456","url":null,"abstract":"We consider the problem of privacy-sensitive anomaly detection - screening to detect individuals, behaviors, areas, or data samples of high interest. What defines an anomaly is context-specific; for example, a spoofed rather than genuine user attempting to log in to a web site, a fraudulent credit card transaction, or a suspicious traveler in an airport. The unifying assumption is that the number of anomalous points is quite small with respect to the population, so that deep screening of all individual data points would potentially be time-intensive, costly, and unnecessarily invasive of privacy. Such privacy violations can raise concerns due sensitive nature of data being used, raise fears about violations of data use agreements, and make people uncomfortable with anomaly detection methods. Anomaly detection is well studied, but methods to provide anomaly detection along with privacy are less well studied. Our overall goal in this research is to provide a framework for identifying anomalous data while guaranteeing quantifiable privacy in a rigorous sense. Once identified, such anomalies could warrant further data collection and investigation, depending on the context and relevant policies. In this research, we focus on privacy protection during the deployment of anomaly detection. Our main contribution is a differentially private access mechanism for finding anomalies using a search algorithm based on adaptive noisy group testing. To achieve this, we take as our starting point the notion of group testing [1], which was most famously used to screen US military draftees for syphilis during World War II. In group testing, individuals are tested in groups to limit the number of tests. Using multiple rounds of screenings, a small number of positive individuals can be detected very efficiently. Group testing has the added benefit of providing privacy to individuals through plausible deniability - since the group tests use aggregate data, individual contributions to the test are masked by the group. We follow on these concepts by demonstrating a search model utilizing adaptive queries on aggregated group data. Our work takes the first steps toward strengthening and formalizing these privacy concepts by achieving differential privacy [2]. Differential privacy is a statistical measure of disclosure risk that captures the intuition that an individual's privacy is protected if the results of a computation have at most a very small and quantifiable dependence on that individual's data. In the last decade, there hpractical adoption underway by high-profile companies such as Apple, Google, and Uber. In order to make differential privacy meaningful in the context of a task that seeks to specifically identify some (anomalous) individuals, we introduce the notion of anomaly-restricted differential privacy. Using ideas from information theory, we show that noise can be added to group query results in a way that provides differential privacy for non-anomalous indi","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133481622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}