Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献

筛选
英文 中文
Connecting the dots between news articles 将新闻文章之间的点联系起来
Dafna Shahaf, Carlos Guestrin
{"title":"Connecting the dots between news articles","authors":"Dafna Shahaf, Carlos Guestrin","doi":"10.1145/1835804.1835884","DOIUrl":"https://doi.org/10.1145/1835804.1835884","url":null,"abstract":"The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today's society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this paper, we investigate methods for automatically connecting the dots -- providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate. We formalize the characteristics of a good chain and provide an efficient algorithm (with theoretical guarantees) to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithm's effectiveness in helping users understanding the news.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"149 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86122605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 211
Recommender problems for web applications web应用程序的推荐问题
Bee-Chung Chen, D. Agarwal
{"title":"Recommender problems for web applications","authors":"Bee-Chung Chen, D. Agarwal","doi":"10.1145/1835804.1866294","DOIUrl":"https://doi.org/10.1145/1835804.1866294","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82173353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using data mining techniques to address critical information exchange needs in disaster affected public-private networks 使用数据挖掘技术解决受灾害影响的公私网络中的关键信息交换需求
Li Zheng, Chao Shen, L. Tang, Tao Li, Steven Luis, Shu‐Ching Chen, Vagelis Hristidis
{"title":"Using data mining techniques to address critical information exchange needs in disaster affected public-private networks","authors":"Li Zheng, Chao Shen, L. Tang, Tao Li, Steven Luis, Shu‐Ching Chen, Vagelis Hristidis","doi":"10.1145/1835804.1835823","DOIUrl":"https://doi.org/10.1145/1835804.1835823","url":null,"abstract":"Crisis Management and Disaster Recovery have gained immense importance in the wake of recent man and nature inflicted calamities. A critical problem in a crisis situation is how to efficiently discover, collect, organize, search and disseminate real-time disaster information. In this paper, we address several key problems which inhibit better information sharing and collaboration between both private and public sector participants for disaster management and recovery. We design and implement a web based prototype implementation of a Business Continuity Information Network (BCIN) system utilizing the latest advances in data mining technologies to create a user-friendly, Internet-based, information-rich service and acting as a vital part of a company's business continuity process. Specifically, information extraction is used to integrate the input data from different sources; the content recommendation engine and the report summarization module provide users personalized and brief views of the disaster information; the community generation module develops spatial clustering techniques to help users build dynamic community in disasters. Currently, BCIN has been exercised at Miami-Dade County Emergency Management.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79091435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
An efficient causal discovery algorithm for linear models 线性模型的一种有效的因果发现算法
Zhenxing Wang, L. Chan
{"title":"An efficient causal discovery algorithm for linear models","authors":"Zhenxing Wang, L. Chan","doi":"10.1145/1835804.1835944","DOIUrl":"https://doi.org/10.1145/1835804.1835944","url":null,"abstract":"Bayesian network learning algorithms have been widely used for causal discovery since the pioneer work [13,18]. Among all existing algorithms, three-phase dependency analysis algorithm (TPDA) [5] is the most efficient one in the sense that it has polynomial-time complexity. However, there are still some limitations to be improved. First, TPDA depends on mutual information-based conditional independence (CI) tests, and so is not easy to be applied to continuous data. In addition, TPDA uses two phases to get approximate skeletons of Bayesian networks, which is not efficient in practice. In this paper, we propose a two-phase algorithm with partial correlation-based CI tests: the first phase of the algorithm constructs a Markov random field from data, which provides a close approximation to the structure of the true Bayesian network; at the second phase, the algorithm removes redundant edges according to CI tests to get the true Bayesian network. We show that two-phase algorithm with partial correlation-based CI tests can deal with continuous data following arbitrary distributions rather than only Gaussian distribution.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77113562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Beyond heuristics: learning to classify vulnerabilities and predict exploits 超越启发式:学习对漏洞进行分类并预测攻击
M. Bozorgi, L. Saul, S. Savage, G. Voelker
{"title":"Beyond heuristics: learning to classify vulnerabilities and predict exploits","authors":"M. Bozorgi, L. Saul, S. Savage, G. Voelker","doi":"10.1145/1835804.1835821","DOIUrl":"https://doi.org/10.1145/1835804.1835821","url":null,"abstract":"The security demands on modern system administration are enormous and getting worse. Chief among these demands, administrators must monitor the continual ongoing disclosure of software vulnerabilities that have the potential to compromise their systems in some way. Such vulnerabilities include buffer overflow errors, improperly validated inputs, and other unanticipated attack modalities. In 2008, over 7,400 new vulnerabilities were disclosed--well over 100 per week. While no enterprise is affected by all of these disclosures, administrators commonly face many outstanding vulnerabilities across the software systems they manage. Vulnerabilities can be addressed by patches, reconfigurations, and other workarounds; however, these actions may incur down-time or unforeseen side-effects. Thus, a key question for systems administrators is which vulnerabilities to prioritize. From publicly available databases that document past vulnerabilities, we show how to train classifiers that predict whether and how soon a vulnerability is likely to be exploited. As input, our classifiers operate on high dimensional feature vectors that we extract from the text fields, time stamps, cross references, and other entries in existing vulnerability disclosure reports. Compared to current industry-standard heuristics based on expert knowledge and static formulas, our classifiers predict much more accurately whether and how soon individual vulnerabilities are likely to be exploited.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"06 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86307034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 224
Session details: IG track 2: business processes 会议细节:IG专题2:业务流程
T. Senator
{"title":"Session details: IG track 2: business processes","authors":"T. Senator","doi":"10.1145/3248778","DOIUrl":"https://doi.org/10.1145/3248778","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87418776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study 异构异常检测的多核学习:算法与航空安全案例研究
Santanu Das, B. Matthews, A. Srivastava, N. Oza
{"title":"Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study","authors":"Santanu Das, B. Matthews, A. Srivastava, N. Oza","doi":"10.1145/1835804.1835813","DOIUrl":"https://doi.org/10.1145/1835804.1835813","url":null,"abstract":"The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. In this paper, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequences of events in the discrete streams can lead to off-nominal system performance. We discuss the application domain, novel algorithms, and also discuss results on real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87444302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 202
New developments in the theory of clustering 聚类理论的新进展
Suresh Venkatasubramanian, Sergei Vassilvitskii
{"title":"New developments in the theory of clustering","authors":"Suresh Venkatasubramanian, Sergei Vassilvitskii","doi":"10.1145/1835804.1866288","DOIUrl":"https://doi.org/10.1145/1835804.1866288","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87876524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data mining in the online services industry 在线服务行业的数据挖掘
Qi Lu
{"title":"Data mining in the online services industry","authors":"Qi Lu","doi":"10.1145/1835804.1835805","DOIUrl":"https://doi.org/10.1145/1835804.1835805","url":null,"abstract":"The online services industry is a rapidly growing industry with a worldwide online ad market projected to grow from $48 billion in 2011 to $67 billion in 2013, of which 47% will come from display advertising and 53% from search advertising. Online Services Division (OSD) within Microsoft is a leader in the consumer cloud space today with a strong portfolio of a set of 3 mutually reinforcing businesses: Search, Portal, Advertising. They are supported by a shared foundational asset of Intent & Knowledge Stores and a shared technology platform supporting large scale data and high performance systems. MSN (Portal) and Bing (Search) generate the content, traffic and data, that make for an exciting fertile environment for large scale data mining practice and system development. Our advertisers are thus given more valuable targeting opportunities and better ROI, which in turn, provide better economics, usability data, and allows for a higher quality services for our advertisers and experience for our users. The ability to transform data into meaningful, actionable insight is an important source of competitive advantage for OSD. The data mining initiatives within the division continue to strive for excellence around the following goals: actionable insights through deep data analysis, data mining and data modeling at scale and with speed, increased productivity from deployed large scale data systems and tools, improved product and service development and decision making gained from effective measurement and experimentation, and a mature data culture in product teams that made the above possible. With many technical and data challenges ahead of us, we are committed to utilizing our huge data asset well to understand the need, intent, and behavior of our users for the purpose of serving them better.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90219617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast online learning through offline initialization for time-sensitive recommendation 通过离线初始化快速在线学习,用于时间敏感的推荐
D. Agarwal, Bee-Chung Chen, P. Elango
{"title":"Fast online learning through offline initialization for time-sensitive recommendation","authors":"D. Agarwal, Bee-Chung Chen, P. Elango","doi":"10.1145/1835804.1835894","DOIUrl":"https://doi.org/10.1145/1835804.1835894","url":null,"abstract":"Recommender problems with large and dynamic item pools are ubiquitous in web applications like content optimization, online advertising and web search. Despite the availability of rich item meta-data, excess heterogeneity at the item level often requires inclusion of item-specific \"factors\" (or weights) in the model. However, since estimating item factors is computationally intensive, it poses a challenge for time-sensitive recommender problems where it is important to rapidly learn factors for new items (e.g., news articles, event updates, tweets) in an online fashion. In this paper, we propose a novel method called FOBFM (Fast Online Bilinear Factor Model) to learn item-specific factors quickly through online regression. The online regression for each item can be performed independently and hence the procedure is fast, scalable and easily parallelizable. However, the convergence of these independent regressions can be slow due to high dimensionality. The central idea of our approach is to use a large amount of historical data to initialize the online models based on offline features and learn linear projections that can effectively reduce the dimensionality. We estimate the rank of our linear projections by taking recourse to online model selection based on optimizing predictive likelihood. Through extensive experiments, we show that our method significantly and uniformly outperforms other competitive methods and obtains relative lifts that are in the range of 10-15% in terms of predictive log-likelihood, 200-300% for a rank correlation metric on a proprietary My Yahoo! dataset; it obtains 9% reduction in root mean squared error over the previously best method on a benchmark MovieLens dataset using a time-based train/test data split.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77304015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信