Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献

筛选
英文 中文
Session details: Research track 9: clustering 会议细节:研究专题9:集群
Murat Dundar
{"title":"Session details: Research track 9: clustering","authors":"Murat Dundar","doi":"10.1145/3248789","DOIUrl":"https://doi.org/10.1145/3248789","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76928007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised sparse metric learning using alternating linearization optimization 使用交替线性化优化的半监督稀疏度量学习
Wei Liu, Shiqian Ma, D. Tao, Jianzhuang Liu, Peng Liu
{"title":"Semi-supervised sparse metric learning using alternating linearization optimization","authors":"Wei Liu, Shiqian Ma, D. Tao, Jianzhuang Liu, Peng Liu","doi":"10.1145/1835804.1835947","DOIUrl":"https://doi.org/10.1145/1835804.1835947","url":null,"abstract":"In plenty of scenarios, data can be represented as vectors and then mathematically abstracted as points in a Euclidean space. Because a great number of machine learning and data mining applications need proximity measures over data, a simple and universal distance metric is desirable, and metric learning methods have been explored to produce sensible distance measures consistent with data relationship. However, most existing methods suffer from limited labeled data and expensive training. In this paper, we address these two issues through employing abundant unlabeled data and pursuing sparsity of metrics, resulting in a novel metric learning approach called semi-supervised sparse metric learning. Two important contributions of our approach are: 1) it propagates scarce prior affinities between data to the global scope and incorporates the full affinities into the metric learning; and 2) it uses an efficient alternating linearization method to directly optimize the sparse metric. Compared with conventional methods, ours can effectively take advantage of semi-supervision and automatically discover the sparse metric structure underlying input data patterns. We demonstrate the efficacy of the proposed approach with extensive experiments carried out on six datasets, obtaining clear performance gains over the state-of-the-arts.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79473463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Text mining to fast-track deserving disability applicants 文本挖掘,以快速跟踪应得的残疾申请人
J. Elder
{"title":"Text mining to fast-track deserving disability applicants","authors":"J. Elder","doi":"10.1145/1866814.1866819","DOIUrl":"https://doi.org/10.1145/1866814.1866819","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80625389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Research track 22: transfer and multi-task learning 议题22:迁移和多任务学习
P. Yu
{"title":"Session details: Research track 22: transfer and multi-task learning","authors":"P. Yu","doi":"10.1145/3248802","DOIUrl":"https://doi.org/10.1145/3248802","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79600023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling relational events via latent classes 通过潜在类对关系事件建模
Christopher DuBois, Padhraic Smyth
{"title":"Modeling relational events via latent classes","authors":"Christopher DuBois, Padhraic Smyth","doi":"10.1145/1835804.1835906","DOIUrl":"https://doi.org/10.1145/1835804.1835906","url":null,"abstract":"Many social networks can be characterized by a sequence of dyadic interactions between individuals. Techniques for analyzing such events are of increasing interest. In this paper, we describe a generative model for dyadic events, where each event arises from one of C latent classes, and the properties of the event (sender, recipient, and type) are chosen from distributions over these entities conditioned on the chosen class. We present two algorithms for inference in this model: an expectation-maximization algorithm as well as a Markov chain Monte Carlo procedure based on collapsed Gibbs sampling. To analyze the model's predictive accuracy, the algorithms are applied to multiple real-world data sets involving email communication, international political events, and animal behavior data.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83238041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
The next generation of transportation systems,greenhouse emissions, and data mining 下一代交通系统、温室气体排放和数据挖掘
H. Kargupta, João Gama, W. Fan
{"title":"The next generation of transportation systems,greenhouse emissions, and data mining","authors":"H. Kargupta, João Gama, W. Fan","doi":"10.1145/1835804.1835956","DOIUrl":"https://doi.org/10.1145/1835804.1835956","url":null,"abstract":"Controling Greenhouse gas (GHG) emissions for minimizing the impact on the environment is one of the major challenges in front of the human civilization. Although future concentrations, damages and costs are unknown, it is widely recognized that major emissions reduction efforts are needed. In 1997, the Kyoto Protocol promoted by the United Nations Framework Convention on Climate Change, aimed at fighting global warming. The main goal is “stabilization of greenhouse gas concentrations in the atmosphere at a level that would prevent dangerous anthropogenic interference with the climate system” [9]. According to the International Energy Agency [1], energy efficient in buildings, industrial processes and transportation could reduce the world’s energy needs in 2050 by one third, and help controlling global emissions of greenhouse gases. The report [1] describes a series of scenarios showing how key energy technologies can reduce emissions of carbon dioxide, the greenhouse gas which is most responsible for climate change. Of the four primary GHG under scrutiny, carbon dioxide (CO2), and the need to lower carbon emissions in general, is of paramount concern. It is estimated that transportation activities are responsible for approximately 25% to 30% of total U.S. GHG emissions, with the on-highway commercial truck market accounting for over 45% of transportation GHG. However, the transportation sector emissions remain almost entirely unaddressed with respect to GHG and CO2 reduction. The Intergovernmental Panel on Climate Change (IPCC) provided guidelines for calculating carbon emission offer estimations only for certain common types of fuels; even the","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85842927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
The topic-perspective model for social tagging systems 社会标签系统的主题视角模型
Caimei Lu, Xiaohua Hu, Xin Chen, Jung-ran Park, Tingting He, Zhoujun Li
{"title":"The topic-perspective model for social tagging systems","authors":"Caimei Lu, Xiaohua Hu, Xin Chen, Jung-ran Park, Tingting He, Zhoujun Li","doi":"10.1145/1835804.1835891","DOIUrl":"https://doi.org/10.1145/1835804.1835891","url":null,"abstract":"In this paper, we propose a new probabilistic generative model, called Topic-Perspective Model, for simulating the generation process of social annotations. Different from other generative models, in our model, the tag generation process is separated from the content term generation process. While content terms are only generated from resource topics, social tags are generated by resource topics and user perspectives together. The proposed probabilistic model can produce more useful information than any other models proposed before. The parameters learned from this model include: (1) the topical distribution of each document, (2) the perspective distribution of each user, (3) the word distribution of each topic, (4) the tag distribution of each topic, (5) the tag distribution of each user perspective, (6) and the probabilistic of each tag being generated from resource topics or user perspectives. Experimental results show that the proposed model has better generalization performance or tag prediction ability than other two models proposed in previous research.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83826387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Session details: Research track 21: KDD methodology 会议细节:研究专场21:KDD方法论
Gregory Piatetsky
{"title":"Session details: Research track 21: KDD methodology","authors":"Gregory Piatetsky","doi":"10.1145/3248801","DOIUrl":"https://doi.org/10.1145/3248801","url":null,"abstract":"","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78934991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable influence maximization for prevalent viral marketing in large-scale social networks 大规模社交网络中流行的病毒式营销的可扩展影响最大化
Wei Chen, Chi Wang, Yajun Wang
{"title":"Scalable influence maximization for prevalent viral marketing in large-scale social networks","authors":"Wei Chen, Chi Wang, Yajun Wang","doi":"10.1145/1835804.1835934","DOIUrl":"https://doi.org/10.1145/1835804.1835934","url":null,"abstract":"Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in large-scale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads. In this paper, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm. Our results from extensive simulations on several real-world and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond million-sized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread --- it is always among the best algorithms, and in most cases it significantly outperforms all other scalable heuristics to as much as 100%--260% increase in influence spread.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83696353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1668
Active learning for biomedical citation screening 生物医学引文筛选的主动学习
Byron C. Wallace, Kevin Small, C. Brodley, T. Trikalinos
{"title":"Active learning for biomedical citation screening","authors":"Byron C. Wallace, Kevin Small, C. Brodley, T. Trikalinos","doi":"10.1145/1835804.1835829","DOIUrl":"https://doi.org/10.1145/1835804.1835829","url":null,"abstract":"Active learning (AL) is an increasingly popular strategy for mitigating the amount of labeled data required to train classifiers, thereby reducing annotator effort. We describe a real-world, deployed application of AL to the problem of biomedical citation screening for systematic reviews at the Tufts Medical Center's Evidence-based Practice Center. We propose a novel active learning strategy that exploits a priori domain knowledge provided by the expert (specifically, labeled features)and extend this model via a Linear Programming algorithm for situations where the expert can provide ranked labeled features. Our methods outperform existing AL strategies on three real-world systematic review datasets. We argue that evaluation must be specific to the scenario under consideration. To this end, we propose a new evaluation framework for finite-pool scenarios, wherein the primary aim is to label a fixed set of examples rather than to simply induce a good predictive model. We use a method from medical decision theory for eliciting the relative costs of false positives and false negatives from the domain expert, constructing a utility measure of classification performance that integrates the expert preferences. Our findings suggest that the expert can, and should, provide more information than instance labels alone. In addition to achieving strong empirical results on the citation screening problem, this work outlines many important steps for moving away from simulated active learning and toward deploying AL for real-world applications.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90373427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信