Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining最新文献_第9页

Session details: Research track 9: clustering 会议细节:研究专题9:集群

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/3248789

Murat Dundar

引用次数: 0

Semi-supervised sparse metric learning using alternating linearization optimization 使用交替线性化优化的半监督稀疏度量学习

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835947

Wei Liu, Shiqian Ma, D. Tao, Jianzhuang Liu, Peng Liu

{"title":"Semi-supervised sparse metric learning using alternating linearization optimization","authors":"Wei Liu, Shiqian Ma, D. Tao, Jianzhuang Liu, Peng Liu","doi":"10.1145/1835804.1835947","DOIUrl":"https://doi.org/10.1145/1835804.1835947","url":null,"abstract":"In plenty of scenarios, data can be represented as vectors and then mathematically abstracted as points in a Euclidean space. Because a great number of machine learning and data mining applications need proximity measures over data, a simple and universal distance metric is desirable, and metric learning methods have been explored to produce sensible distance measures consistent with data relationship. However, most existing methods suffer from limited labeled data and expensive training. In this paper, we address these two issues through employing abundant unlabeled data and pursuing sparsity of metrics, resulting in a novel metric learning approach called semi-supervised sparse metric learning. Two important contributions of our approach are: 1) it propagates scarce prior affinities between data to the global scope and incorporates the full affinities into the metric learning; and 2) it uses an efficient alternating linearization method to directly optimize the sparse metric. Compared with conventional methods, ours can effectively take advantage of semi-supervision and automatically discover the sparse metric structure underlying input data patterns. We demonstrate the efficacy of the proposed approach with extensive experiments carried out on six datasets, obtaining clear performance gains over the state-of-the-arts.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79473463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

Text mining to fast-track deserving disability applicants 文本挖掘，以快速跟踪应得的残疾申请人

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1866814.1866819

J. Elder

引用次数: 0

Session details: Research track 22: transfer and multi-task learning 议题22:迁移和多任务学习

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/3248802

P. Yu

引用次数: 0

Modeling relational events via latent classes 通过潜在类对关系事件建模

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835906

Christopher DuBois, Padhraic Smyth

引用次数: 33

The next generation of transportation systems,greenhouse emissions, and data mining 下一代交通系统、温室气体排放和数据挖掘

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835956

H. Kargupta, João Gama, W. Fan

{"title":"The next generation of transportation systems,greenhouse emissions, and data mining","authors":"H. Kargupta, João Gama, W. Fan","doi":"10.1145/1835804.1835956","DOIUrl":"https://doi.org/10.1145/1835804.1835956","url":null,"abstract":"Controling Greenhouse gas (GHG) emissions for minimizing the impact on the environment is one of the major challenges in front of the human civilization. Although future concentrations, damages and costs are unknown, it is widely recognized that major emissions reduction efforts are needed. In 1997, the Kyoto Protocol promoted by the United Nations Framework Convention on Climate Change, aimed at fighting global warming. The main goal is “stabilization of greenhouse gas concentrations in the atmosphere at a level that would prevent dangerous anthropogenic interference with the climate system” [9]. According to the International Energy Agency [1], energy efficient in buildings, industrial processes and transportation could reduce the world’s energy needs in 2050 by one third, and help controlling global emissions of greenhouse gases. The report [1] describes a series of scenarios showing how key energy technologies can reduce emissions of carbon dioxide, the greenhouse gas which is most responsible for climate change. Of the four primary GHG under scrutiny, carbon dioxide (CO2), and the need to lower carbon emissions in general, is of paramount concern. It is estimated that transportation activities are responsible for approximately 25% to 30% of total U.S. GHG emissions, with the on-highway commercial truck market accounting for over 45% of transportation GHG. However, the transportation sector emissions remain almost entirely unaddressed with respect to GHG and CO2 reduction. The Intergovernmental Panel on Climate Change (IPCC) provided guidelines for calculating carbon emission offer estimations only for certain common types of fuels; even the","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85842927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

The topic-perspective model for social tagging systems 社会标签系统的主题视角模型

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835891

Caimei Lu, Xiaohua Hu, Xin Chen, Jung-ran Park, Tingting He, Zhoujun Li

引用次数: 36

Session details: Research track 21: KDD methodology 会议细节:研究专场21:KDD方法论

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/3248801

Gregory Piatetsky

引用次数: 0

Scalable influence maximization for prevalent viral marketing in large-scale social networks 大规模社交网络中流行的病毒式营销的可扩展影响最大化

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835934

Wei Chen, Chi Wang, Yajun Wang

{"title":"Scalable influence maximization for prevalent viral marketing in large-scale social networks","authors":"Wei Chen, Chi Wang, Yajun Wang","doi":"10.1145/1835804.1835934","DOIUrl":"https://doi.org/10.1145/1835804.1835934","url":null,"abstract":"Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in large-scale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads. In this paper, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm. Our results from extensive simulations on several real-world and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond million-sized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread --- it is always among the best algorithms, and in most cases it significantly outperforms all other scalable heuristics to as much as 100%--260% increase in influence spread.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83696353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1668

Active learning for biomedical citation screening 生物医学引文筛选的主动学习

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI: 10.1145/1835804.1835829

Byron C. Wallace, Kevin Small, C. Brodley, T. Trikalinos

{"title":"Active learning for biomedical citation screening","authors":"Byron C. Wallace, Kevin Small, C. Brodley, T. Trikalinos","doi":"10.1145/1835804.1835829","DOIUrl":"https://doi.org/10.1145/1835804.1835829","url":null,"abstract":"Active learning (AL) is an increasingly popular strategy for mitigating the amount of labeled data required to train classifiers, thereby reducing annotator effort. We describe a real-world, deployed application of AL to the problem of biomedical citation screening for systematic reviews at the Tufts Medical Center's Evidence-based Practice Center. We propose a novel active learning strategy that exploits a priori domain knowledge provided by the expert (specifically, labeled features)and extend this model via a Linear Programming algorithm for situations where the expert can provide ranked labeled features. Our methods outperform existing AL strategies on three real-world systematic review datasets. We argue that evaluation must be specific to the scenario under consideration. To this end, we propose a new evaluation framework for finite-pool scenarios, wherein the primary aim is to label a fixed set of examples rather than to simply induce a good predictive model. We use a method from medical decision theory for eliciting the relative costs of false positives and false negatives from the domain expert, constructing a utility measure of classification performance that integrates the expert preferences. Our findings suggest that the expert can, and should, provide more information than instance labels alone. In addition to achieving strong empirical results on the citation screening problem, this work outlines many important steps for moving away from simulated active learning and toward deploying AL for real-world applications.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90373427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 124