Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining最新文献_第3页

Weakly Supervised Spatial Deep Learning based on Imperfect Vector Labels with Registration Errors 基于配准错误的不完全向量标签的弱监督空间深度学习

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467301

Zhe Jiang, Wenchong He, M. Kirby, S. Asiri, Dan Yan

{"title":"Weakly Supervised Spatial Deep Learning based on Imperfect Vector Labels with Registration Errors","authors":"Zhe Jiang, Wenchong He, M. Kirby, S. Asiri, Dan Yan","doi":"10.1145/3447548.3467301","DOIUrl":"https://doi.org/10.1145/3447548.3467301","url":null,"abstract":"This paper studies weakly supervised learning on spatial raster data based on imperfect vector training labels. Given raster feature imagery and imperfect (weak) vector labels with location registration errors, our goal is to learn a deep learning model for pixel classification and refine vector labels simultaneously. The problem is important in many geoscience applications such as streamline delineation and road mapping from earth imagery, where annotating imperfect coarse vector labels is far more efficient than drawing precise labels. But the problem is challenging due to the misalignment of vector labels with raster feature pixels and the need to infer true vector label location while learning neural network parameters. Existing works on weakly supervised learning often focus on noise and errors in label semantics, assuming label locations to be either correct or irrelevant (e.g., identical and independently distributed). A few works exist on label registration errors, but these methods often focus on label misalignment on object segment boundaries at the pixel level without guaranteeing vector continuity. To fill the gap, this paper proposes a spatial learning framework based on Expectation-Maximization that iteratively updates deep neural network parameters while inferring true vector label locations. Specifically, inference of true vector locations is based on both the current pixel class predictions and the geometric properties of vectors. Evaluations on real-world high-resolution remote sensing datasets in National Hydrography Dataset (NHD) refinement show that the proposed framework outperforms baseline methods in classification accuracy and refined vector quality.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121750373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Overview of the 1st Workshop on City Brain Research 第一届城市大脑研究研讨会综述

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3469481

Guanjie Zheng, P. Jenkins, Yanyan Xu, Dongyao Chen

引用次数: 0

Unpaired Generative Molecule-to-Molecule Translation for Lead Optimization 导联优化的未配对生成分子到分子转化

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467120

Guy Barshatski, Kira Radinsky

{"title":"Unpaired Generative Molecule-to-Molecule Translation for Lead Optimization","authors":"Guy Barshatski, Kira Radinsky","doi":"10.1145/3447548.3467120","DOIUrl":"https://doi.org/10.1145/3447548.3467120","url":null,"abstract":"Molecular lead optimization is an important task of drug discovery focusing on generating novel molecules similar to a drug candidate but with enhanced properties. Prior works focused on supervised models requiring datasets of pairs of a molecule and an enhanced molecule. These approaches require large amounts of data and are limited by the bias of the specific examples of enhanced molecules. In this work, we present an unsupervised generative approach with a molecule-embedding component that maps a discrete representation of a molecule to a continuous space. The components are then coupled with a unique training architecture leveraging molecule fingerprints and applying double cycle constraints to enable both chemical resemblance to the original molecular lead while generating novel molecules with enhanced properties. We evaluate our method on multiple common molecular optimization tasks, including dopamine receptor (DRD2) and drug likeness (QED), and show our method outperforms previous state-of-the-art baselines. Moreover, we conduct thorough ablation experiments to show the effect and necessity of important components in our model. Furthermore, we demonstrate our method's ability to generate FDA-approved drugs it has never encountered before, such as Perazine and Clozapine, which are used to treat psychotic disorders, like Schizophrenia. The system is currently being deployed for use in the Targeted Drug Delivery and Personalized Medicine laboratories generating treatments using nanoparticle-based technology.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128048407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

MLHat: Deployable Machine Learning for Security Defense MLHat:安全防御的可部署机器学习

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3469463

Gang Wang, A. Ciptadi, Aliakbar Ahmadzadeh

引用次数: 0

Towards Computing a Near-Maximum Weighted Independent Set on Massive Graphs 计算海量图上的近极大加权独立集

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467232

Jiewei Gu, Weiguo Zheng, Yuzheng Cai, Peng Peng

引用次数: 5

DI-2021: The Second Document Intelligence Workshop DI-2021:第二届文件情报研讨会

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3469454

Benjamin Han, D. Burdick, David Bruce Lewis, Yijuan Lu, Hamid Motahari, Sandeep Tata

引用次数: 1

Creating Recommender Systems Datasets in Scientific Fields 在科学领域创建推荐系统数据集

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3470805

Márcia Barros, Francisco M. Couto, Matilde Pato, Pedro Ruas

{"title":"Creating Recommender Systems Datasets in Scientific Fields","authors":"Márcia Barros, Francisco M. Couto, Matilde Pato, Pedro Ruas","doi":"10.1145/3447548.3470805","DOIUrl":"https://doi.org/10.1145/3447548.3470805","url":null,"abstract":"Recommender systems (RS) have been successfully explored in a vast number of domains, e.g. movies and tv shows, music, or e-commerce. In these domains we have a large number of datasets freely available for testing and evaluating new recommender algorithms. For example, Movielens and Netflix datasets for movies, Spotify for music, and Amazon for e-commerce, which translates into a large number of algorithms applied to these fields. In scientific fields, such as Health and Chemistry, standard and open access datasets with the information about the preferences of the users are scarce. First, it is important to understand the application domain, i.e. \"what the recommended item is\". Second, who are the end users: researchers, pharmacists, clinicians or policy makers. Third, the availability of data. Thus, if we wish to develop an algorithm for recommending scientific items, we do not have access to datasets with information about the past preferences of a group of users. Given this limitation, we developed a methodology, called LIBRETTI - LIterature Based RecommEndaTion of scienTific Items, whose goal is the creation of datasets, related with scientific fields. These datasets are created based on the major resource of knowledge that Science has: scientific literature. We consider the users as the authors of the publications, the items as the scientific entities (for example chemical compounds or diseases), and the ratings as the number of publications an author wrote about an entity. In this tutorial we will approach state-of-the-art recommender systems in scientific fields, explain what is Named Entity Recognition/Linking (NER/NEL) in research literature, and to demonstrate how to create a dataset for recommending drugs and diseases through research literature related to COVID-19. Our goal is to spread the use of LIBRETTI methodology in order to help in the development of recommender algorithms in scientific fields. These datasets are created based on the major resource of knowledge that Science has: scientific literature. We consider the users as the authors of the publications, the items as the scientific entities (for example chemical compounds or diseases), and the ratings as the number of publications an author wrote about an entity. In this tutorial we will approach state-of-the-art recommender systems in scientific fields, explain what is Named Entity Recognition/Linking (NER/NEL) in research literature, and to demonstrate how to create a dataset for recommending drugs and diseases through research literature related to COVID-19. Our goal is to spread the use of LIBRETTI methodology in order to help in the development of recommender algorithms in scientific fields. More info about the tutorial at https://lasigebiotm.github.io/RecSys.Scifi/.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130452990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Purify and Generate: Learning Faithful Item-to-Item Graph from Noisy User-Item Interaction Behaviors 净化和生成:从嘈杂的用户-项目交互行为中学习忠实的项目-项目图

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467205

Yue He, Yancheng Dong, Peng Cui, Yuhang Jiao, Xiaowei Wang, Ji Liu, Philip S. Yu

{"title":"Purify and Generate: Learning Faithful Item-to-Item Graph from Noisy User-Item Interaction Behaviors","authors":"Yue He, Yancheng Dong, Peng Cui, Yuhang Jiao, Xiaowei Wang, Ji Liu, Philip S. Yu","doi":"10.1145/3447548.3467205","DOIUrl":"https://doi.org/10.1145/3447548.3467205","url":null,"abstract":"Matching is almost the first and most fundamental step in recommender systems, that is to quickly select hundreds or thousands of related entities from the whole commodity pool. Among all the matching methods, item-to-item (I2I) graph based matching is a handy and highly effective approach and is widely used in most applications, owing to the essential relationships of entities described in a powerful I2I graph. Yet, the I2I graph is not a ready-made product in a data source. To obtain it from users' behaviors, a common practice in the industry is to construct the graph based on the similarity of item embeddings or co-occurrence frequency directly. However, these methods tend to lose the complicated correlations (high-ordered or nonlinear) inside decision-making actions and cannot achieve the global optimal solution. Moreover, the correlations between items are usually contained in users' short-term actions, which are full of noise information (e.g. spurious association, missing connection). It is vitally important to filter out noise while generating the graph. In this paper, we propose a novel framework called Purified Graph Generation (PGG) dedicated to learn faithful I2I graph from sparse and noisy behavior data. We capture the 'confidence value' between user and item to get rid of exception action during decision making, and leverage it to re-sample purified sets that are fed into an unsupervised I2I graph structure learning framework called GPBG. Extensive experimental results from both simulation and real data demonstrate that our method could significantly benefit the performance of I2I graph compared to the typical baselines.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134131775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Deep Generative Models for Spatial Networks 空间网络的深度生成模型

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467394

Xiaojie Guo, Yuanqi Du, Liang Zhao

{"title":"Deep Generative Models for Spatial Networks","authors":"Xiaojie Guo, Yuanqi Du, Liang Zhao","doi":"10.1145/3447548.3467394","DOIUrl":"https://doi.org/10.1145/3447548.3467394","url":null,"abstract":"Spatial networks represent crucial data structures where the nodes and edges are embedded in a geometric space. Nowadays, spatial network data is becoming increasingly popular and important, ranging from microscale (e.g., protein structures), to middle-scale (e.g., biological neural networks), to macro-scale (e.g., mobility networks). Although, modeling and understanding the generative process of spatial networks are very important, they remain largely under-explored due to the significant challenges in automatically modeling and distinguishing the independency and correlation among various spatial and network factors. To address these challenges, we first propose a novel objective for joint spatial-network disentanglement from the perspective of information bottleneck as well as a novel optimization algorithm to optimize the intractable objective. Based on this, a spatial-network variational autoencoder (SND-VAE) with a new spatial-network message passing neural network (S-MPNN) is proposed to discover the independent and dependent latent factors of spatial and networks. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the superiority of the proposed model over the state-of-the-arts by up to 66.9% for graph generation and 37.3% for interpretability.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133915628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

ODD: Outlier Detection and Description ODD:异常值检测和描述

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3469483

Siddharth Bhatia, Bryan Hooi, L. Akoglu, Sourav Chatterjee, Xiaodong Jiang, Manish Gupta

引用次数: 1