Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献
{"title":"iMIRACLE: an Iterative Multi-View Graph Neural Network to Model Intercellular Gene Regulation from Spatial Transcriptomic Data.","authors":"Ziheng Duan, Siwei Xu, Cheyu Lee, Dylan Riffle, Jing Zhang","doi":"10.1145/3627673.3679574","DOIUrl":"10.1145/3627673.3679574","url":null,"abstract":"<p><p>Spatial transcriptomics has transformed genomic research by measuring spatially resolved gene expressions, allowing us to investigate how cells adapt to their microenvironment via modulating their expressed genes. This essential process usually starts from cell-cell communication (CCC) via ligand-receptor (LR) interaction, leading to regulatory changes within the receiver cell. However, few methods were developed to connect them to provide biological insights into intercellular regulation. To fill this gap, we propose iMiracle, an iterative multi-view graph neural network that models each cell's intercellular regulation with three key features. Firstly, iMiracle integrates inter- and intra-cellular networks to jointly estimate <i>cell-type</i>- and <i>micro-environment</i>-driven gene expressions. Optionally, it allows prior knowledge of intra-cellular networks as pre-structured masks to maintain biological relevance. Secondly, iMiracle employs iterative learning to overcome the sparsity of spatial transcriptomic data and gradually fill in the missing edges in the CCC network. Thirdly, iMiracle infers a cell-specific ligand-gene regulatory score based on the contributions of different LR pairs to interpret inter-cellular regulation. We applied iMiracle to nine simulated and eight real datasets from three sequencing platforms and demonstrated that iMiracle consistently outperformed ten methods in gene expression imputation and four methods in regulatory score inference. Lastly, we developed iMiracle as an open-source software and anticipate that it can be a powerful tool in decoding the complexities of inter-cellular transcriptional regulation.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"538-548"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scACT: Accurate Cross-modality Translation via Cycle-consistent Training from Unpaired Single-cell Data.","authors":"Siwei Xu, Junhao Liu, Jing Zhang","doi":"10.1145/3627673.3679576","DOIUrl":"10.1145/3627673.3679576","url":null,"abstract":"<p><p>Single-cell sequencing technologies have revolutionized genomics by enabling the simultaneous profiling of various molecular modalities within individual cells. Their integration, especially cross-modality translation, offers deep insights into cellular regulatory mechanisms. Many methods have been developed for cross-modality translation, but their reliance on scarce high-quality co-assay data limits their applicability. Addressing this, we introduce scACT, a deep generative model designed to extract cross-modality biological insights from unpaired single-cell data. scACT tackles three major challenges: aligning unpaired multi-modal data via adversarial training, facilitating cross-modality translation without prior knowledge via cycle-consistent training, and enabling interpretable regulatory interconnections explorations via in-silico perturbations. To test its performance, we applied scACT on diverse single-cell datasets and found it outperformed existing methods in all three tasks. Finally, we have developed scACT as an individual open-source software package to advance single-cell omics data processing and analysis within the research community.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"2722-2731"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11611688/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142775547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HypMix: Hyperbolic Representation Learning for Graphs with Mixed Hierarchical and Non-hierarchical Structures.","authors":"Eric W Lee, Bo Xiong, Carl Yang, Joyce C Ho","doi":"10.1145/3627673.3679940","DOIUrl":"10.1145/3627673.3679940","url":null,"abstract":"<p><p>Heterogeneous networks contain multiple types of nodes and links, with some link types encapsulating hierarchical structure over entities. Hierarchical relationships can codify information such as subcategories or one entity being subsumed by another and are often used for organizing conceptual knowledge into a tree-structured graph. Hyperbolic embedding models learn node representations in a hyperbolic space suitable for preserving the hierarchical structure. Unfortunately, current hyperbolic embedding models only implicitly capture the hierarchical structure, failing to distinguish between node types, and they only assume a single tree. In practice, many networks contain a mixture of hierarchical and non-hierarchical structures, and the hierarchical relations may be represented as multiple trees with complex structures, such as sharing certain entities. In this work, we propose a new hyperbolic representation learning model that can handle complex hierarchical structures and also learn the representation of both hierarchical and non-hierarchic structures. We evaluate our model on several datasets, including identifying relevant articles for a systematic review, which is an essential tool for evidence-driven medicine and node classification.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"3852-3856"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11867734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation.","authors":"Baoyu Jing, Dawei Zhou, Kan Ren, Carl Yang","doi":"10.1145/3627673.3679642","DOIUrl":"10.1145/3627673.3679642","url":null,"abstract":"<p><p>Spatiotemporal time series are usually collected via monitoring sensors placed at different locations, which usually contain missing values due to various failures, such as mechanical damages and Internet outages. Imputing the missing values is crucial for analyzing time series. When recovering a specific data point, most existing methods consider all the information relevant to that point regardless of the cause-and-effect relationship. During data collection, it is inevitable that some unknown confounders are included, e.g., background noise in time series and non-causal shortcut edges in the constructed sensor network. These confounders could open backdoor paths and establish non-causal correlations between the input and output. Over-exploiting these non-causal correlations could cause overfitting. In this paper, we first revisit spatiotemporal time series imputation from a causal perspective and show how to block the confounders via the frontdoor adjustment. Based on the results of frontdoor adjustment, we introduce a novel Causality-Aware Spatiotemporal Graph Neural Network (Casper), which contains a novel Prompt Based Decoder (PBD) and a Spatiotemporal Causal Attention (SCA). PBD could reduce the impact of confounders and SCA could discover the sparse causal relationships among embeddings. Theoretical analysis reveals that SCA discovers causal relationships based on the values of gradients. We evaluate Casper on three real-world datasets, and the experimental results show that Casper could outperform the baselines and could effectively discover the causal relationships.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"1027-1037"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Federated Node Classification over Distributed Ego-Networks with Secure Contrastive Embedding Sharing.","authors":"Han Xie, Li Xiong, Carl Yang","doi":"10.1145/3627673.3679834","DOIUrl":"https://doi.org/10.1145/3627673.3679834","url":null,"abstract":"<p><p>Federated learning on graphs (a.k.a., federated graph learning- FGL) has recently received increasing attention due to its capacity to enable collaborative learning over distributed graph datasets without compromising local clients' data privacy. In previous works, clients of FGL typically represent institutes or organizations that possess sets of entire graphs (e.g., molecule graphs in biochemical research) or parts of a larger graph (e.g., sub-user networks of e-commerce platforms). However, another natural paradigm exists where clients act as remote devices retaining the graph structures of local neighborhoods centered around the device owners (i.e., ego-networks), which can be modeled for specific graph applications such as user profiling on social ego-networks and infection prediction on contact ego-networks. FGL in such novel yet realistic ego-network settings faces the unique challenge of incomplete neighborhood information for non-ego local nodes since they likely appear and have different sets of neighbors in multiple ego-networks. To address this challenge, we propose an FGL method for distributed ego-networks in which clients obtain complete neighborhood information of local nodes through sharing node embeddings with other clients. A contrastive learning mechanism is proposed to bridge the gap between local and global node embeddings and stabilize the local training of graph neural network models, while a secure embedding sharing protocol is employed to protect individual node identity and embedding privacy against the server and other clients. Comprehensive experiments on various distributed ego-network datasets successfully demonstrate the effectiveness of our proposed embedding sharing method on top of different federated model sharing frameworks, and we also provide discussions on the potential efficiency and privacy drawbacks of the method as well as their future mitigation.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2024 ","pages":"2607-2617"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142775542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling Health Data Sharing with Fine-Grained Privacy.","authors":"Luca Bonomi, Sepand Gousheh, Liyue Fan","doi":"10.1145/3583780.3614864","DOIUrl":"10.1145/3583780.3614864","url":null,"abstract":"<p><p>Sharing health data is vital in advancing medical research and transforming knowledge into clinical practice. Meanwhile, protecting the privacy of data contributors is of paramount importance. To that end, several privacy approaches have been proposed to protect individual data contributors in data sharing, including data anonymization and data synthesis techniques. These approaches have shown promising results in providing privacy protection at the dataset level. In this work, we study the privacy challenges in enabling fine-grained privacy in health data sharing. Our work is motivated by recent research findings, in which patients and healthcare providers may have different privacy preferences and policies that need to be addressed. Specifically, we propose a novel and effective privacy solution that enables data curators (e.g., healthcare providers) to protect sensitive data elements while preserving data usefulness. Our solution builds on randomized techniques to provide rigorous privacy protection for sensitive elements and leverages graphical models to mitigate privacy leakage due to dependent elements. To enhance the usefulness of the shared data, our randomized mechanism incorporates domain knowledge to preserve semantic similarity and adopts a block-structured design to minimize utility loss. Evaluations with real-world health data demonstrate the effectiveness of our approach and the usefulness of the shared data for health applications.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2023 ","pages":"131-141"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10601092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71429999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashis Kumar Chanda, Tian Bai, Brian L Egleston, Slobodan Vucetic
{"title":"MedCV: An Interactive Visualization System for Patient Cohort Identification from Medical Claim Data.","authors":"Ashis Kumar Chanda, Tian Bai, Brian L Egleston, Slobodan Vucetic","doi":"10.1145/3511808.3557157","DOIUrl":"10.1145/3511808.3557157","url":null,"abstract":"<p><p>Healthcare providers generate a medical claim after every patient visit. A medical claim consists of a list of medical codes describing the diagnosis and any treatment provided during the visit. Medical claims have been popular in medical research as a data source for retrospective cohort studies. This paper introduces a medical claim visualization system (MedCV) that supports cohort selection from medical claim data. MedCV was developed as part of a design study in collaboration with clinical researchers and statisticians. It helps a researcher to define inclusion rules for cohort selection by revealing relationships between medical codes and visualizing medical claims and patient timelines. Evaluation of our system through a user study indicates that MedCV enables domain experts to define high-quality inclusion rules in a time-efficient manner.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2022 ","pages":"4828-4832"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9830554/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9098325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark.","authors":"Jiasheng Sheng, Zelalem Gero, Joyce C Ho","doi":"10.1145/3511808.3557675","DOIUrl":"https://doi.org/10.1145/3511808.3557675","url":null,"abstract":"<p><p>With the ever-increasing abundance of biomedical articles, improving the accuracy of keyword search results becomes crucial for ensuring reproducible research. However, keyword extraction for biomedical articles is hard due to the existence of obscure keywords and the lack of a comprehensive benchmark. PubMedAKE is an author-assigned keyword extraction dataset that contains the title, abstract, and keywords of over 843,269 articles from the PubMed open access subset database. This dataset, publicly available on Zenodo, is the largest keyword extraction benchmark with sufficient samples to train neural networks. Experimental results using state-of-the-art baseline methods illustrate the need for developing automatic keyword extraction methods for biomedical literature.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"4470-4474"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9652778/pdf/nihms-1846241.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40687330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Product Searches to Conversational Agents for E-Commerce","authors":"G. D. Fabbrizio","doi":"10.1145/3511808.3557514","DOIUrl":"https://doi.org/10.1145/3511808.3557514","url":null,"abstract":"","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"129 1","pages":"5085"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73665054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Selman Aydin, Yu-Jung Ko, Utku Uckun, I V Ramakrishnan, Vikas Ashok
{"title":"Non-Visual Accessibility Assessment of Videos.","authors":"Ali Selman Aydin, Yu-Jung Ko, Utku Uckun, I V Ramakrishnan, Vikas Ashok","doi":"10.1145/3459637.3482457","DOIUrl":"https://doi.org/10.1145/3459637.3482457","url":null,"abstract":"<p><p>Video accessibility is crucial for blind screen-reader users as online videos are increasingly playing an essential role in education, employment, and entertainment. While there exist quite a few techniques and guidelines that focus on creating accessible videos, there is a dearth of research that attempts to characterize the accessibility of existing videos. Therefore in this paper, we define and investigate a diverse set of video and audio-based accessibility features in an effort to characterize accessible and inaccessible videos. As a ground truth for our investigation, we built a custom dataset of 600 videos, in which each video was assigned an accessibility <i>score</i> based on the number of its wins in a Swiss-system tournament, where human annotators performed pairwise accessibility comparisons of videos. In contrast to existing accessibility research where the assessments are typically done by blind users, we recruited sighted users for our effort, since videos comprise a special case where sight could be required to better judge if any particular scene in a video is presently accessible or not. Subsequently, by examining the extent of association between the accessibility features and the accessibility scores, we could determine the features that signifcantly (positively or negatively) impact video accessibility and therefore serve as good indicators for assessing the accessibility of videos. Using the custom dataset, we also trained machine learning models that leveraged our handcrafted features to either classify an arbitrary video as accessible/inaccessible or predict an accessibility score for the video. Evaluation of our models yielded an <i>F</i> <sub>1</sub> score of 0.675 for binary classification and a mean absolute error of 0.53 for score prediction, thereby demonstrating their potential in video accessibility assessment while also illuminating their current limitations and the need for further research in this area.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2021 ","pages":"58-67"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8845074/pdf/nihms-1777380.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39931156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}