Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献
Ali Selman Aydin, Yu-Jung Ko, Utku Uckun, I V Ramakrishnan, Vikas Ashok
{"title":"Non-Visual Accessibility Assessment of Videos.","authors":"Ali Selman Aydin, Yu-Jung Ko, Utku Uckun, I V Ramakrishnan, Vikas Ashok","doi":"10.1145/3459637.3482457","DOIUrl":"https://doi.org/10.1145/3459637.3482457","url":null,"abstract":"<p><p>Video accessibility is crucial for blind screen-reader users as online videos are increasingly playing an essential role in education, employment, and entertainment. While there exist quite a few techniques and guidelines that focus on creating accessible videos, there is a dearth of research that attempts to characterize the accessibility of existing videos. Therefore in this paper, we define and investigate a diverse set of video and audio-based accessibility features in an effort to characterize accessible and inaccessible videos. As a ground truth for our investigation, we built a custom dataset of 600 videos, in which each video was assigned an accessibility <i>score</i> based on the number of its wins in a Swiss-system tournament, where human annotators performed pairwise accessibility comparisons of videos. In contrast to existing accessibility research where the assessments are typically done by blind users, we recruited sighted users for our effort, since videos comprise a special case where sight could be required to better judge if any particular scene in a video is presently accessible or not. Subsequently, by examining the extent of association between the accessibility features and the accessibility scores, we could determine the features that signifcantly (positively or negatively) impact video accessibility and therefore serve as good indicators for assessing the accessibility of videos. Using the custom dataset, we also trained machine learning models that leveraged our handcrafted features to either classify an arbitrary video as accessible/inaccessible or predict an accessibility score for the video. Evaluation of our models yielded an <i>F</i> <sub>1</sub> score of 0.675 for binary classification and a mean absolute error of 0.53 for score prediction, thereby demonstrating their potential in video accessibility assessment while also illuminating their current limitations and the need for further research in this area.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2021 ","pages":"58-67"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8845074/pdf/nihms-1777380.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39931156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, Joyce C Ho
{"title":"Temporal Network Embedding via Tensor Factorization.","authors":"Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, Joyce C Ho","doi":"10.1145/3459637.3482200","DOIUrl":"10.1145/3459637.3482200","url":null,"abstract":"<p><p>Representation learning on static graph-structured data has shown a significant impact on many real-world applications. However, less attention has been paid to the evolving nature of temporal networks, in which the edges are often changing over time. The embeddings of such temporal networks should encode both graph-structured information and the temporally evolving pattern. Existing approaches in learning temporally evolving network representations fail to capture the temporal interdependence. In this paper, we propose Toffee, a novel approach for temporal network representation learning based on tensor decomposition. Our method exploits the tensor-tensor product operator to encode the cross-time information, so that the periodic changes in the evolving networks can be captured. Experimental results demonstrate that Toffee outperforms existing methods on multiple real-world temporal networks in generating effective embeddings for the link prediction tasks.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":" ","pages":"3313-3317"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9652776/pdf/nihms-1846391.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40704234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subsampled Randomized Hadamard Transform for Regression of Dynamic Graphs","authors":"M. H. Chehreghani","doi":"10.1145/3340531.3412158","DOIUrl":"https://doi.org/10.1145/3340531.3412158","url":null,"abstract":"A well-known problem in data science and machine learning is linear regression, which is recently extended to dynamic graphs. Existing exact algorithms for updating solutions of dynamic graph regression require at least a linear time (in terms of n: the number of nodes of the graph). However, this time complexity might be intractable in practice. In this paper, we utilize subsampled randomized Hadamard transform to propose a randomized algorithm for dynamic graphs. Suppose that we are given an nxm matrix embedding M of the graph, where m ⇐ n. Let r be the number of samples required for a guaranteed approximation error, which is a sublinear function of n. After an edge insertion or an edge deletion in the graph, our algorithm updates the approximate solution in O(rm) time.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"10 1","pages":"2045-2048"},"PeriodicalIF":0.0,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78563697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Active Learning with Overlapping Regions.","authors":"Zhipeng Luo, Milos Hauskrecht","doi":"10.1145/3340531.3412022","DOIUrl":"https://doi.org/10.1145/3340531.3412022","url":null,"abstract":"<p><p>Learning of classification models from real-world data often requires substantial human effort devoted to <i>instance</i> annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - <i>region</i>-based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a <i>subpopulation</i> of data instances; the region's label is a human assessment of the class <i>proportion</i> of the data subpopulation. By using <i>learning from label proportions</i> algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a <i>hierarchical active learning</i> solution that aims at incrementally building a <i>concise</i> hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2020 ","pages":"1045-1054"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3340531.3412022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38632888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio Mallia, Michal Siedlaczek, Torsten Suel, M. Zahran
{"title":"GPU-Accelerated Decoding of Integer Lists","authors":"Antonio Mallia, Michal Siedlaczek, Torsten Suel, M. Zahran","doi":"10.1145/3357384.3358067","DOIUrl":"https://doi.org/10.1145/3357384.3358067","url":null,"abstract":"An inverted index is the basic data structure used in most current large-scale information retrieval systems. It can be modeled as a collection of sorted sequences of integers. Many compression techniques for inverted indexes have been studied in the past, with some of them reaching tremendous decompression speeds through the use of SIMD instructions available on modern CPUs. While there has been some work on query processing algorithms for Graphics Processing Units (GPUs), little of it has focused on how to efficiently access compressed index structures, and we see some potential for significant improvements in decompression speed.\u0000 In this paper, we describe and implement two encoding schemes for index decompression on GPU architectures. Their format and decoding algorithm is adapted from existing CPU-based compression methods to exploit the execution model and memory hierarchy offered by GPUs. We show that our solutions, GPU-BP and GPU-VByte, achieve significant speedups over their already carefully optimized CPU counterparts.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"19 30 1","pages":"2193-2196"},"PeriodicalIF":0.0,"publicationDate":"2019-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78160482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C Ho, Li Xiong, Xiaoqian Jiang
{"title":"Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis.","authors":"Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C Ho, Li Xiong, Xiaoqian Jiang","doi":"10.1145/3357384.3357878","DOIUrl":"10.1145/3357384.3357878","url":null,"abstract":"<p><p>Tensor factorization has been demonstrated as an efficient approach for computational phenotyping, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient information. Therefore, the challenge is how to jointly decompose the tensor under rigorous and principled privacy constraints, while still support the model's interpretability. We propose DPFact, a privacy-preserving collaborative tensor factorization method for computational phenotyping using EHR. It embeds advanced privacy-preserving mechanisms with collaborative learning. Hospitals can keep their EHR database private but also collaboratively learn meaningful clinical concepts by sharing differentially private intermediary results. Moreover, DPFact solves the heterogeneous patient population using a structured sparsity term. In our framework, each hospital decomposes its local tensors and sends the updated intermediary results with output perturbation every several iterations to a semi-trusted server which generates the phenotypes. The evaluation on both real-world and synthetic datasets demonstrated that under strict privacy constraints, our method is more accurate and communication-efficient than state-of-the-art baseline methods.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2019 ","pages":"1291-1300"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6940039/pdf/nihms-1052726.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37508089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ardavan Afshar, Ioakeim Perros, Evangelos E Papalexakis, Elizabeth Searles, Joyce Ho, Jimeng Sun
{"title":"COPA: Constrained PARAFAC2 for Sparse & Large Datasets.","authors":"Ardavan Afshar, Ioakeim Perros, Evangelos E Papalexakis, Elizabeth Searles, Joyce Ho, Jimeng Sun","doi":"10.1145/3269206.3271775","DOIUrl":"10.1145/3269206.3271775","url":null,"abstract":"<p><p>PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a <i>CO</i>nstrained <i>PA</i>RAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36× faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Through a case study on temporal phenotyping of medically complex children, we demonstrate how the constraints imposed by COPA reveal concise phenotypes and meaningful temporal profiles of patients. The clinical interpretation of both the phenotypes and the temporal profiles was confirmed by a medical expert.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2018 ","pages":"793-802"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7472553/pdf/nihms-1619557.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38361347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Graph Embedding for Ranking Optimization in E-commerce.","authors":"Chen Chu, Zhao Li, Beibei Xin, Fengchao Peng, Chuanren Liu, Remo Rohs, Qiong Luo, Jingren Zhou","doi":"10.1145/3269206.3272028","DOIUrl":"https://doi.org/10.1145/3269206.3272028","url":null,"abstract":"<p><p>Matching buyers with most suitable sellers providing relevant items (e.g., products) is essential for e-commerce platforms to guarantee customer experience. This matching process is usually achieved through modeling inter-group (buyer-seller) proximity by e-commerce ranking systems. However, current ranking systems often match buyers with sellers of various qualities, and the mismatch is detrimental to not only buyers' level of satisfaction but also the platforms' return on investment (ROI). In this paper, we address this problem by incorporating intra-group structural information (e.g., buyer-buyer proximity implied by buyer attributes) into the ranking systems. Specifically, we propose <b>De</b>ep <b>Gr</b>aph <b>E</b>mb<b>e</b>dding (DEGREE), a deep learning based method, to exploit both inter-group and intra-group proximities jointly for structural learning. With a sparse filtering technique, DEGREE can significantly improve the matching performance with computation resources less than that of alternative deep learning based methods. Experimental results demonstrate that DEGREE outperforms state-of-the-art graph embedding methods on real-world e-commence datasets. In particular, our solution boosts the average unit price in purchases during an online A/B test by up to 11.93%, leading to better operational efficiency and shopping experience.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2018 ","pages":"2007-2015"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3269206.3272028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36867253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Personalized Predictive Framework for Multivariate Clinical Time Series via Adaptive Model Selection.","authors":"Zitao Liu, Milos Hauskrecht","doi":"10.1145/3132847.3132859","DOIUrl":"https://doi.org/10.1145/3132847.3132859","url":null,"abstract":"<p><p>Building of an accurate predictive model of clinical time series for a patient is critical for understanding of the patient condition, its dynamics, and optimal patient management. Unfortunately, this process is not straightforward. First, patient-specific variations are typically large and population-based models derived or learned from many different patients are often unable to support accurate predictions for each individual patient. Moreover, time series observed for one patient at any point in time may be too short and insufficient to learn a high-quality patient-specific model just from the patient's own data. To address these problems we propose, develop and experiment with a new adaptive forecasting framework for building multivariate clinical time series models for a patient and for supporting patient-specific predictions. The framework relies on the adaptive model switching approach that at any point in time selects the most promising time series model out of the pool of many possible models, and consequently, combines advantages of the population, patient-specific and short-term individualized predictive models. We demonstrate that the adaptive model switching framework is very promising approach to support personalized time series prediction, and that it is able to outperform predictions based on pure population and patient-specific models, as well as, other patient-specific model adaptation strategies.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2017 ","pages":"1169-1177"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3132847.3132859","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35704480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace
{"title":"A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.","authors":"Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace","doi":"10.1145/3132847.3132989","DOIUrl":"10.1145/3132847.3132989","url":null,"abstract":"<p><p>We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the <i>PICO</i> elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of <i>candidate concepts</i> for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2017 ","pages":"1519-1528"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5752318/pdf/nihms927025.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35714383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}