Proceedings of the 30th ACM International Conference on Information & Knowledge Management最新文献

筛选
英文 中文
Jura
Zhengqi Xu, Yixuan Cao, Rongyu Cao, Guoxiang Li, Xuanqiang Liu, Yan Pang, Yangbin Wang, Jianfei Zhang, Allie Cheung, Matthew Tam, Lukas Petrikas, Ping Luo
{"title":"Jura","authors":"Zhengqi Xu, Yixuan Cao, Rongyu Cao, Guoxiang Li, Xuanqiang Liu, Yan Pang, Yangbin Wang, Jianfei Zhang, Allie Cheung, Matthew Tam, Lukas Petrikas, Ping Luo","doi":"10.1145/3459637.3481929","DOIUrl":"https://doi.org/10.1145/3459637.3481929","url":null,"abstract":"The initial public offering (IPO) market in Hong Kong is consistently one of the largest in the world. As part of its regulatory responsibilities, Hong Kong Exchanges and Clearing Limited (HKEX) reviews annual reports published by listed companies (issuers). The number of issuers has grown at a fast pace, reaching 2,538 as the end of 2020. This poses a challenge for manually reviewing these annual reports against the many diverse regulatory obligations (listing rules). We propose a system named Jura to improve the efficiency of annual report reviewing with the help of machine learning methods. This system checks the compliance of an issuer's published information against listing rules in four steps: panoptic document recognition, relevant passage location, fine-grained information extraction, and compliance assessment. This paper introduces in detail the passage location step, how it is critical for speeding up compliance assessment, and the various challenges faced. We argue that although a passage is a relatively independent unit, it needs to be combined with document structure and contextual information to accurately locate the relevant passages. With the help of Jura, HKEX reports saving 80% of the time on reviewing issuers' annual reports.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115061674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Scalable Contrast Pattern Mining over Data Streams 数据流上的可伸缩对比模式挖掘
E. Chavary, S. Erfani, C. Leckie
{"title":"Scalable Contrast Pattern Mining over Data Streams","authors":"E. Chavary, S. Erfani, C. Leckie","doi":"10.1145/3459637.3482174","DOIUrl":"https://doi.org/10.1145/3459637.3482174","url":null,"abstract":"Incremental contrast pattern mining (CPM) is an important task in various fields such as network traffic analysis, medical diagnosis, and customer behavior analysis. Due to increases in the speed and dimension of data streams, a major challenge for CPM is to deal with the huge number of generated candidate patterns. While there are some works on incremental CPM, their approaches are not scalable in dense and high dimensional data streams, and the problem of CPM over an evolving dataset is an open challenge. In this work we focus on extracting the most specific set of contrast patterns (CPs) to discover significant changes between two data streams. We devise a novel algorithm to extract CPs using previously mined patterns instead of generating all patterns in each window from scratch. Our experimental results on a wide variety of datasets demonstrate the advantages of our approach over the state of the art in terms of efficiency.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115089271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Study of Explainability Features to Scrutinize Faceted Filtering Results 面过滤结果的可解释性特征研究
Jiaming Qu, Jaime Arguello, Yue Wang
{"title":"A Study of Explainability Features to Scrutinize Faceted Filtering Results","authors":"Jiaming Qu, Jaime Arguello, Yue Wang","doi":"10.1145/3459637.3482409","DOIUrl":"https://doi.org/10.1145/3459637.3482409","url":null,"abstract":"Faceted search systems enable users to filter results by selecting values along different dimensions or facets. Traditionally, facets have corresponded to properties of information items that are part of the document metadata. Recently, faceted search systems have begun to use machine learning to automatically associate documents with facet-values that are more subjective and abstract. Examples include search systems that support topic-based filtering of research articles, concept-based filtering of medical documents, and tag-based filtering of images. While machine learning can be used to infer facet-values when the collection is too large for manual annotation, machine-learned classifiers make mistakes. In such cases, it is desirable to have a scrutable system that explains why a filtered result is relevant to a facet-value. Such explanations are missing from current systems. In this paper, we investigate how explainability features can help users interpret results filtered using machine-learned facets. We consider two explainability features: (1) showing prediction confidence values and (2) highlighting rationale sentences that played an influential role in predicting a facet-value. We report on a crowdsourced study involving 200 participants. Participants were asked to scrutinize movie plot summaries predicted to satisfy multiple genres and indicate their agreement or disagreement with the system. Participants were exposed to four interface conditions. We found that both explainability features had a positive impact on participants' perceptions and performance. While both features helped, the sentence-highlighting feature played a more instrumental role in enabling participants to reject false positive cases. We discuss implications for designing tools to help users scrutinize automatically assigned facet-values.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"603 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116451393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
VidLife: A Dataset for Life Event Extraction from Videos VidLife:从视频中提取生活事件的数据集
Tai-Te Chu, An-Zi Yen, Wei-Hong Ang, Hen-Hsen Huang, Hsin-Hsi Chen
{"title":"VidLife: A Dataset for Life Event Extraction from Videos","authors":"Tai-Te Chu, An-Zi Yen, Wei-Hong Ang, Hen-Hsen Huang, Hsin-Hsi Chen","doi":"10.1145/3459637.3482022","DOIUrl":"https://doi.org/10.1145/3459637.3482022","url":null,"abstract":"Filming video blogs, which is shortened to vlog, becomes a popular way for people to record their life experiences in recent years. In this work, we present a novel task that is aimed at extracting life events from videos and constructing personal knowledge bases of individuals. In contrast to most existing researches in the field of computer vision that focus on identifying low-level script-like activities such as moving boxes, our goal is to extract life events where high-level activities like moving into a new house are recorded. The challenges to be tackled include: (1) identifying which objects in a given scene related to the life events of the protagonist we concern, and (2) determining the association between an extracted visual concept and a more high-level description of a video clip. To address the research issues, we construct a video life event extraction dataset VidLife by exploiting videos from the TV series The Big Bang Theory, in which the plot is around the daily lives of several characters. A pilot multitask learning model is proposed to extract life events given video clips and subtitles for storing in the personal knowledge base.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122323925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Seq2Bubbles
Qitian Wu, Chenxiao Yang, Shuodian Yu, Xiaofeng Gao, Guihai Chen
{"title":"Seq2Bubbles","authors":"Qitian Wu, Chenxiao Yang, Shuodian Yu, Xiaofeng Gao, Guihai Chen","doi":"10.1145/3459637.3482296","DOIUrl":"https://doi.org/10.1145/3459637.3482296","url":null,"abstract":"User behavior sequences contain rich information about user interests and are exploited to predict user's future clicking in sequential recommendation. Existing approaches, especially recently proposed deep learning models, often embed a sequence of clicked items into a single vector, i.e., a point in vector space, which suffer from limited expressiveness for complex distributions of user interests with multi-modality and heterogeneous concentration. In this paper, we propose a new representation model, named as Seq2Bubbles, for sequential user behaviors via embedding an input sequence into a set of bubbles each of which is represented by a center vector and a radius vector in embedding space. The bubble embedding can effectively identify and accommodate multi-modal user interests and diverse concentration levels. Furthermore, we design an efficient scheme to compute distance between a target item and the bubble embedding of a user sequence to achieve next-item recommendation. We also develop a self-supervised contrastive loss based on our bubble embeddings as an effective regularization approach. Extensive experiments on four benchmark datasets demonstrate that our bubble embedding can consistently outperform state-of-the-art sequential recommendation models.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122900019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Adversarial Kernel Sampling on Class-imbalanced Data Streams 类不平衡数据流的对抗性核采样
Peng Yang, Ping Li
{"title":"Adversarial Kernel Sampling on Class-imbalanced Data Streams","authors":"Peng Yang, Ping Li","doi":"10.1145/3459637.3482227","DOIUrl":"https://doi.org/10.1145/3459637.3482227","url":null,"abstract":"This paper investigates online active learning in the setting of class-imbalanced data streams, where labels are allowed to be queried of with limited budgets. In this setup, conventional learning would be biased towards majority classes and consequently harm the performance. To address this issue, imbalance learning technique adopts both asymmetric losses and asymmetric queries to tackle the imbalance. Although this approach is effective, it may not guarantee the performance in an adversarial setting where the actual labels are unknown, and they may be chosen by the adversary To learn a promising hypothesis in class-imbalanced and adversarial environment, we propose an asymmetric min-max optimization framework for online classification. The derived algorithm can track the imbalance and bound the choices of an adversary simultaneously. Despite the promising result, this algorithm assumes that the label is provided for every input, while label is scare and labeling is expensive in real-world application. To this end, we design a confidence-based sampling strategy to query the informative labels within a budget. We theoretically analyze this algorithm in terms of mistake bound, and two asymmetric measures. Empirically, we evaluate the algorithms on multiple real-world imbalanced tasks. Promising results could be achieved on various application domains.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122814549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning to Augment Imbalanced Data for Re-ranking Models 学习为重新排序模型增加不平衡数据
Zimeng Qiu, Yingchun Jian, Qingguo Chen, Lijun Zhang
{"title":"Learning to Augment Imbalanced Data for Re-ranking Models","authors":"Zimeng Qiu, Yingchun Jian, Qingguo Chen, Lijun Zhang","doi":"10.1145/3459637.3482364","DOIUrl":"https://doi.org/10.1145/3459637.3482364","url":null,"abstract":"The conventional solution to learning to rank problems ranks individual documents by prediction scores greedily. Recent emerged re-ranking models, which take as input initial lists, aim to capture document interdependencies and directly generate the optimal ordered lists. Typically, a re-ranking model is learned from a set of labeled data, which can achieve favorable performance on average. However, it can be suboptimal for individual queries because the available training data is usually highly imbalanced. This problem is challenging due to the absence of informative data for some queries and furthermore, the lack of a good data augmentation policy. In this paper, we propose a novel method named Learning to Augment (LTA), which mitigates the imbalance issue through learning to augment the initial lists for re-ranking models. Specifically, we first design a data generation model based on Gaussian Mixture Variational Autoencoder (GMVAE) for generating informative data. GMVAE imposes a mixture of Gaussians on the latent space, which allows it to cluster queries in an unsupervised manner and then generate new data with different query types using the learned components. Then, to obtain a good augmentation strategy (instead of heuristics), we design a teacher model that consists of two intelligent agents to determine how to generate new data for a given list and how to rank both the raw data and generated data to produce augmented lists, respectively. The teacher model leverages the feedback from the re-ranking model to optimize its augmentation policy by means of reinforcement learning. Our method offers a general learning paradigm that is applicable to both supervised and reinforced re-ranking models. Experimental results on benchmark learning to rank datasets show that our proposed method can significantly improve the performance of re-ranking models.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114251621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FairER
Vasilis Efthymiou, K. Stefanidis, E. Pitoura, V. Christophides
{"title":"FairER","authors":"Vasilis Efthymiou, K. Stefanidis, E. Pitoura, V. Christophides","doi":"10.1145/3459637.3482105","DOIUrl":"https://doi.org/10.1145/3459637.3482105","url":null,"abstract":"There is an urgent call to detect and prevent \"biased data\" at the earliest possible stage of the data pipelines used to build automated decision-making systems. In this paper, we are focusing on controlling the data bias in entity resolution (ER) tasks aiming to discover and unify records/descriptions from different data sources that refer to the same real-world entity. We formally define the ER problem with fairness constraints ensuring that all groups of entities have similar chances to be resolved. Then, we introduce FairER, a greedy algorithm for solving this problem for fairness criteria based on equal matching decisions. Our experiments show that FairER achieves similar or higher accuracy against two baseline methods over 7 datasets, while guaranteeing minimal bias.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"35 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114276155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Will Sorafenib Help?: Treatment-aware Reranking in Precision Medicine Search 索拉非尼有帮助吗?:精准医学搜索中的治疗意识重排序
Maciej Rybiński, Sarvnaz Karimi
{"title":"Will Sorafenib Help?: Treatment-aware Reranking in Precision Medicine Search","authors":"Maciej Rybiński, Sarvnaz Karimi","doi":"10.1145/3459637.3482220","DOIUrl":"https://doi.org/10.1145/3459637.3482220","url":null,"abstract":"High-quality evidence from the biomedical literature is crucial for decision making of oncologists who treat cancer patients. Search for evidence on a specific treatment for a patient is the challenge set by the precision medicine track of TREC in 2020. To address this challenge, we propose a two-step method to incorporate treatment into the query formulation and ranking. Training of such ranking function uses a zero-shot setup to incorporate the novel focus on treatments which did not exist in any of the previous TREC tracks. Our treatment-aware neural reranking approach, FAT, achieves state-of-the-art effectiveness for TREC Precision Medicine 2020. Our analysis indicates that the BERT-based rerankers automatically learn to score documents through identifying concepts relevant to precision medicine, similar to hand-crafted heuristics successful in the earlier studies.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116738306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Misbeliefs and Biases in Health-Related Searches 健康相关搜索中的错误观念和偏见
Alexander Bondarenko, Ekaterina Shirshakova, M. Driker, Matthias Hagen, Pavel Braslavski
{"title":"Misbeliefs and Biases in Health-Related Searches","authors":"Alexander Bondarenko, Ekaterina Shirshakova, M. Driker, Matthias Hagen, Pavel Braslavski","doi":"10.1145/3459637.3482141","DOIUrl":"https://doi.org/10.1145/3459637.3482141","url":null,"abstract":"Quality of search engine results returned to health-related questions is very critical, since a searcher may directly trust any suggestion in the top results. We analyze search questions that mention diseases / symptoms and remedies that are potential health-related misbeliefs. Using lists of medical and alternative medicine terms, we extract health-related search questions from 1.5~billion questions submitted to Yandex. As an initial study, we sample 30 frequent questions that contain a disease--remedy pair like \"Can hepatitis be cured with milk thistle?\". For each question, we carefully identify a ground truth answer in the medical literature and annotate the top-10 Yandex search result snippets as confirming the belief, rejecting it, or giving no answer. Our analysis shows that about 44%~of the snippets (that users may simply interpret as definitive answers!) confirm some untrue beliefs and are wrong, and only few include health risk warnings about using toxic plants.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129564256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信