J. Jin, Xinrong Hu, Kai He, Tao Peng, Junping Liu, Jie Yang
{"title":"Progressive Semantic Reasoning for Image Inpainting","authors":"J. Jin, Xinrong Hu, Kai He, Tao Peng, Junping Liu, Jie Yang","doi":"10.1145/3442442.3451142","DOIUrl":"https://doi.org/10.1145/3442442.3451142","url":null,"abstract":"Image inpainting aims to reconstruct the missing or unknown region for a given image. As one of the most important topics from image processing, this task has attracted increasing research interest over the past few decades. Learning-based methods have been employed to solve this task, and achieved superior performance. Nevertheless, existing methods often produce artificial traces, due to the lack of constraints on image characterization under different semantics. To accommodate this issue, we propose a novel artistic Progressive Semantic Reasoning (PSR) network in this paper, which is composed of three shared parameters from the generation network superposition. More precisely, the proposed PSR algorithm follows a typical end-to-end training procedure, that learns low-level semantic features and further transfers them to a high-level semantic network for inpainting purposes. Furthermore, a simple but effective Cross Feature Reconstruction (CFR) strategy is proposed to tradeoff semantic information from different levels. Empirically, the proposed approach is evaluated via intensive experiments using a variety of real-world datasets. The results confirm the effectiveness of our algorithm compared with other state-of-the-art methods. The source code can be found from https://github.com/sfwyly/PSR-Net.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130853493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to See Smells: Extracting Olfactory References from Artworks","authors":"Mathias Zinnen","doi":"10.1145/3442442.3453710","DOIUrl":"https://doi.org/10.1145/3442442.3453710","url":null,"abstract":"1 PROBLEM Although being an essential part of how we experience the world, smell is severely undervalued in the context of cultural heritage. The Odeuropa project aims at preserving and recreating the olfactory heritage of Europe. State-of-the-art methods of artificial intelligence are applied to large corpora of visual and textual data ranging from the 16th to 20th century of European history to extract olfactory references. Creating an ontology of smells, this information is stored in the “European Olfactory Knowledge Graph (EOKG)” following standards of the semantic web. My Ph.D. addresses the visual extraction part of the project. We will create a taxonomy of visual smell references and acquire a large corpus of artworks from various early modern European digital collections. Using computer vision techniques, we will implement a pipeline for the combined recognition of olfactory objects, poses, and iconographies and annotate the images from our image corpus accordingly. Following these steps, we will address the following research questions: (i)What visual representations of smell exist in European 16th to 20th century works of art and how can these be represented in the EOKG as an ontology shared with the other work packages of the Odeuropa project? (ii)Whichmachine-learning techniques exist for the automated extraction of olfactory references in the visual arts? Particularly, which techniques are suited to cope with the domain shift problem when applying computer vision techniques to our field of research? (iii) How do the identified techniques perform in terms of established evaluation metrics? Which ones work best for the extraction of olfactory references? Both the preservation of olfactory heritage [3] and the application of machine learning (ML) to cultural heritage [1] have been addressed before. However, in most cases machine learning algorithms are treated as “black boxes” and their application does not contribute back to ML [4]. Computer vision techniques like object detection and pose estimation have successfully been applied to the domain of visual arts ([8], [2]) but have not achieved performance comparable to their application in the photographic domain. One reason for the success of computer vision on photographs is the availability of huge labeled datasets like ImageNet [10]. Datasets containing artworks","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"459 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132941490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EUDETECTOR: Leveraging Language Model to Identify EU-Related News","authors":"Koustav Rudra, Danny Tran, M. Shaltev","doi":"10.1145/3442442.3452324","DOIUrl":"https://doi.org/10.1145/3442442.3452324","url":null,"abstract":"News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131859041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PolyU-CBS at the FinSim-2 Task: Combining Distributional, String-Based and Transformers-Based Features for Hypernymy Detection in the Financial Domain","authors":"Emmanuele Chersoni, Chu-Ren Huang","doi":"10.1145/3442442.3451387","DOIUrl":"https://doi.org/10.1145/3442442.3451387","url":null,"abstract":"In this contribution, we describe the systems presented by the PolyU CBS Team at the second Shared Task on Learning Semantic Similarities for the Financial Domain (FinSim-2), where participating teams had to identify the right hypernyms for a list of target terms from the financial domain. For this task, we ran our classification experiments with several distributional, string-based, and Transformer features. Our results show that a simple logistic regression classifier, when trained on a combination of word embeddings, semantic and string similarity metrics and BERT-derived probabilities, achieves a strong performance (above 90%) in financial hypernymy detection.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"97 7-8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133722488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FinSBD-2021: The 3rd Shared Task on Structure Boundary Detection in Unstructured Text in the Financial Domain","authors":"Willy Au, Abderrahim Ait-Azzi, Juyeon Kang","doi":"10.1145/3442442.3451378","DOIUrl":"https://doi.org/10.1145/3442442.3451378","url":null,"abstract":"Document processing is a foundational pre-processing task in natural language application applied in the financial domain. In this paper, we present the result of FinSBD-3, the 3rd shared task on Structure Boundary Detection in unstructured text in the financial domain. The shared task is organized as part of the 1st Workshop on Financial Technology on the Web. Participants were asked to create system detecting the boundaries of elements in unstructured text extracted from financial PDF. This edition extends the previous shared tasks by adding boundaries of visual elements such as tables, figures, page headers and page footers; on top of sentences, lists and list items which were already present in previous edition of the shared tasks.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114604947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Text Data Over Time - Example on Job Postings","authors":"Jakob Jelencic","doi":"10.1145/3442442.3453707","DOIUrl":"https://doi.org/10.1145/3442442.3453707","url":null,"abstract":"Modelling multilingual text data over time is a challenging task. This PhD is focused on semantic representation of domain specific short to mid length time stamped textual data. The proposed method is evaluated on the example of job postings, where we are modeling demand on IT jobs. More specifically, we addresses the following three problems: unifying the representation of multilingual text data; clustering similar textual data; using the proposed semantic representation to model and predict future demand of jobs. This works starts with a problem statement, followed by a description of the proposed approach and methodology and is concluded with an overview of the first results and summary of the ongoing research.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114672476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehwish Alam, Russa Biswas, Yiyi Chen, D. Dessí, Genet Asefa Gesese, Fabian Hoppe, Harald Sack
{"title":"HierClasSArt: Knowledge-Aware Hierarchical Classification of Scholarly Articles","authors":"Mehwish Alam, Russa Biswas, Yiyi Chen, D. Dessí, Genet Asefa Gesese, Fabian Hoppe, Harald Sack","doi":"10.1145/3442442.3451365","DOIUrl":"https://doi.org/10.1145/3442442.3451365","url":null,"abstract":"A huge number of scholarly articles published every day in different domains makes it hard for the experts to organize and stay updated with the new research in a particular domain. This study gives an overview of a new approach, HierClasSArt, for knowledge aware hierarchical classification of the scholarly articles for mathematics into a predefined taxonomy. The method uses combination of neural networks and Knowledge Graphs for better document representation along with the meta-data information. This position paper further discusses the open problems about incorporation of new articles and evolving hierarchies in the pipeline. Mathematics domain has been used as a use-case.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inferring Sociodemographic Attributes of Wikipedia Editors: State-of-the-art and Implications for Editor Privacy","authors":"S. Brückner, F. Lemmerich, M. Strohmaier","doi":"10.1145/3442442.3452350","DOIUrl":"https://doi.org/10.1145/3442442.3452350","url":null,"abstract":"In this paper, we investigate the state-of-the-art of machine learning models to infer sociodemographic attributes of Wikipedia editors based on their public profile pages and corresponding implications for editor privacy. To build models for inferring sociodemographic attributes, ground truth labels are obtained via different strategies, using publicly disclosed information from editor profile pages. Different embedding techniques are used to derive features from editors’ profile texts. In comparative evaluations of different machine learning models, we show that the highest prediction accuracy can be obtained for the attribute gender, with precision values of 82% to 91% for women and men respectively, as well as an averaged F1-score of 0.78. For other attributes like age group, education, and religion, the utilized classifiers exhibit F1-scores in the range of 0.32 to 0.74, depending on the model class. By merely using publicly disclosed information of Wikipedia editors, we highlight issues surrounding editor privacy on Wikipedia and discuss ways to mitigate this problem. We believe our work can help start a conversation about carefully weighing the potential benefits and harms that come with the existence of information-rich, pre-labeled profile pages of Wikipedia editors.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117026841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explainable Demand Forecasting: A Data Mining Goldmine","authors":"Jože M. Rožanec","doi":"10.1145/3442442.3453708","DOIUrl":"https://doi.org/10.1145/3442442.3453708","url":null,"abstract":"Demand forecasting is a crucial component of demand management. Value is provided to the organization through accurate forecasts and insights into the reasons driving the forecasts to increase confidence and assist decision-making. In this Ph.D., we aim to develop state-of-the-art demand forecasting models for irregular demand, develop explainability mechanisms to avoid exposing models fine-grained information regarding the model features, create a recommender system to assist users on decision-making and develop mechanisms to enrich knowledge graphs with feedback provided by the users through artificial intelligence-powered feedback modules. We have already developed models for accurate forecasts regarding steady and irregular demand and architecture to provide forecast explanations that preserve sensitive information regarding model features. These explanations highlighting real-world events that provide insights on the general context captured through the dataset features while highlighting actionable items and suggesting datasets for future data enrichment.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"61 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114010392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automating Fairness Configurations for Machine Learning","authors":"Haipei Sun, Yiding Yang, Yanying Li, Huihui Liu, Xinchao Wang, Wendy Hui Wang","doi":"10.1145/3442442.3452301","DOIUrl":"https://doi.org/10.1145/3442442.3452301","url":null,"abstract":"Recent years have witnessed substantial efforts devoted to ensuring algorithmic fairness for machine learning (ML), spanning from formalizing fairness metrics to designing fairness-enhancing methods. These efforts lead to numerous possible choices in terms of fairness definitions and fairness-enhancing algorithms. However, finding the best fairness configuration (including both fairness definition and fairness-enhancing algorithms) for a specific ML task is extremely challenging in practice. The large design space of fairness configurations combined with the tremendous cost required for fairness deployment poses a major obstacle to this endeavor. This raises an important issue: can we enable automated fairness configurations for a new ML task on a potentially unseen dataset? To this point, we design Auto-Fair, a system that provides recommendations of fairness configurations by ranking all fairness configuration candidates based on their evaluations on prior ML tasks. At the core of Auto-Fair lies a meta-learning model that ranks all fairness configuration candidates by utilizing: (1) a set of meta-features that are derived from both datasets and fairness configurations that were used in prior evaluations; and (2) the knowledge accumulated from previous evaluations of fairness configurations on related ML tasks and datasets. The experimental results on 350 different fairness configurations and 1,500 data samples demonstrate the effectiveness of Auto-Fair.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124708042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}