Companion Proceedings of the Web Conference 2021最新文献

筛选
英文 中文
Characterizing Opinion Dynamics and Group Decision Making in Wikipedia Content Discussions 维基百科内容讨论中意见动态和群体决策的特征
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452354
Khandaker Tasnim Huq
{"title":"Characterizing Opinion Dynamics and Group Decision Making in Wikipedia Content Discussions","authors":"Khandaker Tasnim Huq","doi":"10.1145/3442442.3452354","DOIUrl":"https://doi.org/10.1145/3442442.3452354","url":null,"abstract":"Wikipedia, the online encyclopedia, is a trusted source of knowledge for millions of individuals worldwide. As everyone can start a new article, it is often necessary to decide whether certain entries meet the standards for inclusion set forth by the community. These decisions (which are known as “Article for Deletion”, or AfD) are taken by groups of editors in a deliberative fashion, and are known for displaying a number of common biases associated to group decision making. Here, we present an analysis of 1,967,768 AfD discussions between 2005 and 2018. We perform a signed network analysis to capture the dynamics of agreement and disagreement among editors. We measure the preference of each editor for voting toward either inclusion or deletion. We further describe the evolution of individual editors and their voting preferences over time, finding four major opinion groups. Finally, we develop a predictive model of discussion outcomes based on latent factors. Our results shed light on an important, yet overlooked, aspect of curation dynamics in peer production communities, and could inform the design of improved processes of collective deliberation on the web.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134517374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallelizing DNN Training on GPUs: Challenges and Opportunities gpu上并行DNN训练:挑战与机遇
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452055
Weizheng Xu, Youtao Zhang, Xulong Tang
{"title":"Parallelizing DNN Training on GPUs: Challenges and Opportunities","authors":"Weizheng Xu, Youtao Zhang, Xulong Tang","doi":"10.1145/3442442.3452055","DOIUrl":"https://doi.org/10.1145/3442442.3452055","url":null,"abstract":"In recent years, Deep Neural Networks (DNNs) have emerged as a widely adopted approach in many application domains. Training DNN models is also becoming a significant fraction of the datacenter workload. Recent evidence has demonstrated that modern DNNs are becoming more complex and the size of DNN parameters (i.e., weights) is also increasing. In addition, a large amount of input data is required to train the DNN models to reach target accuracy. As a result, the training performance becomes one of the major challenges that limit DNN adoption in real-world applications. Recent works have explored different parallelism strategies (i.e., data parallelism and model parallelism) and used multi-GPUs in datacenters to accelerate the training process. However, naively adopting data parallelism and model parallelism across multiple GPUs can lead to sub-optimal executions. The major reasons are i) the large amount of data movement that prevents the system from feeding the GPUs with the required data in a timely manner (for data parallelism); and ii) low GPU utilization caused by data dependency between layers that placed on different devices (for model parallelism). In this paper, we identify the main challenges in adopting data parallelism and model parallelism on multi-GPU platforms. Then, we conduct a survey including recent research works targeting these challenges. We also provide an overview of our work-in-progress project on optimizing DNN training on GPUs. Our results demonstrate that simple-yet-effective system optimizations can further improve the training scalability compared to prior works.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133663077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
GOAT at the FinSim-2 task: Learning Word Representations of Financial Data with Customized Corpus 在FinSim-2任务中的山羊:使用自定义语料库学习金融数据的单词表示
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451385
Yulong Pei, Qian Zhang
{"title":"GOAT at the FinSim-2 task: Learning Word Representations of Financial Data with Customized Corpus","authors":"Yulong Pei, Qian Zhang","doi":"10.1145/3442442.3451385","DOIUrl":"https://doi.org/10.1145/3442442.3451385","url":null,"abstract":"In this paper, we present our approaches for the FinSim 2021 Shared Task on Learning Semantic Similarities for the Financial Domain. The aim of the FinSim shared task is to automatically classify a given list of terms from the financial domain into the most relevant hypernym (or top-level) concept in an external ontology. Two different word representations have been compared in our study, i.e., customized word2vec provided by the shared task and FinBERT. We first create a customized corpus from the given prospectuses and relevant articles from Investopedia. Then we train the domain-specific word2vec embeddings using the customized data with customized word2vec and FinBERT as the initialized embeddings respectively. Our experimental results demonstrate that these customized word embeddings can effectively improve the classification performance and achieve better results than the direct utilization of the provided word embeddings. The class imbalance issue of the given data is also explored. We empirically study the classification performance by employing several different strategies for imbalanced classification problems. Our system ranks 2nd on both Average Accuracy and Mean Rank metrics.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131155742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Do I Trust this Stranger? Generalized Trust and the Governance of Online Communities 我该相信这个陌生人吗?广义信任与网络社区治理
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452338
Jérôme Hergueux, Y. Algan, Y. Benkler, M. F. Morell
{"title":"Do I Trust this Stranger? Generalized Trust and the Governance of Online Communities","authors":"Jérôme Hergueux, Y. Algan, Y. Benkler, M. F. Morell","doi":"10.1145/3442442.3452338","DOIUrl":"https://doi.org/10.1145/3442442.3452338","url":null,"abstract":"Online peer production communities such as Wikipedia typically rely on a distinct class of users, called administrators, to enforce cooperation when good faith collaboration fails. Assessing one’s intentions is a complex task, however, especially when operating under time-pressure with a limited number of (costly to collect) cues. In such situations, individuals typically rely on simplifying heuristics to make decisions, at the cost of precision. In this paper, we hypothesize that administrators’ community governance policy might be influenced by general trust attitudes acquired mostly out of the Wikipedia context. We use a decontextualized online experiment to elicit levels of trust in strangers in a sample of 58 English Wikipedia administrators. We show that low-trusting admins exercise their policing rights significantly more (e.g., block about 81% more users than high trusting types on average). We conclude that efficiency gains might be reaped from the further development of tools aimed at inferring users’ intentions from digital trace data.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"16 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114007487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce EMR:使用进化MapReduce的大人力资源数据的可扩展聚类
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3453543
M. Bohlouli, Zhonghua He
{"title":"EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce","authors":"M. Bohlouli, Zhonghua He","doi":"10.1145/3442442.3453543","DOIUrl":"https://doi.org/10.1145/3442442.3453543","url":null,"abstract":"Nowadays, the volume and variety of generated data, how to process it and accordingly create value through scalable analytics are main challenges to industries and real-world practices such as talent analytics. For instance, large enterprises and job centres have to progress data intensive matching of job seekers to various job positions at the same time. In other words, it should result in the large scale assignment of best-fit (right) talents (Person) with right expertise (Profession) to the right job (Position) at the right time (Period). We call this definition as a 4P rule in this paper. All enterprises should consider 4P rule in their daily recruitment processes towards efficient workforce development strategies. Such consideration demands integrating large volumes of disparate data from various sources and strongly needs the use of scalable algorithms and analytics. The diversity of the data in human resource management requires speeding up analytical processes. The main challenge here is not only how and where to store the data, but also the analysing it towards creating value (knowledge discovery). In this paper, we propose a generic Career Knowledge Representation (CKR) model in order to be able to model most competences that exist in a wide variety of careers. A regenerated job qualification data of 15 million employees with 84 dimensions (competences) from real HRM data has been used in test and evaluation of proposed Evolutionary MapReduce K-Means method in this research. This proposed EMR method shows faster and more accurate experimental results in comparison to similar approaches and has been tested with real large scale datasets and achieved results are already discussed.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114956850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the quality of health-related Wikipedia articles with generic and specific metrics 使用通用和特定指标评估与健康相关的维基百科文章的质量
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452355
Luis Couto, C. Lopes
{"title":"Assessing the quality of health-related Wikipedia articles with generic and specific metrics","authors":"Luis Couto, C. Lopes","doi":"10.1145/3442442.3452355","DOIUrl":"https://doi.org/10.1145/3442442.3452355","url":null,"abstract":"Wikipedia is an online, free, multi-language, and collaborative encyclopedia, currently one of the most significant information sources on the web. The open nature of Wikipedia contributions raises concerns about the quality of its information. Previous studies have addressed this issue using manual evaluations and proposing generic measures for quality assessment. In this work, we focus on the quality of health-related content. For this purpose, we use general and health-specific features from Wikipedia articles to propose health-specific metrics. We evaluate these metrics using a set of Wikipedia articles previously assessed by WikiProject Medicine. We conclude that it is possible to combine generic and specific metrics to determine health-related content’s information quality. These metrics are computed automatically and can be used by curators to identify quality issues. Along with the explored features, these metrics can also be used in approaches that automatically classify the quality of Wikipedia health-related articles.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129762500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DL Inference and Training Optimization Towards Speed and Scale 面向速度和规模的深度学习推理和训练优化
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452297
Minjia Zhang
{"title":"DL Inference and Training Optimization Towards Speed and Scale","authors":"Minjia Zhang","doi":"10.1145/3442442.3452297","DOIUrl":"https://doi.org/10.1145/3442442.3452297","url":null,"abstract":"The application of deep learning models presents significant improvement to many services and products in Microsoft. However, it is challenging to provide efficient computation and memory capabilities for both DNN workload inference and training given that the model size and complexities keep increasing. From the serving aspect, many DL models suffer from long inference latency and high cost, preventing their deployment in production. On the training side, large-scale model training often requires complex refactoring of models and access to prohibitively expensive GPU clusters, which are not always accessible to many practitioners. We want to deliver solid solutions and systems while exploring the cutting-edge techniques to address these issues. In this talk, I will introduce our experience and lessons from designing and implementing optimizations for both DNN serving and training at large scale with remarkable compute and memory efficiency improvement and infrastructure cost reduction.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123333055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAIR Linked Data - Towards a Linked Data Backbone for Users and Machines 公平关联数据——面向用户和机器的关联数据骨干
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451364
Johannes Frey, Sebastian Hellmann
{"title":"FAIR Linked Data - Towards a Linked Data Backbone for Users and Machines","authors":"Johannes Frey, Sebastian Hellmann","doi":"10.1145/3442442.3451364","DOIUrl":"https://doi.org/10.1145/3442442.3451364","url":null,"abstract":"Although many FAIR principles could be fulfilled by 5-star Linked Open Data, the successful realization of FAIR poses a multitude of challenges. FAIR publishing and retrieval of Linked Data is still rather a FAIRytale than reality, for users and machines. In this paper, we give an overview on four major approaches that tackle individual challenges of FAIR data and present our vision of a FAIR Linked Data backbone. We propose 1) DBpedia Databus - a flexible, heavily automatable dataset management and publishing platform based on DataID metadata; that is extended by 2) the novel Databus Mods architecture which allows for flexible, unified, community-specific metadata extensions and (search/annotation) overlay systems; 3) DBpedia Archivo an archiving solution for unified handling and improvement of FAIRness for ontologies on publisher and consumer side; as well as 4) the DBpedia Global ID management and lookup services to cluster and discover equivalent entities and properties","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117019872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
taka at the FinSBD-3 task: Tables and Figures Extraction using Object Detection Techniques FinSBD-3任务:使用目标检测技术提取表格和图形
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451379
Tien-Dung Le
{"title":"taka at the FinSBD-3 task: Tables and Figures Extraction using Object Detection Techniques","authors":"Tien-Dung Le","doi":"10.1145/3442442.3451379","DOIUrl":"https://doi.org/10.1145/3442442.3451379","url":null,"abstract":"FinSBD-3 is a shared task organized in the context of the 1st workshop on Financial Technology on the Web. The task focuses on extracting the entire structure of noisy PDF financial documents that include 1) sentences, lists, items, and organization of lists and items; 2) figures and tables; 3) headers and footers. This paper describes the approach that allows us to extract the figures and tables using their visual cues. We applied the object segmentation techniques in image processing to detect the location of figures and tables in the PDF files. A post-processing method is then executed in order to find exact content. The result shows the potential of this approach.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"9 24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124207742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Open-domain Vision and Language Understanding with Wikimedia 通过维基媒体实现开放领域视觉和语言理解
Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452346
David Semedo
{"title":"Towards Open-domain Vision and Language Understanding with Wikimedia","authors":"David Semedo","doi":"10.1145/3442442.3452346","DOIUrl":"https://doi.org/10.1145/3442442.3452346","url":null,"abstract":"Current state-of-the-art task-agnostic visio-linguistic approaches, such as ViLBERT [2], are limited to domains in which texts have a visual materialization (e.g. a person running). This work describes a project towards achieving the next generation of models, that can deal with open-domain media, and learn visio-linguistic representations that reflect data’s context, by jointly reasoning over media, a domain knowledge-graph and temporal context. This ambition will be leveraged by a Wikimedia data framework, comprised by comprehensive and high-quality data, covering a wide range of social, cultural, political and other type of events. Towards this goal, we propose a research setup comprised by an open-domain data framework and a set of novel independent research tasks.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132606217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信