Proceedings of the International Workshop on Semantic Big Data最新文献

筛选
英文 中文
What is the schema of your knowledge graph?: leveraging knowledge graph embeddings and clustering for expressive taxonomy learning 你的知识图谱的图式是什么?:利用知识图嵌入和聚类进行表达性分类学习
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393637
A. Zouaq, Félix Martel
{"title":"What is the schema of your knowledge graph?: leveraging knowledge graph embeddings and clustering for expressive taxonomy learning","authors":"A. Zouaq, Félix Martel","doi":"10.1145/3391274.3393637","DOIUrl":"https://doi.org/10.1145/3391274.3393637","url":null,"abstract":"Large-scale knowledge graphs have become prevalent on the Web and have demonstrated their usefulness for several tasks. One challenge associated to knowledge graphs is the necessity to keep a knowledge graph schema (which is generally manually defined) that accurately reflects the knowledge graph content. In this paper, we present an approach that extracts an expressive taxonomy based on knowledge graph embeddings, linked data statistics and clustering. Our results show that the learned taxonomy is not only able to retain original classes but also identifies new classes, thus giving an up-to-date view of the knowledge graph.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123660850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Relaxing global-as-view in mediated data integration from linked data 在关联数据的中介数据集成中放松全局即视图
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393635
A. Adamou, M. d’Aquin
{"title":"Relaxing global-as-view in mediated data integration from linked data","authors":"A. Adamou, M. d’Aquin","doi":"10.1145/3391274.3393635","DOIUrl":"https://doi.org/10.1145/3391274.3393635","url":null,"abstract":"In scenarios where many different, independent and dynamic data sources need to be brought together, mediated data integration at runtime is rapidly gaining interest. In a global-as-view approach, schema mappings express how to get data from each data source according to the global schema of the mediator. Key issues include the effort required to include and map new data sources, and the very need of data sources for the global schema to be expressed. It has been argued that the principles of Linked Data can be used to spread the cost of adding new sources in a pay-as-you-go model. We contribute by describing a data integration framework able to mitigate these issues, by relating data sources under a global schema which is implicit and only partly known at the time a new data source joins. Mappings over a data source only require partial knowledge of it and of the part of the global schema that it will affect. Pay-as-you go can then be employed to guarantee eventual schema compliance. This approach was adopted in a large-scale data integration system for Smart Cities, where it allowed short time-to-publish for new data and iterative schema refinements.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134313783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets 了解Spark-SQL处理大量分布式RDF数据集的性能
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393632
Mohamed Ragab, Riccardo Tommasini, Sadiq Eyvazov, S. Sakr
{"title":"Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets","authors":"Mohamed Ragab, Riccardo Tommasini, Sadiq Eyvazov, S. Sakr","doi":"10.1145/3391274.3393632","DOIUrl":"https://doi.org/10.1145/3391274.3393632","url":null,"abstract":"Recently, a wide range of Web applications (e.g. DBPedia, Uniprot, and Probase) are built on top of vast RDF knowledge bases and using the SPARQL query language. The continuous growth of these knowledge bases led to the investigation of new paradigms and technologies for storing, accessing, and querying RDF data. In practice, modern big data systems (e.g, Hadoop, Spark) can handle vast relational repositories, however, their application in the Semantic Web context is still limited. One possible reason is that such frameworks rely on distributed systems, which are good for relational data, however, their performance on dealing with graph data models like RDF has not been well-studied yet. In this paper, we present a systematic evaluation of the performance of SparkSQL engine for processing SPARQL queries. We stated it using three relevant RDF relational schemas, and two different storage backends, namely, Hive, and HDFS. In addition, we show the impact of using three different RDF-based partitioning techniques with our relational scenario. Additionally, we discuss the results of our experiments: (i) we present insights about the trade-offs that characterize different experimental configurations, and (ii) we identify the best and the worst ones for the SP2Bench's benchmark scenario.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123491857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Ten ways of leveraging ontologies for natural language processing and its enterprise applications 利用本体进行自然语言处理及其企业应用的十种方法
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393639
T. Erekhinskaya, D. Strebkov, Sujal Patel, Mithun Balakrishna, M. Tatu, D. Moldovan
{"title":"Ten ways of leveraging ontologies for natural language processing and its enterprise applications","authors":"T. Erekhinskaya, D. Strebkov, Sujal Patel, Mithun Balakrishna, M. Tatu, D. Moldovan","doi":"10.1145/3391274.3393639","DOIUrl":"https://doi.org/10.1145/3391274.3393639","url":null,"abstract":"In the last years, Artificial Intelligence and Deep Learning have matured from a facinating research area to real-word applications across multiple domains. Enterprises adopt data-driven approaches for various use cases. With the increased adoption, such issues as governance of the models, deployment, scalability, reusablity and maintenance are widely addressed on the engineering side, but not so much on the knowledge side. In this paper, we demonstrate 10 ways of leveraging ontology for Natural Language Processing. Specifically, we explore the usage of ontologies and related standards for labeling schema, configuration, providing lexical data, powering rule engine and automated generation of rules, as well as providing a standard output format. Additionally, we discuss three NLP-based applications: semantic search, question answering and natural language querying and show how they can benefit from ontology usage. The paper summarizes our experience of using ontology in a number of projects for medical, enterprise, financial, legal and security domains.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134251831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Triag, a framework based on triangles of RDF triples Triag,一个基于RDF三元组三角形的框架
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-06-14 DOI: 10.1145/3391274.3393634
Hubert Naacke, Olivier Curé
{"title":"Triag, a framework based on triangles of RDF triples","authors":"Hubert Naacke, Olivier Curé","doi":"10.1145/3391274.3393634","DOIUrl":"https://doi.org/10.1145/3391274.3393634","url":null,"abstract":"The success of RDF-based enterprise Knowledge Graphs partly depends on the efficiency to serve SPARQL queries over large datasets. This usually requires the optimization of a large number of joins between a query's triple patterns. A common solution to this problem is to index triples in several orders and to provide adapted query processing optimizations. In this paper, we extend this approach by proposing a framework that tackles a frequently encountered basic graph pattern: triangles. We present appropriate data structures to store these triangles, provide distributed algorithms to discover and materialize them (including inferred triangles), and detail query optimization techniques. Experimental results conducted over an Apache Spark implementation on two real-world RDF datasets emphasize the performance boost obtained with our approach.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121702062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automated ontology-based annotation of scientific literature using deep learning 使用深度学习的基于本体的科学文献自动注释
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393636
Prashanti Manda, S. SayedAhmed, S. Mohanty
{"title":"Automated ontology-based annotation of scientific literature using deep learning","authors":"Prashanti Manda, S. SayedAhmed, S. Mohanty","doi":"10.1145/3391274.3393636","DOIUrl":"https://doi.org/10.1145/3391274.3393636","url":null,"abstract":"Representing scientific knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scientific literature with ontology concepts are necessary to keep up with the rapid pace of scientific publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with different input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. The Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116621820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
QSGG
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393638
S. Böttcher, Rita Hartel, S. Peeters
{"title":"QSGG","authors":"S. Böttcher, Rita Hartel, S. Peeters","doi":"10.1145/3391274.3393638","DOIUrl":"https://doi.org/10.1145/3391274.3393638","url":null,"abstract":"Like [1], we present QSGG, an algorithm to compute the simulation of a query pattern in a graph of labeled nodes and unlabeled edges. However, our algorithm, QSGG, works on a compressed graph grammar, instead of on the original graph. The speed-up of QSGG compared to a previous algorithm [1] grows with the size of the graph and with the compression strength of the grammar.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126218929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SustainOnt: an ontology for defining an index of neighborhood sustainability across domains SustainOnt:用于定义跨域邻居可持续性指数的本体
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-05-25 DOI: 10.1145/3391274.3393640
Vatricia Edgar, Cecilia La Place, Julia Schmidt, A. Bansal, S. Bansal
{"title":"SustainOnt: an ontology for defining an index of neighborhood sustainability across domains","authors":"Vatricia Edgar, Cecilia La Place, Julia Schmidt, A. Bansal, S. Bansal","doi":"10.1145/3391274.3393640","DOIUrl":"https://doi.org/10.1145/3391274.3393640","url":null,"abstract":"Massive amounts of data, both structured and unstructured, are available to be harvested for competitive business advantage, sound government policies, and new insights in a broad array of applications. This paper specifically focuses on extraction, integration, and querying of open data available about environmental sustainability. The global trend toward urbanization has created a need for residents of urban neighborhoods to better understand the factors impacting the social, environmental, and economic sustainability of an area. To date, there is no concise representation of all aspects of sustainability. This paper aims to fill this gap. A version of sustainability resting on economic, societal, and environmental development as the three main indicators was chosen to inform an ontology called SustainOnt used to organize and analyze relevant data from various sources. The newly-linked data is made available through a dual-platform application aimed at reaching a wide array of audiences. An initial prototype has been designed, using data for a small region, to provide a sustainability index of each city and/or neighborhood area that can be more accessible to people without the means to directly analyze the available data.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121477117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data placement strategies that speed-up distributed graph query processing 加速分布式图查询处理的数据放置策略
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2020-04-05 DOI: 10.1145/3391274.3393633
Daniel Janke, Steffen Staab, Martin Leinberger
{"title":"Data placement strategies that speed-up distributed graph query processing","authors":"Daniel Janke, Steffen Staab, Martin Leinberger","doi":"10.1145/3391274.3393633","DOIUrl":"https://doi.org/10.1145/3391274.3393633","url":null,"abstract":"We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133749048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Proceedings of the International Workshop on Semantic Big Data 语义大数据国际研讨会论文集
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2018-06-10 DOI: 10.1145/3208352
{"title":"Proceedings of the International Workshop on Semantic Big Data","authors":"","doi":"10.1145/3208352","DOIUrl":"https://doi.org/10.1145/3208352","url":null,"abstract":"","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124479659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信