Proceedings of the International Workshop on Semantic Big Data最新文献

筛选
英文 中文
SPARTI 巴达
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2018-06-10 DOI: 10.1145/3208352.3208356
Amgad Madkour, Walid G. Aref, Ahmed M. Aly
{"title":"SPARTI","authors":"Amgad Madkour, Walid G. Aref, Ahmed M. Aly","doi":"10.1145/3208352.3208356","DOIUrl":"https://doi.org/10.1145/3208352.3208356","url":null,"abstract":"Semantic data is an integral component for search engines that provide answers beyond simple keyword-based matches. Resource Description Framework (RDF) provides a standardized and flexible graph model for representing semantic data. The astronomical growth of RDF data raises the need for scalable RDF management strategies. Although cloud-based systems provide a rich platform for managing large-scale RDF data, the shared storage provided by these systems introduces several performance challenges, e.g., disk I/O and network shuffling overhead. This paper investigates SPARTI, a scalable RDF data management system. In SPARTI, the partitioning of the data is based on the join patterns found in the query workload. Initially, SPARTI vertically partitions the RDF data, and then incrementally updates the partitioning according to the workload, which improves the query performance of frequent join patterns. SPARTI utilizes a partitioning schema, termed SemVP, that enables the system to read a reduced set of rows instead of entire partitions. SPARTI proposes a budgeting mechanism with a cost model to determine the worthiness of partitioning. Using real and synthetic datasets, SPARTI is compared against a Spark-based state-of-the-art system and is shown to execute queries around half the time over all query shapes while maintaining around an order of magnitude enhancement in storage requirements.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114600973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extending Apache Spark with a Mediation Layer 用中介层扩展Apache Spark
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2018-06-10 DOI: 10.1145/3208352.3208354
Dimitris Stripelis, Chrysovalantis Anastasiou, J. Ambite
{"title":"Extending Apache Spark with a Mediation Layer","authors":"Dimitris Stripelis, Chrysovalantis Anastasiou, J. Ambite","doi":"10.1145/3208352.3208354","DOIUrl":"https://doi.org/10.1145/3208352.3208354","url":null,"abstract":"With the recent growth of data volumes in many disciplines of both industry and academia many new Big Data Management systems have emerged to provide scalable tools for efficient data storing, processing and analysis. However, most of these systems offer little support for efficiently integrating multiple external sources under a uniform schema and a single query access point, which greatly simplifies further analytics. In this work, we present Spark Mediator, a system that extends the logical data integration capabilities of Apache Spark. As a use case, we show the application of Spark Mediator to the integration of schizophrenia neuroimaging data and compare with previous data integration systems.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123889936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Timestamp-based Integrity Proofs for Linked Data 关联数据的基于时间戳的完整性证明
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2018-06-10 DOI: 10.1145/3208352.3208353
Andrew Sutton, Reza Samavi
{"title":"Timestamp-based Integrity Proofs for Linked Data","authors":"Andrew Sutton, Reza Samavi","doi":"10.1145/3208352.3208353","DOIUrl":"https://doi.org/10.1145/3208352.3208353","url":null,"abstract":"In this paper, we first investigate the state-of-the-art methods of generating cryptographic hashes that can be used as an integrity proof for RDF datasets. We then propose an efficient method of computing integrity proofs for Linked Data that constructs a sorted Merkle tree for growing RDF datasets based on timestamps (as a key) that are semantically extractable from the RDF dataset. We evaluate our method by comparing it to existing methods and investigating its resistance to common security threats.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131580469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Stream WatDiv: A Streaming RDF Benchmark 流WatDiv:一个流RDF基准
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2018-06-10 DOI: 10.1145/3208352.3208355
Libo Gao, Lukasz Golab, M. Tamer Özsu, Günes Aluç
{"title":"Stream WatDiv: A Streaming RDF Benchmark","authors":"Libo Gao, Lukasz Golab, M. Tamer Özsu, Günes Aluç","doi":"10.1145/3208352.3208355","DOIUrl":"https://doi.org/10.1145/3208352.3208355","url":null,"abstract":"We present Stream WatDiv -- an open-source benchmark for streaming RDF data management systems. The proposed benchmark extends the existing WatDiv benchmark, and includes a streaming data generator, a query generator that can produce a diverse set of SPARQL queries, and a testbed to monitor correctness and latency. We use Stream WatDiv to evaluate two popular streaming RDF engines: C-SPARQL and CQUELS. With the diverse set of queries that can be generated by Stream WatDiv, we demonstrate new insights into the behaviour and performance of these systems.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126383202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
TrueWeb TrueWeb
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2018-06-10 DOI: 10.1145/3208352.3208357
Amgad Madkour, Walid G. Aref, Sunil Prabhakar, Mohamed S. Ali, Siarhei Bykau
{"title":"TrueWeb","authors":"Amgad Madkour, Walid G. Aref, Sunil Prabhakar, Mohamed S. Ali, Siarhei Bykau","doi":"10.1145/3208352.3208357","DOIUrl":"https://doi.org/10.1145/3208352.3208357","url":null,"abstract":"We envision a responsible web environment, termed TrueWeb, where a user should be able to find out whether any sentence he or she encounters in the web is true or false. The user should be able to track the provenance of any sentence or paragraph in the web. The target of TrueWeb is to compose factual knowledge from Internet resources about any subject of interest and present the collected knowledge in chronological order and distribute facts spatially and temporally as well as assign some belief factor for each fact. Another important target of TrueWeb is to be able to identify whether a statement in the Internet is true or false. The aim is to create an Internet infrastructure that, for each piece of published information, will be able to identify the truthfulness (or the degree of truthfulness) of that piece of information.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117027474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using semantic web technologies to power LungMAP, a molecular data repository 使用语义web技术为LungMAP(一个分子数据存储库)提供动力
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2017-05-19 DOI: 10.1145/3066911.3066916
Michelle C. Krzyzanowski, Josh Levy, G. Page, N. Gaddis, R. Clark
{"title":"Using semantic web technologies to power LungMAP, a molecular data repository","authors":"Michelle C. Krzyzanowski, Josh Levy, G. Page, N. Gaddis, R. Clark","doi":"10.1145/3066911.3066916","DOIUrl":"https://doi.org/10.1145/3066911.3066916","url":null,"abstract":"As scientific research evolves, data continue to grow at an exponential rate. This growth calls for a need for more data repositories to store the data, and the creation of additional centralized repositories to provide standards for researchers. Common data repositories allow for collaboration and easier sharing of data, critical for further advancement of scientific understanding of a variety of topics. LungMAP (the Molecular Atlas of Lung Development) is an open-access reference resource that provides a comprehensive molecular atlas of the normal developing lung in humans and mice and provides data and reagents to the research community. The database utilizes RDF, SPARQL, and OWL. LungMAP exemplifies the use of semantic web technologies to provide a collaborative and open access data application for the scientific research community.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123309990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extracting linked data from statistic spreadsheets 从统计电子表格中提取关联数据
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2017-05-19 DOI: 10.1145/3066911.3066914
Tien-Duc Cao, I. Manolescu, Xavier Tannier
{"title":"Extracting linked data from statistic spreadsheets","authors":"Tien-Duc Cao, I. Manolescu, Xavier Tannier","doi":"10.1145/3066911.3066914","DOIUrl":"https://doi.org/10.1145/3066911.3066914","url":null,"abstract":"Statistic data is an important sub-category of open data; it is interesting for many applications, including but not limited to data journalism, as such data is typically of high quality, and reflects (under an aggregated form) important aspects of a society's life such as births, immigration, economy etc. However, such open data is often not published as Linked Open Data (LOD) limiting its usability. We provide a conceptual model for the open data comprised in statistics published by INSEE, the national French economic and societal statistics institute. Then, we describe a novel method for extracting RDF LOD, to populate an instance of this model. We used our method to produce RDF data out of 20k+ Excel spreadsheets, and our validation indicates a 91% rate of successful extraction.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128342765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
On data placement strategies in distributed RDF stores 分布式RDF存储中的数据放置策略
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2017-05-19 DOI: 10.1145/3066911.3066915
Daniel Janke, Steffen Staab, Matthias Thimm
{"title":"On data placement strategies in distributed RDF stores","authors":"Daniel Janke, Steffen Staab, Matthias Thimm","doi":"10.1145/3066911.3066915","DOIUrl":"https://doi.org/10.1145/3066911.3066915","url":null,"abstract":"In the last years, scalable RDF stores in the cloud have been developed, where graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs. One main challenge in these RDF stores is the data placement strategy that can be formalized in terms of graph covers. These graph covers determine whether (a) different query results may be computed on several compute nodes in parallel (vertical parallelization) and (b) individual query results can be produced only from triples assigned to few --- ideally one --- storage node (horizontal containment). We analyse the impact of three most commonly used graph cover strategies in these terms and found out that balancing query workload reduces the query execution time more than reducing data transfer over network. To this end, we present our novel benchmark and open source evaluation platform.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128883370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Evolution of anatomical concept usage over time: mining 200 years of biodiversity literature 解剖学概念使用的演变:挖掘200年的生物多样性文献
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2017-05-19 DOI: 10.1145/3066911.3066919
Prashanti Manda, T. Vision
{"title":"Evolution of anatomical concept usage over time: mining 200 years of biodiversity literature","authors":"Prashanti Manda, T. Vision","doi":"10.1145/3066911.3066919","DOIUrl":"https://doi.org/10.1145/3066911.3066919","url":null,"abstract":"The scientific literature contains an historic record of the changing ways in which we describe the world. Shifts in understanding of scientific concepts are reflected in the introduction of new terms and the changing usage and context of existing ones. We conducted an ontology-based temporal data mining analysis of biodiversity literature from the 1700s to 2000s to quantitatively measure how the context of usage for vertebrate anatomical concepts has changed over time. The corpus of literature was divided into nine non-overlapping time periods with comparable amounts of data and context vectors of anatomical concepts were compared to measure the magnitude of concept drift both between adjacent time periods and cumulatively relative to the initial state. Surprisingly, we found that while anatomical concept drift between adjacent time periods was substantial (55% to 68%), it was of the same magnitude as cumulative concept drift across multiple time periods. Such a process, bound by an overall mean drift, fits the expectations of a mean-reverting process.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114788158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A distributed graph approach for pre-processing linked RDF data using supercomputers 使用超级计算机预处理链接RDF数据的分布式图方法
Proceedings of the International Workshop on Semantic Big Data Pub Date : 2017-05-19 DOI: 10.1145/3066911.3066913
M. Lewis, G. Thiruvathukal, V. Vishwanath, M. Papka, Andrew E. Johnson
{"title":"A distributed graph approach for pre-processing linked RDF data using supercomputers","authors":"M. Lewis, G. Thiruvathukal, V. Vishwanath, M. Papka, Andrew E. Johnson","doi":"10.1145/3066911.3066913","DOIUrl":"https://doi.org/10.1145/3066911.3066913","url":null,"abstract":"Efficient RDF, graph based queries are becoming more pertinent based on the increased interest in data analytics and its intersection with large, unstructured but connected data. Many commercial systems have adopted distributed RDF graph systems in order to handle increasing dataset sizes and complex queries. This paper introduces a distribute graph approach to pre-processing linked data. Instead of traversing the memory graph, our system indexes pre-processed join elements that are organized in a graph structure. We analyze the Dbpedia data-set (derived from the Wikipedia corpus) and compare our access method to the graph traversal access approach which we also devise. Results show from our experiments that the distributed, pre-processed graph approach to accessing linked data is faster than the traversal approach over a specific range of linked queries.","PeriodicalId":210506,"journal":{"name":"Proceedings of the International Workshop on Semantic Big Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122261920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信