PhD '12最新文献

筛选
英文 中文
Linking records in dynamic world 链接动态世界中的记录
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213612
Pei Li
{"title":"Linking records in dynamic world","authors":"Pei Li","doi":"10.1145/2213598.2213612","DOIUrl":"https://doi.org/10.1145/2213598.2213612","url":null,"abstract":"In real-world, entities change dynamically and the changes are capture in two dimensions: time and space. For data sets that contain temporal records, where each record is associated with a time stamp and describes some aspects of a real-world entity at that particular time, we often wish to identify records that describe the same entity over time and so be able to enable interesting longitudinal data analysis. For data sets that contain geographically referenced data describing real-world entities at different locations (i.e., location entities), we wish to link those entities that belong to the same organization or network. However, existing record linkage techniques ignore additional evidence in temporal and spatial data and can fall short for these cases.\u0000 This proposal studies linking temporal and spatial records. For temporal record linkage, we apply time decay to capture the effect of elapsed time on entity value evolution, and propose clustering methods that consider time order of records in clustering. For linking location records, we distinguish between strong and weak evidence; for the former, we study core generation in presence of erroneous data, and then leverage the discovered strong evidence to make remaining decisions.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131969375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Holistic indexing: offline, online and adaptive indexing in the same kernel 整体索引:离线,在线和自适应索引在同一个内核
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213604
E. Petraki
{"title":"Holistic indexing: offline, online and adaptive indexing in the same kernel","authors":"E. Petraki","doi":"10.1145/2213598.2213604","DOIUrl":"https://doi.org/10.1145/2213598.2213604","url":null,"abstract":"Proper physical design is a momentous issue for the performance of modern database systems and applications. Nowadays, a growing amount of applications require the execution of dynamic and exploratory workloads with unpredictable characteristics that change over time, e.g., social networks, scientific databases and multimedia databases. In addition, as most modern applications move to the big data era, investing time and resources in building the wrong set of indexes over large collections of data can severely affect performance.\u0000 Offline, online and adaptive indexing are three distinct approaches to the problem of automating the physical design choices. Offline indexing is best in static environments with stable workloads. Online indexing is best in relatively dynamic environments where the query workload can be monitored. Adaptive indexing is best in fully dynamic environments where no idle time or workload knowledge may be assumed. We observe that these three approaches are complementary, while none of them can satisfy the needs of modern applications in isolation.\u0000 We envision a new index selection approach, holistic indexing that excels its predecessors by combining the best features of offline, online and adaptive indexing while overcoming their weaknesses. The main goal is the creation of a database kernel that can autonomously create partial indexes which are continuously refined during query processing as in adaptive indexing but at the same time the system continuously detects any opportunity to improve the physical design offline; whenever any idle time occurs it tries to exploit knowledge gathered during query processing to refine existing indexes further or create new ones. We sketch the research space and the new challenges such a direction brings.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127504183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Clustering techniques for open relation extraction 开放关系提取的聚类技术
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213607
F. Mesquita
{"title":"Clustering techniques for open relation extraction","authors":"F. Mesquita","doi":"10.1145/2213598.2213607","DOIUrl":"https://doi.org/10.1145/2213598.2213607","url":null,"abstract":"This work investigates clustering techniques for Relation Extraction (RE). Relation Extraction is the task of extracting relationships among named entities (e.g., people, organizations and geo-political entities) from natural language text. We are particularly interested in the open RE scenario, where the number of target relations is too large or even unknown. Our contributions are in two aspects of the clustering process: (1) extraction and weighting of features and (2) scalability. In order to evaluate our techniques in large scale, we propose an automatic evaluation method based on pointwise mutual information. Our preliminary results show that our clustering techniques as well as our evaluation method are promising.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126287120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An adaptive event stream processing environment 自适应事件流处理环境
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213613
Samujjwal Bhandari
{"title":"An adaptive event stream processing environment","authors":"Samujjwal Bhandari","doi":"10.1145/2213598.2213613","DOIUrl":"https://doi.org/10.1145/2213598.2213613","url":null,"abstract":"With the increasing application of Event Stream Processing (ESP) for event pattern detection, it has become important to enhance the extant ESP capabilities to deal with applications having dynamic behavior. This dissertation research explores the limitations of current ESP systems due to fixed pattern detection mechanism and discusses the motivational ideas that demand enhancements in ESP. We propose a solution called adaptive ESP that explores, learns, and updates evolving patterns in dynamic applications. Development of adaptive ESP requires several research issues to be addressed: such as handling input data streams, enhancing event languages with probabilistic information, using machine learning algorithms, and processing feedback from experts. We discuss these issues with the proposed architecture for the system and explore research issues and some of the initial work for developing adaptive ESP.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127316900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
High performance spatial query processing for large scale scientific data 面向大规模科学数据的高性能空间查询处理
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213603
Ablimit Aji, Fusheng Wang
{"title":"High performance spatial query processing for large scale scientific data","authors":"Ablimit Aji, Fusheng Wang","doi":"10.1145/2213598.2213603","DOIUrl":"https://doi.org/10.1145/2213598.2213603","url":null,"abstract":"Analyzing and querying large volumes of spatially derived data from scientific experiments has posed major challenges in the past decade. For example, the systematic analysis of imaged pathology specimens result in rich spatially derived information with GIS characteristics at cellular and sub-cellular scales, with nearly a million derived markups and hundred million features per image. This provides critical information for evaluation of experimental results, support of biomedical studies and pathology image based diagnosis. However, the vast amount of spatially oriented morphological information poses major challenges for analytical medical imaging. The major challenges I attack include: i) How can we provide cost effective, scalable spatial query support for medical imaging GIS? ii) How can we provide fast response queries on analytical imaging data to support biomedical research and clinical diagnosis? and iii) How can we provide expressive queries to support spatial queries and spatial pattern discoveries for end users? In my thesis, I work towards developing a MapReduce based framework MIGIS to support expressive, cost effective and high performance spatial queries. The framework includes a real-time spatial query engine RESQUE consisting of a variety of optimized access methods, boundary and density aware spatial data partitioning, a declarative query language interface, a query translator which automates translation of the spatial queries into MapReduce programs and an execution engine which parallelizes and executes queries on Hadoop. Our preliminary experiments demonstrate that MIGIS is a cost effective architecture which achieves high performance spatial query execution. MIGIS is extensible and can be adapted to support similar complex spatial queries for large scale spatial data in other scientific domains.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129247116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Efficient optimization and processing for distributed monitoring and control applications 分布式监控应用的高效优化和处理
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213615
Mengmeng Liu
{"title":"Efficient optimization and processing for distributed monitoring and control applications","authors":"Mengmeng Liu","doi":"10.1145/2213598.2213615","DOIUrl":"https://doi.org/10.1145/2213598.2213615","url":null,"abstract":"In recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, that aim to monitor, control, and make decisions over large volumes of dynamic data. In my dissertation, we aim to enable a generic framework for these distributed monitoring and control applications, and address the limitations of prior work such as data stream management systems and adaptive query processing systems. In particular, we make the following contributions: 1) supporting the maintenance of recursive queries over distributed data streams, 2) enabling full-fledged cost-based incremental query re-optimization, and 3) as ongoing work, incorporating the cost estimation of plan switching during query re-optimization. Our solutions are implemented and evaluated using our prototype system Aspen, over a variety of workloads and benchmarks. In addition, our prototype system Aspen enables an end-to-end framework to support control and decision-making over integrated data streams from both the physical world (e.g., sensor streams) and the digital world (e.g., web, streams, databases).","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116673677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
RecDB: towards DBMS support for online recommender systems RecDB:面向在线推荐系统的DBMS支持
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213608
Mohamed Sarwat
{"title":"RecDB: towards DBMS support for online recommender systems","authors":"Mohamed Sarwat","doi":"10.1145/2213598.2213608","DOIUrl":"https://doi.org/10.1145/2213598.2213608","url":null,"abstract":"Recommender systems have become popular in both commercial and academic settings. The main purpose of recommender systems is to suggest to users useful and interesting items or content (data) from a considerably large set of items. Traditional recommender systems do not take into account system issues (i.e., scalability and query efficiency). In an age of staggering web use growth and everpopular social media applications (e.g., Facebook, Google Reader), users are expressing their opinions over a diverse set of data (e.g., news stories, Facebook posts, retail purchases) faster than ever. In this paper, we propose RecDB; a fully fledged database system that provides online recommendation to users. We implement RecDB using existing open source database system Apache Derby, and we use showcase the effectiveness of RecDB by adopting inside Sindbad; a Location-Based Social Networking system developed at University of Minnesota.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127374240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Foundational aspects of semantic web optimization 语义网页优化的基本方面
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213611
Sebastian Skritek
{"title":"Foundational aspects of semantic web optimization","authors":"Sebastian Skritek","doi":"10.1145/2213598.2213611","DOIUrl":"https://doi.org/10.1145/2213598.2213611","url":null,"abstract":"The goal of the semantic web is to make the information available on the web easier accessible. Its idea is to provide machine readable meta-data to enable the development of tools that support users in finding the relevant data.\u0000 The goal of the thesis is to shed some light onto different foundational aspects of optimization tasks occurring in the field of the Semantic Web. Examples include the redundancy elimination in RDF data or static query analysis of (well-designed) SPARQL queries. Towards this goal, we already contributed several results.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125300240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data quality and integration in collaborative environments 协作环境中的数据质量和集成
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213606
Gregor Endler
{"title":"Data quality and integration in collaborative environments","authors":"Gregor Endler","doi":"10.1145/2213598.2213606","DOIUrl":"https://doi.org/10.1145/2213598.2213606","url":null,"abstract":"The trend to merge medical practices into cooperatively operating networks and organizational units like Medical Supply Centers generates new challenges for an adequate IT support. In particular, new use cases for common economic planning, controlling and treatment coordination arise. This requires consolidation of data originating from heterogeneous and autonomous software systems. Heterogeneity and autonomy are core reasons for low data quality. The intuitive approach of initially integrating heterogeneous systems into a federated system creates a very high upfront effort before the system can become operable and does not adequately consider the fact that data quality requirements might change over time. To remedy this, we propose an approach for continuous data quality improvement which enables a demand driven step by step system integration. By adapting the generic Total Data Quality Management process to healthcare specific use cases, we are developing an extended model for continuous data quality management in cooperative healthcare settings. The IT tools which are needed to provide the information that drives this process are currently in development within a government supported project involving both industry and academia.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127388030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards an extensible efficient event processing kernel 一个可扩展的高效事件处理内核
PhD '12 Pub Date : 2012-05-20 DOI: 10.1145/2213598.2213602
Mohammad Sadoghi
{"title":"Towards an extensible efficient event processing kernel","authors":"Mohammad Sadoghi","doi":"10.1145/2213598.2213602","DOIUrl":"https://doi.org/10.1145/2213598.2213602","url":null,"abstract":"The efficient processing of large collections of patterns (Boolean expressions, XPath queries, or continuous SQL queries) over data streams plays a central role in major data intensive applications ranging from user-centric processing and personalization to real-time data analysis. On the one hand, emerging user-centric applications, including computational advertising and selective information dissemination, demand determining and presenting to an end-user only the most relevant content that is both user-consumable and suitable for limited screen real estate of target (mobile) devices. We achieve these user-centric requirements through novel high-dimensional indexing structures and (parallel) algorithms. On the other hand, applications in real-time data analysis, including computational finance and intrusion detection, demand meeting stringent subsecond processing requirements and providing high-frequency and low-latency event processing over data streams. We achieve real-time data analysis requirements by leveraging reconfigurable hardware -- FPGAs -- to sustain line-rate processing by exploiting unprecedented degrees of parallelism and potential for pipelining, only available through custom-built, application-specific, and low-level logic design. Finally, we conduct a comprehensive evaluation to demonstrate the superiority of our proposed techniques in comparison with state-of-the-art algorithms designed for event processing.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"221 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122930545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信