E. Bertino, M. Jahanshahi
求助PDF
{"title":"智慧城市关键基础设施和应急管理高质量数据的自适应和成本效益收集——框架和挑战","authors":"E. Bertino, M. Jahanshahi","doi":"10.1145/3190579","DOIUrl":null,"url":null,"abstract":"ing with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2018 ACM 1936-1955/2018/05-ART1 $15.00 https://doi.org/10.1145/3190579 ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. 1:2 E. Bertino and M. R. Jahanshahi Fig. 1. A spatially incomplete object. agents (mobile phones, small drones, robots, sensors); 5G networks and edge computing processing [2]; crowdsourcing. In this article, we first briefly discuss relevant data quality requirements related to applications in the area of critical infrastructure and emergency management, although this framework can be extended to other applications. We then present a comprehensive framework for a real-time, adaptive, and cost-effective collection of high-quality data for such applications that leverage many of the above technologies, and elaborate on a few research challenges. 2 DATA QUALITY REQUIREMENTS Data quality is usually characterized by many different dimensions [3]. In our context, e.g., objects extracted from image data, key requirements include: —Spatial Completeness: The objects of interest should be “fully covered” by the image data. For example, an image reporting only half of a building crack would not have satisfactory spatial completeness (see Figure 1 for an example of a spatially incomplete object). —Temporal Completeness: The temporal evolution of the objects of interest should be covered as it is critical for accurate prediction. —Precision: The object images should be sharp and have high resolution. —Traceability: Information about the entire process, according to which data of interest was collected, processed, and transmitted, should be recorded; this is critical for identifying errors that lead to poor quality data about the objects of interest. —Minimality: The presence of non-relevant objects should be minimized. It is, however, important to remark that other quality requirements, such as currentness and consistency, are also relevant in our context. 3 DATA COLLECTION FRAMEWORK Our framework (see Figure 2) is based on two conceptual parties: data collection coordinator (referred to as base station (BS)); and data collectors (e.g., agents in charge of data gathering). The data collection coordinator is the interface system that coordinates the data acquisition tasks and data quality assessment. It interfaces on one side with the data users (e.g., end-users and applications) and on the other with data collectors. Given a data acquisition task and geographical area of interest, it allocates a number of data collectors, based on the capabilities of collectors, for the execution of the task, by also trying to optimize the cost of data acquisition and minimize ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. Adaptive and Cost-Effective Collection of High-Quality Data 1:3 Fig. 2. Data collection framework. the response time. Such allocation decisions can be basically supported by optimization techniques developed in the area of operation research. The main challenge is to determine the most suitable optimization techniques for dynamic contexts. The data collection coordinator must also assess the quality of data with respect to specific quality requirements provided as input by the data users. Since a data collection task may often be split among data collectors, the coordinator may have to integrate the various collected data to see whether, overall, the data meets the specified quality requirements or not. The coordinator may also support data enrichment, for example, by using GIS data [5] and data linkage with other sources. The data collectors carry out the basic tasks of collecting data, assessing the quality of the collected data, and, based on this assessment, collecting more data. Notice that data collectors may have different capabilities. For example, some collectors may have equipment for very high-resolution imagery with powerful computing capabilities and can run machine learning tools that require large storage size and GPU. These collectors may thus be able to perform a high accurate data quality analysis. On the other hand, other collectors are very small and thus can easily move very close to the objects and take images from very short distances; however, their capability for data quality assessment may be very limited. Finally, other collectors may be equipped with mechanical devices to take samples from the environment, such as a sample of soil or water, or perform active testing through injecting dynamic disturbances by the collector actuators at selected locations (e.g., exciting the structure by a hammer and collecting the propagated wave characteristics for damage detection). The decision about the right combinations of data collectors for a data acquisition task is taken by the data collection coordinator based on the knowledge of the capabilities of each data collection device. However, as research in the area of distributed decision making for autonomous systems progresses, such decisions could be even taken autonomously by swarms of data collectors. Our framework is based on the notion of data collection cycle, which is organized according to a continuous loop consisting of two main phases: (a) data collection; (b) data quality assessment. Once data is collected, it is assessed for quality. If quality is insufficient, further data is collected. ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. 1:4 E. Bertino and M. R. Jahanshahi Further data collection is typically tailored to improve the quality. For example, a data collector may be required to collect data of higher resolution for a specific object. Data quality assessment is executed at three levels: (1) locally at the data collector; (2) collaboratively within the data collector swarms; (3) globally at the data collection coordinator. Assessments (1) and (2) may not always be possible. Assessment (1) may not be possible as the data collector may not have the capabilities to assess data. Assessment (2) may not be possible if the swarm does not have capabilities to assess the data or if a data collector is isolated from the rest of the swarm. However, Assessment (2) may be highly desirable when connections with the BS are fragmented/unreliable. Adaptation capabilities are thus crucial to deal with those situations. It is important to notice that a critical challenge is to develop approaches for automatically assessing the quality of collected data and automatically determining which additional data needs to be further collected to refine/complete/enhance the quality of the initial data. In particular, when data collection is performed by a swarm of data collectors, the swarm should automatically assess the data and decide further data collection. The development of such framework requires addressing several challenges: —Optimized data-quality driven allocation of data collection tasks to agents: Data collectors are typically heterogeneous with respect to hardware and software capabilities and with respect to special equipment for data acquisition—for example, a drone may have equipment for acquiring images at very high resolution. Also, data collectors may be located in different geographical regions. Data collection also depends on the quality requirements; for example, when performing an initial assessment, data of low quality may be fine. Therefore, it is important to design approaches that are able to support the optimal allocation of data acquisition tasks based on different constraints, requirements, and data collectors’ capabilities and status. Furthermore, it is important that each data collector has the capability of autonomously deciding which data to collect based on its own local assessment of data that have already been collected. Thus, the allocation of data collection tasks is a combination of centralized decisions with decisions local to the data collectors and/or data collector swarms. —Automatic (collaborative) data quality assessment: Techniques are needed to automatically assess the quality of the collected data with respect to the specific quality requirements. Techniques based on machine learning are relevant here. The main issue is that such assessment may be carried out at three different levels (see previous section) and thus tradeoff may be needed between accuracy and resource usage. For example, at the level of the data collectors, resource usage should be minimized. However, resource use minimization may lead to less accurate decisions. It is also critical to devise approaches by which such assessments can be carried out by data collector swarms. Finally, for assessments to be carried out at the BS level, it is important to assess the “optimal data transmission strategy,” namely whether the bulk data should be sent from the data collectors to the BS, or whether the data collectors should perform some local data reduction and then send the reduced data based on the desired tradeoff between accuracy, communication costs, and data collectors’ resource usage. We use here the term data reduction with a broad meaning to indicate techniques to reduce the amount of data to be transmitted. Examples of such techniques include extracting features from images and sending only these features, discarding images that do not include objects of interest, discarding images of poor quality, and selecting relevant frames from videos. Data reduction is important when the computation, memory, power, and transmission bandwidth constraints of the data collectors are considered, particularly ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. Adaptive and Cost-Effective Collection of High-Quality Data 1:5 for large infrast","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"95 1","pages":"1 - 6"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Adaptive and Cost-Effective Collection of High-Quality Data for Critical Infrastructure and Emergency Management in Smart Cities—Framework and Challenges\",\"authors\":\"E. Bertino, M. Jahanshahi\",\"doi\":\"10.1145/3190579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ing with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2018 ACM 1936-1955/2018/05-ART1 $15.00 https://doi.org/10.1145/3190579 ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. 1:2 E. Bertino and M. R. Jahanshahi Fig. 1. A spatially incomplete object. agents (mobile phones, small drones, robots, sensors); 5G networks and edge computing processing [2]; crowdsourcing. In this article, we first briefly discuss relevant data quality requirements related to applications in the area of critical infrastructure and emergency management, although this framework can be extended to other applications. We then present a comprehensive framework for a real-time, adaptive, and cost-effective collection of high-quality data for such applications that leverage many of the above technologies, and elaborate on a few research challenges. 2 DATA QUALITY REQUIREMENTS Data quality is usually characterized by many different dimensions [3]. In our context, e.g., objects extracted from image data, key requirements include: —Spatial Completeness: The objects of interest should be “fully covered” by the image data. For example, an image reporting only half of a building crack would not have satisfactory spatial completeness (see Figure 1 for an example of a spatially incomplete object). —Temporal Completeness: The temporal evolution of the objects of interest should be covered as it is critical for accurate prediction. —Precision: The object images should be sharp and have high resolution. —Traceability: Information about the entire process, according to which data of interest was collected, processed, and transmitted, should be recorded; this is critical for identifying errors that lead to poor quality data about the objects of interest. —Minimality: The presence of non-relevant objects should be minimized. It is, however, important to remark that other quality requirements, such as currentness and consistency, are also relevant in our context. 3 DATA COLLECTION FRAMEWORK Our framework (see Figure 2) is based on two conceptual parties: data collection coordinator (referred to as base station (BS)); and data collectors (e.g., agents in charge of data gathering). The data collection coordinator is the interface system that coordinates the data acquisition tasks and data quality assessment. It interfaces on one side with the data users (e.g., end-users and applications) and on the other with data collectors. Given a data acquisition task and geographical area of interest, it allocates a number of data collectors, based on the capabilities of collectors, for the execution of the task, by also trying to optimize the cost of data acquisition and minimize ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. Adaptive and Cost-Effective Collection of High-Quality Data 1:3 Fig. 2. Data collection framework. the response time. Such allocation decisions can be basically supported by optimization techniques developed in the area of operation research. The main challenge is to determine the most suitable optimization techniques for dynamic contexts. The data collection coordinator must also assess the quality of data with respect to specific quality requirements provided as input by the data users. Since a data collection task may often be split among data collectors, the coordinator may have to integrate the various collected data to see whether, overall, the data meets the specified quality requirements or not. The coordinator may also support data enrichment, for example, by using GIS data [5] and data linkage with other sources. The data collectors carry out the basic tasks of collecting data, assessing the quality of the collected data, and, based on this assessment, collecting more data. Notice that data collectors may have different capabilities. For example, some collectors may have equipment for very high-resolution imagery with powerful computing capabilities and can run machine learning tools that require large storage size and GPU. These collectors may thus be able to perform a high accurate data quality analysis. On the other hand, other collectors are very small and thus can easily move very close to the objects and take images from very short distances; however, their capability for data quality assessment may be very limited. Finally, other collectors may be equipped with mechanical devices to take samples from the environment, such as a sample of soil or water, or perform active testing through injecting dynamic disturbances by the collector actuators at selected locations (e.g., exciting the structure by a hammer and collecting the propagated wave characteristics for damage detection). The decision about the right combinations of data collectors for a data acquisition task is taken by the data collection coordinator based on the knowledge of the capabilities of each data collection device. However, as research in the area of distributed decision making for autonomous systems progresses, such decisions could be even taken autonomously by swarms of data collectors. Our framework is based on the notion of data collection cycle, which is organized according to a continuous loop consisting of two main phases: (a) data collection; (b) data quality assessment. Once data is collected, it is assessed for quality. If quality is insufficient, further data is collected. ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. 1:4 E. Bertino and M. R. Jahanshahi Further data collection is typically tailored to improve the quality. For example, a data collector may be required to collect data of higher resolution for a specific object. Data quality assessment is executed at three levels: (1) locally at the data collector; (2) collaboratively within the data collector swarms; (3) globally at the data collection coordinator. Assessments (1) and (2) may not always be possible. Assessment (1) may not be possible as the data collector may not have the capabilities to assess data. Assessment (2) may not be possible if the swarm does not have capabilities to assess the data or if a data collector is isolated from the rest of the swarm. However, Assessment (2) may be highly desirable when connections with the BS are fragmented/unreliable. Adaptation capabilities are thus crucial to deal with those situations. It is important to notice that a critical challenge is to develop approaches for automatically assessing the quality of collected data and automatically determining which additional data needs to be further collected to refine/complete/enhance the quality of the initial data. In particular, when data collection is performed by a swarm of data collectors, the swarm should automatically assess the data and decide further data collection. The development of such framework requires addressing several challenges: —Optimized data-quality driven allocation of data collection tasks to agents: Data collectors are typically heterogeneous with respect to hardware and software capabilities and with respect to special equipment for data acquisition—for example, a drone may have equipment for acquiring images at very high resolution. Also, data collectors may be located in different geographical regions. Data collection also depends on the quality requirements; for example, when performing an initial assessment, data of low quality may be fine. Therefore, it is important to design approaches that are able to support the optimal allocation of data acquisition tasks based on different constraints, requirements, and data collectors’ capabilities and status. Furthermore, it is important that each data collector has the capability of autonomously deciding which data to collect based on its own local assessment of data that have already been collected. Thus, the allocation of data collection tasks is a combination of centralized decisions with decisions local to the data collectors and/or data collector swarms. —Automatic (collaborative) data quality assessment: Techniques are needed to automatically assess the quality of the collected data with respect to the specific quality requirements. Techniques based on machine learning are relevant here. The main issue is that such assessment may be carried out at three different levels (see previous section) and thus tradeoff may be needed between accuracy and resource usage. For example, at the level of the data collectors, resource usage should be minimized. However, resource use minimization may lead to less accurate decisions. It is also critical to devise approaches by which such assessments can be carried out by data collector swarms. Finally, for assessments to be carried out at the BS level, it is important to assess the “optimal data transmission strategy,” namely whether the bulk data should be sent from the data collectors to the BS, or whether the data collectors should perform some local data reduction and then send the reduced data based on the desired tradeoff between accuracy, communication costs, and data collectors’ resource usage. We use here the term data reduction with a broad meaning to indicate techniques to reduce the amount of data to be transmitted. Examples of such techniques include extracting features from images and sending only these features, discarding images that do not include objects of interest, discarding images of poor quality, and selecting relevant frames from videos. Data reduction is important when the computation, memory, power, and transmission bandwidth constraints of the data collectors are considered, particularly ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. Adaptive and Cost-Effective Collection of High-Quality Data 1:5 for large infrast\",\"PeriodicalId\":15582,\"journal\":{\"name\":\"Journal of Data and Information Quality (JDIQ)\",\"volume\":\"95 1\",\"pages\":\"1 - 6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Data and Information Quality (JDIQ)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3190579\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3190579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
引用
批量引用
Adaptive and Cost-Effective Collection of High-Quality Data for Critical Infrastructure and Emergency Management in Smart Cities—Framework and Challenges
ing with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2018 ACM 1936-1955/2018/05-ART1 $15.00 https://doi.org/10.1145/3190579 ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. 1:2 E. Bertino and M. R. Jahanshahi Fig. 1. A spatially incomplete object. agents (mobile phones, small drones, robots, sensors); 5G networks and edge computing processing [2]; crowdsourcing. In this article, we first briefly discuss relevant data quality requirements related to applications in the area of critical infrastructure and emergency management, although this framework can be extended to other applications. We then present a comprehensive framework for a real-time, adaptive, and cost-effective collection of high-quality data for such applications that leverage many of the above technologies, and elaborate on a few research challenges. 2 DATA QUALITY REQUIREMENTS Data quality is usually characterized by many different dimensions [3]. In our context, e.g., objects extracted from image data, key requirements include: —Spatial Completeness: The objects of interest should be “fully covered” by the image data. For example, an image reporting only half of a building crack would not have satisfactory spatial completeness (see Figure 1 for an example of a spatially incomplete object). —Temporal Completeness: The temporal evolution of the objects of interest should be covered as it is critical for accurate prediction. —Precision: The object images should be sharp and have high resolution. —Traceability: Information about the entire process, according to which data of interest was collected, processed, and transmitted, should be recorded; this is critical for identifying errors that lead to poor quality data about the objects of interest. —Minimality: The presence of non-relevant objects should be minimized. It is, however, important to remark that other quality requirements, such as currentness and consistency, are also relevant in our context. 3 DATA COLLECTION FRAMEWORK Our framework (see Figure 2) is based on two conceptual parties: data collection coordinator (referred to as base station (BS)); and data collectors (e.g., agents in charge of data gathering). The data collection coordinator is the interface system that coordinates the data acquisition tasks and data quality assessment. It interfaces on one side with the data users (e.g., end-users and applications) and on the other with data collectors. Given a data acquisition task and geographical area of interest, it allocates a number of data collectors, based on the capabilities of collectors, for the execution of the task, by also trying to optimize the cost of data acquisition and minimize ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. Adaptive and Cost-Effective Collection of High-Quality Data 1:3 Fig. 2. Data collection framework. the response time. Such allocation decisions can be basically supported by optimization techniques developed in the area of operation research. The main challenge is to determine the most suitable optimization techniques for dynamic contexts. The data collection coordinator must also assess the quality of data with respect to specific quality requirements provided as input by the data users. Since a data collection task may often be split among data collectors, the coordinator may have to integrate the various collected data to see whether, overall, the data meets the specified quality requirements or not. The coordinator may also support data enrichment, for example, by using GIS data [5] and data linkage with other sources. The data collectors carry out the basic tasks of collecting data, assessing the quality of the collected data, and, based on this assessment, collecting more data. Notice that data collectors may have different capabilities. For example, some collectors may have equipment for very high-resolution imagery with powerful computing capabilities and can run machine learning tools that require large storage size and GPU. These collectors may thus be able to perform a high accurate data quality analysis. On the other hand, other collectors are very small and thus can easily move very close to the objects and take images from very short distances; however, their capability for data quality assessment may be very limited. Finally, other collectors may be equipped with mechanical devices to take samples from the environment, such as a sample of soil or water, or perform active testing through injecting dynamic disturbances by the collector actuators at selected locations (e.g., exciting the structure by a hammer and collecting the propagated wave characteristics for damage detection). The decision about the right combinations of data collectors for a data acquisition task is taken by the data collection coordinator based on the knowledge of the capabilities of each data collection device. However, as research in the area of distributed decision making for autonomous systems progresses, such decisions could be even taken autonomously by swarms of data collectors. Our framework is based on the notion of data collection cycle, which is organized according to a continuous loop consisting of two main phases: (a) data collection; (b) data quality assessment. Once data is collected, it is assessed for quality. If quality is insufficient, further data is collected. ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. 1:4 E. Bertino and M. R. Jahanshahi Further data collection is typically tailored to improve the quality. For example, a data collector may be required to collect data of higher resolution for a specific object. Data quality assessment is executed at three levels: (1) locally at the data collector; (2) collaboratively within the data collector swarms; (3) globally at the data collection coordinator. Assessments (1) and (2) may not always be possible. Assessment (1) may not be possible as the data collector may not have the capabilities to assess data. Assessment (2) may not be possible if the swarm does not have capabilities to assess the data or if a data collector is isolated from the rest of the swarm. However, Assessment (2) may be highly desirable when connections with the BS are fragmented/unreliable. Adaptation capabilities are thus crucial to deal with those situations. It is important to notice that a critical challenge is to develop approaches for automatically assessing the quality of collected data and automatically determining which additional data needs to be further collected to refine/complete/enhance the quality of the initial data. In particular, when data collection is performed by a swarm of data collectors, the swarm should automatically assess the data and decide further data collection. The development of such framework requires addressing several challenges: —Optimized data-quality driven allocation of data collection tasks to agents: Data collectors are typically heterogeneous with respect to hardware and software capabilities and with respect to special equipment for data acquisition—for example, a drone may have equipment for acquiring images at very high resolution. Also, data collectors may be located in different geographical regions. Data collection also depends on the quality requirements; for example, when performing an initial assessment, data of low quality may be fine. Therefore, it is important to design approaches that are able to support the optimal allocation of data acquisition tasks based on different constraints, requirements, and data collectors’ capabilities and status. Furthermore, it is important that each data collector has the capability of autonomously deciding which data to collect based on its own local assessment of data that have already been collected. Thus, the allocation of data collection tasks is a combination of centralized decisions with decisions local to the data collectors and/or data collector swarms. —Automatic (collaborative) data quality assessment: Techniques are needed to automatically assess the quality of the collected data with respect to the specific quality requirements. Techniques based on machine learning are relevant here. The main issue is that such assessment may be carried out at three different levels (see previous section) and thus tradeoff may be needed between accuracy and resource usage. For example, at the level of the data collectors, resource usage should be minimized. However, resource use minimization may lead to less accurate decisions. It is also critical to devise approaches by which such assessments can be carried out by data collector swarms. Finally, for assessments to be carried out at the BS level, it is important to assess the “optimal data transmission strategy,” namely whether the bulk data should be sent from the data collectors to the BS, or whether the data collectors should perform some local data reduction and then send the reduced data based on the desired tradeoff between accuracy, communication costs, and data collectors’ resource usage. We use here the term data reduction with a broad meaning to indicate techniques to reduce the amount of data to be transmitted. Examples of such techniques include extracting features from images and sending only these features, discarding images that do not include objects of interest, discarding images of poor quality, and selecting relevant frames from videos. Data reduction is important when the computation, memory, power, and transmission bandwidth constraints of the data collectors are considered, particularly ACM Journal of Data and Information Quality, Vol. 10, No. 1, Article 1. Publication date: May 2018. Adaptive and Cost-Effective Collection of High-Quality Data 1:5 for large infrast