Bingqing Liu , Farnoosh Namdarpour , Joseph Y.J. Chow
{"title":"A districting problem with data reliability constraints for equity analysis","authors":"Bingqing Liu , Farnoosh Namdarpour , Joseph Y.J. Chow","doi":"10.1016/j.trc.2024.104759","DOIUrl":null,"url":null,"abstract":"<div><p>While data plays an important role in transportation research, sampled data is not always reliable. Data reliability issue is significant especially for minority groups. In this study, a districting approach is proposed which improves data reliability through aggregation of basic spatial units (BSU), adapted from a max-p-regions problem. The model generates as many aggregated zones as possible that minimize intrazonal heterogeneity while minimizing data margin of error (MOE) of all aggregated zones using a controlling MOE threshold. The problem is first formulated as an integer programming which selects optimal set of zones from a pre-generated set of candidate zones. The difficulty of solving the formulation lies in the generation of the candidate set, so a heuristic solution algorithm is proposed. Two case studies are provided to illustrate the method and validate its performance by evaluating the resulting data quality in an example subsequent planning model. First is an area in Downtown Manhattan with 62 census tracts, comparing the aggregated zones with Neighborhood Tabulation Areas (NTAs) and Taxi Zones. Second is the generation of the New York City Equitable Zoning (NYCEZ), which generated 574 Equitable Zones that reduce the average MOE% of demographic data by 48% for seniors, 75% for low-income population, and 46% for long commuters, all with a district number that is higher than NTAs (2<!--> <!-->2<!--> <!-->1) and Taxi Zones (2<!--> <!-->6<!--> <!-->3). NYCEZ and census tracts are then compared in a subsequent model, synthetic population generation, showing an improvement of 6.2% in standard deviation across simulated populations under the proposed zone design. NYCEZ showed smaller variation in the generated population data. The algorithm can help the decision making of public agencies and the service design of mobility providers by producing reliable and equitable data. The algorithm can also be applied to data-sharing between mobility providers and agencies to alleviate privacy concerns.</p></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":null,"pages":null},"PeriodicalIF":7.6000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X24002808","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
While data plays an important role in transportation research, sampled data is not always reliable. Data reliability issue is significant especially for minority groups. In this study, a districting approach is proposed which improves data reliability through aggregation of basic spatial units (BSU), adapted from a max-p-regions problem. The model generates as many aggregated zones as possible that minimize intrazonal heterogeneity while minimizing data margin of error (MOE) of all aggregated zones using a controlling MOE threshold. The problem is first formulated as an integer programming which selects optimal set of zones from a pre-generated set of candidate zones. The difficulty of solving the formulation lies in the generation of the candidate set, so a heuristic solution algorithm is proposed. Two case studies are provided to illustrate the method and validate its performance by evaluating the resulting data quality in an example subsequent planning model. First is an area in Downtown Manhattan with 62 census tracts, comparing the aggregated zones with Neighborhood Tabulation Areas (NTAs) and Taxi Zones. Second is the generation of the New York City Equitable Zoning (NYCEZ), which generated 574 Equitable Zones that reduce the average MOE% of demographic data by 48% for seniors, 75% for low-income population, and 46% for long commuters, all with a district number that is higher than NTAs (2 2 1) and Taxi Zones (2 6 3). NYCEZ and census tracts are then compared in a subsequent model, synthetic population generation, showing an improvement of 6.2% in standard deviation across simulated populations under the proposed zone design. NYCEZ showed smaller variation in the generated population data. The algorithm can help the decision making of public agencies and the service design of mobility providers by producing reliable and equitable data. The algorithm can also be applied to data-sharing between mobility providers and agencies to alleviate privacy concerns.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.