Detailed Methodology: Depave NYC — East Harlem

This document describes the New York City implementation of the Depave pipeline in enough detail for a GIS analyst to reproduce or extend the analysis. The study area here is East Harlem, Manhattan (El Barrio / Spanish Harlem, Community District 11). The pipeline itself is shared with the Jamaica analysis; only the selected neighborhood and the headline numbers differ. Source code and configuration live in a single repository; paths below are relative to the repository root.

Contents
  1. Scope & study area
  2. Lineage: DepaveLA and DepaveNYC
  3. Data sources
  4. Pavement extraction (NYC Land Cover)
  5. Core vs non-core classification
  6. Needs scoring
  7. Stacked composite
  8. Equity overlay
  9. Known limitations
  10. Reproducibility
  11. Credits & citation
Data tier 1 of 4 — Planimetric + hydraulic New York City supplies the richest inputs in the index: expert-labeled 6-inch land cover, DoITT planimetric roadbed, sidewalk, and parking polygons for the core/non-core split, and a calibrated 2-D hydraulic stormwater flood model. This is the highest-reliability tier, the closest any viewer comes to site-level screening. Tiers reflect input-data availability, not effort. Scores are relative within each city, so the tiers and the numbers are not comparable across cities, and reliability decreases from Tier 1 to Tier 4.

1. Scope & study area

The NYC analysis runs neighborhood by neighborhood. All spatial operations are carried out in EPSG:2263 (NAD83 / New York Long Island, US survey feet), the standard projected system for metric work across the five boroughs. Web exports are reprojected to EPSG:4326 for MapLibre consumption.

A study area is selected by Neighborhood Tabulation Area (NTA) codes from the 2020 vintage. The chosen NTAs are dissolved into a single polygon and buffered outward by 100 ft so that pavement straddling a neighborhood edge is captured. East Harlem is the union of two 2020 NTAs:

This study area corresponds to Manhattan Community District 11.

For tract-level summaries, the 2020 NYC census tracts are loaded, reprojected to EPSG:2263, and clipped to the buffered study-area polygon. Any tract that retains less than 50% of its original area after clipping is dropped as an edge sliver. After this filter, 23 census tracts remain inside the East Harlem study area.

Study-area and location selection is configuration-driven (config/locations/nyc.yaml, config/study_areas_nyc.yaml). The orchestrator dispatches on the DEPAVE_LOCATION environment variable, and a --study-area flag routes each neighborhood's outputs to its own subdirectory under data/nyc/interim/ and data/nyc/processed/, which is how East Harlem and Jamaica coexist in one repository.

2. Lineage: DepaveLA and DepaveNYC

The Depave methodology originates in DepaveLA's parcel-level impervious-cover analysis and was extended in the DepaveNYC pilot. The NYC pipeline is the data-rich original: it consumes NYC-grade municipal GIS products rather than the machine-learning substitutes the Fort Lauderdale port relies on. The most consequential differences are summarized below.

StepNYC (this pipeline)Fort Lauderdale port
Pavement sourceNYC Land Cover 2017 (6-inch, 8-class, expert-labeled raster)NAIP 1 m 4-band imagery classified in-house by a random forest
Core/non-core cutDoITT planimetric roadbed, sidewalk, and parking polygons, joined to MapPLUTO land useFDOT measured widths plus OSM class-estimated widths
Flood layerNYC DEP Stormwater Flood Maps (a published 2-D stormwater flood model)DEM-derived topographic proxy
Heat layerNYC DOHMH Heat Vulnerability Index (ZCTA)NOAA/CAPA afternoon surface-temperature polygons
Canopy layerLand Cover 2017 tree-canopy class (raster)University of Miami lidar-derived canopy polygons
Equity overlayNYS DAC (state) plus EJNYC (derived from Census)CEJST federal Justice40 designation
Processing CRSEPSG:2263 (NAD83 / NY Long Island, ftUS)EPSG:2236 (NAD83 / FL East, ftUS)

Downstream stages (needs scoring, stacked composite, equity overlay, web export) are shared between the two locations and run unmodified.

3. Data sources

All endpoints resolve at acquire time and are taken verbatim from config/data_sources_nyc.yaml (URLs verified against Socrata API metadata, March 2026). Every dataset is hosted on NYC Open Data except NYS DAC, which comes from data.ny.gov.

DatasetResolution / vintageNative CRSSource
NYC Land Cover 20176-inch (0.5-ft), 8-class raster, 2017EPSG:2263NYC Open Data he6d-2qns (DoITT LiDAR land cover, ERDAS Imagine .img)
DoITT Planimetric RoadbedRoad-surface polygons, ~105k recordsEPSG:4326NYC Open Data i36f-5ih7
DoITT Planimetric SidewalkSidewalk polygons, ~51k recordsEPSG:4326NYC Open Data 52n9-sdep
DoITT Planimetric Parking LotsParking-lot polygonsEPSG:4326NYC Open Data 7cgt-uhhz
MapPLUTOTax-lot polygons, latest releaseEPSG:2263NYC DCP MapPLUTO
NYC DEP Stormwater Flood Maps3 rain scenarios, polygon (FileGDB)EPSG:2263NYC Open Data 9i7c-xyvv (NYC DEP)
DOHMH Heat Vulnerability IndexZCTA-level, 1–5 scale (CSV)tabular; ZCTA 2020NYC Open Data 4mhf-duep (NYC DOHMH)
NYS DAC (CLCPA)Tract-level designationEPSG:4326data.ny.gov 2e6c-s6fp
EJNYC (EJ areas, derived)Tract-level, derived from cdeligibil='E'EPSG:4326Derived from NYC Census Tracts 63ge-mke6
2020 Census TractsTract polygons, 2020EPSG:4326NYC Open Data 63ge-mke6
NTAs 2020Neighborhood Tabulation Areas, 2020EPSG:4326NYC Open Data 9nt8-h7nd

4. Pavement extraction (NYC Land Cover)

Pavement comes straight from the city's expert-labeled land-cover product, so there is no machine-learning classifier in the NYC pipeline. The input is the NYC Land Cover 2017 raster, an 8-class layer at 6-inch (0.5-ft) resolution derived from LiDAR and high-resolution imagery. The eight classes are:

1 Tree Canopy     2 Grass / Shrub   3 Bare Earth   4 Water
5 Buildings       6 Roads           7 Other Impervious   8 Railroads

Pavement is defined as classes 6, 7, and 8 (roads, other impervious, railroads). The extraction stage (process/pavement.py) proceeds as follows:

  1. Window read & mask. The raster is read in 5,000-pixel tiles to cap peak memory, and each tile is masked to the three pavement classes.
  2. Vectorize. Masked pixels are polygonized with rasterio.features.shapes, preserving the source land-cover class value on each polygon.
  3. Simplify & clean. Geometry is simplified with a 1.0 ft tolerance and polygons smaller than 1.0 sqft are dropped as slivers.
  4. Clip. The result is clipped to the buffered study-area polygon.
  5. Type by planimetric overlay. Each pavement polygon is assigned a pavement type by spatial overlay with the DoITT planimetric layers (see §5).
  6. Join to MapPLUTO. Each polygon is joined to the underlying tax lot by a representative-point spatial join, pulling bbl, landuse, bldgclass, and lotarea from MapPLUTO.

5. Core vs non-core classification

"Core" pavement is the pavement a city needs for movement: travel lanes, public sidewalks, and rail. "Non-core" is the pool of depave candidates. The classification is implemented in process/core_noncore.py; the default status is non-core, and core is asserted only where a rule fires.

Pavement typing

Before the core split, each vectorized pavement polygon is typed by overlay with the DoITT planimetric layers. A polygon is assigned a type only if at least 50% of its area overlaps the relevant planimetric polygon. Typing runs in this precedence, with later rules overwriting earlier ones:

  1. railroad (from land-cover class 8)
  2. parking_lot (planimetric parking overlay)
  3. sidewalk (planimetric sidewalk overlay)
  4. road (planimetric roadbed overlay)
  5. remainder → other_impervious

Core rules

Non-core rules

6. Needs scoring

All four needs layers score the clipped census-tract set (23 tracts in East Harlem). Each layer produces a *_raw column and a min-max-normalized *_score column in [0, 1], where higher means more need. Normalization is independent per layer.

Heat — Heat Vulnerability Index

The heat input is the NYC DOHMH Heat Vulnerability Index (HVI), a 1–5 composite index published at ZCTA (2020 ZIP Code Tabulation Area) resolution.

Important caveat on the heat score. The HVI is delivered as a ZCTA-level table without any ZCTA geometry, and census tracts carry no ZCTA key. The current pipeline therefore does not perform a true ZCTA-to-tract spatial join. Instead it assigns each tract a heat value drawn from a seeded pseudo-random normal distribution (np.random.seed(42)) centered on the study-area mean HVI and clipped to the observed HVI range. The result is deterministic but does not reflect real tract-level heat. Treat the NYC heat score as a placeholder pending a proper ZCTA→tract crosswalk. This is the single biggest limitation of the NYC method and is listed again under Known limitations.

Stormwater flood risk — NYC DEP Stormwater Flood Maps

Where the Fort Lauderdale port has to derive flood risk from terrain, NYC uses a published flood map. The NYC DEP Stormwater Flood Maps are a 2-D stormwater flood-model product that accounts for drainage and pipe capacity. The dataset ships three rainfall scenarios:

For each tract, the pipeline computes the fraction of tract area covered by each scenario's flood extent (pct_extreme, pct_moderate, pct_limited), then combines them with fixed weights and rescales across the tract set:

flood_raw   = 0.5 · pct_extreme + 0.3 · pct_moderate + 0.2 · pct_limited
flood_score = minmax(flood_raw)

The extreme scenario carries the most weight because it represents the most severe rainfall the model simulates. The 0.5 / 0.3 / 0.2 weighting and the min-max step are shared with the Fort Lauderdale pipeline; only the flood input differs.

Canopy deficit

Canopy deficit is computed from the land-cover raster, using class 1 (Tree Canopy) and class 4 (Water). Per tract, the pipeline counts canopy pixels and water pixels in zonal fashion, then computes a non-water canopy percentage and inverts it into a deficit:

canopy_pct   = canopy_pixels / (total_pixels − water_pixels)
canopy_raw   = 1 − canopy_pct
canopy_score = minmax(canopy_raw)

Higher score means greater canopy deficit and more need. Water is removed from the denominator so a tract is not penalized for open water it cannot plant.

Pavement burden

Per tract, pavement burden is the fraction of tract area covered by mapped pavement, min-max normalized across the tract set:

pavement_raw   = (tract ∩ pavement_union).area / tract.area
pavement_score = minmax(pavement_raw)

Note that the NYC pavement-burden score uses the union of all classified pavement (core and non-core together), so it measures total mapped-pavement fraction per tract rather than non-core fraction. This is recorded under Known limitations.

7. Stacked composite

The four normalized scores are combined into one composite. NYC leaves the needs weights at their default, so each dimension contributes equally:

stacked_score = 0.25·heat + 0.25·flood + 0.25·canopy + 0.25·pavement

A tract is flagged as priority if its stacked_score is at or above the 75th percentile of stacked scores (the top quartile). The pipeline also records n_high_needs, the count of the four dimensions in which a tract is itself in the top quartile, plus a priority_rank. Implemented in process/stacked_needs.py.

8. Equity overlay

The equity overlay (process/equity_overlay.py) compares the priority tracts against two real designations.

A tract is flagged is_dac or is_ej when it overlaps the respective union by at least 1% of its area; is_dac_or_ej is the logical OR. The headline equity figure is the priority-and-DAC intersection.

DAC saturation. In the East Harlem study area every one of the 23 tracts is DAC-designated. Within this study area the DAC overlay therefore does not differentiate tracts, and the priority-and-DAC count simply equals the priority count. Read "all priority tracts are DAC" in that light.

9. Known limitations

10. Reproducibility

Full run from scratch

export DEPAVE_LOCATION=nyc
python3 scripts/run_pipeline.py --study-area east_harlem --stages all

Stages accept the name set acquire, pavement, needs, equity, export, the NYC numeric set 0,1,2,3,4,5,6, or all (the default). --force re-downloads raw inputs where supported.

Outputs

Deploy builds build/nyc/east_harlem/ with raster pavement tiles for crisp rendering.

11. Credits & citation

Depave NYC is produced by ONE Architecture & Urbanism, Inc. The methodology builds on the DepaveLA framework and the DepaveNYC pilot.

Data providers: NYC DoITT (land cover and planimetrics), NYC DEP (stormwater flood maps), NYC DCP (MapPLUTO), NYC DOHMH (Heat Vulnerability Index), NYSERDA and New York State (DAC under CLCPA), and the U.S. Census Bureau (tracts and NTAs).

Suggested citation: Depave NYC — East Harlem: A Screening Analysis of Non-Core Pavement and Environmental Need. ONE Architecture & Urbanism, 2026.