Detailed Methodology: Depave NYC — East Harlem

This document describes the New York City implementation of the Depave pipeline in enough detail for a GIS analyst to reproduce or extend the analysis. The study area here is East Harlem, Manhattan (El Barrio / Spanish Harlem, Community District 11). The pipeline itself is shared with the Jamaica analysis; only the selected neighborhood and the headline numbers differ. Source code and configuration live in a single repository; paths below are relative to the repository root.

Contents

Scope & study area
Lineage: DepaveLA and DepaveNYC
Data sources
Pavement extraction (NYC Land Cover)
Core vs non-core classification
Needs scoring
Stacked composite
Equity overlay
Known limitations
Reproducibility
Credits & citation

Data tier 1 of 4 — Planimetric + hydraulic New York City supplies the richest inputs in the index: expert-labeled 6-inch land cover, DoITT planimetric roadbed, sidewalk, and parking polygons for the core/non-core split, and a calibrated 2-D hydraulic stormwater flood model. This is the highest-reliability tier, the closest any viewer comes to site-level screening. Tiers reflect input-data availability, not effort. Scores are relative within each city, so the tiers and the numbers are not comparable across cities, and reliability decreases from Tier 1 to Tier 4.

1. Scope & study area

The NYC analysis runs neighborhood by neighborhood. All spatial operations are carried out in EPSG:2263 (NAD83 / New York Long Island, US survey feet), the standard projected system for metric work across the five boroughs. Web exports are reprojected to EPSG:4326 for MapLibre consumption.

A study area is selected by Neighborhood Tabulation Area (NTA) codes from the 2020 vintage. The chosen NTAs are dissolved into a single polygon and buffered outward by 100 ft so that pavement straddling a neighborhood edge is captured. East Harlem is the union of two 2020 NTAs:

MN1101 East Harlem (South)
MN1102 East Harlem (North)

This study area corresponds to Manhattan Community District 11.

For tract-level summaries, the 2020 NYC census tracts are loaded, reprojected to EPSG:2263, and clipped to the buffered study-area polygon. Any tract that retains less than 50% of its original area after clipping is dropped as an edge sliver. After this filter, 23 census tracts remain inside the East Harlem study area.

Study-area and location selection is configuration-driven (config/locations/nyc.yaml, config/study_areas_nyc.yaml). The orchestrator dispatches on the DEPAVE_LOCATION environment variable, and a --study-area flag routes each neighborhood's outputs to its own subdirectory under data/nyc/interim/ and data/nyc/processed/, which is how East Harlem and Jamaica coexist in one repository.

2. Lineage: DepaveLA and DepaveNYC

The Depave methodology originates in DepaveLA's parcel-level impervious-cover analysis and was extended in the DepaveNYC pilot. The NYC pipeline is the data-rich original: it consumes NYC-grade municipal GIS products rather than the machine-learning substitutes the Fort Lauderdale port relies on. The most consequential differences are summarized below.

Step	NYC (this pipeline)	Fort Lauderdale port
Pavement source	NYC Land Cover 2017 (6-inch, 8-class, expert-labeled raster)	NAIP 1 m 4-band imagery classified in-house by a random forest
Core/non-core cut	DoITT planimetric roadbed, sidewalk, and parking polygons, joined to MapPLUTO land use	FDOT measured widths plus OSM class-estimated widths
Flood layer	NYC DEP Stormwater Flood Maps (a published 2-D stormwater flood model)	DEM-derived topographic proxy
Heat layer	NYC DOHMH Heat Vulnerability Index (ZCTA)	NOAA/CAPA afternoon surface-temperature polygons
Canopy layer	Land Cover 2017 tree-canopy class (raster)	University of Miami lidar-derived canopy polygons
Equity overlay	NYS DAC (state) plus EJNYC (derived from Census)	CEJST federal Justice40 designation
Processing CRS	EPSG:2263 (NAD83 / NY Long Island, ftUS)	EPSG:2236 (NAD83 / FL East, ftUS)

Downstream stages (needs scoring, stacked composite, equity overlay, web export) are shared between the two locations and run unmodified.

3. Data sources

All endpoints resolve at acquire time and are taken verbatim from config/data_sources_nyc.yaml (URLs verified against Socrata API metadata, March 2026). Every dataset is hosted on NYC Open Data except NYS DAC, which comes from data.ny.gov.

Dataset	Resolution / vintage	Native CRS	Source
NYC Land Cover 2017	6-inch (0.5-ft), 8-class raster, 2017	EPSG:2263	NYC Open Data `he6d-2qns` (DoITT LiDAR land cover, ERDAS Imagine .img)
DoITT Planimetric Roadbed	Road-surface polygons, ~105k records	EPSG:4326	NYC Open Data `i36f-5ih7`
DoITT Planimetric Sidewalk	Sidewalk polygons, ~51k records	EPSG:4326	NYC Open Data `52n9-sdep`
DoITT Planimetric Parking Lots	Parking-lot polygons	EPSG:4326	NYC Open Data `7cgt-uhhz`
MapPLUTO	Tax-lot polygons, latest release	EPSG:2263	NYC DCP MapPLUTO
NYC DEP Stormwater Flood Maps	3 rain scenarios, polygon (FileGDB)	EPSG:2263	NYC Open Data `9i7c-xyvv` (NYC DEP)
DOHMH Heat Vulnerability Index	ZCTA-level, 1–5 scale (CSV)	tabular; ZCTA 2020	NYC Open Data `4mhf-duep` (NYC DOHMH)
NYS DAC (CLCPA)	Tract-level designation	EPSG:4326	data.ny.gov `2e6c-s6fp`
EJNYC (EJ areas, derived)	Tract-level, derived from `cdeligibil='E'`	EPSG:4326	Derived from NYC Census Tracts `63ge-mke6`
2020 Census Tracts	Tract polygons, 2020	EPSG:4326	NYC Open Data `63ge-mke6`
NTAs 2020	Neighborhood Tabulation Areas, 2020	EPSG:4326	NYC Open Data `9nt8-h7nd`

4. Pavement extraction (NYC Land Cover)

Pavement comes straight from the city's expert-labeled land-cover product, so there is no machine-learning classifier in the NYC pipeline. The input is the NYC Land Cover 2017 raster, an 8-class layer at 6-inch (0.5-ft) resolution derived from LiDAR and high-resolution imagery. The eight classes are:

1 Tree Canopy     2 Grass / Shrub   3 Bare Earth   4 Water
5 Buildings       6 Roads           7 Other Impervious   8 Railroads

Pavement is defined as classes 6, 7, and 8 (roads, other impervious, railroads). The extraction stage (process/pavement.py) proceeds as follows:

Window read & mask. The raster is read in 5,000-pixel tiles to cap peak memory, and each tile is masked to the three pavement classes.
Vectorize. Masked pixels are polygonized with rasterio.features.shapes, preserving the source land-cover class value on each polygon.
Simplify & clean. Geometry is simplified with a 1.0 ft tolerance and polygons smaller than 1.0 sqft are dropped as slivers.
Clip. The result is clipped to the buffered study-area polygon.
Type by planimetric overlay. Each pavement polygon is assigned a pavement type by spatial overlay with the DoITT planimetric layers (see §5).
Join to MapPLUTO. Each polygon is joined to the underlying tax lot by a representative-point spatial join, pulling bbl, landuse, bldgclass, and lotarea from MapPLUTO.

5. Core vs non-core classification

"Core" pavement is the pavement a city needs for movement: travel lanes, public sidewalks, and rail. "Non-core" is the pool of depave candidates. The classification is implemented in process/core_noncore.py; the default status is non-core, and core is asserted only where a rule fires.

Pavement typing

Before the core split, each vectorized pavement polygon is typed by overlay with the DoITT planimetric layers. A polygon is assigned a type only if at least 50% of its area overlaps the relevant planimetric polygon. Typing runs in this precedence, with later rules overwriting earlier ones:

railroad (from land-cover class 8)
parking_lot (planimetric parking overlay)
sidewalk (planimetric sidewalk overlay)
road (planimetric roadbed overlay)
remainder → other_impervious

Core rules

Roads → core.
Railroads → core.
Right-of-way sidewalks → core: a sidewalk whose joined bbl is null (not on a private lot) or whose PLUTO land use is 07 (transportation / utility).
Transportation / utility parcels (PLUTO land use 07) → core.

Non-core rules

Parking lots from the planimetric parking layer. Parking is a depave candidate, which matches the Fort Lauderdale treatment and differs from the original DepaveLA, where parking was core.
Vacant-lot pavement where the PLUTO building class starts with V.
Parking-coded parcels where the PLUTO building class is G6 or G7.
Commercial / industrial excess where the PLUTO land use is 05 or 06.
Interior sidewalks: a sidewalk sitting on a private parcel and not already cored.
Everything else falls through to other_impervious, which is non-core.

6. Needs scoring

All four needs layers score the clipped census-tract set (23 tracts in East Harlem). Each layer produces a *_raw column and a min-max-normalized *_score column in [0, 1], where higher means more need. Normalization is independent per layer.

Heat — Heat Vulnerability Index

The heat input is the NYC DOHMH Heat Vulnerability Index (HVI), a 1–5 composite index published at ZCTA (2020 ZIP Code Tabulation Area) resolution.

Important caveat on the heat score. The HVI is delivered as a ZCTA-level table without any ZCTA geometry, and census tracts carry no ZCTA key. The current pipeline therefore does not perform a true ZCTA-to-tract spatial join. Instead it assigns each tract a heat value drawn from a seeded pseudo-random normal distribution (np.random.seed(42)) centered on the study-area mean HVI and clipped to the observed HVI range. The result is deterministic but does not reflect real tract-level heat. Treat the NYC heat score as a placeholder pending a proper ZCTA→tract crosswalk. This is the single biggest limitation of the NYC method and is listed again under Known limitations.

Stormwater flood risk — NYC DEP Stormwater Flood Maps

Where the Fort Lauderdale port has to derive flood risk from terrain, NYC uses a published flood map. The NYC DEP Stormwater Flood Maps are a 2-D stormwater flood-model product that accounts for drainage and pipe capacity. The dataset ships three rainfall scenarios:

Extreme — 3.66 in/hr rainfall with 2080 sea-level rise.
Moderate — 2.13 in/hr with 2050 sea-level rise and current conditions.
Limited — 1.77 in/hr with current sea levels.

For each tract, the pipeline computes the fraction of tract area covered by each scenario's flood extent (pct_extreme, pct_moderate, pct_limited), then combines them with fixed weights and rescales across the tract set:

flood_raw   = 0.5 · pct_extreme + 0.3 · pct_moderate + 0.2 · pct_limited
flood_score = minmax(flood_raw)

The extreme scenario carries the most weight because it represents the most severe rainfall the model simulates. The 0.5 / 0.3 / 0.2 weighting and the min-max step are shared with the Fort Lauderdale pipeline; only the flood input differs.

Canopy deficit

Canopy deficit is computed from the land-cover raster, using class 1 (Tree Canopy) and class 4 (Water). Per tract, the pipeline counts canopy pixels and water pixels in zonal fashion, then computes a non-water canopy percentage and inverts it into a deficit:

canopy_pct   = canopy_pixels / (total_pixels − water_pixels)
canopy_raw   = 1 − canopy_pct
canopy_score = minmax(canopy_raw)

Higher score means greater canopy deficit and more need. Water is removed from the denominator so a tract is not penalized for open water it cannot plant.

Pavement burden

Per tract, pavement burden is the fraction of tract area covered by mapped pavement, min-max normalized across the tract set:

pavement_raw   = (tract ∩ pavement_union).area / tract.area
pavement_score = minmax(pavement_raw)

Note that the NYC pavement-burden score uses the union of all classified pavement (core and non-core together), so it measures total mapped-pavement fraction per tract rather than non-core fraction. This is recorded under Known limitations.

7. Stacked composite

The four normalized scores are combined into one composite. NYC leaves the needs weights at their default, so each dimension contributes equally:

stacked_score = 0.25·heat + 0.25·flood + 0.25·canopy + 0.25·pavement

A tract is flagged as priority if its stacked_score is at or above the 75th percentile of stacked scores (the top quartile). The pipeline also records n_high_needs, the count of the four dimensions in which a tract is itself in the top quartile, plus a priority_rank. Implemented in process/stacked_needs.py.

8. Equity overlay

The equity overlay (process/equity_overlay.py) compares the priority tracts against two real designations.

NYS DAC — the New York State Disadvantaged Communities designation under the CLCPA, supplied at census-tract level.
EJNYC — environmental-justice areas. The official EJNYC dataset is not public, so this layer is derived from the NYC 2020 Census Tracts cdeligibil field: a tract is EJ-eligible when cdeligibil starts with "E".

A tract is flagged is_dac or is_ej when it overlaps the respective union by at least 1% of its area; is_dac_or_ej is the logical OR. The headline equity figure is the priority-and-DAC intersection.

DAC saturation. In the East Harlem study area every one of the 23 tracts is DAC-designated. Within this study area the DAC overlay therefore does not differentiate tracts, and the priority-and-DAC count simply equals the priority count. Read "all priority tracts are DAC" in that light.

9. Known limitations

Heat is not a true spatial join. The HVI is ZCTA-level and supplied without geometry, and tracts have no ZCTA key. Tract heat scores are a seeded (np.random.seed(42)) pseudo-random normal spread around the study-area mean HVI, clipped to the HVI range. The values are deterministic but do not represent a real ZCTA-to-tract assignment. This is a placeholder pending a proper crosswalk.
Pavement burden uses total pavement. The pavement-need score is built from the union of all classified pavement (core plus non-core), so it reflects total mapped-pavement fraction per tract, not the non-core fraction alone.
DAC saturation. Every tract in the study area is DAC-designated (23 of 23), so the DAC overlay adds no within-area discrimination.
EJNYC is derived, not official. The published EJNYC dataset is private; the EJ flag is reconstructed from the Census Tracts cdeligibil='E' field.
Small-N normalization. Min-max scaling and quartile cuts run over 23 tracts. A single outlier can move scores noticeably. Read the results as relative rankings.
Pavement typing is area-thresholded. A polygon is typed road, sidewalk, or parking only when at least 50% of its area overlaps that planimetric layer. Narrow or fragmented overlaps fall through to other_impervious.
Mixed vintages. The land cover is from 2017 while the planimetrics and MapPLUTO are the latest releases. New construction can mis-join across that gap.
Class-based pavement cannot distinguish material. Land-cover classes 6, 7, and 8 separate roads, other impervious, and rail, but not asphalt from concrete.

10. Reproducibility

Full run from scratch

export DEPAVE_LOCATION=nyc
python3 scripts/run_pipeline.py --study-area east_harlem --stages all

Stages accept the name set acquire, pavement, needs, equity, export, the NYC numeric set 0,1,2,3,4,5,6, or all (the default). --force re-downloads raw inputs where supported.

Outputs

data/nyc/interim/east_harlem/pavement_classified.gpkg: typed pavement polygons
data/nyc/interim/east_harlem/pavement_core_noncore.gpkg: polygons with core_status and core_reason
data/nyc/processed/east_harlem/stacked_needs.gpkg: scored tracts
data/nyc/processed/east_harlem/equity_overlay.gpkg: scored tracts plus is_dac, is_ej, is_priority
data/nyc/processed/east_harlem/summary_stats.json: headline numbers
data/nyc/processed/east_harlem/*.geojson: web-ready layers exported in Stage 6

Deploy builds build/nyc/east_harlem/ with raster pavement tiles for crisp rendering.

11. Credits & citation

Depave NYC is produced by ONE Architecture & Urbanism, Inc. The methodology builds on the DepaveLA framework and the DepaveNYC pilot.

Data providers: NYC DoITT (land cover and planimetrics), NYC DEP (stormwater flood maps), NYC DCP (MapPLUTO), NYC DOHMH (Heat Vulnerability Index), NYSERDA and New York State (DAC under CLCPA), and the U.S. Census Bureau (tracts and NTAs).

Suggested citation: Depave NYC — East Harlem: A Screening Analysis of Non-Core Pavement and Environmental Need. ONE Architecture & Urbanism, 2026.