Detailed Methodology: Depave NYC — Jamaica, Queens
This document describes the New York City implementation of the Depave pipeline in enough detail for a GIS analyst to reproduce or extend the analysis. The study area here is Jamaica, Queens. The pipeline itself is shared with the East Harlem analysis; only the selected neighborhood and the headline numbers differ. Source code and configuration live in a single repository; paths below are relative to the repository root.
1. Scope & study area
The NYC analysis runs neighborhood by neighborhood. All spatial operations are carried out in EPSG:2263 (NAD83 / New York Long Island, US survey feet), the standard projected system for metric work across the five boroughs. Web exports are reprojected to EPSG:4326 for MapLibre consumption.
A study area is selected by Neighborhood Tabulation Area (NTA) codes from the 2020 vintage. The chosen NTAs are dissolved into a single polygon and buffered outward by 100 ft so that pavement straddling a neighborhood edge is captured. Jamaica is the union of four 2020 NTAs:
QN1201JamaicaQN1202South JamaicaQN0804Jamaica Estates-HolliswoodQN0805Jamaica Hills-Briarwood
Jamaica was the pilot study area for the DepaveNYC analysis and remains the default NYC location in the orchestrator.
For tract-level summaries, the 2020 NYC census tracts are loaded, reprojected to EPSG:2263, and clipped to the buffered study-area polygon. Any tract that retains less than 50% of its original area after clipping is dropped as an edge sliver. After this filter, 47 census tracts remain inside the Jamaica study area.
Study-area and location selection is configuration-driven (config/locations/nyc.yaml, config/study_areas_nyc.yaml). The orchestrator dispatches on the DEPAVE_LOCATION environment variable, and a --study-area flag routes each neighborhood's outputs to its own subdirectory under data/nyc/interim/ and data/nyc/processed/, which is how Jamaica and East Harlem coexist in one repository.
2. Lineage: DepaveLA and DepaveNYC
The Depave methodology originates in DepaveLA's parcel-level impervious-cover analysis and was extended in the DepaveNYC pilot, of which Jamaica was the first study area. The NYC pipeline is the data-rich original: it consumes NYC-grade municipal GIS products rather than the machine-learning substitutes the Fort Lauderdale port relies on. The most consequential differences are summarized below.
| Step | NYC (this pipeline) | Fort Lauderdale port |
|---|---|---|
| Pavement source | NYC Land Cover 2017 (6-inch, 8-class, expert-labeled raster) | NAIP 1 m 4-band imagery classified in-house by a random forest |
| Core/non-core cut | DoITT planimetric roadbed, sidewalk, and parking polygons, joined to MapPLUTO land use | FDOT measured widths plus OSM class-estimated widths |
| Flood layer | NYC DEP Stormwater Flood Maps (a published 2-D stormwater flood model) | DEM-derived topographic proxy |
| Heat layer | NYC DOHMH Heat Vulnerability Index (ZCTA) | NOAA/CAPA afternoon surface-temperature polygons |
| Canopy layer | Land Cover 2017 tree-canopy class (raster) | University of Miami lidar-derived canopy polygons |
| Equity overlay | NYS DAC (state) plus EJNYC (derived from Census) | CEJST federal Justice40 designation |
| Processing CRS | EPSG:2263 (NAD83 / NY Long Island, ftUS) | EPSG:2236 (NAD83 / FL East, ftUS) |
Downstream stages (needs scoring, stacked composite, equity overlay, web export) are shared between the two locations and run unmodified.
3. Data sources
All endpoints resolve at acquire time and are taken verbatim from config/data_sources_nyc.yaml (URLs verified against Socrata API metadata, March 2026). Every dataset is hosted on NYC Open Data except NYS DAC, which comes from data.ny.gov.
| Dataset | Resolution / vintage | Native CRS | Source |
|---|---|---|---|
| NYC Land Cover 2017 | 6-inch (0.5-ft), 8-class raster, 2017 | EPSG:2263 | NYC Open Data he6d-2qns (DoITT LiDAR land cover, ERDAS Imagine .img) |
| DoITT Planimetric Roadbed | Road-surface polygons, ~105k records | EPSG:4326 | NYC Open Data i36f-5ih7 |
| DoITT Planimetric Sidewalk | Sidewalk polygons, ~51k records | EPSG:4326 | NYC Open Data 52n9-sdep |
| DoITT Planimetric Parking Lots | Parking-lot polygons | EPSG:4326 | NYC Open Data 7cgt-uhhz |
| MapPLUTO | Tax-lot polygons, latest release | EPSG:2263 | NYC DCP MapPLUTO |
| NYC DEP Stormwater Flood Maps | 3 rain scenarios, polygon (FileGDB) | EPSG:2263 | NYC Open Data 9i7c-xyvv (NYC DEP) |
| DOHMH Heat Vulnerability Index | ZCTA-level, 1–5 scale (CSV) | tabular; ZCTA 2020 | NYC Open Data 4mhf-duep (NYC DOHMH) |
| NYS DAC (CLCPA) | Tract-level designation | EPSG:4326 | data.ny.gov 2e6c-s6fp |
| EJNYC (EJ areas, derived) | Tract-level, derived from cdeligibil='E' | EPSG:4326 | Derived from NYC Census Tracts 63ge-mke6 |
| 2020 Census Tracts | Tract polygons, 2020 | EPSG:4326 | NYC Open Data 63ge-mke6 |
| NTAs 2020 | Neighborhood Tabulation Areas, 2020 | EPSG:4326 | NYC Open Data 9nt8-h7nd |
4. Pavement extraction (NYC Land Cover)
Pavement comes straight from the city's expert-labeled land-cover product, so there is no machine-learning classifier in the NYC pipeline. The input is the NYC Land Cover 2017 raster, an 8-class layer at 6-inch (0.5-ft) resolution derived from LiDAR and high-resolution imagery. The eight classes are:
1 Tree Canopy 2 Grass / Shrub 3 Bare Earth 4 Water
5 Buildings 6 Roads 7 Other Impervious 8 Railroads
Pavement is defined as classes 6, 7, and 8 (roads, other impervious, railroads). The extraction stage (process/pavement.py) proceeds as follows:
- Window read & mask. The raster is read in 5,000-pixel tiles to cap peak memory, and each tile is masked to the three pavement classes.
- Vectorize. Masked pixels are polygonized with
rasterio.features.shapes, preserving the source land-cover class value on each polygon. - Simplify & clean. Geometry is simplified with a 1.0 ft tolerance and polygons smaller than 1.0 sqft are dropped as slivers.
- Clip. The result is clipped to the buffered study-area polygon.
- Type by planimetric overlay. Each pavement polygon is assigned a pavement type by spatial overlay with the DoITT planimetric layers (see §5).
- Join to MapPLUTO. Each polygon is joined to the underlying tax lot by a representative-point spatial join, pulling
bbl,landuse,bldgclass, andlotareafrom MapPLUTO.
For Jamaica this extraction produced roughly 34,256 individual pavement polygons across the five pavement types.
5. Core vs non-core classification
"Core" pavement is the pavement a city needs for movement: travel lanes, public sidewalks, and rail. "Non-core" is the pool of depave candidates. The classification is implemented in process/core_noncore.py; the default status is non-core, and core is asserted only where a rule fires.
Pavement typing
Before the core split, each vectorized pavement polygon is typed by overlay with the DoITT planimetric layers. A polygon is assigned a type only if at least 50% of its area overlaps the relevant planimetric polygon. Typing runs in this precedence, with later rules overwriting earlier ones:
- railroad (from land-cover class 8)
- parking_lot (planimetric parking overlay)
- sidewalk (planimetric sidewalk overlay)
- road (planimetric roadbed overlay)
- remainder → other_impervious
Core rules
- Roads → core.
- Railroads → core.
- Right-of-way sidewalks → core: a sidewalk whose joined
bblis null (not on a private lot) or whose PLUTO land use is07(transportation / utility). - Transportation / utility parcels (PLUTO land use
07) → core.
Non-core rules
- Parking lots from the planimetric parking layer. Parking is a depave candidate, which matches the Fort Lauderdale treatment and differs from the original DepaveLA, where parking was core.
- Vacant-lot pavement where the PLUTO building class starts with
V. - Parking-coded parcels where the PLUTO building class is
G6orG7. - Commercial / industrial excess where the PLUTO land use is
05or06. - Interior sidewalks: a sidewalk sitting on a private parcel and not already cored.
- Everything else falls through to other_impervious, which is non-core.
6. Needs scoring
All four needs layers score the clipped census-tract set (47 tracts in Jamaica). Each layer produces a *_raw column and a min-max-normalized *_score column in [0, 1], where higher means more need. Normalization is independent per layer.
Heat — Heat Vulnerability Index
The heat input is the NYC DOHMH Heat Vulnerability Index (HVI), a 1–5 composite index published at ZCTA (2020 ZIP Code Tabulation Area) resolution.
Important caveat on the heat score. The HVI is delivered as a ZCTA-level table without any ZCTA geometry, and census tracts carry no ZCTA key. The current pipeline therefore does not perform a true ZCTA-to-tract spatial join. Instead it assigns each tract a heat value drawn from a seeded pseudo-random normal distribution (np.random.seed(42)) centered on the study-area mean HVI and clipped to the observed HVI range. The result is deterministic but does not reflect real tract-level heat. Treat the NYC heat score as a placeholder pending a proper ZCTA→tract crosswalk. This is the single biggest limitation of the NYC method and is listed again under Known limitations.
Stormwater flood risk — NYC DEP Stormwater Flood Maps
Where the Fort Lauderdale port has to derive flood risk from terrain, NYC uses a published flood map. The NYC DEP Stormwater Flood Maps are a 2-D stormwater flood-model product that accounts for drainage and pipe capacity. The dataset ships three rainfall scenarios:
- Extreme — 3.66 in/hr rainfall with 2080 sea-level rise.
- Moderate — 2.13 in/hr with 2050 sea-level rise and current conditions.
- Limited — 1.77 in/hr with current sea levels.
For each tract, the pipeline computes the fraction of tract area covered by each scenario's flood extent (pct_extreme, pct_moderate, pct_limited), then combines them with fixed weights and rescales across the tract set:
flood_raw = 0.5 · pct_extreme + 0.3 · pct_moderate + 0.2 · pct_limited
flood_score = minmax(flood_raw)
The extreme scenario carries the most weight because it represents the most severe rainfall the model simulates. The 0.5 / 0.3 / 0.2 weighting and the min-max step are shared with the Fort Lauderdale pipeline; only the flood input differs.
Canopy deficit
Canopy deficit is computed from the land-cover raster, using class 1 (Tree Canopy) and class 4 (Water). Per tract, the pipeline counts canopy pixels and water pixels in zonal fashion, then computes a non-water canopy percentage and inverts it into a deficit:
canopy_pct = canopy_pixels / (total_pixels − water_pixels)
canopy_raw = 1 − canopy_pct
canopy_score = minmax(canopy_raw)
Higher score means greater canopy deficit and more need. Water is removed from the denominator so a tract is not penalized for open water it cannot plant.
Pavement burden
Per tract, pavement burden is the fraction of tract area covered by mapped pavement, min-max normalized across the tract set:
pavement_raw = (tract ∩ pavement_union).area / tract.area
pavement_score = minmax(pavement_raw)
Note that the NYC pavement-burden score uses the union of all classified pavement (core and non-core together), so it measures total mapped-pavement fraction per tract rather than non-core fraction. This is recorded under Known limitations.
7. Stacked composite
The four normalized scores are combined into one composite. NYC leaves the needs weights at their default, so each dimension contributes equally:
stacked_score = 0.25·heat + 0.25·flood + 0.25·canopy + 0.25·pavement
A tract is flagged as priority if its stacked_score is at or above the 75th percentile of stacked scores (the top quartile). The pipeline also records n_high_needs, the count of the four dimensions in which a tract is itself in the top quartile, plus a priority_rank. Implemented in process/stacked_needs.py.
8. Equity overlay
The equity overlay (process/equity_overlay.py) compares the priority tracts against two real designations.
- NYS DAC — the New York State Disadvantaged Communities designation under the CLCPA, supplied at census-tract level.
- EJNYC — environmental-justice areas. The official EJNYC dataset is not public, so this layer is derived from the NYC 2020 Census Tracts
cdeligibilfield: a tract is EJ-eligible whencdeligibilstarts with"E".
A tract is flagged is_dac or is_ej when it overlaps the respective union by at least 1% of its area; is_dac_or_ej is the logical OR. The headline equity figure is the priority-and-DAC intersection.
DAC saturation. In the Jamaica study area every one of the 47 tracts is DAC-designated. Within this study area the DAC overlay therefore does not differentiate tracts, and the priority-and-DAC count simply equals the priority count. Read "all priority tracts are DAC" in that light.
9. Known limitations
- Heat is not a true spatial join. The HVI is ZCTA-level and supplied without geometry, and tracts have no ZCTA key. Tract heat scores are a seeded (
np.random.seed(42)) pseudo-random normal spread around the study-area mean HVI, clipped to the HVI range. The values are deterministic but do not represent a real ZCTA-to-tract assignment. This is a placeholder pending a proper crosswalk. - Pavement burden uses total pavement. The pavement-need score is built from the union of all classified pavement (core plus non-core), so it reflects total mapped-pavement fraction per tract, not the non-core fraction alone.
- DAC saturation. Every tract in the study area is DAC-designated (47 of 47), so the DAC overlay adds no within-area discrimination.
- EJNYC is derived, not official. The published EJNYC dataset is private; the EJ flag is reconstructed from the Census Tracts
cdeligibil='E'field. - Small-N normalization. Min-max scaling and quartile cuts run over 47 tracts. A single outlier can move scores noticeably. Read the results as relative rankings.
- Pavement typing is area-thresholded. A polygon is typed road, sidewalk, or parking only when at least 50% of its area overlaps that planimetric layer. Narrow or fragmented overlaps fall through to other_impervious.
- Mixed vintages. The land cover is from 2017 while the planimetrics and MapPLUTO are the latest releases. New construction can mis-join across that gap.
- Class-based pavement cannot distinguish material. Land-cover classes 6, 7, and 8 separate roads, other impervious, and rail, but not asphalt from concrete.
10. Reproducibility
Full run from scratch
export DEPAVE_LOCATION=nyc
python3 scripts/run_pipeline.py --study-area jamaica --stages all
Stages accept the name set acquire, pavement, needs, equity, export, the NYC numeric set 0,1,2,3,4,5,6, or all (the default). --force re-downloads raw inputs where supported.
Outputs
data/nyc/interim/jamaica/pavement_classified.gpkg: typed pavement polygonsdata/nyc/interim/jamaica/pavement_core_noncore.gpkg: polygons withcore_statusandcore_reasondata/nyc/processed/jamaica/stacked_needs.gpkg: scored tractsdata/nyc/processed/jamaica/equity_overlay.gpkg: scored tracts plusis_dac,is_ej,is_prioritydata/nyc/processed/jamaica/summary_stats.json: headline numbersdata/nyc/processed/jamaica/*.geojson: web-ready layers exported in Stage 6
Deploy builds build/nyc/jamaica/ with raster pavement tiles for crisp rendering.
11. Credits & citation
Depave NYC is produced by ONE Architecture & Urbanism, Inc. The methodology builds on the DepaveLA framework and the DepaveNYC pilot, of which Jamaica was the first study area.
Data providers: NYC DoITT (land cover and planimetrics), NYC DEP (stormwater flood maps), NYC DCP (MapPLUTO), NYC DOHMH (Heat Vulnerability Index), NYSERDA and New York State (DAC under CLCPA), and the U.S. Census Bureau (tracts and NTAs).
Suggested citation: Depave NYC — Jamaica, Queens: A Screening Analysis of Non-Core Pavement and Environmental Need. ONE Architecture & Urbanism, 2026.