Detailed Methodology: Depave Medellín

This document describes the Medellín (El Poblado) implementation of the Depave pipeline in enough detail for a GIS analyst to reproduce or extend the analysis. Medellín is the international pilot. It runs entirely on global open datasets, with no municipal or national GIS inputs, and reuses the generic tiered pipeline that Fort Lauderdale and Bridgeport share. Paths below are relative to the repository root.

Contents
  1. Scope & study area
  2. How Medellín differs from the U.S. sites
  3. Data sources
  4. Pavement extraction (WorldCover minus buildings)
  5. Core vs non-core classification
  6. Stormwater proxy
  7. Needs scoring
  8. Stacked composite & priority
  9. Equity overlay (not available)
  10. Known limitations
  11. Reproducibility
  12. Credits & citation
Data tier 4 of 4 — Global open data (transferability demonstration) Medellín runs entirely on global open datasets: 10-meter ESA WorldCover, 30-meter Copernicus elevation, and OpenStreetMap, with no local equity layer and H3 hexagons in place of census tracts. Read it as a coarse, indicative demonstration that the method travels beyond U.S. municipal GIS, not a precise local analysis. Tiers reflect input-data availability, not effort. Scores are relative within each city, so the tiers and the numbers are not comparable across cities, and reliability decreases from Tier 1 to Tier 4.

1. Scope & study area

The analysis covers the El Poblado pilot area, a dense mixed-use neighborhood in southeast Medellín, Colombia. The study area is defined by a bounding box (config/study_areas_medellin.yaml, study area pilot), grown by a 100-foot edge buffer. All spatial operations are carried out in EPSG:3116 (MAGNA-SIRGAS / Colombia Bogotá zone, meters), which is suited to metric operations such as area, distance, and buffering across the study area. Web exports are reprojected to EPSG:4326 for MapLibre consumption.

There is no census-tract equivalent for Medellín in this pipeline. The tier resolver requires U.S. coverage for TIGER/Line tracts, which Medellín does not have, so study units fall back to an H3 hexagon grid at resolution 8 (roughly 0.74 km² per hexagon, comparable in area to a U.S. census tract). After generating and clipping the grid to the study area, 14 hexagons remain. These are uniform hexagons, not comunas or barrios. Hexagons whose clipped area falls below 50% of their original extent are dropped as edge slivers, but because the grid is generated to cover the study area, none were dropped here.

Location and study-area selection is configuration-driven (config/locations/medellin.yaml, config/study_areas_medellin.yaml); the orchestrator dispatches on the DEPAVE_LOCATION environment variable. Because the location is not nyc, the run takes the generic tiered pipeline path, the same code that drives Fort Lauderdale and Bridgeport.

2. How Medellín differs from the U.S. sites

The Depave methodology originates in DepaveLA's parcel-level impervious-cover analysis and was extended in a Jamaica, Queens pilot (DepaveNYC), then adapted for Fort Lauderdale and Bridgeport on U.S. municipal data. Medellín has no municipal or national GIS available in this pipeline, so every input resolves to a global open dataset through the tier system. This is the coarsest and most fallback-driven configuration of all the sites.

StepU.S. sites (e.g. Fort Lauderdale)Medellín (El Poblado)
Pavement sourceNAIP 1 m 4-band imagery classified by an in-house random forestESA WorldCover 10 m built-up (class 50) minus Microsoft Global Buildings; no classifier (classify stage skipped)
Core/non-core cutFDOT surveyed widths (state roads) + OSM class widths + OSM sidewalksOSM only: highway class-estimated widths + OSM sidewalks + OSM parking
Flood layer3-component DEM proxy: depression + flow accumulation + inverse SSURGO soil permeability on a lidar DTM2-component DEM-only proxy: depression + flow accumulation on the Copernicus GLO-30 30 m DEM (no soil term)
Heat layerLocal mobile-traverse surface-temperature polygonsLandsat 9 summer land-surface-temperature raster, zonal mean per hexagon
Canopy layerLocal lidar-derived canopy polygonsESA WorldCover tree class (class 10) raster
Study unitsTIGER/Line census tractsH3 hexagon grid, resolution 8 (14 hexagons)
Equity overlayCEJST federal disadvantaged-community screenNone available; equity stage skipped
Processing CRSEPSG:2236 (NAD83 / FL East, ftUS)EPSG:3116 (MAGNA-SIRGAS / Colombia Bogotá zone, meters)

Downstream stages (needs scoring, stacked composite, web export) are shared between locations and unmodified. The classify and equity stages are both skipped for Medellín.

3. Data sources

All inputs are global open datasets resolved at acquire time through the tier system (config/tiers/default.yaml; the resolved tiers are written to interim/resolved_tiers.json). Native CRS values below are read from the on-disk rasters or the acquire modules. Vintages are taken as documented from the acquire-module docstrings, not stamped in the Medellín raster band metadata, so treat them as nominal.

DatasetResolution / vintageNative CRSSource
ESA WorldCover (land cover, built class 50)10 m nominal; v200, 2021EPSG:4326Planetary Computer STAC collection esa-worldcover; built-up class 50
ESA WorldCover tree class (canopy)10 m; 2021EPSG:4326Same raster, tree class 10 (global_sources/worldcover_canopy.py)
Microsoft Global BuildingsPolygon / rollingEPSG:4326Microsoft Global Building Footprints, quadkey-indexed
OpenStreetMap (roads, sidewalks, parks)Current snapshotEPSG:4326OSM extract: highway ways, footway=sidewalk, land use
Copernicus GLO-30 DEM30 m; globalEPSG:4326 (warped to EPSG:3116)Planetary Computer STAC collection cop-dem-glo-30
Landsat 9 LST (summer thermal)~30 m; Jun–Aug median compositeEPSG:4326 (warped to EPSG:3116)Planetary Computer STAC landsat-c2-l2, ST_B10 band → Celsius
H3 hexgrid (study units)Resolution 8 (~0.74 km²/hex), 14 hexagonsgenerated in EPSG:4326 → EPSG:3116h3 library polyfill (global_sources/h3_hexgrid.py)

The processing CRS for all metric operations is EPSG:3116 (MAGNA-SIRGAS / Colombia Bogotá zone, meters). Web export is EPSG:4326. There is no equity/disadvantaged-community dataset in this list, because none with the required coverage exists for Medellín in this pipeline.

4. Pavement extraction (WorldCover minus buildings)

There is no per-pixel classifier for Medellín. The classify stage is gated on use_naip_classification in config/locations/medellin.yaml, which is set to false, so the stage is skipped and pavement comes straight from land cover. Extraction is implemented in src/depave/process/pavement_ftl.py:

  1. Land cover. ESA WorldCover at 10 m. Pixels equal to the built-up class (class 50) are polygonized. Built-up captures all sealed surfaces, including rooftops and roads.
  2. Subtract buildings. Microsoft Global Buildings footprints are subtracted from the built-up polygons. Because built-up includes rooftops, removing footprints isolates the ground-level paved surface.
  3. Clean. A buffer(0) repair, explode to single parts, simplify, and drop slivers below a minimum-area threshold.

No airport or park exclusion step runs for Medellín. That logic is specific to the U.S. sites and is not invoked by the generic extractor here.

The simplify tolerance and minimum-area threshold are named in feet and square feet in the code, but they are applied in the processing CRS, which for Medellín is meters. So the nominal "1.0 ft" simplify acts as 1 m, and the "100 sqft" sliver cut acts as roughly 100 m². This affects only sliver cleanup, not the reported acreage, which is computed from true EPSG:3116 areas.

5. Core vs non-core classification

No surveyed road widths or planimetric roadbed exist for Medellín, so the core mask is built entirely from OpenStreetMap in classify_core_noncore_ftl (pavement_ftl.py). "Core" pavement is what the city needs: travel lanes, sidewalks, and formal parking. "Non-core" is the pool of depave candidates.

Core mask construction

The core mask is the union of three OSM-derived components:

Each pavement polygon is then cut by the core mask: the intersection is labeled core, the difference is labeled non-core. The result is re-exploded and slivers are dropped.

Core-mask units and parking. The buffer half-widths above are authored in feet and scaled to the processing CRS unit before they are applied, so they are correct in Medellín's metre CRS (EPSG:3116). OSM parking polygons are treated as non-core, the same as the other cities. OSM completeness in El Poblado is now the main limit on the core/non-core split.

6. Stormwater proxy

The flood-risk layer is a topographic proxy derived from the Copernicus GLO-30 DEM (30 m), implemented in src/depave/process/stormwater_proxy.py. No soil-permeability data (SSURGO) exists for Medellín, so the pipeline runs the 2-component DEM-only variant rather than the 3-component variant the U.S. sites use. Pipeline:

  1. Clip the Copernicus DEM (30 m pixels) to the study area and mask nodata.
  2. Gaussian pre-smoothing (σ = 1.5 px) to suppress micro-cliffs that would otherwise read as false sink edges.
  3. Priority-flood depression fill on the smoothed DEM.
  4. Depression depth = filled − original DEM, clipped to ≥ 0, measured against the un-smoothed original so real sinks stay visible.
  5. D8 flow accumulation on the filled DEM, log-transformed (log1p) because the distribution is heavy-tailed.
  6. Each field is percentile-rank-normalized to [0, 1], then combined: score = 0.6 × depression + 0.4 × flow_acc. Depression carries the most weight because surface ponding is the most direct pluvial signal.
  7. Tier breaks at the 50th / 75th / 90th percentile of non-zero composite pixels produce limited / moderate / extreme classes. These are within-area percentiles, so a fixed share of the wettest pixels always lands in each tier. They rank relative wetness inside the pilot area; they do not express an absolute flood depth or return period.

This layer is a relative ranking of rainfall-ponding potential, built from terrain alone at 30 m. It omits rainfall depth and intensity, storm-drain capacity, pipe networks, tides, and groundwater, all of which shape real pluvial flooding. D8 flow paths degrade on coarse grids. It has not been validated against observed flooding in Medellín. Read the tiers as a screen for where to look first, not a forecast of where water will stand.

7. Needs scoring

All four needs layers score the 14 H3 hexagons. Each returns a raw value and a min-max-normalized score in [0, 1], where higher means more need. Because Medellín has no local polygon layers for heat or canopy, both fall back to raster zonal statistics. This is the expected international behavior, not a failure.

Heat

From the Landsat 9 summer land-surface-temperature raster (raw/landsat_lst_celsius.tif, ST_B10 converted to Celsius, summer median composite with cloud masking). Each hexagon's score is its zonal mean LST, min-max normalized so hotter hexagons score higher.

Stormwater flood risk

From the DEM proxy in section 6. Per-hexagon flood score is the weighted sum of tier coverage within the hexagon, then rescaled across the 14 hexagons.

Canopy deficit

From the ESA WorldCover tree class (class 10), rasterized to a binary tree mask. Per hexagon, canopy_pct is the fraction of tree pixels; the deficit score is 1 − canopy_pct, min-max normalized, so a hexagon with little canopy scores higher.

Pavement burden

Per-hexagon non-core pavement fraction = non-core pavement area within the hexagon divided by hexagon area, min-max normalized.

8. Stacked composite & priority

The composite is the equal-weight mean of the four normalized scores. Medellín's location YAML defines no custom weights, so each of the four contributes 0.25:

stacked_score = mean(heat_score, flood_score, canopy_score, pavement_score)

Priority hexagons are those at or above the 75th percentile of stacked_score (top quartile). With 14 hexagons, this flags 4 priority hexagons. A second column, n_high_needs, counts how many of the four dimensions a hexagon is simultaneously in the top quartile for. Implemented in stacked_needs.py. Across the 14 hexagons, observed stacked scores ranged from about 0.01 to 0.83.

9. Equity overlay (not available)

There is no equity overlay for Medellín. The U.S. sites overlay a federal disadvantaged-community screen (CEJST), gated on a U.S.-coverage dataset. No comparable disadvantaged-community or environmental-justice dataset is available for Medellín in this pipeline, so the equity stage is skipped. The scored hexagons carry no is_dac or is_ej field, and no "priority meets disadvantage" finding is reported. The priority hexagons reflect environmental need alone.

One consequence: the per-site summary_stats.json file is written only inside the equity stage, so it is not emitted for Medellín. The headline figures on these pages are computed directly from the processed GeoJSON outputs in EPSG:3116 rather than read from a stats file.

10. Known limitations

11. Reproducibility

Full run from scratch

export DEPAVE_LOCATION=medellin
python3 scripts/run_pipeline.py \
    --study-area pilot \
    --stages all

Stage names: study_area, acquire, classify, pavement, needs, equity, export. The classify stage is skipped (use_naip_classification: false) and the equity stage is skipped (no equity dataset). The default study area for non-NYC locations is pilot, so --study-area can be omitted. Any comma-separated subset of stages is accepted.

# Re-run only the analysis stages from cached raw inputs:
DEPAVE_LOCATION=medellin python3 scripts/run_pipeline.py \
    --study-area pilot --stages pavement,needs,export

The web build (python3 scripts/build_deploy.py) reprojects the processed GeoJSON via EPSG:3116 and builds raster tiles for crisp rendering.

12. Credits & citation

Depave Medellín is produced by ONE Architecture & Urbanism, Inc. It is the international pilot of the Depave framework, adapting the DepaveLA/DepaveNYC methodology to a city served entirely by global open data rather than municipal GIS.

Data providers: ESA WorldCover, Microsoft Global Buildings, OpenStreetMap contributors, Copernicus (GLO-30 DEM), USGS/NASA Landsat, and Microsoft Planetary Computer (data access). OpenStreetMap data are licensed under ODbL.

Suggested citation: Depave Medellín (El Poblado): An Open-Data Screening Pilot for Non-Core Pavement and Environmental Need. ONE Architecture & Urbanism, 2026.