Detailed Methodology: Depave Medellín

This document describes the Medellín (El Poblado) implementation of the Depave pipeline in enough detail for a GIS analyst to reproduce or extend the analysis. Medellín is the international pilot. It runs entirely on global open datasets, with no municipal or national GIS inputs, and reuses the generic tiered pipeline that Fort Lauderdale and Bridgeport share. Paths below are relative to the repository root.

Contents

Scope & study area
How Medellín differs from the U.S. sites
Data sources
Pavement extraction (WorldCover minus buildings)
Core vs non-core classification
Stormwater proxy
Needs scoring
Stacked composite & priority
Equity overlay (not available)
Known limitations
Reproducibility
Credits & citation

Data tier 4 of 4 — Global open data (transferability demonstration) Medellín runs entirely on global open datasets: 10-meter ESA WorldCover, 30-meter Copernicus elevation, and OpenStreetMap, with no local equity layer and H3 hexagons in place of census tracts. Read it as a coarse, indicative demonstration that the method travels beyond U.S. municipal GIS, not a precise local analysis. Tiers reflect input-data availability, not effort. Scores are relative within each city, so the tiers and the numbers are not comparable across cities, and reliability decreases from Tier 1 to Tier 4.

1. Scope & study area

The analysis covers the El Poblado pilot area, a dense mixed-use neighborhood in southeast Medellín, Colombia. The study area is defined by a bounding box (config/study_areas_medellin.yaml, study area pilot), grown by a 100-foot edge buffer. All spatial operations are carried out in EPSG:3116 (MAGNA-SIRGAS / Colombia Bogotá zone, meters), which is suited to metric operations such as area, distance, and buffering across the study area. Web exports are reprojected to EPSG:4326 for MapLibre consumption.

There is no census-tract equivalent for Medellín in this pipeline. The tier resolver requires U.S. coverage for TIGER/Line tracts, which Medellín does not have, so study units fall back to an H3 hexagon grid at resolution 8 (roughly 0.74 km² per hexagon, comparable in area to a U.S. census tract). After generating and clipping the grid to the study area, 14 hexagons remain. These are uniform hexagons, not comunas or barrios. Hexagons whose clipped area falls below 50% of their original extent are dropped as edge slivers, but because the grid is generated to cover the study area, none were dropped here.

Location and study-area selection is configuration-driven (config/locations/medellin.yaml, config/study_areas_medellin.yaml); the orchestrator dispatches on the DEPAVE_LOCATION environment variable. Because the location is not nyc, the run takes the generic tiered pipeline path, the same code that drives Fort Lauderdale and Bridgeport.

2. How Medellín differs from the U.S. sites

The Depave methodology originates in DepaveLA's parcel-level impervious-cover analysis and was extended in a Jamaica, Queens pilot (DepaveNYC), then adapted for Fort Lauderdale and Bridgeport on U.S. municipal data. Medellín has no municipal or national GIS available in this pipeline, so every input resolves to a global open dataset through the tier system. This is the coarsest and most fallback-driven configuration of all the sites.

Step	U.S. sites (e.g. Fort Lauderdale)	Medellín (El Poblado)
Pavement source	NAIP 1 m 4-band imagery classified by an in-house random forest	ESA WorldCover 10 m built-up (class 50) minus Microsoft Global Buildings; no classifier (`classify` stage skipped)
Core/non-core cut	FDOT surveyed widths (state roads) + OSM class widths + OSM sidewalks	OSM only: highway class-estimated widths + OSM sidewalks + OSM parking
Flood layer	3-component DEM proxy: depression + flow accumulation + inverse SSURGO soil permeability on a lidar DTM	2-component DEM-only proxy: depression + flow accumulation on the Copernicus GLO-30 30 m DEM (no soil term)
Heat layer	Local mobile-traverse surface-temperature polygons	Landsat 9 summer land-surface-temperature raster, zonal mean per hexagon
Canopy layer	Local lidar-derived canopy polygons	ESA WorldCover tree class (class 10) raster
Study units	TIGER/Line census tracts	H3 hexagon grid, resolution 8 (14 hexagons)
Equity overlay	CEJST federal disadvantaged-community screen	None available; equity stage skipped
Processing CRS	EPSG:2236 (NAD83 / FL East, ftUS)	EPSG:3116 (MAGNA-SIRGAS / Colombia Bogotá zone, meters)

Downstream stages (needs scoring, stacked composite, web export) are shared between locations and unmodified. The classify and equity stages are both skipped for Medellín.

3. Data sources

All inputs are global open datasets resolved at acquire time through the tier system (config/tiers/default.yaml; the resolved tiers are written to interim/resolved_tiers.json). Native CRS values below are read from the on-disk rasters or the acquire modules. Vintages are taken as documented from the acquire-module docstrings, not stamped in the Medellín raster band metadata, so treat them as nominal.

Dataset	Resolution / vintage	Native CRS	Source
ESA WorldCover (land cover, built class 50)	10 m nominal; v200, 2021	EPSG:4326	Planetary Computer STAC collection `esa-worldcover`; built-up class 50
ESA WorldCover tree class (canopy)	10 m; 2021	EPSG:4326	Same raster, tree class 10 (`global_sources/worldcover_canopy.py`)
Microsoft Global Buildings	Polygon / rolling	EPSG:4326	Microsoft Global Building Footprints, quadkey-indexed
OpenStreetMap (roads, sidewalks, parks)	Current snapshot	EPSG:4326	OSM extract: highway ways, `footway=sidewalk`, land use
Copernicus GLO-30 DEM	30 m; global	EPSG:4326 (warped to EPSG:3116)	Planetary Computer STAC collection `cop-dem-glo-30`
Landsat 9 LST (summer thermal)	~30 m; Jun–Aug median composite	EPSG:4326 (warped to EPSG:3116)	Planetary Computer STAC `landsat-c2-l2`, ST_B10 band → Celsius
H3 hexgrid (study units)	Resolution 8 (~0.74 km²/hex), 14 hexagons	generated in EPSG:4326 → EPSG:3116	`h3` library polyfill (`global_sources/h3_hexgrid.py`)

The processing CRS for all metric operations is EPSG:3116 (MAGNA-SIRGAS / Colombia Bogotá zone, meters). Web export is EPSG:4326. There is no equity/disadvantaged-community dataset in this list, because none with the required coverage exists for Medellín in this pipeline.

4. Pavement extraction (WorldCover minus buildings)

There is no per-pixel classifier for Medellín. The classify stage is gated on use_naip_classification in config/locations/medellin.yaml, which is set to false, so the stage is skipped and pavement comes straight from land cover. Extraction is implemented in src/depave/process/pavement_ftl.py:

Land cover. ESA WorldCover at 10 m. Pixels equal to the built-up class (class 50) are polygonized. Built-up captures all sealed surfaces, including rooftops and roads.
Subtract buildings. Microsoft Global Buildings footprints are subtracted from the built-up polygons. Because built-up includes rooftops, removing footprints isolates the ground-level paved surface.
Clean. A buffer(0) repair, explode to single parts, simplify, and drop slivers below a minimum-area threshold.

No airport or park exclusion step runs for Medellín. That logic is specific to the U.S. sites and is not invoked by the generic extractor here.

The simplify tolerance and minimum-area threshold are named in feet and square feet in the code, but they are applied in the processing CRS, which for Medellín is meters. So the nominal "1.0 ft" simplify acts as 1 m, and the "100 sqft" sliver cut acts as roughly 100 m². This affects only sliver cleanup, not the reported acreage, which is computed from true EPSG:3116 areas.

5. Core vs non-core classification

No surveyed road widths or planimetric roadbed exist for Medellín, so the core mask is built entirely from OpenStreetMap in classify_core_noncore_ftl (pavement_ftl.py). "Core" pavement is what the city needs: travel lanes, sidewalks, and formal parking. "Non-core" is the pool of depave candidates.

Core mask construction

The core mask is the union of three OSM-derived components:

Highways. Each OSM highway way is buffered by a per-feature half-width. When the lanes tag is present, the half-width is lanes × 10; otherwise a per-class fallback table is used (for example, motorway/trunk 30, primary 25, secondary 20, tertiary 15, residential 12, pedestrian 8). Service, track, driveway, footway, path, cycleway, bridleway, steps, and corridor are excluded from the core mask. These become non-core depave candidates.
Sidewalks. OSM footway=sidewalk ways buffered by a fixed half-width (nominal 4).
Parking. Parking is not part of the core mask, so parking pavement is non-core, the same as the other cities.

Each pavement polygon is then cut by the core mask: the intersection is labeled core, the difference is labeled non-core. The result is re-exploded and slivers are dropped.

Core-mask units and parking. The buffer half-widths above are authored in feet and scaled to the processing CRS unit before they are applied, so they are correct in Medellín's metre CRS (EPSG:3116). OSM parking polygons are treated as non-core, the same as the other cities. OSM completeness in El Poblado is now the main limit on the core/non-core split.

6. Stormwater proxy

The flood-risk layer is a topographic proxy derived from the Copernicus GLO-30 DEM (30 m), implemented in src/depave/process/stormwater_proxy.py. No soil-permeability data (SSURGO) exists for Medellín, so the pipeline runs the 2-component DEM-only variant rather than the 3-component variant the U.S. sites use. Pipeline:

Clip the Copernicus DEM (30 m pixels) to the study area and mask nodata.
Gaussian pre-smoothing (σ = 1.5 px) to suppress micro-cliffs that would otherwise read as false sink edges.
Priority-flood depression fill on the smoothed DEM.
Depression depth = filled − original DEM, clipped to ≥ 0, measured against the un-smoothed original so real sinks stay visible.
D8 flow accumulation on the filled DEM, log-transformed (log1p) because the distribution is heavy-tailed.
Each field is percentile-rank-normalized to [0, 1], then combined: score = 0.6 × depression + 0.4 × flow_acc. Depression carries the most weight because surface ponding is the most direct pluvial signal.
Tier breaks at the 50th / 75th / 90th percentile of non-zero composite pixels produce limited / moderate / extreme classes. These are within-area percentiles, so a fixed share of the wettest pixels always lands in each tier. They rank relative wetness inside the pilot area; they do not express an absolute flood depth or return period.

This layer is a relative ranking of rainfall-ponding potential, built from terrain alone at 30 m. It omits rainfall depth and intensity, storm-drain capacity, pipe networks, tides, and groundwater, all of which shape real pluvial flooding. D8 flow paths degrade on coarse grids. It has not been validated against observed flooding in Medellín. Read the tiers as a screen for where to look first, not a forecast of where water will stand.

7. Needs scoring

All four needs layers score the 14 H3 hexagons. Each returns a raw value and a min-max-normalized score in [0, 1], where higher means more need. Because Medellín has no local polygon layers for heat or canopy, both fall back to raster zonal statistics. This is the expected international behavior, not a failure.

Heat

From the Landsat 9 summer land-surface-temperature raster (raw/landsat_lst_celsius.tif, ST_B10 converted to Celsius, summer median composite with cloud masking). Each hexagon's score is its zonal mean LST, min-max normalized so hotter hexagons score higher.

Stormwater flood risk

From the DEM proxy in section 6. Per-hexagon flood score is the weighted sum of tier coverage within the hexagon, then rescaled across the 14 hexagons.

Canopy deficit

From the ESA WorldCover tree class (class 10), rasterized to a binary tree mask. Per hexagon, canopy_pct is the fraction of tree pixels; the deficit score is 1 − canopy_pct, min-max normalized, so a hexagon with little canopy scores higher.

Pavement burden

Per-hexagon non-core pavement fraction = non-core pavement area within the hexagon divided by hexagon area, min-max normalized.

8. Stacked composite & priority

The composite is the equal-weight mean of the four normalized scores. Medellín's location YAML defines no custom weights, so each of the four contributes 0.25:

stacked_score = mean(heat_score, flood_score, canopy_score, pavement_score)

Priority hexagons are those at or above the 75th percentile of stacked_score (top quartile). With 14 hexagons, this flags 4 priority hexagons. A second column, n_high_needs, counts how many of the four dimensions a hexagon is simultaneously in the top quartile for. Implemented in stacked_needs.py. Across the 14 hexagons, observed stacked scores ranged from about 0.01 to 0.83.

9. Equity overlay (not available)

There is no equity overlay for Medellín. The U.S. sites overlay a federal disadvantaged-community screen (CEJST), gated on a U.S.-coverage dataset. No comparable disadvantaged-community or environmental-justice dataset is available for Medellín in this pipeline, so the equity stage is skipped. The scored hexagons carry no is_dac or is_ej field, and no "priority meets disadvantage" finding is reported. The priority hexagons reflect environmental need alone.

One consequence: the per-site summary_stats.json file is written only inside the equity stage, so it is not emitted for Medellín. The headline figures on these pages are computed directly from the processed GeoJSON outputs in EPSG:3116 rather than read from a stats file.

10. Known limitations

Global-data coarseness. Pavement comes from ESA WorldCover 10 m built-up minus building footprints, far coarser than the 1 m classifier at the U.S. sites. There is no per-pixel classifier. Alleys, narrow sidewalks, and parking aisles below roughly 10 m are under-resolved. Treat Medellín as a methodology-transfer demonstration with higher uncertainty than the U.S. sites.
No municipal GIS. No surveyed road widths, planimetric roadbed, parcel land-use roll, or city park layer. Core/non-core relies wholly on OSM class-based width estimates, so its accuracy depends on OSM completeness in El Poblado.
No soil data. Without SSURGO, the stormwater proxy is the 2-component DEM-only variant at 30 m. It is coarser than the U.S. version and omits the soil-permeability term.
Topographic proxy, not a hydraulic model. The stormwater layer uses terrain alone, with no rainfall, drainage pipes, tides, or groundwater, and it is unvalidated against observed flooding.
No equity layer. No disadvantaged-community or environmental-justice dataset is available for Medellín in this pipeline. There is no equity overlay and no "priority meets disadvantage" finding.
Coarse, small-N study units. Need is aggregated to 14 uniform H3 hexagons rather than census units. The 75th-percentile priority cut yields only 4 hexagons, a coarse screen at neighborhood scale.
Pre-screening only. As with all sites, parcel selection needs ground-truthing, utility checks, ownership review, and community input.

11. Reproducibility

Full run from scratch

export DEPAVE_LOCATION=medellin
python3 scripts/run_pipeline.py \
    --study-area pilot \
    --stages all

Stage names: study_area, acquire, classify, pavement, needs, equity, export. The classify stage is skipped (use_naip_classification: false) and the equity stage is skipped (no equity dataset). The default study area for non-NYC locations is pilot, so --study-area can be omitted. Any comma-separated subset of stages is accepted.

# Re-run only the analysis stages from cached raw inputs:
DEPAVE_LOCATION=medellin python3 scripts/run_pipeline.py \
    --study-area pilot --stages pavement,needs,export

The web build (python3 scripts/build_deploy.py) reprojects the processed GeoJSON via EPSG:3116 and builds raster tiles for crisp rendering.

12. Credits & citation

Depave Medellín is produced by ONE Architecture & Urbanism, Inc. It is the international pilot of the Depave framework, adapting the DepaveLA/DepaveNYC methodology to a city served entirely by global open data rather than municipal GIS.

Data providers: ESA WorldCover, Microsoft Global Buildings, OpenStreetMap contributors, Copernicus (GLO-30 DEM), USGS/NASA Landsat, and Microsoft Planetary Computer (data access). OpenStreetMap data are licensed under ODbL.

Suggested citation: Depave Medellín (El Poblado): An Open-Data Screening Pilot for Non-Core Pavement and Environmental Need. ONE Architecture & Urbanism, 2026.