Building a GeoTIFF Object Detection Web App
Running object detection on drone or satellite imagery means working with files that can exceed 10,000 pixels per side—too large to load into GPU memory at once. Standard workflows require command-line scripting, manual coordinate system conversions, and offer no feedback during inference that can take minutes per image. This project wraps a trained Faster R-CNN model in a browser interface that handles tiled inference, automatic CRS reprojection, real-time progress updates, and interactive false positive removal—making the full pipeline accessible without touching a terminal.
Tiled Inference and Progress Streaming
The core challenge is processing images that don't fit in memory. The solution is tile-based inference: slice the image into 1024px patches with 128px overlap, run detection on each tile, then merge results. The overlap prevents objects at tile boundaries from being clipped. Each detection's bounding box gets transformed from pixel coordinates back to the original CRS using the GeoTIFF's affine transform, preserving georeferencing throughout.
For user experience, HTTP request-response doesn't work—inference on a large image can
process hundreds of tiles. Instead, the client opens a WebSocket connection and the server
streams progress as each tile completes. The implementation detail that matters: PyTorch
inference blocks Python's event loop, so inference runs in a thread pool executor while
asyncio.run_coroutine_threadsafe pushes progress updates back to the WebSocket
from the worker thread. This keeps the server responsive during long-running jobs.
Coordinate Systems and Post-Processing
GeoTIFFs arrive in arbitrary projections—UTM zones, state plane, Web Mercator. Leaflet
expects WGS84. Rather than require users to reproject before upload, the app generates a
high-resolution overlay reprojected to EPSG:4326 using rasterio's reproject with
Lanczos resampling. Detection coordinates convert to WGS84 for display but the exported GeoJSON
preserves the original CRS for downstream GIS workflows.
Two post-processing steps clean up results. First, automatic geometry merging: tile-based inference often produces duplicate detections for objects spanning boundaries, so a buffer-union-unbuffer operation consolidates overlapping boxes. Second, interactive editing: users click detection polygons to select them (they turn red), then delete to remove false positives. Deletions only affect the export—the original detections remain visible for comparison.
Limitations
The model loads into memory at startup and stays resident. For single-user local deployment that's fine; for shared infrastructure, lazy loading or a separate inference service would scale better. Tile size and overlap are hardcoded at 1024px and 128px—making these configurable would let users optimize for their imagery characteristics and object sizes. The frontend handles one file at a time; batch upload support would streamline processing multiple GeoTIFFs without repeated uploads.
Takeaways
For geospatial machine learning, the infrastructure around inference—handling large files, preserving coordinate systems, providing progress feedback, enabling result correction—often requires more code than the model wrapper itself.
More broadly: deploying ML is less about the model and more about the surrounding experience. A trained network is inert until it's embedded in a workflow where users can get data in, understand what's happening, and act on results. That plumbing is where most of the work lives.
The code is available on GitHub.