Building a GeoTIFF Object Detection Web App

December 28, 2025 • 5 min read

Running object detection on drone or satellite imagery means working with files that can exceed 10,000 pixels per side—too large to load into GPU memory at once. Standard workflows require command-line scripting, manual coordinate system conversions, and offer no feedback during inference that can take minutes per image. This project wraps a trained Faster R-CNN model in a browser interface that handles tiled inference, automatic CRS reprojection, real-time progress updates, and interactive false positive removal—making the full pipeline accessible without touching a terminal.

Tiled Inference and Progress Streaming

The core challenge is processing images that don't fit in memory. The solution is tile-based inference: slice the image into 1024px patches with 128px overlap, run detection on each tile, then merge results. The overlap prevents objects at tile boundaries from being clipped. Each detection's bounding box gets transformed from pixel coordinates back to the original CRS using the GeoTIFF's affine transform, preserving georeferencing throughout.

For user experience, HTTP request-response doesn't work—inference on a large image can process hundreds of tiles. Instead, the client opens a WebSocket connection and the server streams progress as each tile completes. The implementation detail that matters: PyTorch inference blocks Python's event loop, so inference runs in a thread pool executor while asyncio.run_coroutine_threadsafe pushes progress updates back to the WebSocket from the worker thread. This keeps the server responsive during long-running jobs.

Coordinate Systems and Post-Processing

GeoTIFFs arrive in arbitrary projections—UTM zones, state plane, Web Mercator. Leaflet expects WGS84. Rather than require users to reproject before upload, the app generates a high-resolution overlay reprojected to EPSG:4326 using rasterio's reproject with Lanczos resampling. Detection coordinates convert to WGS84 for display but the exported GeoJSON preserves the original CRS for downstream GIS workflows.

Two post-processing steps clean up results. First, automatic geometry merging: tile-based inference often produces duplicate detections for objects spanning boundaries, so a buffer-union-unbuffer operation consolidates overlapping boxes. Second, interactive editing: users click detection polygons to select them (they turn red), then delete to remove false positives. Deletions only affect the export—the original detections remain visible for comparison.

Limitations

The model loads into memory at startup and stays resident. For single-user local deployment that's fine; for shared infrastructure, lazy loading or a separate inference service would scale better. Tile size and overlap are hardcoded at 1024px and 128px—making these configurable would let users optimize for their imagery characteristics and object sizes. The frontend handles one file at a time; batch upload support would streamline processing multiple GeoTIFFs without repeated uploads.

Takeaways

For geospatial machine learning, the infrastructure around inference—handling large files, preserving coordinate systems, providing progress feedback, enabling result correction—often requires more code than the model wrapper itself.

More broadly: deploying ML is less about the model and more about the surrounding experience. A trained network is inert until it's embedded in a workflow where users can get data in, understand what's happening, and act on results. That plumbing is where most of the work lives.

The code is available on GitHub.

Building a GeoTIFF Object Detection Web App

Tiled Inference and Progress Streaming

Coordinate Systems and Post-Processing

Limitations

Takeaways

Training Faster R-CNN for Geospatial Object Detection