Object Detection on Drone Orthomosaics with SAM

January 10, 2026 • 8 min read

Drone-based remote sensing has revolutionized how we capture and analyze geospatial data. From agricultural monitoring to construction site surveys, orthomosaics (geometrically corrected aerial images) provide invaluable insights. But manually identifying objects in these high-resolution images is time-consuming and error-prone. That's where Meta's Segment Anything Model (SAM) comes in.

The Challenge

Traditional object detection approaches face several challenges when applied to drone orthomosaics:

Scale variation: Objects appear at different sizes depending on flight altitude and camera specs
Dense packing: Agricultural scenes often contain thousands of similar objects (plants, pots, rows)
Limited training data: Domain-specific datasets are scarce and expensive to annotate
Computational constraints: Processing gigapixel images requires efficient algorithms

Enter SAM: A Foundation Model Approach

Meta's Segment Anything Model represents a paradigm shift in image segmentation. Trained on over 1 billion masks from 11 million images, SAM demonstrates remarkable zero-shot generalization capabilities. It can segment objects it has never seen before, making it ideal for domain-specific applications like agricultural imagery.

The key insight is that SAM's learned representations capture fundamental visual concepts that transfer across domains. While SAM wasn't trained on drone imagery or plant nurseries, it understands concepts like "circular objects," "repeated patterns," and "boundaries between regions."

Our Approach

In our SciPy 2025 paper, we developed a methodology for applying SAM to drone orthomosaics:

Tile-based processing: We divide large orthomosaics into overlapping tiles that fit in GPU memory
Automatic prompt generation: Using grid-based point prompts to generate candidate masks
Post-processing pipeline: Filtering, merging, and deduplicating detections across tiles
Georeferencing: Converting pixel coordinates back to real-world coordinates

Results and Applications

We tested our approach on orthomosaics from plant nurseries, where the goal was to detect and count individual plant pots. The results were impressive:

Detection accuracy exceeding 95% on well-lit imagery
Processing speed of approximately 1000 objects per minute on consumer GPU hardware
Successful generalization to different pot sizes, colors, and arrangements

The practical applications extend far beyond pot counting. This methodology can be adapted for:

Crop health monitoring and yield estimation
Infrastructure inspection (solar panels, rooftops)
Environmental monitoring (tree counting, wildlife surveys)
Construction progress tracking

Key Takeaways

Foundation models like SAM are changing how we approach computer vision problems. Instead of training specialized models from scratch for each domain, we can leverage pre-trained representations and adapt them with minimal domain-specific data.

The combination of drone technology and foundation models opens new possibilities for automated geospatial analysis. As these models continue to improve, we can expect even more applications in precision agriculture, environmental science, and beyond.

Read the full paper: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)