Object Detection on Drone Orthomosaics with SAM
Drone-based remote sensing has revolutionized how we capture and analyze geospatial data. From agricultural monitoring to construction site surveys, orthomosaics (geometrically corrected aerial images) provide invaluable insights. But manually identifying objects in these high-resolution images is time-consuming and error-prone. That's where Meta's Segment Anything Model (SAM) comes in.
The Challenge
Traditional object detection approaches face several challenges when applied to drone orthomosaics:
- Scale variation: Objects appear at different sizes depending on flight altitude and camera specs
- Dense packing: Agricultural scenes often contain thousands of similar objects (plants, pots, rows)
- Limited training data: Domain-specific datasets are scarce and expensive to annotate
- Computational constraints: Processing gigapixel images requires efficient algorithms
Enter SAM: A Foundation Model Approach
Meta's Segment Anything Model represents a paradigm shift in image segmentation. Trained on over 1 billion masks from 11 million images, SAM demonstrates remarkable zero-shot generalization capabilities. It can segment objects it has never seen before, making it ideal for domain-specific applications like agricultural imagery.
The key insight is that SAM's learned representations capture fundamental visual concepts that transfer across domains. While SAM wasn't trained on drone imagery or plant nurseries, it understands concepts like "circular objects," "repeated patterns," and "boundaries between regions."
Our Approach
In our SciPy 2025 paper, we developed a methodology for applying SAM to drone orthomosaics:
- Tile-based processing: We divide large orthomosaics into overlapping tiles that fit in GPU memory
- Automatic prompt generation: Using grid-based point prompts to generate candidate masks
- Post-processing pipeline: Filtering, merging, and deduplicating detections across tiles
- Georeferencing: Converting pixel coordinates back to real-world coordinates
Results and Applications
We tested our approach on orthomosaics from plant nurseries, where the goal was to detect and count individual plant pots. The results were impressive:
- Detection accuracy exceeding 95% on well-lit imagery
- Processing speed of approximately 1000 objects per minute on consumer GPU hardware
- Successful generalization to different pot sizes, colors, and arrangements
The practical applications extend far beyond pot counting. This methodology can be adapted for:
- Crop health monitoring and yield estimation
- Infrastructure inspection (solar panels, rooftops)
- Environmental monitoring (tree counting, wildlife surveys)
- Construction progress tracking
Key Takeaways
Foundation models like SAM are changing how we approach computer vision problems. Instead of training specialized models from scratch for each domain, we can leverage pre-trained representations and adapt them with minimal domain-specific data.
The combination of drone technology and foundation models opens new possibilities for automated geospatial analysis. As these models continue to improve, we can expect even more applications in precision agriculture, environmental science, and beyond.
Read the full paper: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)