← Back to Blog

Object Detection on Drone Orthomosaics with SAM

Drone-based remote sensing has revolutionized how we capture and analyze geospatial data. From agricultural monitoring to construction site surveys, orthomosaics (geometrically corrected aerial images) provide invaluable insights. But manually identifying objects in these high-resolution images is time-consuming and error-prone. That's where Meta's Segment Anything Model (SAM) comes in.

The Challenge

Traditional object detection approaches face several challenges when applied to drone orthomosaics:

Enter SAM: A Foundation Model Approach

Meta's Segment Anything Model represents a paradigm shift in image segmentation. Trained on over 1 billion masks from 11 million images, SAM demonstrates remarkable zero-shot generalization capabilities. It can segment objects it has never seen before, making it ideal for domain-specific applications like agricultural imagery.

The key insight is that SAM's learned representations capture fundamental visual concepts that transfer across domains. While SAM wasn't trained on drone imagery or plant nurseries, it understands concepts like "circular objects," "repeated patterns," and "boundaries between regions."

Our Approach

In our SciPy 2025 paper, we developed a methodology for applying SAM to drone orthomosaics:

  1. Tile-based processing: We divide large orthomosaics into overlapping tiles that fit in GPU memory
  2. Automatic prompt generation: Using grid-based point prompts to generate candidate masks
  3. Post-processing pipeline: Filtering, merging, and deduplicating detections across tiles
  4. Georeferencing: Converting pixel coordinates back to real-world coordinates

Results and Applications

We tested our approach on orthomosaics from plant nurseries, where the goal was to detect and count individual plant pots. The results were impressive:

The practical applications extend far beyond pot counting. This methodology can be adapted for:

Key Takeaways

Foundation models like SAM are changing how we approach computer vision problems. Instead of training specialized models from scratch for each domain, we can leverage pre-trained representations and adapt them with minimal domain-specific data.

The combination of drone technology and foundation models opens new possibilities for automated geospatial analysis. As these models continue to improve, we can expect even more applications in precision agriculture, environmental science, and beyond.

Read the full paper: Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)