Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)

Our task required feature extraction powerful enough to facilitate the accurate spatial localization of ~19,000 plant pots, each 64 pixels tall and wide, in an image with over 1 billion pixels.

Abstract

Accurate and efficient object detection and spatial localization in remote sensing imagery is a persistent challenge. In the context of precision agriculture, the extensive data annotation required by conventional deep learning models poses additional challenges. This paper presents a fully open source workflow leveraging Meta AI’s Segment Anything Model (SAM) for zero-shot segmentation, enabling scalable object detection and spatial localization in high-resolution drone orthomosaics without the need for annotated image datasets. Model training and/or fine-tuning is rendered unnecessary in our precision agriculture-focused use case. The presented end-to-end workflow takes high-resolution images and quality control (QC) check points as inputs, automatically generates masks corresponding to the objects of interest (empty plant pots, in our given context), and outputs their spatial locations in real-world coordinates. Detection accuracy (required in the given context to be within 3 cm) is then quantitatively evaluated using the ground truth QC check points and benchmarked against object detection output generated using commercially available software. Results demonstrate that the open source workflow achieves superior spatial accuracy — producing output 20% more spatially accurate, with 400% greater IoU — while providing a scalable way to perform spatial localization on high-resolution aerial imagery (with ground sampling distance, or GSD, < 30 cm).

Results

Workflow Precision Recall F1 Score Mean Deviation (cm) IoU
Proprietary 0.9990 1.0000 0.9995 1.39 0.18
Open Source 0.9990 0.9956 0.9973 1.20 0.74
Improvement -- -0.0022 -0.0044 20% 400%