RarePlanes: Synthetic Data Takes Flight
A collaboration between CosmiQ Works and AI.Reverie.
- Synthetic data alone can train a robust object detection algorithm, as benchmarked against real world data.
- Fine tuning the synthetic only model with 10% of the observed dataset achieved roughly the same results as training on 100% of the observed dataset. This method would bypass 90% of the manual labeling and collection effort.
Why It Matters
expensive, and often difficult to procure. This opens opportunities for far more rapid and prolific adoption
of computer vision technologies across industries. For more information, please visit:
Over the past decade, computer vision research and development of new algorithms has been driven largely by open datasets. However, development of such datasets is often labor-intensive, time-consuming and costly. An alternative approach is to create computer generated images and annotations (referred to as synthetic data), a process that can provide thousands of images at very low marginal cost.
Specifically, overhead datasets remain one of the best avenues for developing new computer vision methods that can adapt to limited sensor resolution, variable look angles and locate tightly grouped, cluttered objects. Such methods can extend beyond the overhead space and be helpful in domains such as autonomous driving and surveillance.
Varying Conditions and a Difficult Perspective
Creating synthetic datasets from an overhead perspective is a significant challenge, and simulators must attempt to closely mimic the complexities of a spaceborne or aerial sensor as well as Earth’s ever-changing conditions.
For example, to create a large and heterogeneous synthetic dataset, one must account for:
- Each sensor’s varying spatial resolution
- Changes in sensor look angle
- Time of day of collection
- Changes in illumination due to the sun’s location relative to the sensor
- Ground appearance due to seasonal change, weather conditions and varying geographies or biomes
Further, it is difficult to classify the airplanes because they are visually similar. It’s a fine-grain classification problem.
WHAT WE TESTED
The Largest Dataset of Its Kind
RarePlanes is the largest openly-available very-high resolution dataset built to test the value of synthetic
data from an overhead perspective. It consists of:
- Observed data: 253 Maxar WorldView-3 satellite scenes spanning 112 locations and 2,142km² with 14,700 hand-annotated aircraft. Annotations underwent two rounds of quality control, by a professional service and by the study authors.
- Synthetic data: 50,000 synthetic satellite images with 630,000 aircraft annotations.
- All data
- 10 fine-grain attributes including: aircraft length, wingspan, wing shape, wing position, wingspan class, propulsion, number of engines, number of vertical stabilizers, presence of canards and aircraft role.
- 33 sub-attribute choices within the categories above.
How We Tested It
For each task, we train on:
- Observed data only (the benchmark)
- Synthetic data only
- Fine-tuning with roughly 10% of the observed data
Synthetic Data Effective Alone and in Combination With Real Data
AI.Reverie is a simulation platform that trains AI to understand the world. It offers a suite of synthetic data and vision APIs to help businesses across different industries train their machine learning algorithms and improve their AI applications, along with benchmarking services to measure the impact.
Founded in 2015 as a technology challenge lab within In-Q-Tel (IQT), CosmiQ Works is focused on developing, prototyping, and evaluating emerging open source artificial intelligence capabilities for geospatial use cases. CosmiQ Works helps accelerate development and adoption of these technologies into deployable products.