IEEE ICRA 2021

Self-Supervised Data Labeling via Simulation for Multi-Agent Planning

We generate training labels automatically from simulated collision events, distill them into spatial regression models, and use the learned regions to build roadmaps that are 5x faster than sampling baselines in dynamic, multi-agent environments.

Felipe Felix Arias Brian Ichter Aleksandra Faust Nancy M. Amato

UIUC Parasol Lab · Google Brain

ACPRM roadmap visual

The Problem

Planning in dynamic environments is slow and fragile.

Multi-agent motion planning is bottlenecked by narrow passages and dynamic obstacles. Traditional roadmaps sample uniformly, wasting nodes in open space and under-sampling critical chokepoints. Labeled data for where collisions actually happen is expensive to collect manually.

Programmatic Data Labeling via Simulation

No manual annotation required.

Instead of hand-labeling, we run thousands of simulated trajectories and automatically extract avoidance-critical regions: locations where agents must deviate to avoid collisions. This generates large-scale, spatially-grounded training labels directly from simulation, a form of self-supervised programmatic data labeling.

Training sample 1
Training sample 2
Training sample 3

From darkest to lightest: static obstacles, free space, agent origin, avoidance-critical region, and goal region.

Model Distillation

Learn structure, then plan with it.

A spatial regression network is trained on the simulation-generated labels to predict avoidance criticality from local occupancy grids. At planning time, the model scores candidate nodes: concentrating the roadmap in regions that matter. The result: sparser graphs with better coverage of critical areas.

Results: ACPRM vs. Baselines

5x faster. Sparser graphs. Better coverage.

ACPRM roadmap

ACPRM: nodes concentrated in critical regions.

MAPRM roadmap

MAPRM baseline: uniform sampling wastes nodes.

Key Takeaways