Skip to content

Training Data Collection

quad_training provides time-synchronized episode management for reinforcement-learning and behavior-cloning pipelines. This page covers the data-collection rig; policy deployment is covered under Specialized Controllers.

What it does

  • Episode boundaries — automatic reset when the robot falls, exits the world, or hits a step budget
  • Time-synced channels — joint states, body state, contacts, IMU, and command interleaved at a fixed cadence
  • Curriculum hooks — start poses, target velocities, and terrain are sampled per episode from a YAML schedule
  • Headless-friendly — designed for the MuJoCo backend on a GPU box

A minimal collection run

ros2 launch quad_utils quad_mujoco.py
ros2 launch quad_training collect.py \
  episodes:=1000 \
  curriculum:=stand_walk_basic \
  output_dir:=$HOME/datasets/quad_walk_v1

Output is a directory of .parquet shards plus a manifest.json that maps episode ID → terrain seed → outcome. Loaders for PyTorch and JAX live under quad_training/loaders/.

What gets logged per step

Channel Shape Notes
obs.body_state (13,) pos (3) + quat (4) + lin_vel (3) + ang_vel (3)
obs.joint_state (24,) pos (12) + vel (12)
obs.contacts (4,) bool per leg
obs.cmd_vel (3,) linear x/y, angular z
action (12,) joint commands as logged by the controller
reward (1,) from reward_fn in the curriculum YAML
done (1,) episode-terminal flag

Curricula

Curriculum YAMLs live in quad_training/config/curricula/. Each is a list of stages with sampling rules. Example:

- name: stand
  duration_eps: 200
  init_pose:
    sampler: nominal
  cmd_vel:
    sampler: zero
  terrain: flat
  reward: stand_upright

- name: walk_flat
  duration_eps: 500
  init_pose:
    sampler: nominal_jitter
    jitter: 0.05
  cmd_vel:
    sampler: uniform
    range: { vx: [-0.5, 1.0], vy: [-0.3, 0.3], wz: [-0.5, 0.5] }
  terrain: flat
  reward: track_cmd_vel

Reproducibility

Each episode logs:

  • Curriculum stage + step within stage
  • RNG seed (used for terrain, init pose, perturbations)
  • Robot URDF SHA + per-robot YAML SHA
  • Quad-SDK git SHA

Together these let you re-run a problematic episode bit-for-bit on the MuJoCo backend.

Running on a cluster

The quad_training rig spins up one MuJoCo instance per worker; on a 16-core box you can pack ~12 collectors and saturate I/O before CPU. There's a --launch-mode=detached flag that supervises children and writes shards as they complete (so a crash loses at most one episode).

Troubleshooting

Episodes look correct, model trains poorly

Check the reward curve in the manifest. If it's noisier than expected, your reset condition may be triggering early — inspect done flag distribution per stage.

Workers desync after a few hours

Long collection runs occasionally drift on heavily-loaded shared boxes. Pin workers with taskset or move to a dedicated machine.

Loader complains about schema mismatch

Curriculum changes that add/remove channels invalidate older shards. The manifest carries a schema hash — gate loaders on it.