Training Data Collection¶

quad_training provides time-synchronized episode management for reinforcement-learning and behavior-cloning pipelines. This page covers the data-collection rig; policy deployment is covered under Specialized Controllers.

What it does¶

Episode boundaries — automatic reset when the robot falls, exits the world, or hits a step budget
Time-synced channels — joint states, body state, contacts, IMU, and command interleaved at a fixed cadence
Curriculum hooks — start poses, target velocities, and terrain are sampled per episode from a YAML schedule
Headless-friendly — designed for the MuJoCo backend on a GPU box

A minimal collection run¶

ros2 launch quad_utils quad_mujoco.py
ros2 launch quad_training collect.py \
  episodes:=1000 \
  curriculum:=stand_walk_basic \
  output_dir:=$HOME/datasets/quad_walk_v1

Output is a directory of .parquet shards plus a manifest.json that maps episode ID → terrain seed → outcome. Loaders for PyTorch and JAX live under quad_training/loaders/.

What gets logged per step¶

Channel	Shape	Notes
`obs.body_state`	(13,)	pos (3) + quat (4) + lin_vel (3) + ang_vel (3)
`obs.joint_state`	(24,)	pos (12) + vel (12)
`obs.contacts`	(4,)	bool per leg
`obs.cmd_vel`	(3,)	linear x/y, angular z
`action`	(12,)	joint commands as logged by the controller
`reward`	(1,)	from `reward_fn` in the curriculum YAML
`done`	(1,)	episode-terminal flag

Curricula¶

Curriculum YAMLs live in quad_training/config/curricula/. Each is a list of stages with sampling rules. Example:

- name: stand
  duration_eps: 200
  init_pose:
    sampler: nominal
  cmd_vel:
    sampler: zero
  terrain: flat
  reward: stand_upright

- name: walk_flat
  duration_eps: 500
  init_pose:
    sampler: nominal_jitter
    jitter: 0.05
  cmd_vel:
    sampler: uniform
    range: { vx: [-0.5, 1.0], vy: [-0.3, 0.3], wz: [-0.5, 0.5] }
  terrain: flat
  reward: track_cmd_vel

Reproducibility¶

Each episode logs:

Curriculum stage + step within stage
RNG seed (used for terrain, init pose, perturbations)
Robot URDF SHA + per-robot YAML SHA
Quad-SDK git SHA

Together these let you re-run a problematic episode bit-for-bit on the MuJoCo backend.

Running on a cluster¶

The quad_training rig spins up one MuJoCo instance per worker; on a 16-core box you can pack ~12 collectors and saturate I/O before CPU. There's a --launch-mode=detached flag that supervises children and writes shards as they complete (so a crash loses at most one episode).

Troubleshooting¶

Episodes look correct, model trains poorly

Check the reward curve in the manifest. If it's noisier than expected, your reset condition may be triggering early — inspect done flag distribution per stage.

Workers desync after a few hours

Long collection runs occasionally drift on heavily-loaded shared boxes. Pin workers with taskset or move to a dedicated machine.

Loader complains about schema mismatch

Curriculum changes that add/remove channels invalidate older shards. The manifest carries a schema hash — gate loaders on it.