A joint reinforcement learning framework that unifies locomotion and discrete gaze control for socially aware robot navigation in unstructured environments. AVSN learns where to look and where to drive within a single end-to-end policy, using a latent world model to solve the credit-assignment problem inherent in active sensing.
Paper: Active Vision for Social Navigation
The system spans three repositories under ~/src/:
| Repository | Role |
|---|---|
| RoboTerrain | Gazebo Fortress simulation, Leo Rover with fisheye camera, dynamic human actors, metrics logging |
| attention | Spatiotemporal attention model — predicts near-future human occupancy heatmaps from RGB + YOLO masks |
| dreamerJMV3 | DreamerV3-based RL agent — consumes fused observations (grayscale, depth, attention heatmap) and outputs joint locomotion + gaze actions |
Fisheye camera (6 rectified windows, ~160° FOV)
│
▼
fisheye_ros2_mem_share.py ──► shared memory
│
▼
inference.py
├─ YOLO pedestrian detection
├─ Spatiotemporal attention heatmap
├─ Monocular depth estimation
└─ Fused 3-channel observation ──► shared memory
│
▼
DreamerV3 (dreamerv3/main.py)
├─ World model learns gaze–observation dynamics
├─ Joint action: [linear_vel, angular_vel, pan_window]
└─ Pan selects from 5 yaw offsets: −60°, −30°, 0°, +30°, +60°
- Ubuntu 22.04 with ROS 2 Humble and Gazebo Fortress
- NVIDIA GPU with CUDA (for YOLO, depth estimation, and JAX/DreamerV3)
- Python 3.10+
The attention pipeline and DreamerV3 have conflicting dependencies (notably different JAX versions), so they must run in separate Python environments. Use whichever environment manager you prefer (venv, conda, virtualenv, etc.). Each repository contains a requirements.txt.
# Attention environment
cd ~/src/attention
# create and activate a Python 3.10+ environment, then:
pip install -r requirements.txt
# DreamerV3 environment
cd ~/src/dreamerJMV3
# create and activate a separate Python 3.10+ environment, then:
pip install -r requirements.txtsource /opt/ros/humble/setup.bash
cd ~/src/RoboTerrain/ros2_ws
colcon build
source install/setup.bash~/src/
├── RoboTerrain/
│ └── ros2_ws/src/
│ ├── roverrobotics_ros2/ # Leo Rover model, fisheye camera, Gazebo worlds
│ ├── dynamic_obstacles/ # Human actor spawner + trajectory files
│ └── rover_metrics/ # Navigation metrics logger
├── attention/
│ ├── inference/
│ │ └── fisheye_ros2_mem_share.py # ROS2 → shared memory bridge
│ ├── inference.py # YOLO + attention + depth fusion
│ ├── trajectory_model.py # Spatiotemporal attention model (Flax/JAX)
│ ├── preprocess_dataset.py # Dataset preprocessing
│ ├── run_efficient_training.py # Training entry point
│ └── model_output/ # Trained checkpoints
└── dreamerJMV3/
├── dreamerv3/
│ ├── main.py # DreamerV3 entry point
│ └── configs.yaml # Contains `leorover` config
└── logdir/ # Training logs and checkpoints
The system requires five terminals. Source ROS 2 in every terminal that uses ROS:
source /opt/ros/humble/setup.bash
source ~/src/RoboTerrain/ros2_ws/install/setup.bashLaunch the Leo Rover with fisheye camera in one of the available worlds (inspect, construction, island):
# With GUI
ros2 launch roverrobotics_gazebo Leo_rover_fisheye.launch.py
# Headless (for training runs)
ros2 launch roverrobotics_gazebo Leo_rover_fisheye.launch.py headless:=trueSubscribes to the fisheye camera topic, rectifies six 320×320 windows, and writes them to shared memory:
cd ~/src/attention/inference
python fisheye_ros2_mem_share.pyRuns YOLO detection, spatiotemporal attention, and monocular depth estimation. Outputs the fused observation to shared memory for DreamerV3:
# activate your attention Python environment, then:
cd ~/src/attention
python inference.py --attention_mode ./model_output/checkpoint_epoch_1000.pklLaunches the DreamerV3 agent with the leorover config. The LD_LIBRARY_PATH filtering prevents environment-installed libraries from conflicting with ROS 2 system libraries:
# activate your DreamerV3 Python environment, then:
# Filter LD_LIBRARY_PATH to ROS-only paths
FILTERED_LD_LIBRARY_PATH=$(echo $LD_LIBRARY_PATH | tr ':' '\n' | grep -E '^/opt/ros' | tr '\n' ':' | sed 's/:$//')
env LD_LIBRARY_PATH="$FILTERED_LD_LIBRARY_PATH" \
CUDA_HOME="" \
XLA_PYTHON_CLIENT_PREALLOCATE=false \
XLA_PYTHON_CLIENT_ALLOCATOR=platform \
python dreamerv3/main.py \
--configs leorover \
--logdir ./logdir/dreamer/{timestamp}Spawn pedestrian actors with predefined trajectories (examples shown for the inspect world):
cd ~/src/RoboTerrain/ros2_ws/src/dynamic_obstacles
python spawn.py \
--trajectory_file trajectories/inspect_linear.sdf \
--world_name inspect \
--actor_name linear
python spawn.py \
--trajectory_file trajectories/inspect_diag.sdf \
--world_name inspect \
--actor_name diag
python spawn.py \
--trajectory_file trajectories/inspect_corner_triangle.sdf \
--world_name inspect \
--actor_name triangleThe spatiotemporal attention model is trained separately on datasets of RGB frames with YOLO-generated pedestrian masks. Training produces checkpoint files consumed by inference.py.
# activate your attention Python environment, then:
cd ~/src/attention
python run_efficient_training.py \
--dataset_path /path/to/data \
--output_dir ./model_output \
--preprocessed_dir ./preprocessed_data \
--num_epochs 1000 \
--batch_size 8 \
--sequence_length 5 \
--learning_rate 1e-4 \
--target_width 320 \
--target_height 320 \
--yolo_model_path /path/to/yolo11n.onnx \
--embedding_dim 128 \
--num_heads 4 \
--tensorboard_dir ./log_dirResume from a checkpoint:
python run_efficient_training.py \
--dataset_path /path/to/data \
--resume_checkpoint ./model_output/checkpoint_epoch_500.pkl \
# ... other args as aboveMonitor training:
tensorboard --logdir ./log_dirNavigation metrics are logged via the ROS 2 metrics node:
ros2 run rover_metrics metrics_nodeVisualize and compare runs:
cd ~/src/RoboTerrain/metrics_analyzer
python cli_main.py \
data/metric_logs/run1.csv \
data/metric_logs/run2.csv \
-m CV IM -p time_seriesAvailable metrics: Total Collisions (TC), Obstacle Clearance (OC), Current Velocity (CV), IMU Acceleration (IM), Rough Terrain (RT), Distance Traveled (DT), Smoothness (SM).
| Symptom | Fix |
|---|---|
| No camera frames in shared memory | Confirm Gazebo is running and fisheye_ros2_mem_share.py is receiving on the camera topic |
| YOLO ONNX errors | Ensure onnxruntime-gpu is installed in the attent environment |
| DreamerV3 CUDA OOM | Set XLA_PYTHON_CLIENT_PREALLOCATE=false (already in the launch command above) |
libstdc++ version conflicts |
The LD_LIBRARY_PATH filter in Terminal 4 resolves most Python-environment/ROS library clashes |
| Actors not visible | Verify the --world_name flag matches the launched Gazebo world |
| Checkpoint load failure | Checkpoints are Flax .pkl files; ensure the config (embedding_dim, num_heads) matches |
@article{vice2025avsn,
title = {Active Vision for Social Navigation},
author = {Vice, Jack M. and Sukthankar, Gita},
journal = {Preprints},
year = {2025},
doi = {10.20944/preprints202604.0068}
}
@article{vice2025dune,
title = {DUnE: A Versatile Dynamic Unstructured Environment for Off-Road Navigation},
author = {Vice, Jack M. and Sukthankar, Gita},
journal = {Robotics},
volume = {14},
number = {4},
pages = {35},
year = {2025},
publisher = {MDPI},
doi = {10.3390/robotics14040035}
}RoboTerrain is Apache 2.0. See individual repositories for third-party license details (ONNX Runtime, YOLO, Leo Rover MIT, Clearpath BSD).