# cpp-reinforcement-learning > C++ Reinforcement Learning best practices using libtorch (PyTorch C++ frontend) and modern C++17/20. Use when: - Implementing RL algorithms in C++ for performance-critical applications - Building production RL systems with libtorch - Creating replay buffers and experience storage - Optimizing RL training with GPU acceleration - Deploying RL models with ONNX Runtime - Author: Antony Zaki - Repository: Aznatkoiny/zAI-Skills - Version: 20260208095015 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-08 - Source: https://github.com/Aznatkoiny/zAI-Skills - Web: https://mule.run/skillshub/@@Aznatkoiny/zAI-Skills~cpp-reinforcement-learning:20260208095015 --- --- name: cpp-reinforcement-learning description: | C++ Reinforcement Learning best practices using libtorch (PyTorch C++ frontend) and modern C++17/20. Use when: - Implementing RL algorithms in C++ for performance-critical applications - Building production RL systems with libtorch - Creating replay buffers and experience storage - Optimizing RL training with GPU acceleration - Deploying RL models with ONNX Runtime --- # C++ Reinforcement Learning ## Overview This skill covers implementing reinforcement learning algorithms in C++ using LibTorch (PyTorch C++ frontend) and modern C++17/20 features. It provides patterns for building high-performance RL systems suitable for production deployment, robotics, game AI, and real-time applications. ## When to Use - Implementing DQN, PPO, SAC, or other RL algorithms in C++ - Building performance-critical RL training pipelines - Creating efficient replay buffers with proper memory management - Deploying trained models with ONNX Runtime - Parallelizing environment rollouts across threads - Integrating RL with existing C++ codebases (games, robotics, simulations) ## Core Libraries ### Primary: LibTorch (PyTorch C++ Frontend) LibTorch provides the same tensor operations and autograd capabilities as PyTorch in C++. **Installation**: Download from https://pytorch.org/get-started/locally (select C++/LibTorch) **CMake Integration**: ```cmake cmake_minimum_required(VERSION 3.18) project(rl_project) set(CMAKE_CXX_STANDARD 17) find_package(Torch REQUIRED) add_executable(train_agent src/main.cpp) target_link_libraries(train_agent "${TORCH_LIBRARIES}") ``` ### Secondary Libraries - **ONNX Runtime** - Cross-platform inference deployment - **cpprl** (mhubii/cpprl) - Reference PPO implementation - **Gymnasium C++ bindings** - Environment interfaces ## Quick Start: DQN Agent ```cpp #include struct DQNNet : torch::nn::Module { torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr}; DQNNet(int64_t state_dim, int64_t action_dim) { fc1 = register_module("fc1", torch::nn::Linear(state_dim, 128)); fc2 = register_module("fc2", torch::nn::Linear(128, 128)); fc3 = register_module("fc3", torch::nn::Linear(128, action_dim)); } torch::Tensor forward(torch::Tensor x) { x = torch::relu(fc1->forward(x)); x = torch::relu(fc2->forward(x)); return fc3->forward(x); } }; // Training loop auto policy_net = std::make_shared(state_dim, action_dim); auto target_net = std::make_shared(state_dim, action_dim); torch::optim::Adam optimizer(policy_net->parameters(), lr); // Compute loss auto q_values = policy_net->forward(states).gather(1, actions); auto next_q = target_net->forward(next_states).max(1).values.detach(); auto target = rewards + gamma * next_q * (1 - dones); auto loss = torch::mse_loss(q_values.squeeze(), target); // Backward pass optimizer.zero_grad(); loss.backward(); optimizer.step(); ``` ## Essential Patterns ### Replay Buffer (Ring Buffer) ```cpp class ReplayBuffer { public: explicit ReplayBuffer(size_t capacity) : capacity_(capacity), position_(0), size_(0) { buffer_.reserve(capacity); } void push(Experience exp) { if (buffer_.size() < capacity_) { buffer_.push_back(std::move(exp)); } else { buffer_[position_] = std::move(exp); } position_ = (position_ + 1) % capacity_; size_ = std::min(size_ + 1, capacity_); } std::vector sample(size_t batch_size); private: std::vector buffer_; size_t capacity_, position_, size_; std::mt19937 rng_{std::random_device{}()}; }; ``` ### GPU Device Management ```cpp torch::Device device = torch::cuda::is_available() ? torch::kCUDA : torch::kCPU; model->to(device); // Create tensors on device auto tensor = torch::zeros({batch_size, state_dim}, torch::TensorOptions().device(device).dtype(torch::kFloat32)); ``` ### Inference Mode ```cpp { torch::NoGradGuard no_grad; auto action_values = model->forward(state); auto action = action_values.argmax(1); } ``` ## Common Pitfalls 1. **Forgetting train/eval mode** - Call `model->train()` or `model->eval()` 2. **Missing NoGradGuard** - Use for inference to save memory 3. **Tensor accumulation** - Use `.detach()` for stored tensors 4. **Thread safety** - Clone models for parallel threads 5. **Device mismatch** - Verify all tensors on same device ## Reference Files - [references/libtorch.md](references/libtorch.md) - LibTorch setup and API guide - [references/algorithms.md](references/algorithms.md) - DQN, PPO, SAC implementations - [references/memory-management.md](references/memory-management.md) - Replay buffers, smart pointers, RAII - [references/performance.md](references/performance.md) - Optimization, parallelization, GPU - [references/testing.md](references/testing.md) - Testing and debugging strategies