文章目录

rLLM is an open-source framework that makes reinforcement learning accessible to any developer working with LLMs. Instead of wrestling with complex RL infrastructure, you simply wrap your existing agent code with @rllm.rollout and let the framework automatically trace every LLM call. Whether you're using LangGraph, SmolAgent, Strands, OpenAI Agents SDK, or plain OpenAI, rLLM integrates seamlessly — with near-zero code changes required. The framework supports multiple RL algorithms (GRPO, REINFORCE, RLOO, rejection sampling) and two training backends: verl for distributed multi-GPU setups and tinker for single-machine or CPU environments.

What makes rLLM particularly compelling is its battle-tested performance: models trained with rLLM have beaten models 50x their size, with a 4B parameter model outperforming 235B models on finance benchmarks, and a 1.5B model surpassing OpenAI's O1-Preview on math tasks. The CLI-first workflow with 50+ built-in benchmarks means you can evaluate and train with a single command like rllm eval gsm8k or rllm train gsm8k.

  • Framework Agnostic — Works with LangGraph, SmolAgent, Strands, OpenAI Agents SDK, Google ADK, and any OpenAI-compatible client
  • Minimal Code Changes — Add @rllm.rollout decorator to existing agent code; rLLM traces LLM calls automatically
  • CLI-First Workflow — 50+ built-in benchmarks; rllm eval and rllm train work out of the box
  • Proven Results — 4B models trained with rLLM beat 235B models on finance; 1.5B surpasses O1-Preview on math
  • Multiple RL Algorithms — GRPO, REINFORCE, RLOO, rejection sampling, and more
  • Flexible Backendsverl for distributed GPU training, tinker for single-machine / CPU setups

rLLM has an active GitHub community with 111 open issues and growing discussion. Here are some highlights from real English conversations:

QuyAnh2005: "Congratulations!! If you don't mind, can I ask a question? How many hours did the whole process take you, or 1 epoch?"

jumptoliujj: "Could you share the trend of the val test score during training?"

Discussion centered around reproducing the published results (avg score 56.4) and understanding the computational cost of training runs.

michaelzhiluo: "Have you tried 64K context and the correct temperature? ./scripts/eval/eval_model.sh --model agentica-org/DeepCoder-14B-Preview --datasets test_livecodebench --max-length 65536"

deepdata-foundation: "Same --max-length 65536 and temperature=0.6 but got 0.5699 and 0.2186 instead of expected results."

This thread explores evaluation methodology for coding tasks, with users debugging discrepancies in benchmark scores.

michaelzhiluo: "Looking like you are OOMing during actor update. You might want to increase sequence parallelism. This will distribute the activations of the KV blocks across GPUs."

zjhthu: "I tried setting actor_rollout_ref.actor.ulysses_sequence_parallel_size to 2 or 4, but it still runs OOM."

A practical troubleshooting thread where maintainers guide users through distributed training memory issues, recommending Ray >= 2.41.

rLLM bridges the gap between reinforcement learning research and practical LLM development. By providing a unified, framework-agnostic interface with battle-tested algorithms, it enables developers to train smarter AI agents without rebuilding RL infrastructure from scratch. The active community and responsive maintainers make it an excellent choice for anyone exploring RL-enhanced LLM applications.

Quick Install:

# Single-machine / CPU training
uv pip install "rllm @ git+https://github.com/rllm-org/rllm.git"

# Distributed GPU training
uv pip install rllm[verl] @ git+https://github.com/rllm-org/rllm.git"

@rllm-org/rllm · ⭐ 5,470 · MIT License