AI & ML interests

None defined yet.

Recent Activity

lewtun submitted a paper 19 days ago

Single-minus gluon tree amplitudes are nonzero

sergiopaniego new activity 20 days ago

trl-lib/documentation-images:Upload 2 files

lewtun submitted a paper 20 days ago

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

View all activity

sergiopaniego

posted an update 1 day ago

Post

203

did you know you can train agentic models with RL deploying the environments on HF Spaces? 🤗

with TRL + OpenEnv, your training script connects to remote environments hosted as Spaces

want to train faster? → just add more Spaces (TRL handles the parallelization natively)

we used this to train a model to solve the trolley problem in CARLA. 2 HF Spaces running a full driving simulator, each on a T4 GPU

full write-up with code and results → https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl

sergiopaniego

posted an update 3 days ago

Post

225

Qwen3.5 dense (smol 🤏) models just dropped

- natively multimodal
- 0.8B · 2B · 4B · 9B (+ base variants)
- 262K context extensible to 1M
- built-in thinking

fine-tune them with TRL out of the box → SFT, GRPO, DPO and more!

examples: https://huggingface.co/docs/trl/example_overview
collection: https://huggingface.co/collections/Qwen/qwen35

sergiopaniego

posted an update 6 days ago

Post

2245

What happens when you make an LLM drive a car where physics are real and actions can't be undone?

I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.

The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.

In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.

The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.

This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py

albertvillanova

posted an update 7 days ago

Post

1713

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

qgallouedec

posted an update 14 days ago

Post

2645

@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch:

Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*.
So the real question is:

How hard is it to turn Tiny Aya into an agent?

Turns out… it’s simple, thanks to Hugging Face TRL.
We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.

Small model. Global reach. Agent capabilities.

👉 https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb

1 reply

sergiopaniego

posted an update 14 days ago

Post

1419

Tiny Aya 🌿 just dropped from @CohereLabs , a really powerful multilingual small model!

To celebrate, we cooked up fresh resources to train it for tool calling 🔧

> Free Google Colab guide: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb
> Standalone training script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_tiny_aya_tool_calling.py

lewtun

submitted a paper to Daily Papers 19 days ago

Single-minus gluon tree amplitudes are nonzero

Paper • 2602.12176 • Published 20 days ago • 8

sergiopaniego

posted an update 20 days ago

Post

527

The latest piece by @MiniMax-AI is a must-read.

It tries to break the impossible triangle of agent RL: throughput × stability × flexibility.

A lot to learn here, go read it 🫵
https://huggingface.co/blog/MiniMax-AI/forge-scalable-agent-rl-framework-and-algorithm

sergiopaniego

in trl-lib/documentation-images 20 days ago

Upload 2 files

#4 opened 20 days ago by

cmunley1

lewtun

submitted a paper to Daily Papers 20 days ago

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Paper • 2602.03773 • Published 29 days ago • 11

albertvillanova

posted an update 21 days ago

Post

1716

5 years already working in democratizing AI 🤗
Grateful to be part of such an awesome team making it happen every day.

sergiopaniego

posted an update 24 days ago

Post

481

if you're looking for a good first issue to get your open-source journey started, you could contribute to this TRL issue by documenting one impactful paper in the docs

we have a broad list to cover!! 🧐

https://github.com/huggingface/trl/issues/4407