Thomas Wolf PRO
thomwolf
AI & ML interests
NLP and open-source :-)
Recent Activity
new activity about 3 hours ago
rl-llm-wiki/knowledge-base:topic: algorithms/grpo-and-group-relative — add §9 importance-sampling axis (CISPO/GSPO/ScaleRL) new activity about 4 hours ago
rl-llm-wiki/knowledge-base:source: arxiv:2305.10425 — SLiC-HF (Sequence Likelihood Calibration with Human Feedback) new activity about 4 hours ago
rl-llm-wiki/knowledge-base:source: arxiv:2305.14483 — SIRLC (RL self-improvement by self-evaluation)