Search

Home
Research
Products
Experience
Blogs
Awards
Fun Facts
Calligraphy
Photography

Light Dark Automatic

<em>Simpler is Better:</em> Finding the Best Reward Function in Long Chain-of-Thought Reinforcement Learning for Small Language Models

Apr 27, 2025

Go to Project Site PDF Poster Code Slides

Proposed Pipeline

Zichen "Charlie" Zhang

I’m passionate about transforming traditional softwares with AI

© 2026 Zichen Zhang. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.