Projects

<em>Simpler is Better:</em> Finding the Best Reward Function in Long Chain-of-Thought Reinforcement Learning for Small Language Models

Luning Wang*, Zichen Zhang*, Junkuan Liu*.

We study three types of reward functions — normal, cosine, and dynamic — for long chain-of-thought reinforcement learning in Small Language Models, and find that the simple normal reward consistently outperforms more complex designs, suggesting that simpler rewards are good enough for eliciting reasoning in smaller models.
<em>Simpler is Better:</em> Finding the Best Reward Function in Long Chain-of-Thought Reinforcement Learning for Small Language Models
MIA-Sort: Multiplex Chromatin Interaction Analysis by Efficiently Sorting Chromatin Complexes
MIA-Sort is a Python bioinformatics tool for efficiently extracting and sorting chromatin complexes from large datasets like Hi-C and Pore-C, enabling researchers to analyze chromatin loops, stripes, jets, and hubs to study loop extrusion.
MIA-Sort: Multiplex Chromatin Interaction Analysis by Efficiently Sorting Chromatin Complexes