Hacker Newsnew | past | comments | ask | show | jobs | submit | at2005's commentslogin

I didn't compare with the harness (focused on distillation) but the original ToT paper has a section on it: https://arxiv.org/abs/2305.10601

Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample

Btw the whole motivation for this were algorithms like Grover's, which need "oracles" to be specified. You can only imagine trying to code adders and greater-than circuits with QASM...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: