at2005's comments

at2005 · 2026-03-16T21:47:19 1773697639

I didn't compare with the harness (focused on distillation) but the original ToT paper has a section on it: https://arxiv.org/abs/2305.10601

at2005 · 2026-03-15T06:50:01 1773557401

Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample

at2005 · on Feb 23, 2021

Btw the whole motivation for this were algorithms like Grover's, which need "oracles" to be specified. You can only imagine trying to code adders and greater-than circuits with QASM...