Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Compilers are deterministic, making their generated assembly code verifiable

People keep saying this like it is an absolute fact, whereas in reality it is a scale.

Compilers are more deterministic than LLMs in general, but no they are not completely deterministic. That's why making reproducible builds is hard!

https://stackoverflow.com/questions/52974259/what-are-some-e... and https://github.com/mgrang/non-determinism give some good examples of this.

This leads to the point: in general do we care about this non-determinism?

Most of the time, no we don't.

Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

This leads to how do I verify it is good enough which leads to testing and then suddenly you have a working agentic loop....



>> Compilers are deterministic, making their generated assembly code verifiable

> People keep saying this like it is an absolute fact, whereas in reality it is a scale.

My statement is of course a generalization due to its terseness and focuses on the expectation of repeatable results given constant input, excluding pathological definitions of nondeterminism such as compiler-defined macro values or implementation defects. Modern compilers are complex systems and not really my point.

> This leads to the point: in general do we care about this non-determinism?

> Most of the time, no we don't.

Not generally the type of nondeterminism I described, no. Nor the nondeterministic value of the `__DATE__` macro referenced in the StackOverflow link you provided.

> Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

This is where the wheels fall off.

First, "most of the time" only makes sense when there is another disjoint group of "other times." Second, the preferred group defined is "non-deterministic [sic] output of an LLM is good enough", which means the "other times" are when LLM use is not good enough. Third, and finally, when use of an approach (or a tool) is unpredictable (again, excluding pathological cases) given the same input, it requires an open set of tests to verify correctness over time.

That last point may not be obvious, so I will extrapolate as to why it holds.

Assuming the LLM in use has, or is reasonably expected to have, model evolution, documents generated by same will diverge unpredictably given a constant prompt. This implies prompt evolution will also be required at a frequency almost certainly different than unpredictable document generation intrinsic to LLMs. This in turn implies test expectations and/or definitions having to evolve over time with nothing changing other than undetectable model evolution. Which means any testing which exists at one point in time cannot be relied upon to provide the same verifications at a later point in time. Thus the requirement of an open set of tests to verify correctness over time.

Finally, to answer your question of:

  how do I verify it is good enough
You can't, because what you describe is a multi-story brick house built on a sand dune.


> Assuming the LLM in use has, or is reasonably expected to have, model evolution, documents generated by same will diverge unpredictably given a constant prompt.

So what?

You tell it once. It writes code.

You test that code, not the prompt.


> This leads to the point: in general do we care about this non-determinism?

> Most of the time, no we don't.

well that’s a sweeping generalisation. i think this is a better generalised answer to your question.

> It depends on the problem we’re trying solve and the surrounding conditions and constraints.

software engineering is primarily about understanding the problem space.

are 99% of us building a pacemaker? no. but that doesn’t mean we can automatically make the leap to assuming a set of tools known for being non-deterministic are good enough for our use case.

it depends.

> Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

the next stage is working with whatever tool(s) is/are best suited to solve the problem.

and that depends on the problem you are solving.


> are 99% of us building a pacemaker? no. but that doesn’t mean we can automatically make the leap to assuming a set of tools known for being non-deterministic are good enough for our use case.

This seems irrelevant?

Either way hopefully you test the pacemaker code comprehensively!

That's pretty much the best case for llm generated code: comprehensive tests of the desired behaviour.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: