Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AI is a real world example of Zeno’s Paradox. Getting to 90% accuracy is where we’ve been for years, and that’s Uncanny Valley territory. Getting to 95% accuracy is not “just” another 5%. That makes it sound like it’s 6% as hard as getting to 90%. What you’re actually doing is cutting the error rate in half, which is really difficult. So 97% isn’t 2% harder than 95%, or even 40% harder, it’s almost twice as hard.

The long tail is an expensive beast. And if you used Siri or Alexa as much as they’d like you to, every user will run into one ridiculous answer per day. There’s a psychology around failure clusters that leads people to claim that failure modes happen “all the time” and I’ve seen it happen a lot in the 2x a week to once a day interval. There’s another around clusters that happen when the stakes are high, where the characterization becomes even more unfair. There are others around Dunbar numbers. Public policy changes when everyone knows someone who was affected.



I think this is starting look like it is accurate. The sudden progress of AI is more of an illusion. It is more readily apparent in the field of image generation. If you stand back far enough, the images look outstanding. However, any close inspection reveals small errors everywhere as AI doesn't actually understand the structure of things.

So it is as well with data, just not as easily perceptible at first as sometimes you have to be knowledgeable of the domain to realize just how bad it is.

I've seen some online discussions starting to emerge that suggests this is indeed an architecture flaw in LLMs. That would imply fixing this is not something that is just around the corner, but a significant effort that might even require rethinking the approach.


> but a significant effort that might even require rethinking the approach.

There’s probably a Turing award for whatever comes next, and for whatever comes after that.

And I don’t think that AI will replace developers at any rate. All it might do is show us how futile some of the work we get saddled with is. A new kind of framework for dealing with the sorts of things management believes are important but actually have a high material cost for the value they provide. We all know people who are good at talking, and some of them are good at talking people into unpaid overtime. That’s how they make the numbers work, but chewing developers up and spitting them out. Until we get smart and say no.


I don't think it's an illusion, there has been progress.

And I also agree that the AI like thing we have is nowhere near AGI.

And I also agree with rethinking the approach. The problem here is human AI is deeply entwined and optimized the problems of living things. Before we had humanlike intelligence we had 'do not get killed' and 'do not starve' intelligence. The general issue is AI doesn't have these concerns. This causes a set of alignment issues between human behavior an AI behavior. AI doesn't have any 'this causes death' filter inherent to its architecture and we'll poorly try to tack this on and wonder why it fails.


My professional opinion is that we should be using AI like Bloom filters. Can we detect if the expensive calculation needs to be made or not. A 2% error rate in that situation is just an opex issue, not a publicity nightmare.


Yes, didn't mean to imply there is no progress, just that some perceive that we are all of a sudden getting close to AGI from their first impressions of ChatGPT.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: