OpenClaw had a huge viral marketing campaign. It wasn't a coincidence everyone on twitter was talking about it at the same time suddenly. To its credit, it also executed well enough in a few areas that captured people's imagination. Most of the concepts are ideas people have been toying with for years, though.
Steinberg funded or directed a campaign? It looked to me like unconnected parties liked it and marketed it to offer their own solutions and services on top of it. You saw that they were paid by Steinberg/his affiliates?
What kind of small tasks do you find it's good at? My non-coding use of agents has been related to server admin, and my local-llm use-case is for 24/7 tasks that would be cost-prohibitive. So my best guess for this would be monitoring logs, security cameras, and general home automation tasks.
That's about it. The harness is still pretty rudimentary so I'm sure the system could be more capable, and that might reveal more interesting opportunities. I don't really know.
So far I've got it orchestrating a few instances to dig through logs, local emails, git repositories, and github to figure out what I've been doing and what I need to do. Opus is waayyy better at it, but Qwen does a good enough job to actually be useful.
I tried having it parse orders in emails and create a CSV of expenses, and that went pretty badly. I'm not sure why. The CSV was invalid and full of bunk entries by the end, almost every time. It missed a lot of expenses. It would parse out only 5 or 6 items of 7, for example. Opus and Sonnet do spectacular jobs on tasks like this, and do cool things like create lists of emails with orders then systematically ensure each line item within each email is accounted for, even without prompting to do so. It's an entirely different category of performance.
Automation is something I'd like to dabble in next, but all I can think of it being useful for is mapping commands (probably from voice) to tool calls, and the reality is I'd rather tap a button on my phone. My family might like being able to use voice commands, though. Otherwise, having it parse logs to determine how to act based on thresholds or something would also be far better implemented with simple algorithms. It's hard to find truly useful and clear fits for LLMs
Oh man you just gave me an idea to use something like qwen 3.5 to categorize a lot of emails. You can keep the context small, do it per email and just churn through a lot of crap.
I've been learning to apply these lately and it has been pretty eye opening. Combined with Fourier analysis (for example) you can do what seems kind of like magic, in my opinion. But it has been possible since long before LLMs showed up.
Totally different categories and different use cases, but the more I learn about LLMs the more I discover there's a powerful, determinsitic, well-established statistical model or two to do the same thing.
Really, LLMs are kind of like convenient, wildly inefficient proxies for useful processes. But I'm not convinced they should often end up as permanent fixtures of logical pipelines. Unless you're making a chat bot, I guess.
> Really, LLMs are kind of like convenient, wildly inefficient proxies for useful processes. But I'm not convinced they should often end up as permanent fixtures of logical pipelines. Unless you're making a chat bot, I guess.
I think I agree with this. It's made me realise LLMs are great for prototyping processes in the same way that 3D printers are great at prototyping physical things. They make it quick and easy to get something close enough to see the unforeseen problems a proper solution might have.
3d printing is a great analog because there are so many critical considerations that are often missed or can't be accounted for in the prototype, but, it's alright because it's a prototype. The strain testing, durability, manufacturing at scale; none of that is properly addressed. Those might involved some serious, expensive challenges, too. But it's alright because you've got something in your hand that informs you whether or not those challenges are worth contending with. I really love this about LLMs and 3d printing.
IMO the fact that spam detection has devolved into reputation management vs. being able to work on the content themselves makes me think there is a lot of alpha between an llm process vs. the most traditional processes we have now.
I was just chatting with a co-worker that wanted to run a LLM locally to classify a bunch of text. He was worried about spending too many tokens though.
I asked him why he didn't just have the LLM build him a python ML library based classifier instead.
The LLMs are great but you can also build supporting tools so that:
- you use fewer tokens
- it's deterministic
- you as the human can also use the tools
- it's faster b/c the LLM isn't "shamboozling" every time you need to do the same task.
I use Haiku to classify my mail - it's way overkill, but also doesn't require training unlike a classifer. I recieve many dozens of e-mails a day, and it's burned on average ~$3 worth of tokens per month. I'll probably switch that to a cheaper model soon, but it's cheap enough the "payoff" from spending the time optimizing it is long.
Remember when Netflix almost split its brand with "Quickster"? It was the dying DVD by mail service, but the whole debacle did nothing but confuse people.
True, although Netflix knew the DVD business had no permanent future anyway, so they really didn't care. If they'd picked a less silly name like "DVDflix" or something, it wouldn't have become a viral story, but either way it wouldn't have changed NFLX's fortunes.
Everyone has their own hill to die on, that's the thing about personal computing. It's the same if you ask why they can't switch mobile OS. It's some seemingly trivial app or feature that almost nobody cares about.
Support means that the manufacturer just still releases OS updates. But it says absolutely nothing about the quality of those updates: what if those updates simply degrade the situation? Every iPhone user I know says the same without conspiring with each other: it's better to stop updating to newer major OS releases for older iPhones.
I was really looking for tangible, actionable advice since I'm facing slow adoption in my org. This post seems to hide behind the "secret sauce" that it claims made all of the difference.
Once local models are good enough there will be a $20 cloud provider that can give you more context, parameters, and t/s than you could dream of at home. This is true today with services like groq.
Not exactly. Those models are based on intermittent usage. If you're using an AI engineer using a sophisticated agent flow, the usage is constant and continuous. That can price to an equivalent of a dedicated cube at home over 2 years.
I had 3 projects running today. I hit my Claude Max Pro session limits twice today in about 90 minutes. I'm now keeping it down to 1 project, and I may interrupt it until the evening when I don't need Claude Web. If I could run it passively on my laptop, I would.
Anthropic used to have unlimited subscriptions, then people started running angents 24/7.
Now they have 5 hour buckets of limited use.
Groq most likely stays afloat because they're a bit player - and propped by VC money.
With a local system I can run it at full blast all the time, nobody can suddenly make it stupid by reallocating resources to training their new model, nobody can censor it or do stealth updates that make it perform worse.
Groq and Cerebras definitely have the t/s, but their hardware is tremendously expensive, even compared to the standard data center GPUs. Worth keeping in mind if we're talking about a $20 subscription.
reply