So what's the state of research in breaking out of the Von Neumann approach and going with a RAM-free architecture where the CPU has m(b)illions of registers you just do everything in? Of course it's expensive, but let's say you have effectively infinite dollars, is this a good idea?
Where would your processor fetch its program code from, if not RAM?
Assuming you place code also into registers...
If you squint hard enough registers are also a form of RAM, just closer to the processor and faster. A machine with only instruction execute and registers would still have a Harvard/Von Neumann architecture.
The reason why processors don't have more registers is because they are quite power hungry and they are not very dense. For a given chip area, D-RAM gives you more than 6x the capacity for less than half the power use. And no, you can't make registers with the same technology as D-RAM.
Right, registers are a kind of very small working memory, the only place where "work" operations can happen. Most program code eventually has to go through the register bank anyway, except it all has to be MOV in and out of the registers, eating up unbelievable amount of time.
I've always viewed RAM as a kind of register cache, necessitated because registers are expensive to build and RAM, though expensive, is cheaper. I've heard registers these days are just a small bit of SRAM, but reaching into my way back machine in college, I seem to remember them being a different kind of memory element.
But RAM and all the caches these days leading up to registers are all require fetch from somewhere, store in the register, do the work, then write back the result somewhere (even if the instruction set obfuscates that). If you had enough registers, the fetch and store parts of that work are pretty much gone, turning something like
where each mov we do today introduces a cascade down the cache and memory stack (perhaps even dipping into on-disk VM) just to copy a few bytes into a register. And we have to do that 3 times here. The number of adds we could do in the time it takes to do a mov is probably pretty high, but we simply can't do them because we're waiting on bits moving from one place to another.
So suppose money, power etc. weren't considered issues and engineering effort was put into a register-only approach, how much faster would that be? (one the reasons that the Von Neumann architecture became "the" way to do things was that registers were considered expensive to build, but what if we didn't care about money?)
I'd bet a general purpose system built this way would be an order of magnitude faster than anything we have today. But you're right, it would be an enormous resource hog and be expensive as a medium-sized mega yacht.
The problem with modern chips is not money, or the price of energy. The poor things would simply fry (or explode) if we were to make them like you suggest and clock them at competitive frequency.