Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that the actual number of registers is considerably different than the number of registers you can access through instruction set. They are used via register renaming and optimizations of complex instructions.


Yes. As other commentors have said, if you are doing out-of-order execution well, the CPU will have many more 'hidden' registers and do register renaming to use them. But this has an interesting interaction with compilers.

Say you have a simple function that is going to add 1 to a bunch of variables. In an ARM-like assembly code, this could be written as:

  LDR r1, [r0, #0]
  ADD r1, r1, #1
  STR r1, [r0, #0]
  LDR r1, [r0, #4]
  ADD r1, r1, #1
  STR r1, [r0, #4]
  LDR r1, [r0, #8]
  ADD r1, r1, #1
  STR r1, [r0, #8]
Now, if your CPU can do OoOE, it can spot that register r1 is used for three independent loads, adds and stores, and can internally use three different registers for them, allowing the operations to be done in parallel. But, equally, the compiler could have written the code as:

  LDR r1, [r0, #0]
  ADD r1, r1, #1
  STR r1, [r0, #0]
  LDR r2, [r0, #4]
  ADD r2, r2, #1
  STR r2, [r0, #4]
  LDR r3, [r0, #8]
  ADD r3, r3, #1
  STR r3, [r0, #8]
Compilers and register renaming are fighting each other. In traditional compiler writing, you try to minimise the register usage and output the first code listing. But if you have plenty of registers, you could output the second code instead, and let the CPU do parallel execution without the need for register renaming.

In other words, once you have enough 'real' registers does it get rid of the need for register renaming? Intel added it to their pentiums to improve existing x86 code, but I wonder if it has that much of a benefit with newer ISAs that have 'enough' registers and properly tuned compilers?


You still need OoOe to execute your second example optimally since you didn't schedule the instructions, which points to why OoOe isn't going away - there are going to be code sequences that the compiler cannot schedule optimally, particularly around branches. Additionally, cache misses are impossible to predict statically, and OoOe helps hide those.

And no one does OoOe without register renaming.


Yeah, I avoided any other changes to avoid confusing the issue. But any reordering I could have done, the compiler could have done too. Your point about branches is fair though, as the 'active' renamed registers after a branch can only be known at runtime.

Still, I wonder whether some of the features of modern CPUs could be dropped if it wasn't for legacy code. On the other hand, Itanium tried to push the parallelism work onto the compiler and look where that ended up!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: