Speaking from my own personal use, 9 times out of 10, the 10x-30x speedup of goi...

Speaking from my own personal use, 9 times out of 10, the 10x-30x speedup of going CPU to GPU is what I'm looking for. The extra 1.5-2x speedup at the end is not something I care about.

I think this is the major hold-up to more GPGPU adoption. NVidia does an amazing job tweaking libraries for ultrafast performance for Torch/TensorFlow/video games, where it really does matter. In the process, they miss everyone else who has embarrassingly parallel problems, but where it's just not worth the time to code up different versions for CUDA and OpenCL (or whatever ATI/Intel happen to be swinging that week).

I think if they did this /well/, they'd take over Intel as the major value-add in computers.