HUGE congrats to Mike Pall. No amount of praise to him is enough.
Hopefully, sponsors will continue to donate to this project to keep Mike fully employed and ensure long-term commitment of this project.
I encourage sponsors to considered donating [1]
On a side note, I wish the Programming Language Shootout [2] would add back in LuaJIT to demonstrate just how fast LuaJIT is in comparison to other languages. It's just amazing.
When then Programming Language Shootout use to include LuaJIT, it had a benchmark against Java.
If you want to see just LuaJIT vs Lua, [1] is a good comparison.
The reason why I liked the inclusion of LuaJIT in the Programming Language Shootout was that it compared LuaJIT to other language implementations not just on speed, but memory consumption and lines of code. All of which, LuaJIT typically dominated.
https://github.com/darius/superbench is not a big fancy cross-language shootout, at all, but I tried a few little programs I actually cared about. LuaJIT 'won' in the sense that of the languages I tried it had the best combo of performance and pleasant productivity, informally. I didn't try to quantify the latter.
(Also, no Java because I wouldn't seriously consider it for hacking-for-fun.)
Yeah I agree it's a very interesting piece of software.
But do you understand the prequisites -- namely processor microarchitecture? I know that's why I don't understand it. I imagine if I did it might not be that hard to understand.
Most people's knowledge of the stack stops somewhere around C. C hasn't changed for 40 years. It's a portable assembly language, i.e. the common mental model is that each construct in C is a handful of assembly instructions on any processor. But to get LuaJIT-type performance you have to understand exactly how the CPU works, i.e. pipelining and reordering, internal caching algorithms, internal branch prediction algorithms, and maybe to some extent multicore, although I guess LuaJIT is single-threaded like Lua.
Mike Pall's mental model is for sure not C (since a lot of it isn't even written in C), but the lower level CPU architecture. And that has changed a lot in the last 10 years. I think even if you coded a lot of assembly language in the 80's it wouldn't necessarily translate to working on something like LuaJIT. My impression from skimming some online posts is that he has read hundreds of pages of Intel/ARM reference documentation from cover to cover, for multiple CPUs. I think there are a limited number of people with the patience to do that, since it's non-portable and changes relatively quickly.
Another problem is I think that the open source tools that are available don't work at the right level. Like when you are running profilers, they are helping you profile assembly code generated by a C compiler. The performance characteristics he's taking advantage of are at a lower level. I think you need CPU-specific performance counters and so forth, and there's a different way of getting those for each make/model. Are there even open source tools that allow you to get this information? I'd appreciate a pointer.
It's just a different level of abstraction than most programmers are working with. Not that many people are even writing "low level" C these days, e.g. something like redis. A lot of "application level" C code you see these days is old and/or not particularly fast.
I would be interested if anyone has any pointers for this style of code other than "read a bunch of CPU manuals". :)
I think that most sampling profilers like oprofile (http://oprofile.sourceforge.net/news/) or perf (https://perf.wiki.kernel.org/index.php/Main_Page) will give you CPU-level profiling and performance counters. The UI of those tools isn't super-friendly; if you're on OS X the "Instruments" app gives the same information with a far superior UI.
I have written a JIT that uses LuaJIT's dynamic assembly language engine DynASM (http://luajit.org/dynasm.html). DynASM is an incredibly piece of engineering, it lets you write really readable code for generating machine code at runtime, and is extremely small and low-overhead.
I keep meaning to write an article that demonstrates how to use LuaJIT for a small but interesting JIT. I just never quite get around to it. I always wish that I could use to implement the Universal Machine from ICFP 2008 (http://www.boundvariable.org/task.shtml), which is absolutely the most delightful problem ever, but as I recall it makes extensive use of self-modifying code which makes JIT-ting much more difficult and less effective.
- Linux's perf tool allows you to read HW performance counters. It's pretty self-explanatory, and some interesting ones are platform-neutral, whereas others you have to specify the CPU manufacturer's hex-code. (See Intel manual, for example.)
- For pipelining and HW/CPU details, I suggest you grab a copy of Hennesy and Patterson's Computer Architecture: A Quantitative Approach. There is also Computer Architecture: A Programmer's Perspective, which is good, but I think for what you're interested in, CAAQA is the better book.
- If you just want to play around with HW performance counters, you might want to try Intel's vtune, which comes (or at least did so, two years ago) as an Eclipse plugin/RCP-workbench.
"Another problem is I think that the open source tools that are available don't work at the right level. Like when you are running profilers, they are helping you profile assembly code generated by a C compiler. The performance characteristics he's taking advantage of are at a lower level. I think you need CPU-specific performance counters and so forth, and there's a different way of getting those for each make/model. Are there even open source tools that allow you to get this information? I'd appreciate a pointer."
The way forward is a sampling profiler.
By sampling the stack every (eg) millisecond, you can build a picture of what is taking the longest.
A sampling profiler still doesn't tell you what specific instructions are taking longer, how much memory contention is happening between CPUs, how many cache and TLB misses there are, etc.
> But do you understand the prequisites -- namely processor microarchitecture?
I do not but I try to learn more about that (coursera ftw).
LuaJit is awesome on many level, first as you have pointed out attention to detailed on processer level is quite special and not many people can do that.
Other then the good usage of the processor is not the only reason luajit is fast. On the compiler level, the optimization luajit does are extensive. Even lots of novel innovations are in that that people writting papers should read (Mike Pall has posted a list of thing that he thinks are somewhat novel). The nature of the trace compiler makes many of these things less complicated then they would have been otherwise.
There are many compiler books but no books about JITs in general. There are some good blogpost about LuaJit but not that many. Following Mike Pall (MikeMike on HN or reddit is intressting too)
Particularly interested in Android integration - has anyone used it for any projects? From my admittedly small experience using it, I love the language, and would really like to use it in some way practically.
Thank you! I'm not developing the site in it, I'm using it for allowing the users to write their own analytics reports in an easy, fast and secure manner. It's just the bare LuaJIT interpreter with some simple libraries I'm developing for analyzing reports more efficiently.
Something tells me there is an MC Hammer joke in here regarding being 2.0-Lua-Jit-To-Quit, but I'm not sure I can find the phrasing to excite the level of humor I was hoping for...
If you're referring to LuaJIT 1 vs 2, the big differences are that LuaJIT 2 has a trace-tree JIT, and interpreters for each platform written in assembly. The improvement in performance is significant: LuaJIT2's interpreter alone is faster than LuaJIT1's JIT.
It's hard to compare LuaJIT1 and 2, since LuaJIT2 is the most advanced JIT for a dynamically-typed language on the planet. The performance it can achieve with zero type annotations still blows me away.
I think you're forgetting the gold standard in dynamic languages from the 80s: Common Lisp. It's still around, and it's still faster than almost every other dynamically typed language out there.
Yes, on synthetic benchmarks for LuaJIT-2.0.0-beta10 and PyPy 1.9. Those numbers need updating for the latest versions, but the basics are as follows. LuaJIT is incredibly fast to warm-up. On short benchmarks (around a second or less), it is way ahead of PyPy. As code runs for longer and longer, PyPy tends to catch up, and sometimes overtake, LuaJIT.
My experience of RPython and PyPy suggests that there is a fair bit of scope for reducing some of the warm-up cost. I don't think PyPy will ever match LuaJIT's warm-up time, but it may well move quite a bit closer over time. It'll be interesting to see.
Are you sure PyPy can overtake LuaJit after warmup. I think it can maybe do it in a micro benchmark but I doute that in a complex benchmark PyPy has any chance.
LuaJit uses more advanced elimnation of loads and other optimications, also Lua is easier to optimize in generall.
I dont know the numbers but I would really be astunished if that was true.
To me, the real deciding factor that tells me that LuaJIT will win in the long run is that LuaJIT traces the high-level semantics of Lua, whereas PyPy has to work with Python bytecode, which may have a considerable amount of lost information. There are some good examples in the lambda-the-ultimate thread (for example, optimizing based on the knowledge that HLOAD and ASTORE will never alias).
PyPy can't feasibly optimize the high-level Python AST, because the language is very large compared to Lua, and more or less behaviorally defined by Python's bytecode compiler.
This is one of the interesting things about LuaJIT: it demonstrates the value of a small, well-designed language in a shockingly awesome way. The care that the Lua designers put into making Lua simple and regular is one of the reasons why a single programmer (a demigod, yes, but still — one demigod) is able to make so impressive an implementation.
> Are you sure PyPy can overtake LuaJit after warmup. I think it can maybe do it in a micro
> benchmark but I doute that in a complex benchmark PyPy has any chance.
When comparing the performance of different languages, micro benchmarks are about all we have, for better or worse. The best we can do is to run a fair number of different such benchmarks and make the comparison over them. I made my statement on the basis of such a comparison.
Both LuaJIT and RPython / PyPy are very clever systems and there's a surprising amount of overlap between the way they do things. But every system has its own strengths and weaknesses, and it doesn't surprise me personally that neither one is a winner 100% of the time.
I don't think PyPy is faster. Their design is significantly more complicated and thus takes longer to tune. Additionally I think python is a small bit more dynamic then Lua, complicating things.
I remember seeing a comparison between LuaJit and V8, both of which are highly tuned. LuaJit won on some of the benchmarks while V8 won others, and there were actually bit of a difference in the performance. That's the problem with Jitting, it's all tradeoffs. So what's good for a Jit in a browser is different from what's good in server environment. So you have to make a lot of benchmarks, testing widely different things, while at the same time not spending more time on optimizing one benchmark implementation in one language more than another.
Also the addition of the ffi library. It totally changes the way you integrate with C code (for the better) - its the best FFI I've seen in any language.
I'd be surprised if it were easier than LLVM, which is pretty easy.
But if you want to make a language you should absolutely not get hung up on technology. Even though it's very dated, Crenshaw's "Let's Build a Compiler" can teach you everything you need to get started building languages from scratch and by hand, and it moves pretty fast too:
There are other tutorials out there which will get you places. The first thing to do, really, is make a sketch of the language as you'd like to use it, and get excited about that, and then sit down and implement bits of it at a time. Compiler/interpreter writing is extremely rewarding—you get a lot of payback for your investment. Good luck!
It's actually really easy to take the JIT-compilation engine from LuaJit and use it for the compiler of your own programming language. The LuaJIT code is surprisingly modular, and you can pretty much take what you need.
For scanner/parser generation Lemon and Ragel are terrific. That takes care of building the AST for you. LuaJIT does the heavy lifting and takes care of the architecture specific edge cases. That just leaves the fun part in the middle where you decide on the semantics and syntax of your language.
I got curious and looked into this. The relevant effort seems to be this, a new backend for the ClojureScript compiler which compiles to Lua instead of JavaScript:
Clojure data structures do not have many threads. Clojure separates the form of the data and the way that they are processed. All Clojure Data Structures are completly feasable in a single threaded language.
I'd like to use Lua, partly to take advantage of LuaJIT and partly because it's an interesting language, but it drives me crazy that it lacks things like 'continue' and '+='. Maybe I should use MoonScript, but that completely discards Lua's aesthetics, which is hardly better.
Hopefully, sponsors will continue to donate to this project to keep Mike fully employed and ensure long-term commitment of this project.
I encourage sponsors to considered donating [1]
On a side note, I wish the Programming Language Shootout [2] would add back in LuaJIT to demonstrate just how fast LuaJIT is in comparison to other languages. It's just amazing.
[1] http://luajit.org/sponsors.html
[2] http://shootout.alioth.debian.org/