Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Lockstep – A data-oriented programming language (github.com/seanwevans)
8 points by goosethe 1 day ago | hide | past | favorite | 4 comments
https://github.com/seanwevans/lockstep

I want to share my work-in-progress systems language with a v0.1.0 release of Lockstep. It is a data-oriented systems programming language designed for high-throughput, deterministic compute pipelines.

I built Lockstep to bridge the gap between the productivity of C and the execution efficiency of GPU compute shaders. Instead of traditional control flow, Lockstep enforces straight-line SIMD execution. You will not find any if, for, or while statements inside compute kernels; branching is entirely replaced by hardware-native masking and stream-splitting.

Memory is handled via a static arena provided by the Host. There is no malloc, no hidden threads, and no garbage collection, which guarantees predictable performance and eliminates race conditions by construction.

Under the hood, Lockstep targets LLVM IR directly to leverage industrial-grade optimization passes. It also generates a C-compatible header for easy integration with host applications written in C, C++, Rust, or Zig.

v0.1.0 includes a compiler with LLVM IR and C header emission, a CLI simulator for validating pipeline wiring and cardinality on small datasets and an opt-in LSP server for real-time editor diagnostics, hover type info, and autocompletion.

You can check out the repository to see the syntax, and the roadmap outlines where the project is heading next, including parameterized SIMD widths and multi-stage pipeline composition.

I would love to hear feedback on the language semantics, the type system, and the overall architecture!

 help



For similar languages / compilers in this area:

1. ISPC [1], the Intel® Implicit SPMD Program Compiler also compiles SIMD programs with branches and other control flow efficiently using predication/masking etc.

2. Futhark [2] compiles nice-looking functional programs into efficent parallel GPU/CPU code.

[1]: https://github.com/ispc/ispc [2]: https://futhark-lang.org/


It's a bit philosophically different than ISPC. When SIMD lanes diverge, the ISPC compiler implicitly handles the execution masks and lane disabling behind the scenes.

We take a more draconian approach. We completely ban if and else inside compute kernels. If you want conditional logic, you must explicitly use branchless intrinsics like mix, step, or select. The goal is to make the cost of divergence mathematically explicit to the programmer rather than hiding it in the compiler. If a pathway is truly divergent, you handle it at the pipeline level using a filter node to split the stream. We also ban arbitrary pointers entirely. All memory is handled via a Host-Owned Static Arena, and structs are automatically decomposed into Struct-of-Arrays layouts. Because the compiler controls the exact byte-offset and knows there are no arbitrary pointers, it can aggressively decorate every LLVM IR pointer with noalias.


What about real performance? Does parallelization via SIMD instructions work well? What about supporting older CPUs with fewer SIMD instructions?

Does it have some loophole to allow loops? Or it just allows linear execution?

Can it read or write external memory (allocated within other language like C++)?


LLVM is able to auto-vectorize the generated IR extremely well. There are no branches to mis-predict, so, theoretically, it just blasts through the data.

Since it emits standard LLVM IR, LLVM handles the actual instruction set targeting. Right now in v0.1.0, the compiler hardcodes a SIMD width of 8 (assuming AVX2). However, parameterized SIMD widths are already on the roadmap for v0.4.0. Once that is added, you will be able to pass a --target-width flag to compile down to narrower vector units (like SSE on older CPUs) or up to AVX-512 and ARM NEON.

There are strictly no loopholes for loops inside the compute kernels. Inside a shader block, execution is 100% linear. However, the host application calling the pipeline effectively acts as the loop over the data elements. To help, we allow linear accumulators: You consume these with a fold operation, which the compiler lowers into a lock-free parallel reduction tree rather than a traditional for loop.

The memory model is a host-owned static arena where your host application allocates a flat, contiguous block of memory and passes that pointer to Lockstep_BindMemory(ptr). Lockstep does all its reads and writes exclusively within that allocated buffer. Because it doesn't have arbitrary pointers, it can't reach outside that arena, which is exactly how we mathematically guarantee the noalias pointer optimizations in LLVM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: