Skip to content

KarpelesLab/kataan

Repository files navigation

Kataan

A high-performance JavaScript (ECMAScript) engine written in pure Rust, with no foreign code on the critical path. Kataan is usable three ways — as a Rust library, as a C library, and as a standalone command-line tool — the same tri-modal model proven out in the sibling projects purecrypto (cryptography) and rsurl (HTTP/curl).

Status: running and broadly conformant; advanced tiers in active build-out. The lexer and the full ECMAScript parser are complete, and two execution engines run real programs and are checked to agree on every test:

  • a tree-walking interpreter (the default / corpus engine), and
  • a register bytecode VM (the primary path for kataan run and the C ABI), compiling nearly all of the common language directly — every operator, objects/arrays, method calls with call/apply/bind, new/new.target, all loops + for-of/for-in/switch/try-catch-finally, closures (incl. mutual recursion), destructuring, rest/spread, classes with extends/super and getters/setters, generators (incl. yield* and .throw()), and async/await — falling back to the tree-walker for the handful of constructs it doesn't yet compile.

A dual-path Test262-style conformance corpus (520/520) passes on both engines, covering closures, classes/inheritance (incl. extends of native errors), optional chaining, the iterator protocol, Map/Set/WeakMap, Symbol (incl. Symbol.hasInstance), BigInt, Promise + async/await, Proxy/Reflect (incl. the ownKeys trap driving Object.keys/values/ entries/for-in), typed arrays, Date, an in-house RegExp, and a large standard library (Math, JSON, Object/Array/String/Number). Compiled bytecode can be serialized, reloaded, and run without the source.

Three advanced tiers are real and tested, though each has named work remaining:

  • a machine-code JIT (x86-64 / Linux, behind jit) with an optimizing integer path (four-pass optimizer + register allocator) and a float path covering + - * / %, comparisons, control flow, and the SSE-expressible Math intrinsics (sqrt/abs/min/max/floor/ceil/trunc), emitting into W^X memory via raw syscalls; object/string ops stay interpreted;
  • a pure-Rust, no_std WebAssembly engine — full MVP plus sign-extension, saturating conversion, bulk-memory, multi-value, and typed structured control — with a JS↔WASM boundary (validate/compile/instantiate, the Module/Instance/Global/Memory objects, host-function imports, and stateful instances), driven by a .wast/WAT spec harness (a spec-derived corpus, not yet the full upstream suite);
  • a zero-copy "D′" snapshot tier atop the moving GC: a verified codec that mmap-reloads a heap (eleven reference cell kinds, cross-kind cycles, insertion-order-preserving) and runs a restored closure both in place and reloaded into a fresh runtime.

Kataan works as a CLI/REPL, a Rust library, and a C library (kt_eval). See the roadmap for the remaining road to a complete engine.

Why

Modern JavaScript engines (V8, JavaScriptCore, SpiderMonkey) all rely on the same handful of techniques. Kataan commits to the full set from the architecture stage rather than retrofitting them:

  • NaN-boxed values — every JS value in 64 bits, Copy, dense on the stack.
  • Hidden classes (shapes) + inline caches — property access becomes a slot load, not a hash probe; the single biggest lever for real-world JS speed.
  • Register-based bytecode VM — fewer instructions than a stack VM, and JIT-friendly by construction.
  • Interned atoms + rope strings — O(1) key comparison, non-quadratic string building.
  • A precise, generational, moving GC — bump allocation makes new nearly free.
  • Tiered execution — a fast interpreter first, then a baseline JIT, then an optimizing JIT driven by inline-cache type feedback.

The language core is sans-I/O and no_std + alloc; the host runtime (event loop, timers, fetch, crypto, modules) is a separate layer on top, so the engine stays embeddable. See ROADMAP.md for the road ahead — the remaining work to a complete JS+WASM engine and the design invariants behind it.

Pure Rust, no foreign code

Kataan depends on no C libraries. Where it needs cryptography or networking it reuses sibling pure-Rust Karpelès Lab crates:

  • purecryptocrypto.subtle / WebCrypto, crypto.getRandomValues, randomUUID, and TLS.
  • rsurl — HTTP/HTTPS transport behind fetch and the Node http(s) compatibility layer.

unsafe is quarantined: the crate is unsafe_code = "deny" (not forbid), and only the ffi module plus a small, audited set of VM hot-path primitives opt back in with a scoped #[allow(unsafe_code)] and a safety comment.

Try it

The CLI runs JavaScript today:

$ cargo run -- run -e '
class Animal { constructor(n){ this.n = n } speak(){ return `${this.n} makes a sound` } }
class Dog extends Animal { speak(){ return `${this.n} barks` } }
console.log(new Dog("Rex").speak());
console.log([1,2,3,4].filter(x => x % 2).map(x => x*x).reduce((a,b)=>a+b, 0));
console.log(JSON.stringify({ ok: true, items: [...new Set([1,1,2,3])] }));
'
Rex barks
10
{"ok":true,"items":[1,2,3]}

It also exposes each pipeline stage, and an interactive REPL:

$ cargo run -- lex    -e 'x => x * 2'  # token stream
$ cargo run -- parse  -e 'x => x * 2'  # AST dump
$ cargo run -- disasm -e '1 + 2 * 3'   # register bytecode
$ cargo run -- repl                    # interactive session
$ cargo run -- --help

The disasm command shows the register bytecode the compiler emits:

$ cargo run -- disasm -e 'let s = 0; let i = 0; while (i < 3) { s += i; i += 1; } s'
chunk #0 "<main>"  (regs=14, params=0)
     0  LoadInt     r0, 0
     ...
     6  Lt          r6, r4, r5
     7  JumpIfFalse r6, +9
     ...
    16  Jump        -13
    18  Return      r13

Use as a Rust library

use kataan::parser::Parser;
use kataan::interp::Interp;

let program = Parser::parse_program("const sq = x => x * x; sq(8)").unwrap();
let mut interp = Interp::new();
assert_eq!(interp.run(&program).unwrap().to_js_string(), "64");

The lower stages are available directly too:

use kataan::lexer::{Lexer, TokenKind};

let tokens = Lexer::new("let answer = 42;").tokenize().unwrap();
assert_eq!(tokens.first().unwrap().text("let answer = 42;"), "let");
assert_eq!(tokens.last().unwrap().kind, TokenKind::Eof);

Feature flags

Feature Default Description
std Standard library; implies alloc. Needed by the host runtime/CLI.
alloc Heap-backed types; the minimum for the pure language core.
regex In-house regular-expression engine.
intl In-house Intl-lite (collation, number/date formatting).
module ESM + CommonJS module loader.
host Host runtime: event loop, timers, console, encoding, URL, streams.
fetch fetch / Node http(s) over rsurl.
crypto crypto.getRandomValues / WebCrypto over purecrypto.
jit Machine-code JIT (x86-64/Linux): optimizing integer + float paths.
ffi The C ABI (the only place broad unsafe is allowed).
cli The kataan command-line tool.

Build the bare no_std language core with:

cargo build --no-default-features --features alloc

Use as a C library

cargo rustc --lib --release --features ffi --crate-type staticlib   # libkataan.a
cargo rustc --lib --release --features ffi --crate-type cdylib      # libkataan.so

The header is include/kataan.h; a runnable example lives in tests/ffi_smoke.c. The C ABI follows the purecrypto conventions — KtStatus return codes, the in/out length convention, opaque handles, and a panic catch at every boundary.

License

MIT © 2026 Karpelès Lab Inc. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

 
 
 

Contributors