Skip to main content

HashLink bytecode (and hlboot.dat)

note

Based on the documentation at the excellent hlbc wiki, and improved with information from the development of crashlink.

When compiling Haxe code for HashLink, the compiler generates a single binary file. The extension of such file is usually .hl, if the app is distributed with a HashLink vm, the bytecode file will be named hlboot.dat and will be found automatically by the main vm executable.

The binary contains everything that's needed to run which is data, type definitions and code. HashLink will generate native code for every function on startup (this is actually an AOT JIT if such term exists, everything gets compiled on startup, not lazily like on the JVM).

The bytecode is entirely typed and contains definition for every possible type in the program (including primitives and function types). The code is register based per function, a function declares some typed registers to be used by the instructions.

There are constant pools for signed integers, double precision floating point numbers, strings and bytes.

The bytecode also optionally contains debug info like source file names, lines and variable assignments.

Because most of the values in the structures are of variable size, it is impossible to know the position of anything without parsing the whole file beforehand.

Structures

var is a variable size integer.

Main structure

size (bytes/struct)namedescription
3magic"HLB"
1version
varflagshas debug info
varnints
varnfloats
varnstrings
varnbytesif version >= 5
varntypes
varnglobals
varnnatives
varnfunctions
varnconstantsif version >= 4
varentrypointfindex
4 * nintsintsi32 constant pool
8 * nfloatsfloatsf64 constant pool
stringsstringsstring constant pool
4bytes_sizeamount to read for bytes data
bytes_sizebytes_databyte strings data
var * nbytesbytes_pospos of byte strings in bytes_data
4ndebugfilesif has debug info, number of debug files entries
stringsdebugfilesif has debug info
ntypes * typetypestypes definitions
var * nglobalsglobalstypes of each globals
nnatives * nativenativesNative functions to be loaded from external libraries
nfunctions* functionfunctionsFunction definitions
nconstants* constantconstantsConstant definitions

Strings block

sizenamedescription
4strings_sizeamount to read next
strings_sizestrings_datastrings list, zero terminated strings
var * nstringsstrings_sizessizes of each string

Types

sizenamedescription
1kindtype kind
?definitionempty or some data based on the type kind

Type kinds

kindnamedescription
0voidno data
1u8no data
2u16no data
3i32no data
4i64no data
5f32no data
6f64no data
7boolno data
8bytesno data
9Dynno data, dynamic type, can be anything
10Funfunction type / signature
11Objobject type (haxe class)
12Arrayno data, arrays are dynamic
13Typeno data, the Type type, for reflection
14Refreference
15Virtuallike an anonymous class
16DynObjno data, like Dyn but it is an object
17Abstractabstract class
18Enumenum with its variants
19NullCan possibly be null type, types aren't nullable by default
20Methodlike Fun, but different
21Structlike Obj, but different
22PackedPacks the inner structure

Fun

sizenamedescription
varnargs
var * nargsargstype references of the function args
varretreturn type
Function names

As you can see, a function has no name. Its name can be inferred from where it is used though. They are bound as class methods (protos) or into class fields (bindings), so you can name them from that (after parsing the whole thing).

Obj

sizenamedescription
varnamestring ref
varsupertype ref, supertype (can be < 0, -> no supertype)
varglobalglobal ref, global representing the static fields
varnfieldsnumber of fields
varnprotosnumber of methods
varnbindingsnumber of field bindings
nfields * fieldfieldsclass fields
nprotos * protoprotosclass methods
nbindings * bindingprotosbindings between fields and functions (usually static functions)
Field

Field definition for an Obj or a Virtual

sizenamedescription
varnamestring ref, name
vartypetype ref of the field
Field references

Fields references are positive numbers only valid in a given type hierarchy. They are the field index in an array of all the fields in the hierarchy in order. Example :

class A {
var a: Int;
}

class B extends A {
var b: Int;
}

In this example, the index of the field b is 1, while the index of a is 0.

A consequence of this system is that you need to traverse the entire type hierarchy to gather all fields to know their indexes.

Proto

A class instance method.

sizenamedescription
varnamestring ref, name
varfindexglobal function index
varpindex?
Binding

Binds a field to a function, usually for static functions or constructors.

sizenamedescription
varfieldfield ref
varfindexglobal function index

See field references.

Ref

The type of reference : Ref<T>. Same structure as other wrapper types.

sizenamedescription
vartypetype behind the reference

Virtual

An anonymous data class with fields.

sizenamedescription
varnfieldsnumber of fields
nfields * fieldfieldsthe fields

Abstract

sizenamedescription
varnamestring ref

Enum

sizenamedescription
varnamestring ref
varglobal
varnconstructsnumber of constructs (variants)
Construct
sizenamedescription
varnamestring ref
varnparamsnumber of parameters
var * nparamsparamstype ref, parameters type

Null

Same structure as other wrapper types.

sizenamedescription
vartypeinner type

Method

Same structure as Fun.

Struct

Same structure as Obj.

Packed

Same structure as other wrapper types.

sizenamedescription
vartypeinner type

Natives

sizenamedescription
varlibstring ref, lib name
varnamestring ref, function name
vartypetype ref, function type
varfindexglobal function index

lib is the name of the external library to load. An exception is std which points directly to the standard library present in HashLink. If lib contains a ?, it means the library is optional and can possibly not be present.

Functions

sizenamedescription
vartypetype ref, function type
varfindexglobal function index
varnregsnumber of registers
varnopsnumber of instructions
var * nregsregsregisters types
nops * opcodeopsinstructions
? * nopsdebuginfoif has debug info, complicated encoding for file/line info for each instruction
varnassignsif has debug info && version >= 3
2 var nassignsassignstuples (variable name ref, opcode number)

Opcodes

An opcode consists of a variable size integer for the opcode index and a piece of data for each argument, determined by the argument type. Here is a list of all opcodes:

Basic Operations:

  • Mov: Copy value from src into dst register (dst = src)
  • Int: Load i32 constant from pool into dst (dst = @ptr)
  • Float: Load f64 constant from pool into dst (dst = @ptr)
  • Bool: Set boolean value in dst (dst = true/false)
  • Bytes: Load byte array from constant pool into dst (dst = @ptr)
  • String: Load string from constant pool into dst (dst = @ptr)
  • Null: Set dst register to null (dst = null)

Arithmetic:

  • Add: Add two numbers (dst = a + b)
  • Sub: Subtract two numbers (dst = a - b)
  • Mul: Multiply two numbers (dst = a * b)
  • SDiv: Signed division (dst = a / b)
  • UDiv: Unsigned division (dst = a / b)
  • SMod: Signed modulo (dst = a % b)
  • UMod: Unsigned modulo (dst = a % b)

Bitwise:

  • Shl: Left shift (dst = a << b)
  • SShr: Signed right shift (dst = a >> b)
  • UShr: Unsigned right shift (dst = a >>> b)
  • And: Bitwise AND (dst = a & b)
  • Or: Bitwise OR (dst = a | b)
  • Xor: Bitwise XOR (dst = a ^ b)
  • Neg: Negate value (dst = -src)
  • Not: Boolean NOT (dst = !src)

Increment/Decrement:

  • Incr: Increment value (dst++)
  • Decr: Decrement value (dst--)

Function Calls:

  • Call0: Call function with no args (dst = fun())
  • Call1: Call function with 1 arg (dst = fun(arg0))
  • Call2: Call function with 2 args (dst = fun(arg0, arg1))
  • Call3: Call function with 3 args (dst = fun(arg0, arg1, arg2))
  • Call4: Call function with 4 args (dst = fun(arg0, arg1, arg2, arg3))
  • CallN: Call function with N args (dst = fun(args...))
  • CallMethod: Call method with N args (dst = obj.field(args...))
  • CallThis: Call this method with N args (dst = this.field(args...))
  • CallClosure: Call closure with N args (dst = fun(args...))

Closures:

  • StaticClosure: Create closure from function (dst = fun)
  • InstanceClosure: Create closure from object method (dst = obj.fun)
  • VirtualClosure: Create closure from object field (dst = obj.field)

Global Variables:

  • GetGlobal: Get global value (dst = @global)
  • SetGlobal: Set global value (@global = src)

Fields:

  • Field: Get object field (dst = obj.field)
  • SetField: Set object field (obj.field = src)
  • GetThis: Get this field (dst = this.field)
  • SetThis: Set this field (this.field = src)
  • DynGet: Get dynamic field (dst = obj[field])
  • DynSet: Set dynamic field (obj[field] = src)

Control Flow:

  • JTrue: Jump if true (if cond jump by offset)
  • JFalse: Jump if false (if !cond jump by offset)
  • JNull: Jump if null (if reg == null jump by offset)
  • JNotNull: Jump if not null (if reg != null jump by offset)
  • JSLt/JSGte/JSGt/JSLte: Signed comparison jumps
  • JULt/JUGte: Unsigned comparison jumps
  • JNotLt/JNotGte: Negated comparison jumps
  • JEq: Jump if equal (if a == b jump by offset)
  • JNotEq: Jump if not equal (if a != b jump by offset)
  • JAlways: Unconditional jump
  • Label: Target for backward jumps (loops)
  • Switch: Multi-way branch based on integer value

Type Conversions:

  • ToDyn: Convert to dynamic type (dst = (dyn)src)
  • ToSFloat: Convert to signed float (dst = (float)src)
  • ToUFloat: Convert to unsigned float (dst = (float)src)
  • ToInt: Convert to int (dst = (int)src)
  • SafeCast: Safe type cast with runtime check
  • UnsafeCast: Unchecked type cast
  • ToVirtual: Convert to virtual type

Exception Handling:

  • Ret: Return from function (return ret)
  • Throw: Throw exception
  • Rethrow: Rethrow exception
  • Trap: Setup try-catch block
  • EndTrap: End try-catch block
  • NullCheck: Throw if null (if reg == null throw)

Memory Operations:

  • GetI8: Read i8 from bytes (dst = bytes[index])
  • GetI16: Read i16 from bytes (dst = bytes[index])
  • GetMem: Read from memory (dst = bytes[index])
  • GetArray: Get array element (dst = array[index])
  • SetI8: Write i8 to bytes (bytes[index] = src)
  • SetI16: Write i16 to bytes (bytes[index] = src)
  • SetMem: Write to memory (bytes[index] = src)
  • SetArray: Set array element (array[index] = src)

Objects:

  • New: Allocate new object (dst = new typeof(dst))
  • ArraySize: Get array length (dst = len(array))
  • Type: Get type object (dst = type ty)
  • GetType: Get value's type (dst = typeof src)
  • GetTID: Get type ID (dst = typeof src)

References:

  • Ref: Create reference (dst = &src)
  • Unref: Read reference (dst = *src)
  • Setref: Write reference (*dst = src)
  • RefData: Get reference data
  • RefOffset: Get reference with offset

Enums:

  • MakeEnum: Create enum variant (dst = construct(args...))
  • EnumAlloc: Create enum with defaults (dst = construct())
  • EnumIndex: Get enum tag (dst = variant of value)
  • EnumField: Get enum field (dst = (value as construct).field)
  • SetEnumField: Set enum field (value.field = src)

Other:

  • Assert: Debug break
  • Nop: No operation
  • Prefetch: CPU memory prefetch hint
  • Asm: Inline x86 assembly

Opcodes can have one of the following argument types:

  • Reg: Register index (VarInt)
  • Regs: Register indexes (VarInt for count, followed by count VarInts)
  • RefInt: i32 constant index (VarInt)
  • RefFloat: f64 constant index (VarInt)
  • InlineBool: Boolean value - 0 or 1 (VarInt)
  • RefBytes: Byte array constant index (VarInt)
  • RefString: String constant index (VarInt)
  • RefFun: Function index (VarInt)
  • RefField: Field index (VarInt)
  • RefGlobal: Global index (VarInt)
  • JumpOffset: Jump offset (VarInt)
  • JumpOffsets: Jump offsets (VarInt for count, followed by count VarInts)
  • RefType: Type index (VarInt)
  • RefEnumConstant: Enum constant index (VarInt)
  • RefEnumConstruct: Enum construct index (VarInt)
  • InlineInt: Inline integer value (VarInt)

Jump offsets

Jump offsets counts the number of instructions to jump (not the bytes). They may be negative for a backward jump, in this case, they always target a Label instruction.

Constants

Constants are used to initialize globals without code.

sizenamedescription
varglobalglobal to initialize
varnfieldsnumber of fields
varfieldsfield initializers

About bytecode data types

Variable sized integers

Since much of the linking in the bytecode happens with integer indexes, most of the integers are encoded in a variable byte size format spanning 1, 2 or 4 bytes (var). This was probably done to save space. You can find an example on how to decode one here and to encode one here.

Function indexes

A function reference is a function index (findex) but unlike other indexes it does not reference a function in the pool directly, as natives and functions share the findex space. We need to parse the whole file to map findexes to regular function or native indexes.

See also