# Instruction encoding

Detailed bytecode structure for all instructions (currently - from **Cairo**). Proving of program considerations.

In our processor architecture instructions have variable length: they may take one or two words. Two-word instructions are called immediate and can be distinguished by the value of bits 50-52 of the first word being `000` - it means that next word within instruction will be used as immediate value. One-word instructions are called relative in our terminology. Instruction size will be referred later as `insn_size` and is equal to 1 for one-word instructions and 2 for two-word instructions.

**DST\_OUT** is a 1-bit field, determining if res should be written to `*dst`

* `0`: res would not be written
* `1`: res would be written, `*dst <- res`

**AP\_MODE** is a 2-bit field, representing ruleset for updating AP register during instruction execution:

* `00`: Same, `AP <- AP`.
* `01`: Increment AP, `AP <- AP + 1`.&#x20;
* `10`: Increment AP by res, `AP <- AP + res`.

**PC\_MODE** is a 3-bit field, representing ruleset for updating PC register during instruction execution:

* `000`: PC moving to the next instruction, `PC <- PC + insn_size`.
* `001`: Absolute jump, `PC <- res`.
* `010`: Relative jump, `PC <- PC + res`.
* `100`: Conditional relative jump, `PC <- PC + insn_size` if `*dst` == 0, `PC <- PC + *op1` otherwise

**OPCODE** is a 4-bit field, representing primary operation of instruction and, therefore, main constraint it generates:

* `0000`: ADD, `res = *op0 + *op1`. ADD and SUB instructions will use this opcode, as constraint is working both ways.
* `0001`: MUL, `res = *op0 * *op1`. MUL and DIV instructions will use this opcode, as constraint is working both ways.
* `0010`: CALL, `*ap = SP; *(AP + 1) = PC; AP <- AP + 2; SP <- AP + 2`. CALL, CALL ABS, CALL REL instructions will use this opcode, each with corresponding **PC\_MODE**
* `0100`: RET, `SP <- *dst`
* `1000`: MOV, `res = *op`

**OP0\_MODE** is a 1-bit field, representing addressing mode for op0:

* `0`: use SP as the offset base.
* `1`: use AP as the offset base.

**OP1\_MODE** is a 3-bit field, representing addressing mode for op1:

* `000`: use PC as the offset base. **This instruction is immediate and therefore is 2 words long**.
* `001`: use SP as the offset base.
* `010`: use AP as the offset base.
* `100`: user \*op0 as the offset base.

**DST\_MODE** is a 1-bit field, representing addressing mode for dst:

* `0`: use SP as the offset base.
* `1`: use AP as the offset base.

**DUMMY\_BIT** is a 1-bit field that shows if the current instruction is a dummy instruction.

* `0`: Normal instruction.
* `1`: Dummy instruction. For dummy instructions all bits except LSB and **DUMMY\_BIT** should be zero. Normally, compiler should not explicitly use dummy instructions, which are needed for public memory mechanism and trace padding.

**DST** is a 16-bit field, containing biased-representation of a memory address offset for the destination operand. Address meaning depends on the value of **DST\_MODE**, but technically any value is permitted here and should not cause Undefined Behavior.

**OP1** is a 16-bit field, containing biased-representation of a memory address offset for the second operand. Address meaning depends on the value of **OP1\_MODE**, but technically any value is permitted here and should not cause Undefined Behavior.

**OP0** is a 16-bit field, containing biased-representation of a memory address offset for the first operand. Address meaning depends on the value of **OP0\_MODE**, but technically any value is permitted here and should not cause Undefined Behavior.

Any field value not listed before this point should cause Undefined Behavior.

The first word of every instruction have following field structure with LE encoding (i.e. bits 0-15 means the 16 LSB of the word):

* Bits 00-15: **OP0**
* Bits 16-31: **OP1**
* Bits 32-47: **DST**
* Bits 48-48: **DUMMY\_BIT**
* Bits 49-49: **DST\_MODE**
* Bits 50-52: **OP1\_MODE**
* Bits 53-53: **OP0\_MODE**
* Bits 54-57: **OPCODE**
* Bits 58-60: **PC\_MODE**
* Bits 61-62: **AP\_MODE**
* Bits 63-63: **DST\_OUT**

Note that DUMMY\_BIT is placed in a very particular position to make instruction always fit the modulus of Goldilock field.

We will need two dummy instructions: Dummy Access Instruction and Dummy Padding Instruction. Neither of these instructions are included in the bytecode. These instructions don’t have to be “unpacked”, as they are not a part of the program itself. They are added by prover and more like auxiliary rows added to the trace.

Dummy Access Instruction is a one-word instruction having only **DUMMY\_BIT** and LSB set as 1, other fields must be 0. It does nothing and enforces no state transitions other than:

* addresses in PC, OP0, OP1, and DST are zero.
* values at PC, OP0, OP1, and DST are zero.
* if the previous instruction is dummy, `res <- res`, else `res <- pc`.&#x20;

This allows us to add dummy `(0, 0)`  access for the public memory trace (i.e. `(addr, value)` pairs to be shared with verifier) to the un-sorted memory trace while still including the actual `(addr, value)` pairs in sorted memory trace.

Dummy Padding Instruction is a one-word instruction having only DUMMY\_BIT set as 1, other fields must be 0. It does absolutely nothing and enforces that all registers values stay exactly the same as after the previous instruction. It is used to pad the trace so the number of rows is a power of two.
