## Intel's x86 Instruction Encoding
Today I learned just how complex intel's x86 instruction encoding format(IEF) is and below I'll try to give a brief overview of what I garnered.
Intel's IEF generally follows the format shown below, but not strictly as certain fields can be optional depending on what bits are set, or what prefixes, opcodes are chosen.
```
opcode Mod REG R/M SS Index Base Disp Immediate
[ prefixes ][ _ _ _ _ _ _ _ _ _ ][ _ _ | _ _ _ | _ _ _ ][ _ _ | _ _ _ | _ _ _ ][ _ _ _ _ ][ _ _ _ _ _ ]
| |
d |
s
```
### Prefixes
There are different prefixes that can be used in conjunction with the different encoding syntax. In this (til), I'll only be talking about the legacy prefixes and opcodes. There are 4 prefix groups and an instruction can have up to 4 prefixes from different groups in order to have a defined behavior. These prefixes affect the behavior of the instruction.
- LOCK , REPNE/REPNZ , REP or REPE/REPZ
- Segment override
- Operand-size override
- Address-size override
Another important prefix is the REX prefx. The REX prefix specifies the operand size to 64bit when an instruction does not does default to 64-bit operand size. The prefix changes the operand size to 64bit also when the `s` bit in the opcode is set else it defaults to 8bit.
### Opcode
Opcodes are either one or two bytes long and it encodes specific information about the operation type, operand size and the direction of data transfer. The `d` bit specifies the direction of data transfer by swapping the source and destination operand.
If `d` = 0:
add [rax], rdi
If `d` = 1:
add rdi, [rax]
The `s` bit changes the operand size. If the `s` = 0, the operand size is assumed to be 8bit. If `s` = 1, then the operand size is assumed to be either 16, 32, or 64bit. The default operand size when the `s` bit is set is 32bit, but to specify a 16/64bit operand size, a special prefix is needed.
### ModR/M
The Mod bits specify the addressing mode and There are 4 addressing [modes](https://wiki.osdev.org/X86-64_Instruction_Encoding#ModR/M).
The REG and R/M field specifies the source or destination operand. However, the R/M field when coupled with the Mod field specifies displacement addressing mode or register indirect addressing mode or SIB with no displacement. To learn more about this check [here](https://wiki.osdev.org/X86-64_Instruction_Encoding#Immediate_opcode_byte) and [here](http://www.c-jump.com/CIS77/CPU/x86/lecture.html#X77_0350_intel_manual_opcode_bytes_cont).
### SIB
The Scaled Index Byte(SIB) specified the scaled index addressing mode. In this mode, the index is scaled by a factor of either 1, 2, 3 or 4 before it is added to the base. e.g:
```
[ reg1 + reg*n] ; scaled index mode without disp
[ disp + reg1 + reg*n] ; scaled index mode with disp
```
- SS: specifies the scale factor [1, 2, 3, 4].
- Index: specifies the register to be scaled.
- Base: specifies the base address/register that the scaled index will be added to.
There's also more to the SIB, as it can also be used in conjunction with the Rex Prefix and that can modify the size of the specified register. I won't go into detail, there are better sources out there that expand more on this. Check [sources section](#_sources_).
THE END...
THANKS FOR READING!
---
If you notice any errors or an incorrect statement, pls and pls do reach out to me @temitayo.v.ojo.gr@dartmouth.edu.
The ending might seem very abrupt, I just don't how else to end. I decided not to include anything about displacement and immediate because its pretty self-explanatory if you understand all the other fields. Also I don't want to make these too lengthy. If you aren't satisfied with this overview check the list of sources I've compiled below.
##### _sources_:
- Best [source](https://wiki.osdev.org/X86-64_Instruction_Encoding#ModR/M) so far with really good tables
- [reddit yappings](https://stackoverflow.com/questions/35379820/what-do-instruction-prefixes-mean-in-modern-x86)
- Intel's x86_32/64 [Instruction reference](http://ref.x86asm.net/coder64.html)
- Lecture on x86_32 [encoding](http://www.c-jump.com/CIS77/CPU/x86/lecture.html). Really good resource.
- [Short info](https://blog.kenanb.com/code/low-level/2024/01/06/encoding-diagram-attempt.html) but has a really good diagram that helps with visualizing how the fields connect.
- [List of more instruction encoding resources](https://blog.kenanb.com/code/low-level/2024/01/07/x86-insn-encoding-resources.html)
- cool [disassemler](https://github.com/zyantific/zydis/tree/master) tool that visualizes how instructions are encoded
- [Cool Trick](https://github.com/Theldus/stelf). Using knowledge of instruction encoding, you can encode data in your instructions. cool!
- [Handcrafting Elf](https://medium.com/@dassomnath/handcrafting-x64-elf-from-specification-to-bytes-9986b342eb89). Doesn't really touch on instruction encoding but it is a good way to understand by writing your own elf file.