Extended Memory Interface (EMI)
Exceptions Interrupts (FIQ & IRQ)
https://people.kernel.org/linusw/how-the-arm32-kernel-starts
https://opensource.com/article/20/12/52-bit-arm64-kernel
http://research.tedneward.com/languages/assembly/arm/index/index.html

- Can be either little-endian or big-endian as controlled by bit 9, the E bit, of the Program Status Register (CPSR)
- ARM processors always fetched two instructions ahead of the currently executed instructions. During execution, PC stores the address of the current instruction plus 8 (two ARM instructions) in ARM state, and the current instruction plus 4 (two Thumb instructions) in Thumb state. This is different from x86 where PC always points to the next instruction to be executed


ARM Processors can operate under three different states. A processor that is executing ARM instructions is operating in ARM state. A processor that is executing Thumb instructions is operating in Thumb state. A processor that is executing ThumbEE instructions is operating in ThumbEE state. A processor can also operate in another state called the Jazelle® state. These are called instruction set states.
The assembler cannot directly assemble code for the Jazelle state. A processor in one instruction set state cannot execute instructions from another instruction set.
The processor knows that it’s in thumb mode by turning on the least-significant bit of the program counter, causing the program counter to have an odd value. This bit is ignored for the purpose of instruction fetching and you can switch between ARM and thumb mode by toggling this bit.
ARM:
The ARM instruction set is a RISC set of 32-bit instructions providing a comprehensive range of operations. Unlike CISC processors, RISC engines generally execute each instruction in a single clock cycle, which typically results in faster execution than on a CISC processor with the same clock speed.
THUMB and ThumbEE:
The ARM instruction set is a set of 32-bit instructions providing a comprehensive range of operations.
ARMv4T and later define a 16-bit instruction set called Thumb. Most of the functionality of the 32-bit ARM instruction set is available, but some operations require more instructions. The Thumb instruction set provides better code density, at the expense of performance.
ARMv7 defines the Thumb Execution Environment (ThumbEE). The ThumbEE instruction set is based on Thumb, with some changes and additions to make it a better target for dynamically generated code, that is, code compiled on the device either shortly before or during execution.
The difference between two equivalent instructions (ARM and Thumb) lies in how the instructions are fetched and interpreted prior to execution, not in how they function. Since the expansion from 16-bit to 32-bit instruction is accomplished via dedicated hardware within the chip, it doesn’t slow execution even a bit. But the narrower 16-bit instructions do offer memory advantages.
When operating in the 16-bit Thumb state, the application encounters a slightly different set of registers, without R8-R12
**JAZELLE**** (not used anyway, android rekt it)**:
Allows some ARM processors to execute Java bytecode in hardware as a third execution state. Jazelle is an execution mode in ARM architecture which “provides architectural support for hardware acceleration of bytecode execution by a Java Virtual Machine (JVM)
The Jazelle extension uses low-level binary translation, implemented as an extra stage between the fetch and decode stages in the processor instruction pipeline. Recognised bytecodes are converted into a string of one or more native ARM instructions.
Bytecodes are decoded by the hardware in two stages (versus a single stage for Thumb and ARM code) and switching between hardware and software decoding (Jazelle mode and ARM mode) takes ~4 clock cycles.
Java bytecode is indicated as the current instruction set by a combination of two bits in the ARM CPSR (Current Program Status Register). The “T”-bit must be cleared and the “J”-bit set. The BXJ instruction attempts to switch to Jazelle state, and if allowed and successful, sets the “J” bit in the CPSR; otherwise, it “falls through” and acts as a standard BX (Branch) instruction. The Java program counter (PC) pointing to the next instructions must be placed in the Link Register (R14) before executing the BXJ branch request, as regardless of hardware or software processing, the system must know where to begin decoding.

ARM uses a load-store model for memory access which means that only load/store (LDR and STR) instructions can access memory. While on x86 most instructions are allowed to directly operate on data in memory, on ARM data must be moved from memory into registers before being operated on.

https://www.youtube.com/watch?v=ViNnfoE56V8
https://azeria-labs.com/writing-arm-assembly-part-1/
https://www.youtube.com/watch?v=gfmRrPjnEw4