https://www.slideshare.net/tklengyel/hacktivity-2016-stealthy-hypervisor-based-malware-analysis
https://github.com/sergej-proskurin/xen
https://github.com/sergej-proskurin/xen/blob/arm-altp2m-v5/xen/arch/arm/altp2m.c
https://github.com/sergej-proskurin/drakvuf/tree/master
https://github.com/drakvuf-on-arm
https://www.youtube.com/watch?v=EZPXy314q3E
(…) while Intel as well as the AArch64 execution state of ARMv8 CPUs allow to hide memory-artifacts by marking memory pages as execute-only, second level translation tables of both the AArch32 execution state of ARMv8 and the ARMv7 architecture prohibit execute-only memory and thus impede stealthy VMI
Both AArch32 and AArch64 support up to 16 configurable breakpoints. Every breakpoint can be set by means of the Breakpoint Control Register DBGBCR in conjunction with one of the Breakpoint Value Registers DBGBVR.
Respectively, ARM supports up to 16 watchpoints, which function in a similar way. In the simplest case, a set breakpoint or watchpoint holds an instruction address, that generates an associated debug event on every instruction or data fetch
On top of that, ARM features software breakpoint instructions. The CPU generates debug events on execution of these instructions
To single-step hit breakpoints, a monitor can configure DBGBCR to mismatch the breakpoint address in one of the DBGBVR registers: as addresses of instructions following the hit breakpoint do not match the address in DBGBVR, this will cause the CPU to generate a debug event on execution of every following instruction.
Alternatively, AArch64 allows to generate Software Step exceptions by setting the SS bit of the Monitor Debug System Control MDSCR_EL1 and Saved Program Status Register SPSR of the target exception level.
For instance, to single-step a hit breakpoint in EL1 the monitor must set the MDSCR_EL1.SS and SPSR_EL1.SS bits. After returning to the trapped instruction, the SPSR will be written to the process state PSTATE register in EL1. Consequently, the CPU executes the next instruction and generates a Software Step exception.
To prevent disclosure of the analysis system, the VMM can intercept (and emulate) guest-access to debug registers and hence cover, e.g., set breakpoints controlled by the VMM. Yet, we highlight that adversaries can use the finite number of breakpoint and watchpoint registers as side channel information to reveal the analysis framework. Also, in-guest debugging cannot be perfectly emulated.
Software Step exceptions on AArch64 are also visible to guest VMs. The VMM can intercept accesses to MDSCR_EL1 and hide the SS bit. Also, ARM forbids direct access to the PSTATE.SS bit in all exception levels, complicating the discovery of analysis systems.
Still, an adversary with control over the guest’s exception handlers in EL1 can reveal the analysis by provoking an interrupt from EL1 that traps as well to EL1: in the exception handler, the PSTATE holding the set SS bit will be written to SPSR_EL1 which in turn is accessible. We have validated this behavior as part of our evaluation (Section 4.4). Since accesses to SPSR_EL1 cannot be intercepted, the VMM would need to trap and emulate every instruction in the exception handlers to cloak the analysis. This, however, is not a good alternative as the overhead of handling exceptions would rise enormously. As such, we are in need for stealthy single-stepping alternatives, upon which we place great emphasis in this paper.
Stealthy monitoring requires:
- (R1) a mechanism to intercept the guest in EL1 (guest kernel),
- (R2) a single-stepping mechanism that cannot be discovered by in-guest artifacts, and
- (R3) execute-only memory
Sadly, ARM does not support stealthy single-stepping (Section 2). On AArch32 the finite number of hardware breakpoints helps the attacker to infer the presence of an analysis framework; on AArch64, the attacker can spill the set PSTATE.SS bit into the SPSR_EL1 register to uncover the analysis framework. Besides, AArch32 lacks execute-only memory (Section 2). As such, both architectures do not meet (R2) and AArch32 additionally fails to comply with (R3)
To tackle these shortcomings, we present VMI primitives that enable monitoring of VMs, without relying on the intended hardware mechanisms. Instead, we employ virtualization extensions to intercept the guest kernel at arbitrary locations (aka. tap points [23]) and leverage such tap points to single-step trapped instructions in a stealthy way. Next, we extend Xen to control the guest’s physical memory, which presents the foundation for stealthy monitoring on ARM. Finally, we combine these primitives to set the ground for stealthy VMI on ARM. Figure 2 illustrates an overall architecture of our system whose components are described in this section. In the following, we present primitives that, when combined, form stealthy monitoring systems on both AArch64 and AArch32.

We implement altp2m for ARM upon the Xen p2m subsystem. Xen p2m stands for physical to machine and leverages SLAT to manage and isolate memory between guest domains (Xen’s notion for VMs) and the VMM. On ARM the Virtualization Translation Table Base Register (VTTBR) holds the base address of SLAT tables, similarly to the EPTP on Intel. Xen p2m maintains only one VTTBR. As such, the p2m subsystem maintains a single view of the guest’s physical memory, even for VMs with multiple vCPUs.
Xen altp2m on ARM allows to dynamically define and switch among different VTTBRs per domain and vCPU. The interaction with the altp2m interface takes place through dedicated hypercalls, called HVMOPs. These facilitate the privileged domain Dom0 to create, switch, and destroy individual memory views that are then applied to unprivileged domains DomU (Figure 2). In addition, the altp2m interface allows to define memory access permissions of individual guest physical page frames per view and also remap individual guest frames to different machine frames.