Sycalls (bp injection)

------ DRAKVUF ------
 
syscalls::syscalls 
    -> create_trap_config()
    -> drakvuf_add_trap()
 
drakvuf_add_trap() -> inject_trap_sw()
 
inject_trap_sw() -> inject_trap_pa()
 
inject_trap_pa() -> vmi_write_sw_trap()
 
vmi_write_sw_trap() -> 
 
------ LibVMI ------
 
-> vmi_write_32_pa() -> vmi_write_pa()
 
vmi_write_pa() -> vmi_write()
 
vmi_write() -> driver_write()
 
driver_write() 
    -> driver.write_ptr = &xen_write 
            -> xen_write()
 
xen_write() -> xen_put_memory()
 
xen_put_memory()
 
    -> xen_get_memory_pfn()
            -> libxcw.xc_map_foreign_range()
 
    -> memcpy()

The syscalls plugin is responsible for tracking the execution of function-entry-points responsible to handling system calls on Windows and Linux. The function accomplishes this by looping through the Rekall-profile (legacy) and using a BREAKPOINT trap on each function whose name starts with Nt on Windows and sys_ on Linux.

The recommended way of generating JSON profiles is now using Volatility3/dwarf2json

[~ Legacy Rekall ~]

How we can achieve this on Intel x64:

Intel’s Extended Page Tables (EPTs) allow to define execute-only memory, which lends a VMM the ability to hide code instrumentation.

Yes, on ARM, there’s a simmilar way… This can be done by changing memory attributes on ARM:

But, there are some caveats… We can use the “memory permissions switch” trick to intercept and do our magic. But this is may cause a considerable overhead, because a lot of uninteresting memory access will trigg an interception. And, in contrast to Intel and AArch64, SLAT on AArch32 does not support this functionality (We have a lot of devices running AArch32)

Instead of changing permissions of a single memory view at run-time, altp2m allows to allocate a set of views beforehand. This way, a monitor can individually assign a specific memory view to each vCPU of DomU. As such, for instance on memory access violations, VMI-tools can switch the view of the affected vCPU to a less restrictive view, instead of explicitly relaxing permissions of the view that led to the trap; switching views is as simple as switching the domain’s VTTBR.

Advntages over the “memory permissions switch trick”:

Less overhead.
We do not relax permissions of other vCPUs and thus avoid race conditions.

On a potentially malicious write attempt, instead of relaxing permissions, the monitor switches the view of the trapped vCPU to the less restrictive original-view. This allows us to record the event, satisfy the write request, and avoid the target application to become suspicious

Xen altp2m on ARM allows to dynamically define and switch among different VTTBRs per domain and vCPU. The interaction with the altp2m interface takes place through dedicated hypercalls, called HVMOPs. These facilitate the privileged domain Dom0 to create, switch, and destroy individual memory views that are then applied to unprivileged domains DomU (Figure 2). In addition, the altp2m interface allows to define memory access permissions of individual guest physical page frames per view and also remap individual guest frames to different machine frames

On ARM the Virtualization Translation Table Base Register (VTTBR) holds the base address of SLAT tables, similarly to the EPTP on Intel.

To ensure the monitor regains control immediately after the write, we single-step the trapped vCPU and switch back to the execute-view.

Sadly,** AArch32 is not capable of enforcing execute-only pages**; every code page has to be both readable and executable, or instruction fetches will fail.

To overcome this limitation we explore the TLB organization on ARM.

Contrary to x86, on ARM a TLB tag corresponds to a specific memory view instead of a vCPU; by switching altp2m views we activate the associated VMID, without having to flush differently tagged mappings of same guest physical memory.

On the other hand, if different altp2m views shared a VMID, the guest would be susceptible to using stalled translations in the TLB, even though the active view contained the most recent mappings in memory.

We choose to employ this architectural feature to hide modified code pages from data fetches and thus mimic execute-only memory

To minimize TLB maintenance, the TLBs are associated or tagged with an identifier, which organizes the TLB entries based on a specific context. This means TLB entries with the same Address Space Identifier tag refer to a specific process. Similarly, entries tagged with the same Virtual Machine Identifier (VMID) refer to a specific VM or rather to a specific second level translation table (the VMID is part of the Virtualization Translation Table Base Register VTTBR). As such, the CPU does not need to flush the TLBs on context switches, thus significantly increasing the overall system performance.

As such, we extend the altp2m interface to pair the VMIDs of altp2m views to de-synchronize the physically separated iTLB and dTLB.

The TLB organization on x86 and ARM evolved to a split TLB architecture. A split TLB separates the TLB into two disjoint sets comprising the instruction TLB (iTLB) and data TLB (dTLB); the iTLB caches translated instruction fetches, whereas the dTLB holds translated data fetches.

On ARM, the introduced cache is called unified TLB (uniTLB) and is comparable to the shared TLB (sTLB) on x86. The uniTLB holds evicted entries from the iTLB and dTLB and is first consulted before walking the page tables.

To cause an inconsistent state in the TLBs that we require for hiding code pages, we prime the iTLB so that it holds guest frame mappings that translate to different machine frames than those cached in the dTLB. That is, we require a mechanism that allows us to translate one guest frame to two physically different machine frames; only one of both mappings will be exclusively cached either in the iTLB or dTLB.

It is essential that both memory views are tagged with the same VMID; the system will ignore the primed iTLB entry, if it switches to a memory view with a different VMID. We grant the original page read-only permissions. We withdraw write permissions from the original page in the original-view to intercept write attempts. This allows us to monitor any change to the original page and propagate the modifications to the shadow-copy as required.

Since AArch32 does not support execute-only mappings, we grant the shadow-copy read-execute permissions. Also, we withdraw the execute right from all other mappings in the execute-view, as to limit the execution in this view to the page of interest (Figure 6).

We configure the original-view to be active by default. On the first execution of the function to be monitored, the guest hands over control to the VMM: the instruction fetch violates the permissions of the read-only mapping. As such, the translation result does not get cached in the TLBs. The monitor leverages this architectural property to intercept the guest upon permission violation and switch to the execute-view, which grants execution access of the requested guest frame. This time, upon the successful SLAT table walk using the execute-view, the translation mechanism populates the iTLB with the machine frame that is associated with the executeview (MFN2 in Figure 5). Consequently, further instruction fetches from the page in question will directly consult the primed iTLB entry until it gets evicted. When the primed iTLB entry gets evicted it will need to be primed again. Upon execution of the SMC in the execute-view, the monitor can single-step the original instruction as described in Sections 3.2 and 3.3. After single-stepping the monitored instruction, the monitor switches back to the original-view.

We prepare the iTLB so that it holds guest frame mappings that translate to different machine frames than those cached in the dTLB.
1. We require a mechanism that allows us to translate one guest frame to two physically different machine frames.
2. Only one of both mappings will be exclusively cached either in the iTLB or dTLB.
3. It is essential that both memory views are tagged with the same VMID.
4. The system will ignore the prepared iTLB entry, if it switches to a memory view with a different VMID.
5. We withdraw write permissions from the original page in the original-view to intercept write attempts. This allows us to monitor any change to the original page and propagate the modifications to the shadow-copy as required.
We configure the original-view to be active by default.
The instruction fetch violates the permissions of the read-only mapping. 6. As such, the translation result does not get cached in the TLBs. 7. The monitor leverages this architectural property to intercept the guest upon permission violation and switch to the execute-view, which grants execution access of the requested guest frame.
This time, upon the successful SLAT table walk using the execute-view, the translation mechanism populates the iTLB with the machine frame that is associated with the execute-view (MFN2 in Figure 5). 8. Consequently, further instruction fetches from the page in question will directly consult the primed iTLB entry until it gets evicted. 9. When the primed iTLB entry gets evicted it will need to be primed again.
Upon execution of the SMC in the execute-view, the monitor can single-step the original instruction
After single-stepping the monitored instruction, the monitor switches back to the original-view.

The setup configures two views that map one guest frame to two machine physical frames with different access permissions. By priming the iTLB the target instruction is fetched from the execute-view, while reads use the original-view. As the iTLB and dTLB hold mappings from two different views, the primed system does not need the VMM to switch the views. This setup satisfies reads initiated, for example, by integrity checkers. At the same time, it transparently causes the guest to execute the SMC instruction in the shadow-copy of the original page (R3). Thus, split-TLB incurs minimal overhead, as it traps to the VMM only for the purpose of priming the TLBs or on execution of SMC instructions. Nevertheless, this setup entails a limitation that we discuss in detail in Section 5.2

The original-view maps the target to the original machine frame; the execute-view to the shadow-copy. priming the TLBs or on execution of SMC instructions. Nevertheless, this setup entails a limitation that we discuss in detail in Section 5.2.

In the following discussion, the x86 INT instruction (0xCC) will be used to trap execution. For ARM, as showed on “Hiding in the Shadows: Empowering ARM for Stealthy Virtual Machine Introspection”, the SMC instruction is a good choice:

The SMC instruction is the ideal candidate as it can be configured to trap to the VMM and thus employed as a trigger to switch the execution-flow of the guest OS to the VMM. The benefit of using the SMC instruction is that the guest is architecturally unable to subscribe to SMC traps. In contrast to software breakpoints, SMC traps can only be directed to TrustZone or to the VMM. This property reduces complexity of the monitor, as the execution of an SMC instruction never has to be re-injected into the guest. A limitation of the SMC in place of a software breakpoint is that it can only be executed in EL1, that is the guest kernel.

By reading the first two instructions from the prolog of the target kernel function into a backup buffer and then overwriting the function’s first instruction with an SMC, the monitor will intercept the guest kernel (R1) on execution of the marked kernel function. Upon execution of the SMC, the monitor can place the original first instruction back into memory while writing a second SMC in place of the immediately following instruction. When execution is resumed, the original instruction is executed followed by the execution of the second SMC. Now, the monitor can restore the original SMC without losing control over the guest kernel and hence conclude single-stepping of the first instruction in the system-call handler. On AArch64, we achieve stealthy single-stepping of the guest (R2) by configuring the instrumented page, holding the system-call handler, as execute-only (R3); reads and writes will trap into the VMM.

As the only disadvantage injection with SMC has, is that the user space is not allowed to execute SMCs. We therefore can only trap code running in kernel space.

To create a breakpoint we insert the instruction SMC where the control flow should be interrupted, e.g., at the entry of a function. When this instruction is executed it causes a SIGTRAP signal which is handled by Xen, which sends an event on the event channel which is handled by libvmtrace.

Xen abstracts the Interrupt/Exception handling for us, but on the low-level view, the execution flow starts with the*** Exception Handler*** set on Excetion Vector Table (VBAR_ELn).

https://developer.arm.com/documentation/100933/0100/AArch64-Exception-and-Interrupt-Handling

Overview

🌱 Back to Garden

sargx digital garden

Explorer

Sycalls (bp injection)

Drakvuf Hands-on-Overview