Skip to content

feat(x86_64): boot Asterinas as zone1 via Multiboot2, with virtio-blk/net/console#322

Open
yydawx wants to merge 4 commits into
syswonder:dev-asterinasfrom
yydawx:ccf-asterinas
Open

feat(x86_64): boot Asterinas as zone1 via Multiboot2, with virtio-blk/net/console#322
yydawx wants to merge 4 commits into
syswonder:dev-asterinasfrom
yydawx:ccf-asterinas

Conversation

@yydawx

@yydawx yydawx commented Jun 3, 2026

Copy link
Copy Markdown

Summary
Adds Multiboot2 protocol support to boot Asterinas OS as a zone1 guest, using a minimal ASM bootloader. Minimal changes to core code — all x86-specific logic stays under arch/x86_64/.

Changes
Multiboot2 Boot Support (Commit 1: feat)

New mb2_boot.S bootloader: 16→32-bit transition with GDT + TSS setup, jumps to kernel entry
Loaded via boot_filepath in zone1 config, with GPA→HPA offset translation in hvisor-tool
ELF segment loading with kernel_entry_gpa passed to bootloader via ESI
multiboot_info_paddr/multiboot_enabled added to HvArchZoneConfig (x86-specific)
Multiboot path gated behind multiboot_enabled flag — Linux zone1 paths unaffected
Removed unused print_memory_map
Exception Handling (Commit 2: fix)

S2PT (EPT) violation handler via MMIO dispatch
GS_BASE/FS_BASE MSR read/write support for 64-bit guests
x2APIC MSR fallback for unrecognized registers in x2APIC range
TSC frequency reporting via CPUID
Virtio Robustness (Commit 3: fix)

NULL guard for VIRTIO_BRIDGE.res_agent() — returns gracefully instead of panic
Struct & Config Fixes (Commit 4: feat)

Added v_bus/v_device/v_function to HvPciDevConfig to match C side (fixes 128-byte zone_config size mismatch)
Bumped CONFIG_MAGIC_VERSION to 0x7 on both C and Rust sides
Zone0 memory layout and virtio config adjustments for zone1 coexistence
Example zone1 config: zone1-asterinas.json
Requires
yydawx/hvisor-tool#98 — Multiboot2 loading with GPA→HPA offset translation

@github-actions github-actions Bot added x86_64 feature New feature or request labels Jun 3, 2026
@yydawx

yydawx commented Jun 3, 2026

Copy link
Copy Markdown
Author

因为验证一下非常繁琐,所以我提供一个agent生成的Guide,如有问题可以随时沟通:

在 hvisor 上运行 Asterinas(x86_64 QEMU)

概述

本文档说明如何在 hvisor 上通过 Multiboot2 协议启动 Asterinas 内核作为 zone1 虚拟机。

测试版本:

  • hvisor:基于上游 d3260d0(v0.4 发布基线)+ ccf-asterinas 补丁

  • hvisor-tool:基于上游 b45971a + ccf-asterinas 补丁

  • Asterinas:OSDK 0.17.2,SMP=2

Asterinas 内核要求

Asterinas 内核编译参数:

make kernel SMP=2 BENCHMARK=sysbench/cpu_lat ENABLE_REGRESSION_TEST=true

其中:

  • SMP=2:与 zone1 分配的 CPU 数量匹配

  • BENCHMARK=sysbench/cpu_lat:将 benchmark 工具打包进 initramfs

  • ENABLE_REGRESSION_TEST=true:将回归测试打包进 initramfs

注意:Asterinas 不需要我们的内核修改即可在 hvisor 下启动。可选的 iface/init.rs 补丁仅用于让 virtio-net 的 eth0 接口出现;不打这个补丁, console、块设备 I/O、benchmark、回归测试全都能正常运行。

hvisor 侧修改(ccf-asterinas 分支)

共 4 个提交,基于上游 d3260d0

  1. Multiboot2 启动协议支持 — 为 Multiboot2 兼容内核设置 32 位保护 模式 guest 状态。加入 ELF 段加载、Multiboot2 info 结构构建、cpuid leaf 0x15 支持、EFER LME/LMA 位修复。

  2. 异常处理和中断路由优化 — EOI 卡死看门狗(5000 次后丢弃卡死 中断)、IRQ 去重(同向量不重复注入)、非 zone0 CPU 上的设备中断 转发回 zone0、INS/OUTS 指令支持。

  3. EPT PCI 映射和 virtio 鲁棒性 — ECAM 直通映射让 guest 能访问 PCI 配置空间、PCI MMIO 窗口映射、DMA 区域映射、virtio bridge 从 panic 改为 Option 返回值、无 PCI 设备的 zone 跳过 IOMMU。

  4. 示例配置文件virtio-asterinas-example.json zone1-asterinas-example.json,位于 platform/x86_64/qemu/configs/

hvisor-tool 侧修改(ccf-asterinas 分支)

共 4 个提交,基于上游 b45971a:

  1. Virtio 设备修复 — queue_sel 写入移除边界检查(让 guest 能 遍历队列)、QUEUE_NUM_MAX 读取加边界检查返回 0、GPA 翻译后加 NULL 指针检查、blk_size 字段初始化。

  2. 终端换行修复 — virtio-console TX 处理中 \n\r\n 转换, PTY 初始化错误处理补全。

  3. Multiboot2 zone 加载 — ELF 段解析、Multiboot2 info 结构构建、 zone_config 结构体扩展。

  4. 示例配置文件 — 同 hvisor 提交 4。

宿主机环境

QEMU:      qemu-system-x86_64 + KVM 加速
机型:      q35, kernel-irqchip=split
CPU:       host,+x2apic,+invtsc,+vmx
内存:      12 GB(可调,建议 ≥ 8 GB)
IOMMU:     Intel VT-d(intel-iommu,caching-mode=on,device-iotlb=on)
磁盘:      virtio-blk-pci,挂载在 PCIe bus 1
网络:      user-mode NIC

内存布局

Asterinas zone1 8 GB 示例(非连续 EPT 区域,绕过 ECAM 空洞):

GPA 0x00000000-0x1ff00000  (511 MB)     低端 RAM
GPA 0x1ff00000-0x20000000  (1 MB)       ACPI 表
GPA 0x20000000-0xb0000000  (2.25 GB)    中端 RAM(到 ECAM 空洞前)
GPA 0x100000000-0x150000000 (1.25 GB)   高端 RAM
GPA 0x150000000-0x250000000 (4 GB)      扩展 RAM
GPA 0xfeb00000-0xfeb02000               virtio MMIO 区域

总计约 8 GB,分散在 EPT 的 5 个 RAM 区域中。

编译和部署

完整构建

1. 构建 Asterinas(需要 Docker 容器):

docker exec syswand-build bash -c 'cd /root/syswand_asterinas/asterinas && make kernel SMP=2 BENCHMARK=sysbench/cpu_lat ENABLE_REGRESSION_TEST=true'

2. 构建 hvisor:

cd /home/yyda/workspace/syswand_asterinas/hvisor
make clean && make ARCH=x86_64 BOARD=qemu LOG=off

3. 构建 hvisor daemon:

cd /home/yyda/workspace/syswand_asterinas/hvisor-tool
make all ARCH=x86_64 LOG=LOG_INFO KDIR=/home/yyda/workspace/syswand_asterinas/linux

4. 部署到 rootfs:

cd /home/yyda/workspace/syswand_asterinas
sudo mount rootfs1.img -t ext4 /mnt
sudo rm -f /mnt/hvisor /mnt/hvisor.ko
sudo cp asterinas/target/osdk/iso_root/boot/aster-kernel-osdk-bin /mnt/
sudo cp asterinas/target/osdk/iso_root/boot/initramfs.cpio.gz /mnt/
sudo cp zone1-asterinas.json /mnt/
sudo cp virtio_cfg.json /mnt/
sudo cp hvisor-tool/output/hvisor /mnt/
sudo cp hvisor-tool/output/hvisor.ko /mnt/
sudo umount /mnt
sudo cp rootfs1.img ./hvisor/platform/x86_64/qemu/image/virtdisk/

快速重建(仅 hvisor)

cd hvisor && make clean && make ARCH=x86_64 BOARD=qemu LOG=off

快速重建(仅 daemon)

cd hvisor-tool && make all ARCH=x86_64 LOG=LOG_INFO KDIR=/path/to/linux
sudo mount rootfs1.img -t ext4 /mnt
sudo rm -f /mnt/hvisor /mnt/hvisor.ko
sudo cp hvisor-tool/output/hvisor /mnt/
sudo cp hvisor-tool/output/hvisor.ko /mnt/
sudo umount /mnt
sudo rm -f ./hvisor/platform/x86_64/qemu/image/virtdisk/rootfs1.img
sudo cp rootfs1.img ./hvisor/platform/x86_64/qemu/image/virtdisk/
sudo chown $(whoami):$(whoami) ./hvisor/platform/x86_64/qemu/image/virtdisk/rootfs1.img

运行

启动 hvisor

cd /home/yyda/workspace/syswand_asterinas/hvisor
make ARCH=x86_64 BOARD=qemu run LOG=off

LOG=off 关闭 hvisor 日志输出,保持终端干净。

在 zone0(根 Linux)中启动服务和 zone1

# 1. 可选:创建 TAP 设备给 virtio-net
ip tuntap add tap0 mode tap
ip link set tap0 up
ip addr add 192.168.100.1/24 dev tap0

2. 启动 virtio daemon

nohup ./hvisor virtio start virtio_cfg.json > /daemon.log 2>&1 &
sleep 2

3. 启动 Asterinas zone1

./hvisor zone start ./zone1-asterinas.json

在 zone1(Asterinas)中操作

# 查看文件系统
ls

运行回归测试

/test/run_regression_test.sh

运行 benchmark

mkdir -p /ext2
mount -t ext2 /dev/vda /ext2
sh /benchmark/run_all.sh

挂载持久化磁盘并写入

echo "hello" > /ext2/test.txt
cat /ext2/test.txt

配置文件说明

zone1-asterinas.json

定义 zone1 的内存区域、CPU、内核路径、initramfs 和 Multiboot2 参数。 关键字段:

字段 说明
memory_regions GPA→HPA 映射,包含 RAM 区域和 virtio MMIO
multiboot_enabled 必须设为 true
multiboot_info_paddr Multiboot2 info 结构的 GPA 地址
kernel_cmdline 传递给 Asterinas 内核的命令行参数
initramfs_filepath initramfs 文件路径

已知问题

  1. pty_blocking 测试死锁 — 已从 device 回归测试中跳过该测试。

  2. pivot_root errno 差异 — 两个 pivot_root 边缘情况中,Asterinas VFS 返回的 errno 与 Linux 不同(EBUSY vs EINVAL、ENOENT vs EINVAL)。

  3. cgroup 文件系统缺失 — 需要 /sys/fs/cgroup 的进程回归测试 因 cgroupfs 未挂载而失败。

  4. /proc/self/exe 不可用 — Memory 和 security 回归测试的 setup 阶段因 Asterinas procfs 未实现 /proc/self/exe 符号链接而失败。

  5. fio slab 分配失败 — zone1 内存不足 8 GB 时,fio 测试可能触发 "Allocating a slot from a full slab" panic。建议给 zone1 至少 8 GB。

  6. QEMU 内存必须覆盖 zone1 所有 HPA — 如果 QEMU 的 -m 小于 zone1 最高 HPA,将触发 EPT violation(#GP 风暴)。确保 -m ≥ 最高 HPA + 最大区域大小

@yydawx

yydawx commented Jun 3, 2026

Copy link
Copy Markdown
Author

目前我发现给asterinas配置virtio需要修改asterinas的源码,这个可以接受吗?还是我们需要想一个更好的办法。
应该是asterinas根据一些硬编码的设置,编译出带有对应设备信息的kernel,所以想改virtio,总是要修改asterinas的部分源码。

@caodg caodg requested a review from Solicey June 3, 2026 21:32
@Solicey

Solicey commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

目前我发现给asterinas配置virtio需要修改asterinas的源码,这个可以接受吗?还是我们需要想一个更好的办法。 应该是asterinas根据一些硬编码的设置,编译出带有对应设备信息的kernel,所以想改virtio,总是要修改asterinas的部分源码。

I also encountered this problem when configuring virtio, and I think it is acceptable to make a few changes to Asterinas.

Comment thread src/device/irqchip/pic/ioapic.rs Outdated
let zone = this_zone_arc.read();
// The guest IOAPIC RTE may route to a CPU outside this zone.
// If so, redirect to the zone's first CPU so the interrupt
// reaches the correct guest.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added cpu redirect fix in function VirtIoApic::write() Line136-142 in the last commit, you could remove redundant fixes.

Comment thread src/arch/x86_64/trap.rs Outdated

/// Walk guest page tables for virtual address `vaddr` using CR3 as the PML4 base.
/// Prints the full page table hierarchy for debugging.
fn walk_guest_page_table(vaddr: usize, cr3_gpa: usize) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could reuse function gva_to_gpa() in mmio.rs for page walking.

Comment thread src/config.rs Outdated
pub name: [u8; CONFIG_NAME_MAXLEN],
// Multiboot support (NEW)
pub multiboot_info_paddr: u64,
pub multiboot_enabled: u32,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider putting multiboot_info_paddr and multiboot_enabled inside arch_config, since they are x86 specific configs, which shall not be shared by other archs.

Comment thread src/hypercall/mod.rs Outdated
Comment on lines +150 to +165
Some(zone_arc) => {
let target_cpu = get_target_cpu(irq_id as _, target_zone as _);
// Verify target_cpu belongs to target_zone.
// The guest IOAPIC may route IRQs to an APIC ID that now
// belongs to a different zone, which would cause the IRQ
// to be injected into the wrong guest.
let zone = zone_arc.read();
if zone.cpu_set.bitmap & (1u64 << target_cpu) != 0 {
target_cpu
} else {
trace!("virtio: IRQ {} for zone {} routed to CPU {} outside zone, falling back to CPU {}",
irq_id, target_zone, target_cpu,
zone.cpu_set.first_cpu().unwrap());
zone.cpu_set.first_cpu().unwrap()
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another redundant IOAPIC redirect fix which should be removed. By the way, we shall avoid adding arch-specific contents into codes and files shared by all archs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IOAPIC redirect in VirtIoApic::write() only takes effect when the guest actively reconfigures IOAPIC entries, but the initial RTE state is inherited from zone0 on zone1 startup and may point to CPUs outside zone1. Without the fallback In handle_hvc_finish_req, virtio IRQs are delivered to the wrong guest. Tested: removing this breaks virtio console input.

Comment thread src/arch/x86_64/zone.rs Outdated
Comment thread src/arch/x86_64/trap.rs Outdated
Comment on lines +100 to +118
}
IdtVector::I8042_KEYBOARD_VECTOR => {}
IdtVector::APIC_SPURIOUS_VECTOR | IdtVector::APIC_ERROR_VECTOR => {}
_ => {
if vector >= 0x20 && this_cpu_data().arch_cpu.power_on {
inject_vector(this_cpu_id(), vector, None, false);
IdtVector::APIC_SPURIOUS_VECTOR
| IdtVector::APIC_ERROR_VECTOR => {}
// programmed the LAPIC. They belong to the CURRENT zone,
// not zone0. Device interrupts (0x20-0xdf) always belong to
// zone0 and must be forwarded if they arrive on a non-zone0 CPU.
// Check if this is a LAPIC-local interrupt.
// The guest's timer vector is dynamically allocated and may be < 0xe0,
// so we also check against the tracked LAPIC timer vector.
let is_lapic_local = vector >= 0xe0
|| vector == this_cpu_data().arch_cpu.virt_lapic.virt_timer_vector as u8;
if zone_id == 0 || is_lapic_local {
inject_vector(cpu_id, vector, None, false);
} else {
// Forward device interrupt to zone0.
let zone0 = crate::zone::find_zone(0).unwrap();
let zone0_cpu = zone0.read().cpu_set.first_cpu().unwrap_or(0);
inject_vector(zone0_cpu, vector, None, false);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-root zones should also be able to receive real-hardware-injected vectors. Sometimes we may let zone1 use real devices instead of virtio devices.

@Solicey Solicey left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making as minimum changes as possible to achieve booting Asterinas. You can take a look at my previous commit to learn what had already been fixed, so that you do not need to add redundant fixes in your pr.

Comment thread src/device/irqchip/pic/lapic.rs Outdated
Comment thread src/device/irqchip/pic/ioapic.rs Outdated
Comment on lines +233 to +264
/// When a non-root zone starts on a set of CPUs, ensure critical physical
/// interrupts (UART, etc.) are not routed to those CPUs. If they are, re-route
/// them to CPU 0 which stays in the root zone. Without this, zone0 can become
/// unresponsive because physical interrupts get injected into a guest that has
/// no handler for them.
pub fn ioapic_reroute_from_cpus(cpu_set: &crate::cpu_data::CpuSet) {
// Critical IRQs that the root zone needs for interactive console.
const CRITICAL_IRQS: &[u8] = &[irqs::UART_COM1_IRQ];

let mut io_apic = IO_APIC.lock();
for &irq in CRITICAL_IRQS {
// table_entry returns RedirectionTableEntry, transmute to u64 for
// bit-field manipulation.
let entry = unsafe { io_apic.table_entry(irq) };
let raw: u64 = unsafe { core::mem::transmute(entry) };
let dest_apic_id = raw.get_bits(56..=63) as usize;
let dest_cpu = get_cpu_id(dest_apic_id);
if cpu_set.bitmap & (1u64 << dest_cpu) != 0 {
// Re-route to CPU 0 which is always in the root zone.
let cpu0_apic_id = get_apic_id(0) as u64;
let mut new_raw = raw;
new_raw.set_bits(56..=63, cpu0_apic_id);
let new_entry = unsafe { core::mem::transmute(new_raw) };
unsafe { io_apic.set_table_entry(irq, new_entry) };
warn!(
"ioapic: rerouted IRQ {} from CPU {} (APIC {:#x}) to CPU 0 (APIC {:#x})",
irq, dest_cpu, dest_apic_id, cpu0_apic_id
);
}
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it necessary to handle ioapic reroute. As mentioned earlier, this issue has been fixed in my last commit. You can make your own modifications based on my fix, but please avoid fixing the same problems with redundant codes.

Comment thread src/arch/x86_64/zone.rs Outdated
Comment thread src/device/irqchip/pic/mod.rs Outdated
@yydawx

yydawx commented Jun 6, 2026

Copy link
Copy Markdown
Author

Most redundant code is because some problems when booting. But some of them may not work indeed. I will find out which part is useless. Thanks for your review!

@yydawx yydawx force-pushed the ccf-asterinas branch 2 times, most recently from 138a5b2 to a589ab3 Compare June 8, 2026 03:39
@yydawx yydawx marked this pull request as draft June 8, 2026 03:39
@yydawx yydawx force-pushed the ccf-asterinas branch 3 times, most recently from 0cb3ac4 to dd456e1 Compare June 8, 2026 08:03
@yydawx yydawx requested a review from Solicey June 8, 2026 08:04
@yydawx

yydawx commented Jun 8, 2026

Copy link
Copy Markdown
Author

@Solicey Hi!I removed most redundant codes and debugs/comments. I also aviod hard -coding
It would be better now.

@yydawx yydawx marked this pull request as ready for review June 8, 2026 10:45
Comment thread src/arch/x86_64/boot.rs Outdated
Comment thread src/arch/x86_64/cpu.rs Outdated
Comment thread src/arch/x86_64/trap.rs Outdated
yydawx added 3 commits June 12, 2026 10:19
- Add mb2_boot.S bootloader for 16-bit to 32-bit mode transition
- Bootloader sets up GDT with TSS and jumps to kernel entry
- Pass kernel entry via ESI to bootloader on VM entry
- Add multiboot_info_paddr/multiboot_enabled to HvArchZoneConfig
- Remove unused print_memory_map
- Add v_bus/v_device/v_function to HvPciDevConfig
- Add S2PT violation handler via MMIO dispatch
- Add GS_BASE/FS_BASE MSR read/write support
- Add NULL guard for VIRTIO_BRIDGE res_agent
- Adjust zone0 memory layout for zone1 coexistence
- Update virtio configuration for multi-zone setup
@yydawx yydawx requested a review from Solicey June 14, 2026 14:45
@yydawx

yydawx commented Jun 14, 2026

Copy link
Copy Markdown
Author

@Solicey multiboot2 is ready now. I forgot to mention for review :)

btw: it seems that asterinas will make out a bzImage type kernel when legacy-32 parameter is on. But we need setup.bin and vmlinux.bin right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request x86_64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants