feat(x86_64): boot Asterinas as zone1 via Multiboot2, with virtio-blk/net/console#322
feat(x86_64): boot Asterinas as zone1 via Multiboot2, with virtio-blk/net/console#322yydawx wants to merge 4 commits into
Conversation
|
因为验证一下非常繁琐,所以我提供一个agent生成的Guide,如有问题可以随时沟通: 在 hvisor 上运行 Asterinas(x86_64 QEMU)概述本文档说明如何在 hvisor 上通过 Multiboot2 协议启动 Asterinas 内核作为 zone1 虚拟机。 测试版本:
Asterinas 内核要求Asterinas 内核编译参数: 其中:
注意:Asterinas 不需要我们的内核修改即可在 hvisor 下启动。可选的 hvisor 侧修改(ccf-asterinas 分支)共 4 个提交,基于上游 d3260d0:
hvisor-tool 侧修改(ccf-asterinas 分支)共 4 个提交,基于上游 b45971a:
宿主机环境内存布局Asterinas zone1 8 GB 示例(非连续 EPT 区域,绕过 ECAM 空洞): 总计约 8 GB,分散在 EPT 的 5 个 RAM 区域中。 编译和部署完整构建1. 构建 Asterinas(需要 Docker 容器): 2. 构建 hvisor: 3. 构建 hvisor daemon: 4. 部署到 rootfs: 快速重建(仅 hvisor)快速重建(仅 daemon)运行启动 hvisor
在 zone0(根 Linux)中启动服务和 zone1
在 zone1(Asterinas)中操作
配置文件说明zone1-asterinas.json定义 zone1 的内存区域、CPU、内核路径、initramfs 和 Multiboot2 参数。 关键字段:
已知问题
|
|
目前我发现给asterinas配置virtio需要修改asterinas的源码,这个可以接受吗?还是我们需要想一个更好的办法。 |
I also encountered this problem when configuring virtio, and I think it is acceptable to make a few changes to Asterinas. |
| let zone = this_zone_arc.read(); | ||
| // The guest IOAPIC RTE may route to a CPU outside this zone. | ||
| // If so, redirect to the zone's first CPU so the interrupt | ||
| // reaches the correct guest. |
There was a problem hiding this comment.
I have added cpu redirect fix in function VirtIoApic::write() Line136-142 in the last commit, you could remove redundant fixes.
|
|
||
| /// Walk guest page tables for virtual address `vaddr` using CR3 as the PML4 base. | ||
| /// Prints the full page table hierarchy for debugging. | ||
| fn walk_guest_page_table(vaddr: usize, cr3_gpa: usize) { |
There was a problem hiding this comment.
You could reuse function gva_to_gpa() in mmio.rs for page walking.
| pub name: [u8; CONFIG_NAME_MAXLEN], | ||
| // Multiboot support (NEW) | ||
| pub multiboot_info_paddr: u64, | ||
| pub multiboot_enabled: u32, |
There was a problem hiding this comment.
Consider putting multiboot_info_paddr and multiboot_enabled inside arch_config, since they are x86 specific configs, which shall not be shared by other archs.
| Some(zone_arc) => { | ||
| let target_cpu = get_target_cpu(irq_id as _, target_zone as _); | ||
| // Verify target_cpu belongs to target_zone. | ||
| // The guest IOAPIC may route IRQs to an APIC ID that now | ||
| // belongs to a different zone, which would cause the IRQ | ||
| // to be injected into the wrong guest. | ||
| let zone = zone_arc.read(); | ||
| if zone.cpu_set.bitmap & (1u64 << target_cpu) != 0 { | ||
| target_cpu | ||
| } else { | ||
| trace!("virtio: IRQ {} for zone {} routed to CPU {} outside zone, falling back to CPU {}", | ||
| irq_id, target_zone, target_cpu, | ||
| zone.cpu_set.first_cpu().unwrap()); | ||
| zone.cpu_set.first_cpu().unwrap() | ||
| } | ||
| } |
There was a problem hiding this comment.
Another redundant IOAPIC redirect fix which should be removed. By the way, we shall avoid adding arch-specific contents into codes and files shared by all archs.
There was a problem hiding this comment.
The IOAPIC redirect in VirtIoApic::write() only takes effect when the guest actively reconfigures IOAPIC entries, but the initial RTE state is inherited from zone0 on zone1 startup and may point to CPUs outside zone1. Without the fallback In handle_hvc_finish_req, virtio IRQs are delivered to the wrong guest. Tested: removing this breaks virtio console input.
| // Map PCI ECAM region into guest EPT so the guest can access PCI config space. | ||
| // The MCFG table tells the guest ECAM is at HPA 0xb0000000, which becomes | ||
| // GPA 0xb0000000 when copied into guest ACPI tables. Without this mapping, | ||
| // any PCI config access causes an EPT violation. | ||
| if self.id == 0 { | ||
| // Zone0: full ECAM identity-map | ||
| let ecam_base = 0xb000_0000usize; | ||
| let ecam_size = 0x20_0000usize; | ||
| self.gpm.insert(MemoryRegion::new_with_offset_mapper( | ||
| ecam_base as GuestPhysAddr, | ||
| ecam_base as HostPhysAddr, | ||
| ecam_size, | ||
| MemFlags::READ | MemFlags::WRITE, | ||
| ))?; | ||
| } else { | ||
| // Non-root zone: identity-map ECAM except for the page containing | ||
| // the virtio-blk device (01:00.0). That page gets an MMIO handler | ||
| // that returns 0xffffffff for the vendor-ID read, hiding the device | ||
| // so the guest never tries to access its BAR and corrupt IOMMU state. | ||
| let ecam_base = 0xb000_0000usize; | ||
| let virtio_blk_ecam_gpa = ecam_base + 0x10_0000; // bus 1, dev 0, func 0 | ||
| let ecam_page = 0x1000usize; | ||
|
|
||
| // ECAM before virtio-blk page: 0xb0000000..0xb0100000 | ||
| if virtio_blk_ecam_gpa > ecam_base { | ||
| self.gpm.insert(MemoryRegion::new_with_offset_mapper( | ||
| ecam_base as GuestPhysAddr, | ||
| ecam_base as HostPhysAddr, | ||
| virtio_blk_ecam_gpa - ecam_base, | ||
| MemFlags::READ | MemFlags::WRITE, | ||
| ))?; | ||
| } | ||
| // Virtio-blk ECAM page: MMIO-handler that hides the device | ||
| self.mmio_region_register( | ||
| virtio_blk_ecam_gpa, | ||
| ecam_page, | ||
| ecam_virtio_blk_hide_handler, | ||
| virtio_blk_ecam_gpa, | ||
| ); | ||
| // ECAM after virtio-blk page: 0xb0101000..0xb0200000 | ||
| let after_gpa = virtio_blk_ecam_gpa + ecam_page; | ||
| let ecam_end = ecam_base + 0x20_0000usize; | ||
| if after_gpa < ecam_end { | ||
| self.gpm.insert(MemoryRegion::new_with_offset_mapper( | ||
| after_gpa as GuestPhysAddr, | ||
| after_gpa as HostPhysAddr, | ||
| ecam_end - after_gpa, | ||
| MemFlags::READ | MemFlags::WRITE, | ||
| ))?; | ||
| } | ||
| } | ||
|
|
||
| // Map PCI 32-bit MMIO window so the guest can access PCI device BARs. | ||
| let pci_mmio_base = 0xC000_0000usize; | ||
| let pci_mmio_size = 0x3EB0_0000usize; // up to 0xFEB00000 | ||
| self.gpm.insert(MemoryRegion::new_with_offset_mapper( | ||
| pci_mmio_base as GuestPhysAddr, | ||
| pci_mmio_base as HostPhysAddr, | ||
| pci_mmio_size, | ||
| MemFlags::READ | MemFlags::WRITE, | ||
| ))?; | ||
|
|
||
| // Continue PCI MMIO after the virtio MMIO hole | ||
| let pci_mmio2_base = 0xFEB0_1000usize; | ||
| let pci_mmio2_size = 0xFF000usize; // ~1MB, up to IOAPIC at 0xFEC00000 | ||
| self.gpm.insert(MemoryRegion::new_with_offset_mapper( | ||
| pci_mmio2_base as GuestPhysAddr, | ||
| pci_mmio2_base as HostPhysAddr, | ||
| pci_mmio2_size, | ||
| MemFlags::READ | MemFlags::WRITE, | ||
| ))?; | ||
|
|
||
| // Map 64-bit PCI BAR window for non-root zones (zone0 maps it below too, | ||
| // but via the RAM regions which cover all of HPA). | ||
| let pci_bar64_base = 0x8_0000_0000usize; | ||
| let pci_bar64_size = 0x1000_0000usize; | ||
| self.gpm.insert(MemoryRegion::new_with_offset_mapper( | ||
| pci_bar64_base as GuestPhysAddr, | ||
| pci_bar64_base as HostPhysAddr, | ||
| pci_bar64_size, | ||
| MemFlags::READ | MemFlags::WRITE, | ||
| ))?; | ||
|
|
||
| // Map DMA memory region for non-root zones: cover the guest's I/O memory | ||
| // allocator low range (0x20000000..0xB0000000, i.e. up to ECAM) using | ||
| // HPA 0x1_0000_0000 (4GB, reserved for zone1 by zone0). | ||
| if self.id != 0 { | ||
| let dma_gpa_base = 0x2000_0000usize; | ||
| let dma_hpa_base = 0x1_0000_0000usize; | ||
| let dma_size = 0x9000_0000usize; // 2.25GB, up to ECAM at 0xB0000000 | ||
| self.gpm.try_insert(MemoryRegion::new_with_offset_mapper( | ||
| dma_gpa_base as GuestPhysAddr, | ||
| dma_hpa_base as HostPhysAddr, | ||
| dma_size, | ||
| MemFlags::READ | MemFlags::WRITE, | ||
| )); | ||
| } | ||
|
|
There was a problem hiding this comment.
It is not a good idea to hard-code configurations in code. Another thing is that, we have already add memory mapping for PCI config space, take a look at codes under pci.
| } | ||
| IdtVector::I8042_KEYBOARD_VECTOR => {} | ||
| IdtVector::APIC_SPURIOUS_VECTOR | IdtVector::APIC_ERROR_VECTOR => {} | ||
| _ => { | ||
| if vector >= 0x20 && this_cpu_data().arch_cpu.power_on { | ||
| inject_vector(this_cpu_id(), vector, None, false); | ||
| IdtVector::APIC_SPURIOUS_VECTOR | ||
| | IdtVector::APIC_ERROR_VECTOR => {} | ||
| // programmed the LAPIC. They belong to the CURRENT zone, | ||
| // not zone0. Device interrupts (0x20-0xdf) always belong to | ||
| // zone0 and must be forwarded if they arrive on a non-zone0 CPU. | ||
| // Check if this is a LAPIC-local interrupt. | ||
| // The guest's timer vector is dynamically allocated and may be < 0xe0, | ||
| // so we also check against the tracked LAPIC timer vector. | ||
| let is_lapic_local = vector >= 0xe0 | ||
| || vector == this_cpu_data().arch_cpu.virt_lapic.virt_timer_vector as u8; | ||
| if zone_id == 0 || is_lapic_local { | ||
| inject_vector(cpu_id, vector, None, false); | ||
| } else { | ||
| // Forward device interrupt to zone0. | ||
| let zone0 = crate::zone::find_zone(0).unwrap(); | ||
| let zone0_cpu = zone0.read().cpu_set.first_cpu().unwrap_or(0); | ||
| inject_vector(zone0_cpu, vector, None, false); | ||
| } |
There was a problem hiding this comment.
non-root zones should also be able to receive real-hardware-injected vectors. Sometimes we may let zone1 use real devices instead of virtio devices.
Solicey
left a comment
There was a problem hiding this comment.
I suggest making as minimum changes as possible to achieve booting Asterinas. You can take a look at my previous commit to learn what had already been fixed, so that you do not need to add redundant fixes in your pr.
| IA32_X2APIC_APICID => { | ||
| // info!("apicid: {:x}", this_cpu_id()); | ||
| Ok(this_apic_id() as u64) | ||
| } | ||
| IA32_X2APIC_LDR => Ok(this_apic_id() as u64), // logical apic id | ||
| IA32_X2APIC_APICID => Ok(this_apic_id() as u64), | ||
| IA32_X2APIC_VERSION => Ok(0x1415), // version 0x14, max LVT entry 0x15 | ||
| IA32_X2APIC_LDR => Ok(this_apic_id() as u64), | ||
| IA32_X2APIC_SIVR => Ok(self.virt_svr as u64), | ||
| IA32_X2APIC_ISR0 | IA32_X2APIC_ISR1 | IA32_X2APIC_ISR2 | IA32_X2APIC_ISR3 | ||
| | IA32_X2APIC_ISR4 | IA32_X2APIC_ISR5 | IA32_X2APIC_ISR6 | IA32_X2APIC_ISR7 => { | ||
| // info!("isr!"); | ||
| Ok(0) | ||
| } | ||
| | IA32_X2APIC_ISR4 | IA32_X2APIC_ISR5 | IA32_X2APIC_ISR6 | IA32_X2APIC_ISR7 => Ok(0), | ||
| IA32_X2APIC_IRR0 | IA32_X2APIC_IRR1 | IA32_X2APIC_IRR2 | IA32_X2APIC_IRR3 | ||
| | IA32_X2APIC_IRR4 | IA32_X2APIC_IRR5 | IA32_X2APIC_IRR6 | IA32_X2APIC_IRR7 => { | ||
| // info!("irr!"); | ||
| Ok(0) | ||
| } | ||
| IA32_X2APIC_LVT_TIMER => Ok(self.virt_lvt_timer_bits as _), | ||
| _ => hv_result_err!(ENOSYS), | ||
| | IA32_X2APIC_IRR4 | IA32_X2APIC_IRR5 | IA32_X2APIC_IRR6 | IA32_X2APIC_IRR7 => Ok(0), | ||
| IA32_X2APIC_ESR => Ok(0), | ||
| IA32_X2APIC_LVT_TIMER => Ok(self.virt_lvt_timer_bits as u64), | ||
| IA32_X2APIC_LVT_THERMAL | IA32_X2APIC_LVT_PMI | IA32_X2APIC_LVT_LINT0 | ||
| | IA32_X2APIC_LVT_LINT1 | IA32_X2APIC_LVT_ERROR => Ok(1 << 16), // masked | ||
| IA32_X2APIC_INIT_COUNT => Ok(0), | ||
| IA32_X2APIC_CUR_COUNT => Ok(0), | ||
| IA32_X2APIC_DIV_CONF => Ok(0), | ||
| IA32_TSC_DEADLINE => Ok(0), | ||
| _ => Ok(0), // safe default for unknown MSRs |
There was a problem hiding this comment.
Could you explain the reason why we shall add more x2apic handlers?
| /// When a non-root zone starts on a set of CPUs, ensure critical physical | ||
| /// interrupts (UART, etc.) are not routed to those CPUs. If they are, re-route | ||
| /// them to CPU 0 which stays in the root zone. Without this, zone0 can become | ||
| /// unresponsive because physical interrupts get injected into a guest that has | ||
| /// no handler for them. | ||
| pub fn ioapic_reroute_from_cpus(cpu_set: &crate::cpu_data::CpuSet) { | ||
| // Critical IRQs that the root zone needs for interactive console. | ||
| const CRITICAL_IRQS: &[u8] = &[irqs::UART_COM1_IRQ]; | ||
|
|
||
| let mut io_apic = IO_APIC.lock(); | ||
| for &irq in CRITICAL_IRQS { | ||
| // table_entry returns RedirectionTableEntry, transmute to u64 for | ||
| // bit-field manipulation. | ||
| let entry = unsafe { io_apic.table_entry(irq) }; | ||
| let raw: u64 = unsafe { core::mem::transmute(entry) }; | ||
| let dest_apic_id = raw.get_bits(56..=63) as usize; | ||
| let dest_cpu = get_cpu_id(dest_apic_id); | ||
| if cpu_set.bitmap & (1u64 << dest_cpu) != 0 { | ||
| // Re-route to CPU 0 which is always in the root zone. | ||
| let cpu0_apic_id = get_apic_id(0) as u64; | ||
| let mut new_raw = raw; | ||
| new_raw.set_bits(56..=63, cpu0_apic_id); | ||
| let new_entry = unsafe { core::mem::transmute(new_raw) }; | ||
| unsafe { io_apic.set_table_entry(irq, new_entry) }; | ||
| warn!( | ||
| "ioapic: rerouted IRQ {} from CPU {} (APIC {:#x}) to CPU 0 (APIC {:#x})", | ||
| irq, dest_cpu, dest_apic_id, cpu0_apic_id | ||
| ); | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I don't think it necessary to handle ioapic reroute. As mentioned earlier, this issue has been fixed in my last commit. You can make your own modifications based on my fix, but please avoid fixing the same problems with redundant codes.
| // Use kernel's existing GDT at GPA 0x80014f0 | ||
| // The kernel's GDT has: | ||
| // - Selector 0x08: 64-bit code segment | ||
| // - Selector 0x10: Data segment | ||
| // - Selector 0x18: 32-bit code segment | ||
| // | ||
| // We need a TSS for VMX. Put it at GPA 0x8048000 (below stack at 0x804a000) | ||
| // DO NOT use 0x8009000 - that's kernel boot code! | ||
| let tss_gpa = 0x8048000usize; | ||
| if let Ok((tss_hpa, _, _)) = unsafe { self.gpm.page_table_query(tss_gpa) } { | ||
| let tss_ptr = tss_hpa as *mut u8; | ||
| unsafe { | ||
| // Zero out 104 bytes (32-bit TSS size) | ||
| for i in 0..104 { | ||
| core::ptr::write_volatile(tss_ptr.add(i), 0); | ||
| } | ||
| } | ||
| info!("[ZONE{}] TSS written to GPA {:#x} (HPA {:#x})", zone_id, tss_gpa, tss_hpa); | ||
| } else { | ||
| warn!("[ZONE{}] Failed to write TSS: GPA {:#x} not mapped", zone_id, tss_gpa); |
There was a problem hiding this comment.
Could you explain why we are adding TSS entry for GDT in Multiboot?
There was a problem hiding this comment.
Intel SDM Vol 3C §26.3.1.2 requires the TR selector to point to a valid TSS descriptor on VM entry. The multiboot2 guest starts in 32-bit protected mode and only sets up its own TSS after transitioning to 64-bit mode. A minimal blank TSS at a safe GPA is needed to satisfy the VMX check during the early boot window.
There was a problem hiding this comment.
If it is necessary to place a TSS entry in GDT, you can take a look at platform/x86_64/qemu/image/bootloader/boot.S, where we setup GDT the first time we enter guest OS.
There was a problem hiding this comment.
Besides, hard-coding numeric configurations is not a good idea, because it hurts readability and make future maintenance more difficult. It may be better to move them to a dedicated config so the code stays cleaner and easier to manage.
|
Most redundant code is because some problems when booting. But some of them may not work indeed. I will find out which part is useless. Thanks for your review! |
138a5b2 to
a589ab3
Compare
0cb3ac4 to
dd456e1
Compare
|
@Solicey Hi!I removed most redundant codes and debugs/comments. I also aviod hard -coding |
| @@ -536,7 +538,7 @@ pub fn print_memory_map() { | |||
|
|
|||
| /// copy kernel modules to the right place | |||
| pub fn module_init(info_addr: usize) { | |||
| println!("module_init"); | |||
| info!("module_init"); | |||
There was a problem hiding this comment.
We haven't initialized logger at this point, so we'd better keep using println
| fn setup_multiboot_guest_state(&mut self, entry: GuestPhysAddr) -> HvResult { | ||
| let cr0_fixed0 = Msr::IA32_VMX_CR0_FIXED0.read(); | ||
| let cr0_fixed1 = Msr::IA32_VMX_CR0_FIXED1.read(); | ||
| let mut cr0_guest = Cr0Flags::PROTECTED_MODE_ENABLE.bits(); | ||
| let cr0_fixed1_excluding_pe_pg = | ||
| cr0_fixed1 | Cr0Flags::PAGING.bits() | Cr0Flags::PROTECTED_MODE_ENABLE.bits(); | ||
| let cr0_fixed0_excluding_pe_pg = | ||
| cr0_fixed0 & !(Cr0Flags::PAGING.bits() | Cr0Flags::PROTECTED_MODE_ENABLE.bits()); | ||
| cr0_guest = (cr0_guest | cr0_fixed0_excluding_pe_pg) & cr0_fixed1_excluding_pe_pg; | ||
|
|
||
| let cr4_fixed0 = Msr::IA32_VMX_CR4_FIXED0.read(); | ||
| let cr4_fixed1 = Msr::IA32_VMX_CR4_FIXED1.read(); | ||
| let cr4_guest = (cr4_fixed0 & cr4_fixed1) as usize; | ||
|
|
||
| VmcsGuestNW::CR0.write(cr0_guest as usize)?; | ||
| VmcsControlNW::CR0_READ_SHADOW.write(cr0_guest as usize)?; | ||
| let cr0_mask = Cr0Flags::CACHE_DISABLE.bits() | ||
| | Cr0Flags::NOT_WRITE_THROUGH.bits() | ||
| | Cr0Flags::NUMERIC_ERROR.bits() | ||
| | Cr0Flags::EXTENSION_TYPE.bits(); | ||
| VmcsControlNW::CR0_GUEST_HOST_MASK.write(cr0_mask as usize)?; | ||
|
|
||
| VmcsGuestNW::CR3.write(0)?; | ||
|
|
||
| VmcsGuestNW::CR4.write(cr4_guest)?; | ||
| VmcsControlNW::CR4_READ_SHADOW.write(cr4_guest)?; | ||
| VmcsControlNW::CR4_GUEST_HOST_MASK | ||
| .write(Cr4Flags::VIRTUAL_MACHINE_EXTENSIONS.bits() as usize)?; | ||
|
|
||
| // CS: 32-bit code at selector 0x18 (kernel's GDT) | ||
| VmcsGuestNW::CS_BASE.write(0)?; | ||
| VmcsGuest32::CS_LIMIT.write(0xFFFFFFFF)?; | ||
| VmcsGuest16::CS_SELECTOR.write(0x18)?; | ||
| VmcsGuest32::CS_ACCESS_RIGHTS.write(0xC09B)?; | ||
|
|
||
| // DS, ES, SS: data at selector 0x10 | ||
| VmcsGuestNW::DS_BASE.write(0)?; | ||
| VmcsGuest32::DS_LIMIT.write(0xFFFFFFFF)?; | ||
| VmcsGuest16::DS_SELECTOR.write(0x10)?; | ||
| VmcsGuest32::DS_ACCESS_RIGHTS.write(0xC093)?; | ||
| VmcsGuestNW::ES_BASE.write(0)?; | ||
| VmcsGuest32::ES_LIMIT.write(0xFFFFFFFF)?; | ||
| VmcsGuest16::ES_SELECTOR.write(0x10)?; | ||
| VmcsGuest32::ES_ACCESS_RIGHTS.write(0xC093)?; | ||
| VmcsGuestNW::SS_BASE.write(0)?; | ||
| VmcsGuest32::SS_LIMIT.write(0xFFFFFFFF)?; | ||
| VmcsGuest16::SS_SELECTOR.write(0x10)?; | ||
| VmcsGuest32::SS_ACCESS_RIGHTS.write(0xC093)?; | ||
|
|
||
| // FS, GS: unusable | ||
| VmcsGuestNW::FS_BASE.write(0)?; | ||
| VmcsGuest32::FS_LIMIT.write(0)?; | ||
| VmcsGuest16::FS_SELECTOR.write(0)?; | ||
| VmcsGuest32::FS_ACCESS_RIGHTS.write(0x10000)?; | ||
| VmcsGuestNW::GS_BASE.write(0)?; | ||
| VmcsGuest32::GS_LIMIT.write(0)?; | ||
| VmcsGuest16::GS_SELECTOR.write(0)?; | ||
| VmcsGuest32::GS_ACCESS_RIGHTS.write(0x10000)?; | ||
|
|
||
| // TR: TSS at MB2_TSS_GPA | ||
| VmcsGuestNW::TR_BASE.write(MB2_TSS_GPA)?; | ||
| VmcsGuest32::TR_LIMIT.write(MB2_TSS_SIZE as u32 - 1)?; | ||
| VmcsGuest16::TR_SELECTOR.write((MB2_GDT_TSS_ENTRY * 8) as u16)?; | ||
| VmcsGuest32::TR_ACCESS_RIGHTS.write(0x008B)?; | ||
|
|
||
| // LDTR: unusable | ||
| VmcsGuestNW::LDTR_BASE.write(0)?; | ||
| VmcsGuest32::LDTR_LIMIT.write(0)?; | ||
| VmcsGuest16::LDTR_SELECTOR.write(0)?; | ||
| VmcsGuest32::LDTR_ACCESS_RIGHTS.write(0x10000)?; | ||
|
|
||
| VmcsGuestNW::GDTR_BASE.write(MB2_GDT_BASE_GPA)?; | ||
| VmcsGuest32::GDTR_LIMIT.write(((MB2_GDT_TSS_ENTRY + 2) * 8 - 1) as u32)?; | ||
| VmcsGuestNW::IDTR_BASE.write(0)?; | ||
| VmcsGuest32::IDTR_LIMIT.write(0xffff)?; | ||
| VmcsGuest32::IDTR_LIMIT.write(0)?; | ||
|
|
||
| VmcsGuestNW::DR7.write(0x400)?; | ||
| VmcsGuestNW::RSP.write(rsp)?; | ||
| VmcsGuestNW::RSP.write(MB2_STACK_GPA)?; | ||
| VmcsGuestNW::RIP.write(entry)?; | ||
| VmcsGuestNW::RFLAGS.write(0x2)?; | ||
| VmcsGuestNW::PENDING_DBG_EXCEPTIONS.write(0)?; | ||
| VmcsGuestNW::IA32_SYSENTER_ESP.write(0)?; | ||
| VmcsGuestNW::IA32_SYSENTER_EIP.write(0)?; | ||
| VmcsGuest32::IA32_SYSENTER_CS.write(0)?; | ||
|
|
||
| VmcsGuest32::INTERRUPTIBILITY_STATE.write(0)?; | ||
| VmcsGuest32::ACTIVITY_STATE.write(0)?; | ||
| VmcsGuest32::VMX_PREEMPTION_TIMER_VALUE.write(0)?; | ||
|
|
||
| VmcsGuest64::LINK_PTR.write(u64::MAX)?; // SDM Vol. 3C, Section 24.4.2 | ||
| VmcsGuest64::LINK_PTR.write(u64::MAX)?; | ||
| VmcsGuest64::IA32_DEBUGCTL.write(0)?; | ||
| VmcsGuest64::IA32_PAT.write(Msr::IA32_PAT.read())?; | ||
| VmcsGuest64::IA32_EFER.write(0)?; | ||
|
|
||
| // for AP start up, set CS_BASE to entry address, and RIP to 0. | ||
| if self.power_on && !this_cpu_data().boot_cpu { | ||
| VmcsGuestNW::RIP.write(0)?; | ||
| VmcsGuestNW::CS_BASE.write(entry)?; | ||
| } | ||
| info!( | ||
| "[MULTIBOOT] 32-bit guest: CR0={:#x}, CR4={:#x}, RIP={:#x}, GDT={:#x}", | ||
| cr0_guest, cr4_guest, entry, MB2_GDT_BASE_GPA | ||
| ); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
I suggest putting this part of the code inside an ASM file. You can find reference in platform/x86_64/qemu/image/bootloader/boot.S. By setting boot_filepath and boot_load_paddr inside arch_config, you can load this booting code into memory when booting zone1. By setting entry_point you can set guest entry to this file.
There was a problem hiding this comment.
By doing this, we can make as fewer changes to our main codes as possible while introducing multiboot2.
| } | ||
|
|
||
| let res = match exit_info.exit_reason { | ||
| VmxExitReason::EXCEPTION_NMI => handle_exception(arch_cpu, &exit_info), |
There was a problem hiding this comment.
Could you explain why we are adding a NMI handler?
- Add mb2_boot.S bootloader for 16-bit to 32-bit mode transition - Bootloader sets up GDT with TSS and jumps to kernel entry - Pass kernel entry via ESI to bootloader on VM entry - Add multiboot_info_paddr/multiboot_enabled to HvArchZoneConfig - Remove unused print_memory_map - Add v_bus/v_device/v_function to HvPciDevConfig
- Add S2PT violation handler via MMIO dispatch - Add GS_BASE/FS_BASE MSR read/write support
- Add NULL guard for VIRTIO_BRIDGE res_agent
- Adjust zone0 memory layout for zone1 coexistence - Update virtio configuration for multi-zone setup
Summary
Adds Multiboot2 protocol support to boot Asterinas OS as a zone1 guest, using a minimal ASM bootloader. Minimal changes to core code — all x86-specific logic stays under arch/x86_64/.
Changes
Multiboot2 Boot Support (Commit 1: feat)
New mb2_boot.S bootloader: 16→32-bit transition with GDT + TSS setup, jumps to kernel entry
Loaded via boot_filepath in zone1 config, with GPA→HPA offset translation in hvisor-tool
ELF segment loading with kernel_entry_gpa passed to bootloader via ESI
multiboot_info_paddr/multiboot_enabled added to HvArchZoneConfig (x86-specific)
Multiboot path gated behind multiboot_enabled flag — Linux zone1 paths unaffected
Removed unused print_memory_map
Exception Handling (Commit 2: fix)
S2PT (EPT) violation handler via MMIO dispatch
GS_BASE/FS_BASE MSR read/write support for 64-bit guests
x2APIC MSR fallback for unrecognized registers in x2APIC range
TSC frequency reporting via CPUID
Virtio Robustness (Commit 3: fix)
NULL guard for VIRTIO_BRIDGE.res_agent() — returns gracefully instead of panic
Struct & Config Fixes (Commit 4: feat)
Added v_bus/v_device/v_function to HvPciDevConfig to match C side (fixes 128-byte zone_config size mismatch)
Bumped CONFIG_MAGIC_VERSION to 0x7 on both C and Rust sides
Zone0 memory layout and virtio config adjustments for zone1 coexistence
Example zone1 config: zone1-asterinas.json
Requires
yydawx/hvisor-tool#98 — Multiboot2 loading with GPA→HPA offset translation