sandlock-oci: single-sandbox OCI exec via sandlock-init#110
Merged
Conversation
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
…through Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
19500ea to
efaaa76
Compare
Signed-off-by: Cong Wang <cwang@multikernel.io>
… memfd init Signed-off-by: Cong Wang <cwang@multikernel.io>
Signed-off-by: Cong Wang <cwang@multikernel.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Re-architects OCI
execso the container workload and every exec'd process run in one sandbox: one seccomp filter, one notify listener, one supervisor with shared runtime state. This replaces the clone-per-exec model (PR #109), where each exec'd process was an independent clone of the policy with its own supervisor (identical rules, but a separate runtime world: independent port remapping, no shared process view).Supersedes #109.
Why a clone is not enough
A cloned sandbox enforces identical rules but does not share the supervisor's per-sandbox runtime state. Two independent supervisors mean port remapping maps the main process's bind and an exec'd process's connect independently (so the exec'd process cannot reach a service the main process runs), and there is no single supervisor that knows the container's full process set. "Same sandbox" requires all processes to share one seccomp listener, and a listener is shared only by fork-inheritance from the installing task.
Architecture
The container's confined PID-1 is a small static
sandlock-init, launched from amemfd(via a new coreexecveat(AT_EMPTY_PATH)primitive, so it never needs to exist in the rootfs).sandlock-initreads a control channel and fork-execs the workload (RunMain) and each exec'd command (RunExec, whose stdin/stdout/stderr arrive overSCM_RIGHTS). Because the workload and exec'd processes are fork-children of onesandlock-init, they inherit its one seccomp filter and Landlock ruleset and are serviced by the one supervisor. There is exactly one confined-process launch per container, so "same sandbox" is structural, not asserted.The daemon launches
sandlock-init, hosts the single supervisor, and relays OCI verbs (start->RunMain,exec->RunExec,kill --all/delete --force-> group teardown via the daemon sincesandlock-initis the group leader) over anInitLinkdemux.state.pidremains the workload pid, sokill/state/liveness are unchanged for callers.What changed
sandlock-core:Sandbox.exec_fd->execveat(AT_EMPTY_PATH)to launch the confined process from an fd; anAT_EMPTY_PATH-only-when-pathname-empty guard in the chroot exec handler (so a confined process cannot exec a host binary outside the virtual root);checkpoint_pid()to capture a specific fork-descendant.sandlock-init(new crate): a static, synchronous PID-1 agent, embedded intosandlock-ociviainclude_bytes!and run from a memfd.sandlock-oci: the daemon launchessandlock-initand relays OCI verbs; the CLIexecsurface; group teardown routed through the daemon;checkpointre-targeted to the workload.Non-TTY only:
-t/--console-socketare accepted for runc compatibility but ignored (no PTY yet).Testing
oci_exec_same_sandbox: create+start a container,execa command into it, assert it ran confined inside the container; passed 11/11 stress runs.oci_checkpoint_of_running_containerpasses with the workload (a fork-child of init) captured.AT_EMPTY_PATHpassthrough to empty-pathname-only.Known follow-ups (not blocking exec)
kill --all/delete(currently fire-and-forget), bounds-check argv[0] vs PATH_MAX in the exec buffers, EAGAIN loop insend_with_fds.A hypothetical kernel
SECCOMP_FILTER_FLAG_USE_LISTENER(bind a new filter to an existing listener) would let the daemon spawn exec'd processes directly and retiresandlock-init; it does not exist in mainline, sosandlock-initis the portable mechanism today.🤖 Generated with Claude Code