Skip to content

feat: S3 envelope encryption for data-at-rest protection#459

Closed
GenerQAQ wants to merge 2 commits intodevfrom
feat/s3-envelope-encryption
Closed

feat: S3 envelope encryption for data-at-rest protection#459
GenerQAQ wants to merge 2 commits intodevfrom
feat/s3-envelope-encryption

Conversation

@GenerQAQ
Copy link
Copy Markdown
Contributor

Why we need this PR?

Protect sensitive data stored in S3 (user files, session messages, attachments) with envelope encryption. Direct S3 bucket access only exposes ciphertext; decryption requires either a user's project API key or the admin master key.

Describe your solution

Implement AES-256-GCM envelope encryption with a two-tier key hierarchy:

  • DEK (Data Encryption Key): random 32B per S3 object, encrypts the data
  • KEK (Key Encryption Key): derived via HKDF from either user API key or admin master key, wraps the DEK
  • Both wrapped DEKs stored in S3 object metadata, enabling decryption by either user or admin
  • File downloads changed from presigned URL to API-proxied streaming (server-side decryption)
  • Backward compatible: unencrypted legacy objects still readable (no enc-algo metadata = passthrough)

Implementation Tasks

  • Crypto package (internal/infra/crypto/) with HKDF, AES-256-GCM wrap/unwrap
  • Config: encryption.master_key and encryption.enabled
  • DI container injection of EncryptionService
  • S3 upload paths encrypt data + store wrapped DEKs in metadata (Go API & Python Core)
  • S3 download paths detect encryption + decrypt transparently (Go API & Python Core)
  • Auth middleware derives user KEK from API key, injects into gin context
  • Replace presigned URL artifact/skill downloads with API proxy streaming
  • Update TS SDK: remove presigned URL fetch, use API response body
  • Update PY SDK: remove presigned URL fetch, use API response body
  • Key rotation endpoint (Phase 7 — deferred)
  • Admin dashboard adaptation (Phase 5 — separate repo)

Impact Areas

  • Client SDK (Python)
  • Client SDK (TypeScript)
  • Core Service
  • API Server
  • Dashboard
  • CLI Tool
  • Documentation
  • Other: ...

Checklist

  • Open your pull request against the dev branch.
  • All tests pass in available continuous integration systems (e.g., GitHub Actions).
  • Tests are added or modified as needed to cover code changes.

🤖 Generated with Claude Code

Implement envelope encryption (AES-256-GCM) for all S3-stored data including
user files, session messages, and attachments. Each object gets a unique DEK
wrapped by both user KEK (derived from API key) and admin KEK (derived from
master key), enabling per-project key rotation without re-encrypting data.

Key changes:
- Add crypto package with HKDF key derivation and AES-256-GCM envelope encryption
- Modify S3 upload/download paths in both Go API and Python Core to encrypt/decrypt
- Auth middleware derives user KEK from API key and injects into gin context
- Replace presigned URL downloads with API-proxied streaming (server-side decryption)
- Update TS/PY SDKs to use new API proxy download instead of presigned URLs
- Backward compatible: unencrypted legacy objects still readable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@GenerQAQ GenerQAQ force-pushed the feat/s3-envelope-encryption branch from d2e24dd to 5805c1c Compare March 19, 2026 07:45
@GenerQAQ
Copy link
Copy Markdown
Contributor Author

Code review

Found 3 issues:

  1. Authorization bypass in DownloadSessionAsset — the handler accepts an arbitrary s3_key query parameter and passes it directly to s3.DownloadFile without verifying the key belongs to the authenticated project. The session_id path param is parsed but never used. Every other session endpoint validates project ownership via the auth context. Any valid API key can download any S3 object in the bucket.

func (h *SessionHandler) DownloadSessionAsset(c *gin.Context) {
s3Key := c.Query("s3_key")
if s3Key == "" {
c.JSON(http.StatusBadRequest, serializer.ParamErr("", errors.New("s3_key is required")))
return
}
content, err := h.svc.DownloadAsset(c.Request.Context(), s3Key)
if err != nil {
c.JSON(http.StatusInternalServerError, serializer.Err(http.StatusInternalServerError, "download asset failed", err))
return
}
c.Data(http.StatusOK, "application/octet-stream", content)
}
// SessionFlush godoc
//
// @Summary Flush session

  1. Content-addressed dedup breaks with per-user encryptionuploadWithDedup hashes the plaintext to produce sumHex and checks for existing S3 objects with that hash. The dedup early return (line 237) happens before encryptAndMergeMetadata (line 270). When two different users upload identical plaintext, user B's upload matches user A's encrypted object. The returned metadata has user A's wrapped DEK in enc-dek-user, so user B cannot decrypt. The admin KEK path works, but user-key isolation is broken for deduplicated objects.

// Check for existing object with pagination support
listInput := &s3.ListObjectsV2Input{
Bucket: &u.Bucket,
Prefix: &keyPrefix,
}
var continuationToken *string
for {
listInput.ContinuationToken = continuationToken
result, err := u.Client.ListObjectsV2(ctx, listInput)
if err != nil {
break
}
if result.Contents != nil {
for _, obj := range result.Contents {
if obj.Key != nil && strings.Contains(*obj.Key, sumHex) {
if headResult, herr := u.Client.HeadObject(ctx, &s3.HeadObjectInput{
Bucket: &u.Bucket,
Key: obj.Key,
}); herr == nil {
return &model.Asset{
Bucket: u.Bucket,
S3Key: *obj.Key,
ETag: cleanETag(*headResult.ETag),
SHA256: sumHex,
MIME: contentType,
SizeB: aws.ToInt64(headResult.ContentLength),
}, nil
}
}
}
}
// Check if there are more pages
if !aws.ToBool(result.IsTruncated) {
break
}
continuationToken = result.NextContinuationToken
}

  1. Binary file downloads unconditionally proxy through API serverGetFile in agentSkillsService replaced the presigned-URL path for non-text files with DownloadRawContent, which downloads the entire file to the API server and returns it inline as base64. This happens regardless of whether encryption is enabled. Large binary files that were previously served efficiently via presigned S3 URLs now consume API server memory and bandwidth for all deployments, including those with encryption disabled (the default).

} else {
// For non-text files: try downloading raw content (works for both encrypted and non-encrypted)
rawContent, rawMIME, err := s.artifactSvc.DownloadRawContent(ctx, artifact)
if err != nil {
return nil, fmt.Errorf("failed to download file content: %w", err)
}
output.RawContent = rawContent
output.ContentMIME = rawMIME
}

Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

- Add project ownership check in DownloadSessionAsset to prevent
  unauthorized access to other projects' S3 objects via arbitrary s3_key
- Skip content-addressed dedup when encryption is enabled to avoid
  cross-user DEK conflicts (different users need different wrapped DEKs)
- Restore presigned URL path for binary files when encryption is disabled,
  only proxy through API server when encryption requires server-side decrypt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@GenerQAQ GenerQAQ marked this pull request as ready for review March 19, 2026 09:34
@GenerQAQ GenerQAQ requested a review from a team as a code owner March 19, 2026 09:34
@GenerQAQ
Copy link
Copy Markdown
Contributor Author

Superseded by new PR from fix/encryption-review-issues branch (includes merge with latest dev + admin config fix)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant