From 3111811687761a0d20f81a76e6c2a826f3768694 Mon Sep 17 00:00:00 2001 From: Min Lim Date: Wed, 27 May 2026 15:56:01 -0700 Subject: [PATCH] Add initial draft of claude skill files Signed-off-by: Min Lim --- .claude/skills/architecture/SKILL.md | 111 +++++++++ .claude/skills/arm-support/SKILL.md | 162 ++++++++++++ .claude/skills/ci-status/SKILL.md | 90 +++++++ .claude/skills/debug/SKILL.md | 79 ++++++ .claude/skills/dependency-update/SKILL.md | 160 ++++++++++++ .claude/skills/docker-troubleshoot/SKILL.md | 105 ++++++++ .claude/skills/heartbeat/SKILL.md | 182 ++++++++++++++ .claude/skills/java-support/SKILL.md | 139 +++++++++++ .claude/skills/lint/SKILL.md | 108 ++++++++ .claude/skills/new-profiler/SKILL.md | 261 ++++++++++++++++++++ .claude/skills/pr-create/SKILL.md | 73 ++++++ .claude/skills/review-code/SKILL.md | 65 +++++ .claude/skills/test/SKILL.md | 152 ++++++++++++ .claude/skills/version-release/SKILL.md | 127 ++++++++++ 14 files changed, 1814 insertions(+) create mode 100644 .claude/skills/architecture/SKILL.md create mode 100644 .claude/skills/arm-support/SKILL.md create mode 100644 .claude/skills/ci-status/SKILL.md create mode 100644 .claude/skills/debug/SKILL.md create mode 100644 .claude/skills/dependency-update/SKILL.md create mode 100644 .claude/skills/docker-troubleshoot/SKILL.md create mode 100644 .claude/skills/heartbeat/SKILL.md create mode 100644 .claude/skills/java-support/SKILL.md create mode 100644 .claude/skills/lint/SKILL.md create mode 100644 .claude/skills/new-profiler/SKILL.md create mode 100644 .claude/skills/pr-create/SKILL.md create mode 100644 .claude/skills/review-code/SKILL.md create mode 100644 .claude/skills/test/SKILL.md create mode 100644 .claude/skills/version-release/SKILL.md diff --git a/.claude/skills/architecture/SKILL.md b/.claude/skills/architecture/SKILL.md new file mode 100644 index 000000000..1ba137061 --- /dev/null +++ b/.claude/skills/architecture/SKILL.md @@ -0,0 +1,111 @@ +--- +name: architecture +description: Understand gProfiler architecture and codebase structure. Use when the user asks how gProfiler works, wants to understand the codebase, or needs architectural guidance. +context: fork +user-invocable: true +--- + +## gProfiler Architecture Overview + +### High-Level Architecture + +``` +┌─────────────────────────────────────────────────────────────┐ +│ gprofiler/main.py │ +│ (Orchestration Layer) │ +├─────────────────────────────────────────────────────────────┤ +│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────────┐│ +│ │ perf │ │ Java │ │ Python │ │ Ruby │ │ .NET ││ +│ │profiler │ │profiler │ │profiler │ │profiler │ │profiler││ +│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └───┬────┘│ +│ └──────────┴──────────┴──────────┴───────────┘ │ +│ ▼ │ +│ gprofiler/merge.py │ +│ (Profile Data Aggregation) │ +├─────────────────────────────────────────────────────────────┤ +│ Output Layer │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ +│ │ Flamegraph │ │ Upload │ │ Local Output │ │ +│ │ (HTML) │ │ (Studio) │ │ (collapsed) │ │ +│ └─────────────┘ └─────────────┘ └─────────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Key Components + +#### 1. Profiler Registry (`gprofiler/profilers/registry.py`) +- Decorator-based profiler registration +- Runtime discovery of available profilers +- Configuration-based profiler selection + +#### 2. Profiler Base (`gprofiler/profilers/profiler_base.py`) +- Abstract base class for all profilers +- Lifecycle: `start()` → `snapshot()` → `stop()` +- Common utilities for process discovery + +#### 3. Individual Profilers (`gprofiler/profilers/*.py`) + +| Profiler | Backend Tool | Key Features | +|----------|--------------|--------------| +| `perf.py` | Linux perf | System-wide, kernel stacks | +| `java.py` | async-profiler | JVM attach, allocation profiling | +| `python.py` | py-spy | No instrumentation needed | +| `python_ebpf.py` | PyPerf | eBPF-based, lower overhead | +| `ruby.py` | rbspy | Ruby VM sampling | +| `php.py` | phpspy | PHP process profiling | +| `dotnet.py` | dotnet-trace | .NET Core/5+ support | +| `node.py` | perf | V8 JavaScript profiling | + +#### 4. Merge Layer (`gprofiler/merge.py`) +- Combines samples from multiple profilers +- Handles symbol resolution +- Produces unified stack traces + +#### 5. Metadata Collection (`gprofiler/metadata/`) +- `application_identifiers.py` - Extracts app names from processes +- `system_metadata.py` - Collects host information +- Enriches profiles with context + +### Data Flow + +``` +1. Process Discovery + └── Scan /proc for target processes + +2. Profiler Selection + └── Match processes to appropriate profilers + +3. Sampling + └── Each profiler collects stacks independently + +4. Aggregation + └── merge.py combines all samples + +5. Output + └── Generate flamegraph or upload to Studio +``` + +### Key Files to Understand + +| File | Lines | Purpose | +|------|-------|---------| +| `main.py` | ~1500 | Entry point, CLI, orchestration | +| `profilers/perf.py` | ~500 | Core perf integration | +| `profilers/java.py` | ~1800 | Complex JVM profiling | +| `merge.py` | ~400 | Profile aggregation | +| `utils/perf_process.py` | ~200 | perf subprocess management | + +### Extension Points + +1. **Add new profiler**: Implement `ProfilerBase`, use `@register_profiler` +2. **Add metadata**: Extend `application_identifiers.py` +3. **New output format**: Modify `main.py` output handling +4. **New deployment**: Add to `deploy/` directory + +### Instructions + +When user asks about architecture: +1. Start with high-level overview above +2. Dive into specific component if asked +3. Reference actual code files with line numbers +4. Explain data flow through the system diff --git a/.claude/skills/arm-support/SKILL.md b/.claude/skills/arm-support/SKILL.md new file mode 100644 index 000000000..5d3a8bbbb --- /dev/null +++ b/.claude/skills/arm-support/SKILL.md @@ -0,0 +1,162 @@ +--- +name: arm-support +description: Debug and fix ARM/Aarch64 compatibility issues. Use when the user encounters ARM-specific bugs or needs to add ARM support for a feature. +--- + +## ARM/Aarch64 Support Guide + +**Context from history:** ARM issues are recurring (6+ commits fixing ARM-specific bugs in 3 years). + +### Architecture Support Status + +| Runtime | x86_64 | Aarch64 | +|---------|--------|---------| +| perf (native, Golang) | ✅ | ✅ | +| Java (async-profiler) | ✅ | ✅ | +| Python (py-spy) | ✅ | ✅ | +| Python (PyPerf eBPF) | ✅ | ❌ | +| Ruby (rbspy) | ✅ | ✅ | +| PHP (phpspy) | ✅ | ✅ (experimental) | +| NodeJS (perf) | ✅ | ✅ | +| .NET (dotnet-trace) | ✅ (exp) | ✅ (exp) | + +### Common ARM Issues from History + +**1. gProfiler ARM Build Failures** +``` +# Commit: [Reliability] Fix gProfiler arm (#988) +# Commit: Fixes #977 gprofiler does not work on aarch64 (#978) +``` + +**2. Architecture Metadata** +``` +# Commit: Fix arch metadata with Aarch64 (#837) +# Issue: Incorrect architecture reporting +``` + +**3. getaddrinfo() EBUSY** +``` +# Commit: Fix getaddrinfo() EBUSY on ARM (#825) +# Issue: Network resolution fails intermittently on ARM +``` + +**4. Perf Build on ARM** +``` +# Commit: Perf: fix v6.7 build on aarch64 (#891) +# Issue: perf tool compilation differences +``` + +### Checking Architecture + +```bash +# Check current architecture +uname -m +# x86_64 or aarch64 + +# In Python +import platform +platform.machine() +``` + +### Building for ARM + +```bash +# Native ARM build (on ARM host) +./scripts/build_aarch64_container.sh -t gprofiler:arm64 +./scripts/build_aarch64_executable.sh + +# Cross-build from x86_64 (slow, requires QEMU) +docker run --rm --privileged multiarch/qemu-user-static --reset -p yes +docker buildx create --name multiarch --driver docker-container --use +./scripts/build_aarch64_container.sh -t gprofiler:arm64 +``` + +### Testing on ARM + +```bash +# Run tests on ARM host +cd tests && sudo python3 -m pytest -v + +# Skip x86_64-only tests +cd tests && sudo python3 -m pytest -v -k "not x86_64_only" + +# Check for ARM-specific test markers +grep -r "aarch64\|arm64" tests/ +``` + +### ARM-Specific Code Patterns + +```python +import platform + +def is_aarch64() -> bool: + return platform.machine() in ("aarch64", "arm64") + +# Conditional logic for ARM +if is_aarch64(): + # ARM-specific handling + pass +``` + +### Key Files for ARM Support + +``` +gprofiler/utils/ # Architecture detection utilities +scripts/build_aarch64_* # ARM build scripts +tests/conftest.py # Test fixtures with arch detection +``` + +### Debugging ARM Issues + +```bash +# Check kernel version (ARM may have different capabilities) +uname -r + +# Check perf support +perf list + +# Check CPU info +cat /proc/cpuinfo | head -20 + +# Check if running in container +cat /proc/1/cgroup +``` + +### PyPerf/eBPF on ARM + +**Note:** PyPerf (eBPF-based Python profiler) is NOT supported on ARM. + +```python +# From gprofiler/profilers/python_ebpf.py +# PyPerf requires x86_64 +if is_aarch64(): + # Fall back to py-spy + pass +``` + +### CI/CD for ARM + +`.github/workflows/build-test-deploy.yml`: +- Separate jobs for x86_64 and aarch64 +- ARM builds use different runner labels +- Some tests may be skipped on ARM + +### Commit Message Pattern + +``` +Fix on ARM/Aarch64 (#PR_NUMBER) +[Reliability] Fix gProfiler arm (#PR_NUMBER) +``` + +--- + +## TODO: Skill Content to Add + +- [ ] **Add complete ARM compatibility table** - All profilers × ARM status +- [ ] **Add ARM kernel requirements** - Minimum kernel versions for features +- [ ] **Add Graviton-specific notes** - AWS Graviton 2/3 specific issues +- [ ] **Add ARM perf event support** - Which perf events work on ARM +- [ ] **Add ARM Docker tips** - Multi-arch build best practices +- [ ] **Add ARM test infrastructure** - How to run ARM tests locally +- [ ] **Add cross-compile troubleshooting** - Common QEMU issues +- [ ] **Add ARM cloud provider notes** - AWS, GCP, Azure ARM instances diff --git a/.claude/skills/ci-status/SKILL.md b/.claude/skills/ci-status/SKILL.md new file mode 100644 index 000000000..225b6ce76 --- /dev/null +++ b/.claude/skills/ci-status/SKILL.md @@ -0,0 +1,90 @@ +--- +name: ci-status +description: Check CI pipeline status and troubleshoot failures. Use when the user wants to check build status, see why CI failed, or monitor workflow runs. +allowed-tools: Bash(gh *) Bash(git *) +--- + +## gProfiler CI/CD Status + +### Current PR/Branch Status + +```! +BRANCH=$(git branch --show-current) +echo "Branch: $BRANCH" +echo "" +# Try to get PR status +gh pr view --json state,statusCheckRollup,url 2>/dev/null || echo "No PR found for this branch" +``` + +### CI Workflows + +| Workflow | Triggers | Purpose | +|----------|----------|---------| +| `linters.yml` | PR, push | Python/Shell/Dockerfile linting | +| `build-test-deploy.yml` | PR, push, schedule | Build & test executables/containers | +| `codeql.yml` | PR, push, schedule | Security scanning | +| `build-base-images.yml` | Manual | Base Docker image builds | + +### Check Commands + +```bash +# View PR checks +gh pr checks + +# Watch checks until complete +gh pr checks --watch + +# List recent workflow runs +gh run list --limit 10 + +# View specific run details +gh run view + +# View failed run logs +gh run view --log-failed + +# Re-run failed jobs +gh run rerun --failed +``` + +### Common CI Failures + +#### Linting Failures +```bash +# Fix locally +./lint.sh # Auto-format +./lint.sh --ci # Check mode (what CI runs) +mypy . # Type check +``` + +#### Build Failures +- Check Docker layer caching +- Verify base image availability +- Check for network/download issues + +#### Test Failures +```bash +# Run specific failing test locally +sudo python3 -m pytest tests/test_.py -v -k "test_name" +``` + +### Instructions + +1. Check current CI status with commands above +2. If failed, identify which job failed +3. Fetch logs with `gh run view --log-failed` +4. Suggest fixes based on error messages +5. Help re-run if it was a flaky failure + +--- + +## TODO: Skill Content to Add + +- [ ] **Add workflow file documentation** - Explain each workflow file +- [ ] **Add job dependency diagram** - Visual CI pipeline flow +- [ ] **Add runner specifications** - What runs on which runner type +- [ ] **Add common failure patterns** - Categorized CI failure examples +- [ ] **Add cache configuration** - How Docker layer caching works +- [ ] **Add artifact retention** - Where build artifacts are stored +- [ ] **Add required checks list** - Which checks must pass for merge +- [ ] **Add workflow dispatch guide** - How to manually trigger workflows diff --git a/.claude/skills/debug/SKILL.md b/.claude/skills/debug/SKILL.md new file mode 100644 index 000000000..68f7713d2 --- /dev/null +++ b/.claude/skills/debug/SKILL.md @@ -0,0 +1,79 @@ +--- +name: debug +description: Debug profiler issues and analyze errors. Use when the user encounters errors, crashes, or unexpected behavior in gProfiler. +allowed-tools: Bash(python3 *) Bash(strace *) Bash(sudo *) Bash(dmesg *) Bash(journalctl *) Read Grep Glob +--- + +## gProfiler Debugging Guide + +### System Context + +```! +uname -a +python3 --version +cat /etc/os-release 2>/dev/null | head -5 +``` + +### Common Debug Commands + +#### Check Profiler Dependencies +```bash +# Verify perf is available +which perf && perf --version + +# Check kernel capabilities +cat /proc/sys/kernel/perf_event_paranoid +cat /proc/sys/kernel/kptr_restrict + +# Check CAP_SYS_ADMIN capability +capsh --print 2>/dev/null | grep -i sys_admin +``` + +#### Debug Running gProfiler +```bash +# Run with verbose logging +sudo python3 -m gprofiler -v --output-dir /tmp/gprofiler-debug + +# Trace system calls +sudo strace -f -o /tmp/gprofiler.strace python3 -m gprofiler ... + +# Check for permission issues +sudo dmesg | tail -50 | grep -i "permission\|denied\|perf" +``` + +#### Profiler-Specific Debugging + +| Profiler | Debug Flag | Log Location | +|----------|------------|--------------| +| perf | `--perf-mode=fp` | stderr | +| Java | `--java-async-profiler-mode=cpu` | `/tmp/async-profiler.log` | +| Python | `--python-mode=pyperf` | stderr | +| Ruby | `--ruby-mode=rbspy` | stderr | + +### Common Issues + +1. **"perf_event_open failed"** - Check `/proc/sys/kernel/perf_event_paranoid` (should be ≤1) +2. **"Permission denied"** - Run with sudo or check capabilities +3. **"No samples collected"** - Process may be idle or profiler incompatible +4. **Container issues** - Check `--privileged` or `SYS_ADMIN` capability + +### Instructions + +1. Gather error messages and stack traces +2. Check system requirements above +3. Identify which profiler is failing +4. Review relevant profiler code in `gprofiler/profilers/` +5. Suggest fixes or workarounds + +--- + +## TODO: Skill Content to Add + +- [ ] **Add error message catalog** - Common errors with solutions +- [ ] **Add kernel version requirements** - Minimum kernel for each feature +- [ ] **Add container runtime matrix** - Docker/containerd/cri-o compatibility +- [ ] **Add perf_event_paranoid guide** - All levels and their effects +- [ ] **Add capability requirements** - Required Linux capabilities per profiler +- [ ] **Add log file locations** - Where to find all log files +- [ ] **Add strace interpretation** - How to read strace output for gProfiler +- [ ] **Add profiler-specific debug flags** - Debug options per profiler diff --git a/.claude/skills/dependency-update/SKILL.md b/.claude/skills/dependency-update/SKILL.md new file mode 100644 index 000000000..d4601183d --- /dev/null +++ b/.claude/skills/dependency-update/SKILL.md @@ -0,0 +1,160 @@ +--- +name: dependency-update +description: Safely update Python dependencies with security and compatibility checks. Use when the user wants to update packages, fix CVEs, or upgrade dependencies. +allowed-tools: Bash(pip *) Bash(pip3 *) Bash(python3 -m pip *) Bash(safety *) Bash(pip-audit *) Read Edit Grep +--- + +## gProfiler Dependency Management + +**Context from history:** 26% of all commits (41/156) are dependency updates. Security-first approach. + +### Current Dependencies + +```! +echo "=== requirements.txt (runtime) ===" +head -15 requirements.txt 2>/dev/null || echo "File not found" +echo "" +echo "=== dev-requirements.txt (development) ===" +head -15 dev-requirements.txt 2>/dev/null || echo "File not found" +``` + +### CVE Fixes from History + +| CVE | Package | Fix Commit | +|-----|---------|------------| +| CVE-2025-71176 | pytest | Update pytest to 9.0.3 (#1035) | +| CVE (black) | black | Fix CVE issue from black package (#1023) | +| Security | requests | Update requests to 2.33.0 (#1032) | +| Security | grpcio | Update granulate-utils reference (#989) | + +### Security Scanning + +```bash +# Using pip-audit (recommended) +pip install pip-audit +pip-audit -r requirements.txt +pip-audit -r dev-requirements.txt + +# Using safety +pip install safety +safety check -r requirements.txt + +# Using bandit for code security +pip install bandit +bandit -r gprofiler/ +``` + +### Update Workflow + +**Step 1: Check for vulnerabilities** +```bash +pip-audit -r requirements.txt +``` + +**Step 2: Check outdated packages** +```bash +pip list --outdated +``` + +**Step 3: Update specific package** +```bash +# Check current version +pip show + +# Check available versions +pip index versions + +# Edit requirements file +# Then test: +pip install -r requirements.txt -r dev-requirements.txt +``` + +**Step 4: Verify compatibility** +```bash +./lint.sh +mypy . +cd tests && sudo python3 -m pytest -v test_sanity.py +``` + +### Key Dependencies + +| Package | File | Purpose | Update Frequency | +|---------|------|---------|------------------| +| `psutil` | requirements.txt | Process utilities | Stable | +| `requests` | requirements.txt | HTTP client | Security-sensitive | +| `granulate-utils` | requirements.txt | Shared utilities | Internal | +| `pytest` | dev-requirements.txt | Testing | Security-sensitive | +| `black` | dev-requirements.txt | Formatting | CVE history | +| `mypy` | dev-requirements.txt | Type checking | Regular | +| `flake8` | dev-requirements.txt | Linting | Stable | + +### Base Image Dependencies + +From history, these also get updated: +- Alpine version (v3.22 → v3.23) +- OpenSSL version (v3.0.11) +- zlib version (1.3 → 1.3.1) + +Check Dockerfiles: +```bash +grep -E "FROM|apk add|apt-get" container.Dockerfile executable.Dockerfile +``` + +### Commit Message Pattern + +For security updates: +``` +Update to to fix CVE-XXXX-XXXXX (#PR) +``` + +For regular updates: +``` +Update to (#PR) +Bump from X.Y.Z to A.B.C (#PR) +``` + +### Version Pinning Strategy + +- **Runtime deps (requirements.txt):** Pin major.minor, allow patch +- **Dev deps (dev-requirements.txt):** Pin exact version for reproducibility +- **Example:** + ``` + requests>=2.33.0,<3.0 + pytest==9.0.3 + ``` + +### Testing After Updates + +```bash +# Quick validation +./lint.sh +mypy . + +# Full test (if significant update) +sudo ./tests/test.sh + +# Build test (if build-related dep) +./scripts/build_x86_64_executable.sh +``` + +### PR Checklist for Dependency Updates + +- [ ] CVE number referenced (if security fix) +- [ ] Compatibility verified with Python 3.10+ +- [ ] Linters pass +- [ ] Type checking passes +- [ ] Tests pass (or document known issues) +- [ ] No breaking API changes + +--- + +## TODO: Skill Content to Add + +- [ ] **Add full dependency list** - Complete list with purposes +- [ ] **Add version constraint explanations** - Why specific versions pinned +- [ ] **Add transitive dependency notes** - Important indirect dependencies +- [ ] **Add security advisory links** - Links to CVE databases +- [ ] **Add test matrix for updates** - What to test for each dep type +- [ ] **Add granulate-utils documentation** - Internal package details +- [ ] **Add Docker base image deps** - Alpine/Ubuntu package lists +- [ ] **Add breaking change indicators** - How to identify risky updates diff --git a/.claude/skills/docker-troubleshoot/SKILL.md b/.claude/skills/docker-troubleshoot/SKILL.md new file mode 100644 index 000000000..e18885317 --- /dev/null +++ b/.claude/skills/docker-troubleshoot/SKILL.md @@ -0,0 +1,105 @@ +--- +name: docker-troubleshoot +description: Troubleshoot Docker build and container issues. Use when the user has Docker build failures, container runtime issues, or needs help with Dockerfiles. +allowed-tools: Bash(docker *) Bash(cat *Dockerfile*) Read Grep +--- + +## gProfiler Docker Troubleshooting + +### Docker Environment + +```! +docker --version 2>/dev/null || echo "Docker not found" +docker info 2>/dev/null | grep -E "Server Version|Storage Driver|Operating System" || echo "Docker daemon not running" +``` + +### Key Dockerfiles + +| File | Purpose | Build Script | +|------|---------|--------------| +| `container.Dockerfile` | Full container with bundled profilers | `scripts/build_x86_64_container.sh` | +| `executable.Dockerfile` | PyInstaller executable build | `scripts/build_x86_64_executable.sh` | + +### Common Build Commands + +```bash +# Build container (x86_64) +./scripts/build_x86_64_container.sh -t gprofiler:dev + +# Build executable +./scripts/build_x86_64_executable.sh + +# Build with no cache (for debugging) +docker build --no-cache -f container.Dockerfile -t gprofiler:test . + +# Build specific stage +docker build --target builder -f container.Dockerfile -t gprofiler:builder . +``` + +### Troubleshooting Steps + +#### Build Failures + +1. **Check Dockerfile syntax** + ```bash + ./dockerfile_lint.sh + ``` + +2. **Build with verbose output** + ```bash + docker build --progress=plain -f container.Dockerfile . + ``` + +3. **Debug specific stage** + ```bash + # Build up to failing stage + docker build --target -f container.Dockerfile -t debug:stage . + # Inspect + docker run -it debug:stage /bin/bash + ``` + +#### Runtime Issues + +```bash +# Run with full privileges (for profiling) +docker run --privileged -it gprofiler:dev /bin/bash + +# Check required capabilities +docker run --cap-add SYS_ADMIN --cap-add SYS_PTRACE -it gprofiler:dev + +# Mount host for debugging +docker run -v /tmp:/host-tmp --privileged -it gprofiler:dev +``` + +#### Image Size Issues + +```bash +# Analyze image layers +docker history gprofiler:dev + +# Check image size +docker images gprofiler:dev + +# Find large files +docker run --rm gprofiler:dev du -sh /* 2>/dev/null | sort -h +``` + +### Multi-Architecture Builds + +```bash +# x86_64 +./scripts/build_x86_64_container.sh -t gprofiler:x86 + +# ARM64 (requires QEMU or ARM host) +./scripts/build_aarch64_container.sh -t gprofiler:arm64 + +# Buildx for multi-arch (if configured) +docker buildx build --platform linux/amd64,linux/arm64 -f container.Dockerfile . +``` + +### Instructions + +1. Identify the specific error message +2. Determine if it's build-time or runtime +3. Check relevant Dockerfile section +4. Suggest minimal fix or workaround diff --git a/.claude/skills/heartbeat/SKILL.md b/.claude/skills/heartbeat/SKILL.md new file mode 100644 index 000000000..7dc32e05a --- /dev/null +++ b/.claude/skills/heartbeat/SKILL.md @@ -0,0 +1,182 @@ +--- +name: heartbeat +description: Work with the gProfiler heartbeat system for dynamic profiling control. Use when the user asks about heartbeat mode, Performance Studio integration, or command-driven profiling. +--- + +## gProfiler Heartbeat System + +The heartbeat system enables centralized profiling control where Performance Studio can dynamically issue start/stop commands to gProfiler agents. + +### System Architecture + +``` +┌─────────────────────┐ Heartbeat ┌──────────────────────┐ +│ Performance Studio │ ◄──────────────► │ gProfiler Agent │ +│ Backend │ Commands │ │ +└─────────────────────┘ ────────────────► └──────────────────────┘ +``` + +### Running in Heartbeat Mode + +**Basic:** +```bash +python gprofiler/main.py \ + --enable-heartbeat-server \ + --upload-results \ + --token "your-token" \ + --service-name "web-service" \ + --api-server "http://performance-studio:8000" \ + --heartbeat-interval 30 \ + --output-dir /tmp/profiles \ + --verbose +``` + +**Production:** +```bash +export GPROFILER_TOKEN="my_token" +export GPROFILER_SERVICE="your-service-name" +export GPROFILER_SERVER="http://localhost:8080" + +/opt/gprofiler/gprofiler \ + -u \ + --token=$GPROFILER_TOKEN \ + --service-name=$GPROFILER_SERVICE \ + --server-host $GPROFILER_SERVER \ + --dont-send-logs \ + --server-upload-timeout 10 \ + -c \ + --disable-metrics-collection \ + --java-safemode= \ + -d 60 \ + --java-no-version-check +``` + +### Command Flow + +``` +1. User submits profiling request to backend + ↓ +2. Backend creates command with unique ID + ↓ +3. Agent sends heartbeat to backend + ↓ +4. Backend responds with pending command + ↓ +5. Agent checks idempotency (skip if already received) + ↓ +6. Agent enqueues command in priority queue + ↓ +7. Agent executes command (start/stop profiling) + ↓ +8. Agent reports completion to backend +``` + +### Command Priority Queues + +| Queue | Purpose | Max Size | +|-------|---------|----------| +| `stop_queue` | Immediate stop commands | 1 | +| `adhoc_queue` | Single-run start commands | 10 | +| `continuous_queue` | Long-running start commands | 1 | + +Priority: `stop > adhoc > continuous` + +### API Endpoints + +**Submit Profiling Request:** +```bash +curl -X POST http://localhost:8000/api/metrics/profile_request \ + -H "Content-Type: application/json" \ + -d '{ + "service_name": "web-service", + "command_type": "start", + "duration": 60, + "frequency": 11, + "profiling_mode": "cpu", + "target_hostnames": ["host1", "host2"] + }' +``` + +**Stop Profiling:** +```bash +curl -X POST http://localhost:8000/api/metrics/profile_request \ + -H "Content-Type: application/json" \ + -d '{ + "service_name": "web-service", + "command_type": "stop", + "stop_level": "host", + "target_hostnames": ["host1"] + }' +``` + +### PerfSpect Hardware Metrics + +Enable Intel PerfSpect for hardware metrics: +```bash +curl -X POST http://localhost:8000/api/metrics/profile_request \ + -H "Content-Type: application/json" \ + -d '{ + "service_name": "web-service", + "command_type": "start", + "duration": 60, + "additional_args": { + "enable_perfspect": true + } + }' +``` + +Requirements: +- Linux x86_64 (Intel architecture) +- Root access +- Internet for auto-install + +### Key Files + +``` +gprofiler/main.py # Heartbeat integration +gprofiler/command_control.py # CommandManager class +docs/HEARTBEAT_SYSTEM_README.md # Full documentation +``` + +### Troubleshooting + +**Agent not receiving commands:** +- Check network connectivity +- Verify authentication token +- Check service name matching + +**Commands not executing:** +- Check agent logs for errors +- Verify command parameters +- Check system permissions + +**PerfSpect not working:** +- Verify Linux x86_64 platform +- Check root permissions +- Check `/tmp/gprofiler_perfspect/perfspect/` + +### CLI Options Reference + +```bash +--enable-heartbeat-server # Enable heartbeat mode +--heartbeat-interval 30 # Heartbeat frequency (seconds) +--api-server URL # Backend server URL +--upload-results # Required for heartbeat mode +--token TOKEN # Authentication token +--service-name NAME # Service identifier +--enable-hw-metrics-collection # Enable PerfSpect +--perfspect-path PATH # PerfSpect binary path +``` + +--- + +## TODO: Skill Content to Add + +- [ ] **Add complete API reference** - All heartbeat API endpoints with examples +- [ ] **Add command_control.py documentation** - CommandManager class details +- [ ] **Add authentication flow** - Token validation and refresh +- [ ] **Add error response codes** - All possible error responses +- [ ] **Add deployment examples** - K8s, Docker Compose, systemd configs +- [ ] **Add PerfSpect output examples** - Sample hardware metrics output +- [ ] **Add monitoring integration** - How to monitor heartbeat health +- [ ] **Add scaling guidance** - Multi-agent deployment patterns diff --git a/.claude/skills/java-support/SKILL.md b/.claude/skills/java-support/SKILL.md new file mode 100644 index 000000000..406781359 --- /dev/null +++ b/.claude/skills/java-support/SKILL.md @@ -0,0 +1,139 @@ +--- +name: java-support +description: Add or update Java JDK version support. Use when the user needs to add support for a new JDK version or fix Java profiling issues. +user-invocable: true +--- + +## Adding Java JDK Version Support + +**Context from history:** Java support is the most frequently updated profiler (17 commits in 3 years). New JDK releases require quick gProfiler updates. + +### Supported JDK Versions + +Currently supported: JDK 7+ (HotSpot-based JVMs) +- Oracle JDK +- OpenJDK (AdoptOpenJDK, Azul Zulu, Temurin, etc.) + +Recent additions (from commit history): +- JDK 17, 18, 19, 21, 25 +- Temurin JDK support + +### Key Files to Modify + +``` +gprofiler/profilers/java.py # Main Java profiler (67KB, 19 changes) +tests/test_java.py # Java tests (51KB, 11 changes) +gprofiler/resources/ # async-profiler binaries +``` + +### Adding New JDK Version Support + +**Step 1: Update Version Detection** + +In `gprofiler/profilers/java.py`, find the JDK version detection logic: +```python +# Look for patterns like: +MIN_JDK_VERSION = ... +SUPPORTED_JDK_VERSIONS = [...] +``` + +**Step 2: Update async-profiler Compatibility** + +Check if async-profiler supports the new JDK: +- async-profiler v3.0 added in recent updates +- DSO storage directory issues fixed for newer JDKs + +**Step 3: Add Tests** + +In `tests/test_java.py`: +```python +@pytest.mark.parametrize("jdk_version", [..., "new_version"]) +def test_java_profiling_jdk_versions(jdk_version): + ... +``` + +**Step 4: Update Test Docker Images** + +From history: `openjdk:11-jdk` deprecated → moved to Temurin +```python +# In conftest.py or test_java.py +JAVA_TEST_IMAGES = { + "8": "eclipse-temurin:8-jdk", + "11": "eclipse-temurin:11-jdk", + "17": "eclipse-temurin:17-jdk", + "21": "eclipse-temurin:21-jdk", + # Add new version +} +``` + +### Common Issues from History + +**1. Directory Ownership for AsyncProfiler DSO** +``` +# Commit: Support java profiling without root privilege (#987) +# Issue: DSO storage directory permissions +``` + +**2. Version Detection from Process Names** +``` +# Some JDKs report version differently +# Check /proc//cmdline parsing +``` + +**3. Rootless Container Profiling** +``` +# Commit: Support java profiling without root privilege in a container (#987) +# Solution: Check process namespace and permissions +``` + +### Testing New JDK Support + +```bash +# Test specific JDK version +cd tests && sudo python3 -m pytest -v test_java.py -k "jdk21" + +# Test latest JVMs sanity +cd tests && sudo python3 -m pytest -v -k "test_sanity_latest_jvms" + +# Full Java test suite +cd tests && sudo python3 -m pytest -v test_java.py +``` + +### Commit Message Pattern + +From history, Java commits follow this pattern: +``` +java: Add JDK support (#PR_NUMBER) +java: Update min JDK version (#PR_NUMBER) +java: Fix for JDK (#PR_NUMBER) +``` + +### Java Profiling Options (from README) + +```bash +# Disable Java profiling +--no-java +--java-mode disabled + +# Disable buildid embedding +--no-java-async-profiler-buildids + +# Skip version check (for testing) +--java-no-version-check + +# Safe mode (empty to disable) +--java-safemode= +``` + +--- + +## TODO: Skill Content to Add + +- [ ] **Add JDK version detection code** - Actual code snippets from java.py +- [ ] **Add async-profiler flags reference** - All async-profiler options used +- [ ] **Add JVM vendor compatibility matrix** - Oracle, Temurin, Azul, etc. +- [ ] **Add allocation profiling guide** - How to enable allocation profiling +- [ ] **Add JFR comparison** - When to use async-profiler vs JFR +- [ ] **Add container Java detection** - How Java processes are found in containers +- [ ] **Add frame format examples** - Example Java stack frames +- [ ] **Add troubleshooting decision tree** - Flowchart for Java profiling issues diff --git a/.claude/skills/lint/SKILL.md b/.claude/skills/lint/SKILL.md new file mode 100644 index 000000000..5cc2e3098 --- /dev/null +++ b/.claude/skills/lint/SKILL.md @@ -0,0 +1,108 @@ +--- +name: lint +description: Run all code quality checks and auto-fix formatting issues. Use when the user wants to lint, format, or check code style before committing. +allowed-tools: Bash(isort *) Bash(black *) Bash(flake8 *) Bash(mypy *) Bash(./lint.sh *) Bash(./shell_lint.sh) Bash(./dockerfile_lint.sh) Bash(pip *) Bash(bandit *) +--- + +## gProfiler Code Quality Tools + +**Project standards:** Line length 120, Python 3.10+, strict mypy + +### Quick Commands + +```bash +# Run all Python linters with auto-fix +./lint.sh + +# CI mode (check only, no auto-formatting) +./lint.sh --ci + +# Shell script linting (uses Docker) +./shell_lint.sh + +# Dockerfile linting (uses Docker) +./dockerfile_lint.sh +``` + +### Individual Tools + +| Tool | Command | Purpose | +|------|---------|---------| +| **isort** | `isort --settings-path .isort.cfg .` | Import sorting (line_length: 120) | +| **black** | `black --line-length 120 .` | Code formatting | +| **flake8** | `flake8 --config .flake8 .` | Style linting | +| **mypy** | `mypy .` | Type checking (strict mode) | +| **shellcheck** | `./shell_lint.sh` | Shell script linting | +| **hadolint** | `./dockerfile_lint.sh` | Dockerfile linting | + +### Configuration Files + +- `.isort.cfg` - Import sorting config (line_length=120, multi_line_output=3) +- `.flake8` - Flake8 config (max_line_length=120) +- `mypy.ini` - Type checking (strict mode, no_implicit_optional=False) + +### Common Issues from History + +**1. Import Order (isort)** +```python +# Correct format (3-line grouped with trailing comma) +import logging +import os +from pathlib import Path + +from granulate_utils import ... +from psutil import ... + +from gprofiler.utils import ... +``` + +**2. Line Length (120 chars)** +- Use parentheses for line continuation +- Break long strings with implicit concatenation + +**3. Type Hints (mypy)** +- All public functions need type hints +- Use `Optional[T]` for nullable parameters +- Use `from __future__ import annotations` for forward refs + +**4. Black Formatting** +- CVE-2024 issue with black package fixed in recent versions +- Ensure dev-requirements.txt has latest black + +### Pre-Commit Checklist + +1. `./lint.sh` - Auto-fix formatting +2. `mypy .` - Check types +3. `./shell_lint.sh` - If shell scripts modified +4. `./dockerfile_lint.sh` - If Dockerfiles modified + +### CI Workflow + +The `.github/workflows/linters.yml` runs: +1. Python linters (isort, black, flake8) +2. mypy type checking +3. shellcheck via Docker +4. hadolint via Docker + +### Bandit Security Linting + +For security-sensitive changes: +```bash +pip install bandit +bandit -r gprofiler/ +``` + +Note: B404 (subprocess import) is a known finding - use subprocess safely. + +--- + +## TODO: Skill Content to Add + +- [ ] **Add example error messages** - Common linting errors and how to fix them +- [ ] **Add IDE integration section** - VS Code, PyCharm settings for auto-formatting +- [ ] **Add mypy error explanations** - Common mypy errors specific to gProfiler patterns +- [ ] **Add flake8 ignore patterns** - Document which rules are ignored and why +- [ ] **Add shell lint examples** - Common shellcheck warnings in build scripts +- [ ] **Add Dockerfile lint examples** - hadolint warnings and fixes +- [ ] **Add CI failure troubleshooting** - How to debug linting failures in CI +- [ ] **Add git hooks setup** - Optional pre-commit hook configuration diff --git a/.claude/skills/new-profiler/SKILL.md b/.claude/skills/new-profiler/SKILL.md new file mode 100644 index 000000000..18536fb05 --- /dev/null +++ b/.claude/skills/new-profiler/SKILL.md @@ -0,0 +1,261 @@ +--- +name: new-profiler +description: Guide for adding a new profiler to gProfiler. Use when the user wants to add support for a new language runtime or profiling tool. +user-invocable: true +disable-model-invocation: true +--- + +## Adding a New Profiler to gProfiler + +### Architecture Overview + +``` +gprofiler/profilers/ +├── profiler_base.py # Base class - extend this +├── registry.py # @register_profiler decorator +├── factory.py # Profiler instantiation +├── perf.py # System profiler (reference: 19KB) +├── java.py # Java profiler (reference: 67KB) +├── python.py # Python py-spy profiler +├── python_ebpf.py # Python PyPerf profiler +├── ruby.py # Ruby rbspy profiler +├── php.py # PHP phpspy profiler +├── dotnet.py # .NET dotnet-trace profiler +└── node.py # NodeJS profiler +``` + +### Step 1: Create Profiler Class + +Create `gprofiler/profilers/.py`: + +```python +# +# Copyright (C) 2022 Intel Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import logging +from typing import Optional + +from gprofiler.gprofiler_types import ProcessToProfileData, ProfileData +from gprofiler.log import get_logger_adapter +from gprofiler.profilers.profiler_base import ProfilerBase +from gprofiler.profilers.registry import register_profiler + +logger = get_logger_adapter(__name__) + + +@register_profiler( + "MyRuntime", + # Possible profiler names for CLI selection + possible_modes=["myruntime", "disabled"], + # Default mode + default_mode="myruntime", + # Supported architectures + supported_archs=["x86_64", "aarch64"], + # Profiler-specific CLI arguments + profiler_arguments=[ + # Add any runtime-specific arguments + ], +) +class MyRuntimeProfiler(ProfilerBase): + """ + Profiler for MyRuntime applications. + """ + + # Frame suffix for this profiler (shown in flamegraph) + FRAME_SUFFIX = "_[myrt]" + + def __init__( + self, + frequency: int, + duration: int, + stop_event, + storage_dir: str, + profile_spawned_processes: bool, + *args, + **kwargs, + ): + super().__init__( + frequency=frequency, + duration=duration, + stop_event=stop_event, + storage_dir=storage_dir, + profile_spawned_processes=profile_spawned_processes, + ) + # Initialize profiler-specific state + + def start(self) -> None: + """Start the profiler. Called once at the beginning.""" + logger.info("Starting MyRuntime profiler") + # Initialize profiling tools, attach to processes, etc. + + def stop(self) -> None: + """Stop the profiler. Called once at the end.""" + logger.info("Stopping MyRuntime profiler") + # Cleanup resources, detach from processes + + def snapshot(self) -> ProcessToProfileData: + """ + Collect profiling data for one interval. + Called periodically during profiling. + + Returns: + Dict mapping process info to ProfileData + """ + results: ProcessToProfileData = {} + + # Collect samples from your profiling tool + # Parse stack traces + # Build ProfileData for each process + + return results +``` + +### Step 2: Add CLI Arguments + +In `gprofiler/main.py`, add runtime-specific arguments: + +```python +# In the argument parser section +parser.add_argument( + "--myruntime-mode", + choices=["myruntime", "disabled"], + default="myruntime", + help="MyRuntime profiling mode", +) +parser.add_argument( + "--no-myruntime", + action="store_true", + help="Disable MyRuntime profiling", +) +``` + +### Step 3: Add Tests + +Create `tests/test_myruntime.py`: + +```python +import pytest +from tests.conftest import AssertInCollapsed + +class TestMyRuntimeProfiler: + @pytest.mark.parametrize("version", ["1.0", "2.0"]) + def test_myruntime_profiling( + self, + application_docker_container, + runtime_specific_args, + assert_collapsed: AssertInCollapsed, + ): + """Test profiling of MyRuntime applications.""" + # Test implementation + pass +``` + +### Step 4: Add Resources (if needed) + +If your profiler requires external tools: + +``` +gprofiler/resources/ +└── myruntime/ + ├── x86_64/ + │ └── profiler_tool + └── aarch64/ + └── profiler_tool +``` + +Update build scripts in `scripts/` to include the tool. + +### Step 5: Update Documentation + +1. Update `README.md` with: + - Runtime support in architecture table + - Profiling options section + - Frame format documentation + +2. Add to frame suffix table: + ``` + | MyRuntime | Per tool | `_[myrt]` | + ``` + +### Key Patterns from Existing Profilers + +**Process Discovery:** +```python +from gprofiler.utils.process import search_for_process + +processes = search_for_process( + lambda p: "myruntime" in p.cmdline(), + self._stop_event, +) +``` + +**Resource Cleanup:** +```python +def stop(self) -> None: + try: + # Cleanup + except Exception: + logger.exception("Error stopping profiler") + finally: + # Ensure resources released +``` + +**Handling Stop Event:** +```python +def snapshot(self) -> ProcessToProfileData: + if self._stop_event.is_set(): + return {} + # Continue profiling... +``` + +### Testing Your Profiler + +```bash +# Run profiler tests +cd tests && sudo python3 -m pytest -v test_myruntime.py + +# Test with verbose output +sudo python3 -m gprofiler --myruntime-mode=myruntime -v -d 30 -o /tmp/output + +# Verify frame format in output +cat /tmp/output/last_profile.col | grep "_[myrt]" +``` + +### Commit Message Pattern + +``` +Add profiler support (#PR_NUMBER) +: Add version X.Y support (#PR_NUMBER) +``` + +--- + +## TODO: Skill Content to Add + +- [ ] **Add complete profiler lifecycle diagram** - Visual flow of start/snapshot/stop +- [ ] **Add ProfilerBase method documentation** - All methods with signatures +- [ ] **Add example profiler implementations** - Annotated code from simple profilers +- [ ] **Add process discovery patterns** - Different ways to find target processes +- [ ] **Add stack trace parsing examples** - How to parse different profiler outputs +- [ ] **Add resource bundling guide** - How to add external binaries to build +- [ ] **Add profiler configuration patterns** - How CLI args flow to profiler +- [ ] **Add frame format specification** - Detailed frame suffix requirements + +### Candidate New Profilers to Document + +- [ ] **Rust** - Native profiling with symbolication +- [ ] **Erlang/BEAM** - Erlang VM profiling +- [ ] **Lua/LuaJIT** - Lua runtime profiling diff --git a/.claude/skills/pr-create/SKILL.md b/.claude/skills/pr-create/SKILL.md new file mode 100644 index 000000000..e4e21d424 --- /dev/null +++ b/.claude/skills/pr-create/SKILL.md @@ -0,0 +1,73 @@ +--- +name: pr-create +description: Create a well-formatted pull request for gProfiler. Use when the user wants to submit changes, create a PR, or push their work. +disable-model-invocation: true +allowed-tools: Bash(git *) Bash(gh *) +--- + +## Create gProfiler Pull Request + +### Current Branch Status + +```! +echo "Branch: $(git branch --show-current)" +echo "Base: master" +echo "" +echo "Commits to include:" +git log master..HEAD --oneline 2>/dev/null || echo "No commits ahead of master" +echo "" +echo "Files changed:" +git diff --stat master..HEAD 2>/dev/null || git diff --stat HEAD +``` + +### PR Checklist + +Before creating PR, verify: +- [ ] Code formatted: `./lint.sh` passes +- [ ] Type hints: `mypy .` passes +- [ ] Tests pass locally (if applicable) +- [ ] Commit messages are descriptive +- [ ] No secrets or credentials in diff + +### PR Template + +**Title format:** `: ` +- Types: `feat`, `fix`, `docs`, `refactor`, `test`, `build`, `ci` + +**Body sections:** +```markdown +## Summary +Brief description of what this PR does. + +## Changes +- Bullet points of specific changes + +## Testing +How the changes were tested. + +## Related Issues +Fixes #123 (if applicable) +``` + +### Instructions + +1. Review the commits and changes above +2. Generate appropriate PR title and description +3. Run: `gh pr create --title "..." --body "..."` +4. If tests need to run: `gh pr checks --watch` + +### Example Commands + +```bash +# Create PR with editor +gh pr create + +# Create PR inline +gh pr create --title "fix: handle missing perf binary gracefully" --body "..." + +# Create draft PR +gh pr create --draft --title "wip: add Go profiler support" + +# Push and create PR +git push -u origin $(git branch --show-current) && gh pr create +``` diff --git a/.claude/skills/review-code/SKILL.md b/.claude/skills/review-code/SKILL.md new file mode 100644 index 000000000..edae7198a --- /dev/null +++ b/.claude/skills/review-code/SKILL.md @@ -0,0 +1,65 @@ +--- +name: review-code +description: Review code changes against gProfiler coding standards. Use when the user asks to review code, check a PR, or validate changes before committing. +--- + +## gProfiler Code Review Checklist + +### Current Changes + +!`git diff --stat HEAD 2>/dev/null || echo "No git changes"` + +### Coding Standards + +#### Python Style +- [ ] Line length ≤ 120 characters +- [ ] Type hints on all public functions +- [ ] Apache 2.0 license header on new files +- [ ] Imports sorted (stdlib → external → local) with trailing comma +- [ ] Uses `get_logger_adapter(__name__)` for logging + +#### Security +- [ ] No hardcoded credentials or secrets +- [ ] Input validation on external data +- [ ] Safe subprocess usage (no shell=True with user input) +- [ ] Proper error handling for privileged operations + +#### Profiler-Specific +- [ ] Handles process termination gracefully +- [ ] Cleans up resources in `stop()` method +- [ ] Respects `stop_event` for cancellation +- [ ] Handles missing/unavailable profiler tools + +#### Testing +- [ ] Tests added for new functionality +- [ ] Tests run with root privileges considered +- [ ] Docker fixtures used for runtime testing + +#### Documentation +- [ ] README updated for user-facing changes +- [ ] Docstrings on complex functions +- [ ] CLI help text for new arguments + +### Review Instructions + +1. Check the diff above for violations +2. Run `./lint.sh --ci` to verify formatting +3. Verify type hints with `mypy .` +4. Look for common issues: + - Missing error handling in profiler code + - Unclosed file handles or processes + - Missing cleanup in exception paths + - Hardcoded paths that should be configurable + +--- + +## TODO: Skill Content to Add + +- [ ] **Add code pattern examples** - Good vs bad code patterns +- [ ] **Add profiler-specific review rules** - Per-profiler considerations +- [ ] **Add performance review criteria** - Overhead and efficiency checks +- [ ] **Add security checklist expansion** - More security review items +- [ ] **Add test coverage requirements** - Minimum coverage expectations +- [ ] **Add backwards compatibility checks** - CLI and API stability +- [ ] **Add resource cleanup patterns** - File handle and process cleanup +- [ ] **Add logging review guidelines** - What and how to log diff --git a/.claude/skills/test/SKILL.md b/.claude/skills/test/SKILL.md new file mode 100644 index 000000000..981becc4e --- /dev/null +++ b/.claude/skills/test/SKILL.md @@ -0,0 +1,152 @@ +--- +name: test +description: Run gProfiler test suite. Use when the user wants to run tests, check if tests pass, or test specific functionality. +allowed-tools: Bash(sudo *) Bash(pytest *) Bash(python3 -m pytest *) Bash(./tests/test.sh *) Bash(cd *) Read +--- + +## gProfiler Test Suite + +**Important:** Tests require root privileges (sudo) for profiling system resources. + +### Quick Commands + +```bash +# Full test suite +sudo ./tests/test.sh + +# Executable-only tests (no Docker resources needed) +sudo ./tests/test.sh --executable + +# Specific test file +cd tests && sudo python3 -m pytest -v test_perf.py + +# Specific test by name +cd tests && sudo python3 -m pytest -v -k "test_java_profiling" + +# Run with verbose output +cd tests && sudo python3 -m pytest -v -s test_sanity.py +``` + +### Test Categories + +| Test File | Purpose | Notes | +|-----------|---------|-------| +| `test_sanity.py` | Basic smoke tests | Quick validation | +| `test_java.py` | Java profiler tests | Tests JDK 8-25, largest file (51KB) | +| `test_python.py` | Python profiler tests | py-spy and PyPerf | +| `test_perf.py` | System profiler tests | perf integration | +| `test_merge.py` | Profile merging | Data aggregation | +| `test_appids.py` | App identification | Metadata extraction | +| `test_executable.py` | Binary mode tests | PyInstaller build | +| `test_app_metadata.py` | Metadata collection | Frequently modified | + +### Known Flaky Tests (from 3-year history) + +**PyPerf/Python eBPF tests:** +- TMPDIR setup issues +- Non-blocking I/O problems +- Solution: Tests use `@pytest.mark.flaky(reruns=3, reruns_delay=2)` + +**Java tests:** +- JDK version compatibility +- AsyncProfiler DSO directory permissions +- Temurin vs OpenJDK differences + +**Dotnet tests:** +- patchelf installation required +- Timeout increases needed for newer .NET SDK + +### Test Infrastructure + +**conftest.py fixtures (most modified file - 13 changes in 3 years):** +- Docker container management +- Test application builders +- Resource cleanup +- Architecture detection (x86_64/aarch64) + +**pytest plugins:** +- `pytest-rerunfailures` - Retry flaky tests +- `pytest-timeout` - Prevent hanging tests + +### Running Tests for Specific Profilers + +```bash +# Java profiler +cd tests && sudo python3 -m pytest -v test_java.py + +# Python profiler (py-spy) +cd tests && sudo python3 -m pytest -v test_python.py -k "pyspy" + +# Python profiler (PyPerf/eBPF) +cd tests && sudo python3 -m pytest -v test_python.py -k "pyperf" + +# System profiler (perf) +cd tests && sudo python3 -m pytest -v test_perf.py + +# Sanity tests with latest JVMs +cd tests && sudo python3 -m pytest -v -k "test_sanity_latest_jvms" +``` + +### ARM/Aarch64 Testing + +From history: ARM tests have recurring issues +- `in_container` fixture for container tests +- Platform-specific test skips +- getaddrinfo() EBUSY errors on ARM + +```bash +# Check architecture +uname -m + +# Run with architecture awareness +cd tests && sudo python3 -m pytest -v --ignore=test_bigdata.py +``` + +### Test Environment Setup + +```bash +# Install dev dependencies +pip3 install -r dev-requirements.txt + +# Ensure root has same packages +sudo pip3 install -r dev-requirements.txt + +# Copy resources (if testing from source) +./scripts/copy_resources_from_image.sh +``` + +### Debugging Test Failures + +```bash +# Run with full output +cd tests && sudo python3 -m pytest -v -s --tb=long test_name.py + +# Run specific test with debugging +cd tests && sudo python3 -m pytest -v -s -k "test_specific_name" --pdb + +# Check Docker containers +docker ps -a | grep gprofiler +docker logs +``` + +### CI Test Workflow + +`.github/workflows/build-test-deploy.yml`: +1. Build executable (x86_64, aarch64) +2. Run executable tests +3. Build container with profilers +4. Run container tests +5. Deploy on tag push + +--- + +## TODO: Skill Content to Add + +- [ ] **Add test fixture documentation** - Explain each conftest.py fixture in detail +- [ ] **Add example test patterns** - Copy-paste templates for new profiler tests +- [ ] **Add Docker test image list** - Complete list of runtime test images +- [ ] **Add test environment variables** - Document all test-related env vars +- [ ] **Add local test setup guide** - Step-by-step for first-time test runners +- [ ] **Add test output interpretation** - How to read test results and logs +- [ ] **Add flaky test retry patterns** - Document retry decorator usage +- [ ] **Add CI test matrix details** - What tests run on which platforms diff --git a/.claude/skills/version-release/SKILL.md b/.claude/skills/version-release/SKILL.md new file mode 100644 index 000000000..c7ea83cee --- /dev/null +++ b/.claude/skills/version-release/SKILL.md @@ -0,0 +1,127 @@ +--- +name: version-release +description: Release a new version of gProfiler. Use when the user wants to bump the version, create a release, or tag a new version. +disable-model-invocation: true +allowed-tools: Bash(git *) Bash(gh *) Read Edit +--- + +## gProfiler Version Release Process + +**Context from history:** 22 version bumps in 3 years (1.34.0 → 1.56.2) + +### Current Version + +```! +grep -E "^__version__" gprofiler/__init__.py 2>/dev/null || echo "Version not found" +``` + +### Release Steps + +**Step 1: Update Version** + +Edit `gprofiler/__init__.py`: +```python +__version__ = "X.Y.Z" +``` + +**Step 2: Create PR** +```bash +git checkout -b bump-version-X.Y.Z +git add gprofiler/__init__.py +git commit -m "Bump version to X.Y.Z" +git push -u origin bump-version-X.Y.Z +gh pr create --title "Bump version to X.Y.Z" --body "Version bump for release X.Y.Z" +``` + +**Step 3: Merge PR** + +Wait for CI to pass, then merge. + +**Step 4: Create Tag** +```bash +git checkout master +git pull +git tag vX.Y.Z +git push origin vX.Y.Z +``` + +### CI/CD on Tag Push + +When a tag is pushed, `.github/workflows/build-test-deploy.yml`: +1. Builds executables (x86_64, aarch64) +2. Builds Docker containers +3. Runs full test suite +4. Deploys to Docker Hub (`intel/gprofiler:X.Y.Z`, `intel/gprofiler:latest`) +5. Creates GitHub release with executables + +### Version Numbering + +Semantic versioning: `MAJOR.MINOR.PATCH` +- **MAJOR:** Breaking changes +- **MINOR:** New features, backward compatible +- **PATCH:** Bug fixes, security updates + +### Recent Version History + +From git history: +``` +1.56.2 - Current +1.56.1 +1.56.0 +... +1.34.0 - ~3 years ago +``` + +### Pre-Release Checklist + +- [ ] All tests pass +- [ ] Linters pass +- [ ] CHANGELOG updated (if maintained) +- [ ] README updated for new features +- [ ] Version number follows semantic versioning +- [ ] No pending security issues + +### Hotfix Release + +For urgent fixes: +```bash +# Branch from tag +git checkout vX.Y.Z +git checkout -b hotfix-X.Y.Z+1 + +# Make fix +# ... + +# Update version +# Edit gprofiler/__init__.py + +# Create PR to master AND tag +git push -u origin hotfix-X.Y.Z+1 +gh pr create --title "Hotfix: " --body "..." +``` + +### Docker Hub Images + +After release: +- `intel/gprofiler:latest` +- `intel/gprofiler:X.Y.Z` + +### GitHub Release + +Created automatically with: +- `gprofiler_x86_64` executable +- `gprofiler_aarch64` executable +- Release notes (from tag message or PR description) + +--- + +## TODO: Skill Content to Add + +- [ ] **Add version history table** - Recent versions with highlights +- [ ] **Add release artifact checksums** - How to verify downloads +- [ ] **Add Docker Hub tag convention** - Tagging strategy explanation +- [ ] **Add rollback procedure** - How to revert a bad release +- [ ] **Add release announcement template** - Standard release notes format +- [ ] **Add compatibility notes** - What to check for breaking changes +- [ ] **Add post-release verification** - Steps to verify successful release +- [ ] **Add emergency hotfix process** - Detailed hotfix workflow