Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions .claude/skills/architecture/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
name: architecture
description: Understand gProfiler architecture and codebase structure. Use when the user asks how gProfiler works, wants to understand the codebase, or needs architectural guidance.
context: fork
user-invocable: true
---

## gProfiler Architecture Overview

### High-Level Architecture

```
┌─────────────────────────────────────────────────────────────┐
│ gprofiler/main.py │
│ (Orchestration Layer) │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────────┐│
│ │ perf │ │ Java │ │ Python │ │ Ruby │ │ .NET ││
│ │profiler │ │profiler │ │profiler │ │profiler │ │profiler││
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └───┬────┘│
│ └──────────┴──────────┴──────────┴───────────┘ │
│ ▼ │
│ gprofiler/merge.py │
│ (Profile Data Aggregation) │
├─────────────────────────────────────────────────────────────┤
│ Output Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Flamegraph │ │ Upload │ │ Local Output │ │
│ │ (HTML) │ │ (Studio) │ │ (collapsed) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```

### Key Components

#### 1. Profiler Registry (`gprofiler/profilers/registry.py`)
- Decorator-based profiler registration
- Runtime discovery of available profilers
- Configuration-based profiler selection

#### 2. Profiler Base (`gprofiler/profilers/profiler_base.py`)
- Abstract base class for all profilers
- Lifecycle: `start()` → `snapshot()` → `stop()`
- Common utilities for process discovery

#### 3. Individual Profilers (`gprofiler/profilers/*.py`)

| Profiler | Backend Tool | Key Features |
|----------|--------------|--------------|
| `perf.py` | Linux perf | System-wide, kernel stacks |
| `java.py` | async-profiler | JVM attach, allocation profiling |
| `python.py` | py-spy | No instrumentation needed |
| `python_ebpf.py` | PyPerf | eBPF-based, lower overhead |
| `ruby.py` | rbspy | Ruby VM sampling |
| `php.py` | phpspy | PHP process profiling |
| `dotnet.py` | dotnet-trace | .NET Core/5+ support |
| `node.py` | perf | V8 JavaScript profiling |

#### 4. Merge Layer (`gprofiler/merge.py`)
- Combines samples from multiple profilers
- Handles symbol resolution
- Produces unified stack traces

#### 5. Metadata Collection (`gprofiler/metadata/`)
- `application_identifiers.py` - Extracts app names from processes
- `system_metadata.py` - Collects host information
- Enriches profiles with context

### Data Flow

```
1. Process Discovery
└── Scan /proc for target processes

2. Profiler Selection
└── Match processes to appropriate profilers

3. Sampling
└── Each profiler collects stacks independently

4. Aggregation
└── merge.py combines all samples

5. Output
└── Generate flamegraph or upload to Studio
```

### Key Files to Understand

| File | Lines | Purpose |
|------|-------|---------|
| `main.py` | ~1500 | Entry point, CLI, orchestration |
| `profilers/perf.py` | ~500 | Core perf integration |
| `profilers/java.py` | ~1800 | Complex JVM profiling |
| `merge.py` | ~400 | Profile aggregation |
| `utils/perf_process.py` | ~200 | perf subprocess management |

### Extension Points

1. **Add new profiler**: Implement `ProfilerBase`, use `@register_profiler`
2. **Add metadata**: Extend `application_identifiers.py`
3. **New output format**: Modify `main.py` output handling
4. **New deployment**: Add to `deploy/` directory

### Instructions

When user asks about architecture:
1. Start with high-level overview above
2. Dive into specific component if asked
3. Reference actual code files with line numbers
4. Explain data flow through the system
162 changes: 162 additions & 0 deletions .claude/skills/arm-support/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
---
name: arm-support
description: Debug and fix ARM/Aarch64 compatibility issues. Use when the user encounters ARM-specific bugs or needs to add ARM support for a feature.
---

## ARM/Aarch64 Support Guide

**Context from history:** ARM issues are recurring (6+ commits fixing ARM-specific bugs in 3 years).

### Architecture Support Status

| Runtime | x86_64 | Aarch64 |
|---------|--------|---------|
| perf (native, Golang) | ✅ | ✅ |
| Java (async-profiler) | ✅ | ✅ |
| Python (py-spy) | ✅ | ✅ |
| Python (PyPerf eBPF) | ✅ | ❌ |
| Ruby (rbspy) | ✅ | ✅ |
| PHP (phpspy) | ✅ | ✅ (experimental) |
| NodeJS (perf) | ✅ | ✅ |
| .NET (dotnet-trace) | ✅ (exp) | ✅ (exp) |

### Common ARM Issues from History

**1. gProfiler ARM Build Failures**
```
# Commit: [Reliability] Fix gProfiler arm (#988)
# Commit: Fixes #977 gprofiler does not work on aarch64 (#978)
```

**2. Architecture Metadata**
```
# Commit: Fix arch metadata with Aarch64 (#837)
# Issue: Incorrect architecture reporting
```

**3. getaddrinfo() EBUSY**
```
# Commit: Fix getaddrinfo() EBUSY on ARM (#825)
# Issue: Network resolution fails intermittently on ARM
```

**4. Perf Build on ARM**
```
# Commit: Perf: fix v6.7 build on aarch64 (#891)
# Issue: perf tool compilation differences
```

### Checking Architecture

```bash
# Check current architecture
uname -m
# x86_64 or aarch64

# In Python
import platform
platform.machine()
```

### Building for ARM

```bash
# Native ARM build (on ARM host)
./scripts/build_aarch64_container.sh -t gprofiler:arm64
./scripts/build_aarch64_executable.sh

# Cross-build from x86_64 (slow, requires QEMU)
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
docker buildx create --name multiarch --driver docker-container --use
./scripts/build_aarch64_container.sh -t gprofiler:arm64
```

### Testing on ARM

```bash
# Run tests on ARM host
cd tests && sudo python3 -m pytest -v

# Skip x86_64-only tests
cd tests && sudo python3 -m pytest -v -k "not x86_64_only"

# Check for ARM-specific test markers
grep -r "aarch64\|arm64" tests/
```

### ARM-Specific Code Patterns

```python
import platform

def is_aarch64() -> bool:
return platform.machine() in ("aarch64", "arm64")

# Conditional logic for ARM
if is_aarch64():
# ARM-specific handling
pass
```

### Key Files for ARM Support

```
gprofiler/utils/ # Architecture detection utilities
scripts/build_aarch64_* # ARM build scripts
tests/conftest.py # Test fixtures with arch detection
```

### Debugging ARM Issues

```bash
# Check kernel version (ARM may have different capabilities)
uname -r

# Check perf support
perf list

# Check CPU info
cat /proc/cpuinfo | head -20

# Check if running in container
cat /proc/1/cgroup
```

### PyPerf/eBPF on ARM

**Note:** PyPerf (eBPF-based Python profiler) is NOT supported on ARM.

```python
# From gprofiler/profilers/python_ebpf.py
# PyPerf requires x86_64
if is_aarch64():
# Fall back to py-spy
pass
```

### CI/CD for ARM

`.github/workflows/build-test-deploy.yml`:
- Separate jobs for x86_64 and aarch64
- ARM builds use different runner labels
- Some tests may be skipped on ARM

### Commit Message Pattern

```
Fix <issue> on ARM/Aarch64 (#PR_NUMBER)
[Reliability] Fix gProfiler arm (#PR_NUMBER)
```

---

## TODO: Skill Content to Add

- [ ] **Add complete ARM compatibility table** - All profilers × ARM status
- [ ] **Add ARM kernel requirements** - Minimum kernel versions for features
- [ ] **Add Graviton-specific notes** - AWS Graviton 2/3 specific issues
- [ ] **Add ARM perf event support** - Which perf events work on ARM
- [ ] **Add ARM Docker tips** - Multi-arch build best practices
- [ ] **Add ARM test infrastructure** - How to run ARM tests locally
- [ ] **Add cross-compile troubleshooting** - Common QEMU issues
- [ ] **Add ARM cloud provider notes** - AWS, GCP, Azure ARM instances
90 changes: 90 additions & 0 deletions .claude/skills/ci-status/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
name: ci-status
description: Check CI pipeline status and troubleshoot failures. Use when the user wants to check build status, see why CI failed, or monitor workflow runs.
allowed-tools: Bash(gh *) Bash(git *)
---

## gProfiler CI/CD Status

### Current PR/Branch Status

```!
BRANCH=$(git branch --show-current)
echo "Branch: $BRANCH"
echo ""
# Try to get PR status
gh pr view --json state,statusCheckRollup,url 2>/dev/null || echo "No PR found for this branch"
```

### CI Workflows

| Workflow | Triggers | Purpose |
|----------|----------|---------|
| `linters.yml` | PR, push | Python/Shell/Dockerfile linting |
| `build-test-deploy.yml` | PR, push, schedule | Build & test executables/containers |
| `codeql.yml` | PR, push, schedule | Security scanning |
| `build-base-images.yml` | Manual | Base Docker image builds |

### Check Commands

```bash
# View PR checks
gh pr checks

# Watch checks until complete
gh pr checks --watch

# List recent workflow runs
gh run list --limit 10

# View specific run details
gh run view <run-id>

# View failed run logs
gh run view <run-id> --log-failed

# Re-run failed jobs
gh run rerun <run-id> --failed
```

### Common CI Failures

#### Linting Failures
```bash
# Fix locally
./lint.sh # Auto-format
./lint.sh --ci # Check mode (what CI runs)
mypy . # Type check
```

#### Build Failures
- Check Docker layer caching
- Verify base image availability
- Check for network/download issues

#### Test Failures
```bash
# Run specific failing test locally
sudo python3 -m pytest tests/test_<name>.py -v -k "test_name"
```

### Instructions

1. Check current CI status with commands above
2. If failed, identify which job failed
3. Fetch logs with `gh run view <id> --log-failed`
4. Suggest fixes based on error messages
5. Help re-run if it was a flaky failure

---

## TODO: Skill Content to Add

- [ ] **Add workflow file documentation** - Explain each workflow file
- [ ] **Add job dependency diagram** - Visual CI pipeline flow
- [ ] **Add runner specifications** - What runs on which runner type
- [ ] **Add common failure patterns** - Categorized CI failure examples
- [ ] **Add cache configuration** - How Docker layer caching works
- [ ] **Add artifact retention** - Where build artifacts are stored
- [ ] **Add required checks list** - Which checks must pass for merge
- [ ] **Add workflow dispatch guide** - How to manually trigger workflows
Loading
Loading