Skip to content

Latest commit

 

History

History
634 lines (489 loc) · 17.4 KB

File metadata and controls

634 lines (489 loc) · 17.4 KB

AI-Powered Database Analysis

PDOdb provides AI-powered database analysis capabilities, allowing you to get intelligent recommendations for query optimization, schema improvements, and performance tuning using various AI providers (OpenAI, Anthropic, Google, Microsoft, Ollama).

Table of Contents

Overview

The AI analysis feature combines traditional database analysis (EXPLAIN plans, schema inspection) with AI-powered recommendations to provide comprehensive optimization suggestions. This helps developers:

  • Optimize SQL queries - Get AI suggestions for improving query performance
  • Improve database schema - Receive recommendations for indexes, constraints, and table structure
  • Understand query execution - Combine EXPLAIN analysis with AI explanations
  • Get optimization suggestions - Receive actionable recommendations based on your database structure

Supported AI Providers

PDOdb supports multiple AI providers, each with their own strengths:

Provider Models Best For API Key Required
OpenAI gpt-4o-mini, gpt-4, gpt-3.5-turbo General analysis, fast responses Yes
Anthropic claude-3-5-sonnet, claude-3-opus Detailed analysis, long context Yes
Google gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash-001, gemini-flash-latest, gemini-pro-latest Multimodal analysis, large context Yes
Microsoft Azure OpenAI models Enterprise environments Yes
DeepSeek deepseek-chat, deepseek-reasoner Cost-effective, OpenAI-compatible Yes
Yandex gpt-oss-120b/latest Russian market, Yandex Cloud Yes (API key + folder ID)
Ollama Any local model (llama2, deepseek-coder, etc.) Privacy, offline use, no API costs No

Configuration

Environment Variables

Configure AI providers using environment variables:

# Default provider
export PDODB_AI_PROVIDER=openai

# OpenAI
export PDODB_AI_OPENAI_KEY=sk-...

# Anthropic
export PDODB_AI_ANTHROPIC_KEY=sk-ant-...

# Google
export PDODB_AI_GOOGLE_KEY=...
export PDODB_AI_GOOGLE_MODEL=gemini-2.5-flash  # Optional: gemini-2.5-pro, gemini-2.0-flash-001, gemini-flash-latest, gemini-pro-latest

# Microsoft Azure OpenAI
export PDODB_AI_MICROSOFT_KEY=...
export PDODB_AI_MICROSOFT_MODEL=gpt-4  # Optional: model name
# Also configure endpoint via config array (see below)

# DeepSeek
export PDODB_AI_DEEPSEEK_KEY=...
export PDODB_AI_DEEPSEEK_MODEL=deepseek-chat  # Optional: deepseek-reasoner (thinking mode)

# Yandex Cloud
export PDODB_AI_YANDEX_KEY=...
export PDODB_AI_YANDEX_FOLDER_ID=b1ge9k4rdlck8g72slht  # Required: Yandex Cloud folder ID
export PDODB_AI_YANDEX_MODEL=gpt-oss-120b/latest  # Optional: model name

# Ollama (local, no API key needed)
export PDODB_AI_OLLAMA_URL=http://localhost:11434
export PDODB_AI_OLLAMA_MODEL=llama3.2  # Optional: model name

# OpenAI
export PDODB_AI_OPENAI_MODEL=gpt-4o-mini  # Optional: gpt-4, gpt-3.5-turbo
export PDODB_AI_OPENAI_TEMPERATURE=0.7  # Optional: 0.0-2.0, default: 0.7
export PDODB_AI_OPENAI_MAX_TOKENS=2000  # Optional: default: 2000
export PDODB_AI_OPENAI_TIMEOUT=30  # Optional: seconds, default: 30

# Anthropic
export PDODB_AI_ANTHROPIC_MODEL=claude-3-5-sonnet-20241022  # Optional: claude-3-opus
export PDODB_AI_ANTHROPIC_TEMPERATURE=0.7  # Optional: 0.0-2.0, default: 0.7
export PDODB_AI_ANTHROPIC_MAX_TOKENS=2000  # Optional: default: 2000
export PDODB_AI_ANTHROPIC_TIMEOUT=30  # Optional: seconds, default: 30

# Google
export PDODB_AI_GOOGLE_TEMPERATURE=0.7  # Optional: 0.0-2.0, default: 0.7
export PDODB_AI_GOOGLE_MAX_TOKENS=8192  # Optional: default: 8192
export PDODB_AI_GOOGLE_TIMEOUT=30  # Optional: seconds, default: 30

# Microsoft
export PDODB_AI_MICROSOFT_TEMPERATURE=0.7  # Optional: 0.0-2.0, default: 0.7
export PDODB_AI_MICROSOFT_MAX_TOKENS=2000  # Optional: default: 2000
export PDODB_AI_MICROSOFT_TIMEOUT=30  # Optional: seconds, default: 30

# DeepSeek
export PDODB_AI_DEEPSEEK_TEMPERATURE=0.7  # Optional: 0.0-2.0, default: 0.7
export PDODB_AI_DEEPSEEK_MAX_TOKENS=2000  # Optional: default: 2000
export PDODB_AI_DEEPSEEK_TIMEOUT=30  # Optional: seconds, default: 30

# Yandex
export PDODB_AI_YANDEX_TEMPERATURE=0.7  # Optional: 0.0-2.0, default: 0.7
export PDODB_AI_YANDEX_MAX_TOKENS=4000  # Optional: default: 4000
export PDODB_AI_YANDEX_TIMEOUT=30  # Optional: seconds, default: 30

# Ollama
export PDODB_AI_OLLAMA_TEMPERATURE=0.7  # Optional: 0.0-2.0, default: 0.7
export PDODB_AI_OLLAMA_MAX_TOKENS=2000  # Optional: default: 2000
export PDODB_AI_OLLAMA_TIMEOUT=30  # Optional: seconds, default: 30

Configuration Array

You can also configure AI providers in your application config:

$db = new PdoDb('mysql', [
    'host' => 'localhost',
    'dbname' => 'mydb',
    'username' => 'user',
    'password' => 'pass',
    'ai' => [
        'provider' => 'openai',
        'openai_key' => 'sk-...',
        'anthropic_key' => 'sk-ant-...',
        'google_key' => '...',
        'microsoft_key' => '...',
        'deepseek_key' => '...',
        'yandex_key' => '...',
        'yandex_folder_id' => 'b1ge9k4rdlck8g72slht',  # Optional: can also be in providers.yandex.folder_id
        'ollama_url' => 'http://localhost:11434',
        'providers' => [
            'openai' => [
                'model' => 'gpt-4o-mini',
                'temperature' => 0.7,
                'max_tokens' => 2000,
                'timeout' => 30,
            ],
            'anthropic' => [
                'model' => 'claude-3-5-sonnet-20241022',
                'temperature' => 0.7,
                'max_tokens' => 2000,
                'timeout' => 30,
            ],
            'google' => [
                'model' => 'gemini-2.5-flash',  # or gemini-2.5-pro, gemini-2.0-flash-001, gemini-flash-latest, gemini-pro-latest
                'temperature' => 0.7,
                'max_tokens' => 8192,
                'timeout' => 30,
            ],
            'microsoft' => [
                'endpoint' => 'https://your-resource.openai.azure.com',
                'deployment' => 'gpt-4',
                'temperature' => 0.7,
                'max_tokens' => 2000,
                'timeout' => 30,
            ],
            'deepseek' => [
                'model' => 'deepseek-chat',  # or deepseek-reasoner (thinking mode)
                'temperature' => 0.7,
                'max_tokens' => 2000,
                'timeout' => 30,
            ],
            'yandex' => [
                'folder_id' => 'b1ge9k4rdlck8g72slht',  # Required: Yandex Cloud folder ID
                'model' => 'gpt-oss-120b/latest',  # Optional: model name
                'temperature' => 0.7,
                'max_tokens' => 4000,
                'timeout' => 30,
            ],
            'ollama' => [
                'model' => 'llama3.2',  # or any local model name
                'temperature' => 0.7,
                'max_tokens' => 2000,
                'timeout' => 30,
            ],
        ],
    ],
]);

Available Google Gemini Models

Google provides multiple model variants. Recommended models:

  • gemini-2.5-flash (default) - Stable, fast, versatile model with 1M token context
  • gemini-2.5-pro - Best for complex tasks, 1M token context, 65K output tokens
  • gemini-2.0-flash-001 - Stable version of Gemini 2.0 Flash (January 2025)
  • gemini-flash-latest - Always uses the latest Flash model
  • gemini-pro-latest - Always uses the latest Pro model

For a complete list of available models, use the provided script:

php check-google-models.php

This script will show all models that support generateContent with their token limits and descriptions.

Priority

Configuration priority (highest to lowest):

  1. Environment variables
  2. Config array
  3. Default values

Code-Level Integration

Using explainAiAdvice()

The explainAiAdvice() method extends the standard explainAdvice() with AI-powered recommendations:

use tommyknocker\pdodb\PdoDb;

$db = PdoDb::fromEnv();

// Get AI-enhanced analysis
$result = $db->find()
    ->from('users')
    ->where('email', 'user@example.com')
    ->explainAiAdvice(
        tableName: 'users',
        provider: 'openai',
        options: [
            'temperature' => 0.5,
            'max_tokens' => 2000,
        ]
    );

// Access base analysis (traditional EXPLAIN)
$baseAnalysis = $result->baseAnalysis;
echo "Issues found: " . count($baseAnalysis->issues) . "\n";
echo "Recommendations: " . count($baseAnalysis->recommendations) . "\n";

// Access AI analysis
echo "AI Analysis:\n";
echo $result->aiAnalysis . "\n";

// Provider information
echo "Provider: {$result->provider}\n";
echo "Model: {$result->model}\n";

Using AiAnalysisService Directly

For more control, use AiAnalysisService directly:

use tommyknocker\pdodb\PdoDb;
use tommyknocker\pdodb\ai\AiAnalysisService;

$db = PdoDb::fromEnv();

$aiService = new AiAnalysisService($db);

// Analyze a SQL query
$sql = "SELECT * FROM users WHERE email = ? AND status = 'active'";
$analysis = $aiService->analyzeQuery(
    sql: $sql,
    tableName: 'users',
    provider: 'anthropic',
    options: ['max_tokens' => 3000]
);

echo $analysis;

// Analyze database schema
$schemaAnalysis = $aiService->analyzeSchema(
    tableName: 'users',
    provider: 'openai'
);

echo $schemaAnalysis;

// Get optimization suggestions based on existing analysis
$baseAnalysis = [
    'issues' => [...],
    'recommendations' => [...],
];

$suggestions = $aiService->suggestOptimizations(
    analysis: $baseAnalysis,
    context: ['table_stats' => [...]],
    provider: 'google'
);

CLI Commands

pdodb ai query

Analyze query using explainAiAdvice() (includes both base and AI analysis):

pdodb ai query "SELECT * FROM orders o JOIN users u ON o.user_id = u.id" \
    --provider=anthropic \
    --table=orders

pdodb ai schema

Analyze database schema with AI:

# Analyze specific table
pdodb ai schema --table=users --provider=openai

# Analyze all tables
pdodb ai schema --provider=ollama --format=json

pdodb ai optimize

Get AI-powered optimization suggestions:

pdodb ai optimize --table=users --provider=openai --temperature=0.5

MCP Server

PDOdb includes an MCP (Model Context Protocol) server for integration with AI agents and IDEs that support MCP.

Starting the MCP Server

vendor/bin/pdodb-mcp

The server reads database configuration from environment variables (same as PdoDb::fromEnv()).

Available Tools

The MCP server exposes the following tools:

  1. get_schema - Get database schema information

    {
      "name": "get_schema",
      "arguments": {
        "table": "users"  // optional, omit for all tables
      }
    }
  2. explain_plan - Analyze SQL query with EXPLAIN plan and AI

    {
      "name": "explain_plan",
      "arguments": {
        "sql": "SELECT * FROM users WHERE id = 1",
        "table": "users",
        "provider": "openai"
      }
    }
  3. suggest_indexes - Get AI-powered index suggestions

    {
      "name": "suggest_indexes",
      "arguments": {
        "table": "users",
        "provider": "anthropic"
      }
    }
  4. explain_plan - Get AI analysis of query execution plan

    {
      "name": "explain_plan",
      "arguments": {
        "sql": "SELECT * FROM users",
        "table": "users",
        "provider": "openai"
      }
    }

Resources

The server also exposes database tables as resources:

  • table://users - Schema information for the users table
  • table://orders - Schema information for the orders table

Prompts

The server provides prompt templates:

  • explain_plan - Template for analyzing SQL queries with EXPLAIN plan
  • optimize_schema - Template for schema optimization

Examples

Example 1: Optimize a Slow Query

$db = PdoDb::fromEnv();

// Get AI analysis for a slow query
$result = $db->find()
    ->from('orders')
    ->join('users', 'orders.user_id = users.id')
    ->where('orders.status', 'pending')
    ->orderBy('orders.created_at', 'DESC')
    ->limit(100)
    ->explainAiAdvice(
        tableName: 'orders',
        provider: 'openai'
    );

// Check for critical issues
if ($result->baseAnalysis->hasCriticalIssues()) {
    echo "⚠️ Critical issues found!\n";
    foreach ($result->baseAnalysis->issues as $issue) {
        if ($issue->severity === 'critical') {
            echo "- {$issue->message}\n";
        }
    }
}

// Read AI recommendations
echo "\n🤖 AI Recommendations:\n";
echo $result->aiAnalysis . "\n";

Example 2: Schema Optimization

use tommyknocker\pdodb\ai\AiAnalysisService;

$db = PdoDb::fromEnv();
$aiService = new AiAnalysisService($db);

// Analyze entire schema
$analysis = $aiService->analyzeSchema(
    tableName: null,  // null = all tables
    provider: 'anthropic'
);

echo "Schema Analysis:\n";
echo $analysis . "\n";

Example 3: Using Ollama (Local, No API Key)

$db = new PdoDb('mysql', [
    'host' => 'localhost',
    'dbname' => 'mydb',
    'username' => 'user',
    'password' => 'pass',
    'ai' => [
        'provider' => 'ollama',
        'ollama_url' => 'http://localhost:11434',
        'providers' => [
            'ollama' => [
                'model' => 'deepseek-coder:6.7b',
                'max_tokens' => 1000,
            ],
        ],
    ],
]);

$result = $db->find()
    ->from('users')
    ->where('status', 'active')
    ->explainAiAdvice(provider: 'ollama');

echo $result->aiAnalysis;

Example 4: CLI Usage

# Get schema optimization suggestions
pdodb ai schema --table=orders --provider=anthropic

# Analyze with custom model and temperature
pdodb ai query "SELECT COUNT(*) FROM users" \
    --provider=openai \
    --model=gpt-4 \
    --temperature=0.3 \
    --max-tokens=1500

Best Practices

1. Choose the Right Provider

  • OpenAI (gpt-4o-mini): Fast, cost-effective for general analysis
  • Anthropic (claude-3-5-sonnet): Best for detailed, comprehensive analysis
  • Google (gemini-pro): Good balance of speed and quality
  • Ollama: Use for privacy-sensitive environments or when API costs are a concern

2. Provide Context

Always provide table names when analyzing queries:

// Good - provides context
$result = $db->find()
    ->from('users')
    ->where('email', 'user@example.com')
    ->explainAiAdvice(tableName: 'users');

// Less optimal - no table context
$result = $db->rawQuery('SELECT * FROM users WHERE email = ?')
    ->explainAiAdvice();

3. Combine with Base Analysis

AI analysis works best when combined with traditional EXPLAIN analysis:

$result = $db->find()
    ->from('orders')
    ->explainAiAdvice(tableName: 'orders');

// Check base analysis first
if ($result->baseAnalysis->hasCriticalIssues()) {
    // Handle critical issues
}

// Then review AI suggestions
echo $result->aiAnalysis;

4. Adjust Parameters for Your Use Case

  • Temperature: Lower (0.3-0.5) for precise, factual analysis. Higher (0.7-1.0) for creative suggestions.
  • Max Tokens: Increase for complex schemas or detailed analysis (2000-4000).

5. Cache AI Responses (Optional)

For frequently analyzed queries, consider caching AI responses:

// Simple caching example
$cacheKey = 'ai_analysis_' . md5($sql . $tableName);
$cached = $cache->get($cacheKey);

if ($cached === null) {
    $result = $db->find()->from('users')->explainAiAdvice();
    $cache->set($cacheKey, $result->aiAnalysis, 3600); // 1 hour
    $cached = $result->aiAnalysis;
}

6. Use MCP Server for IDE Integration

Integrate with AI-powered IDEs (Cursor, VS Code with MCP extensions) for real-time database analysis:

{
  "mcpServers": {
    "pdodb": {
      "command": "vendor/bin/pdodb-mcp",
      "env": {
        "PDODB_DRIVER": "mysql",
        "PDODB_HOST": "localhost",
        "PDODB_DATABASE": "mydb",
        "PDODB_USERNAME": "user",
        "PDODB_PASSWORD": "pass"
      }
    }
  }
}

Troubleshooting

Provider Not Available

If a provider is not available, check:

  1. API key is set correctly (for cloud providers)
  2. Ollama server is running (for Ollama)
  3. Network connectivity (for cloud providers)
use tommyknocker\pdodb\ai\providers\OpenAiProvider;
use tommyknocker\pdodb\ai\AiConfig;

$config = new AiConfig();
$provider = new OpenAiProvider($config);

if (!$provider->isAvailable()) {
    echo "OpenAI provider is not available. Check PDODB_AI_OPENAI_KEY.\n";
}

Model Not Found (Ollama)

If using Ollama and getting "model not found" errors:

# List available models
curl http://localhost:11434/api/tags

# Pull a model if needed
ollama pull deepseek-coder:6.7b

Rate Limiting

Cloud providers may rate limit requests. Consider:

  • Using Ollama for local development
  • Implementing request throttling
  • Caching AI responses

Cost Management

To manage API costs:

  • Use Ollama for local development
  • Use smaller models (gpt-4o-mini instead of gpt-4)
  • Reduce max_tokens for simpler queries
  • Cache AI responses for repeated queries

See Also