README

PHP SDK for Opik - an LLM observability and evaluation platform.

NOTE: This is a community-maintained SDK, not an official Comet ML product. For official SDKs, see Python and TypeScript.

SDK Comparison
Installation
Quick Start
Configuration
Features
- Tracing
- Feedback Scores
- Threads
- Datasets
- Experiments
- Prompts
- Attachments
API Reference
Development

SDK Comparison

This table compares feature coverage between the official SDKs and this community PHP SDK.

Category	Feature	Python	TypeScript	PHP	Notes
Tracing	Traces & Spans	✅	✅	✅	Full support
	Nested Spans	✅	✅	✅	Full support
	Search (OQL)	✅	✅	✅	Full support
	Span Types	✅	✅	✅	Full support
	Usage Tracking	✅	✅	✅	Full support
	Cost Calculation	✅	✅	✅	User-provided pricing
	`@track` Decorator	✅	✅	❌	PHP lacks decorators
Feedback	Feedback Scores	✅	✅	✅	Full support
	Batch Feedback	✅	✅	✅	Full support
	Threads	✅	❌	✅	Full support
Datasets	CRUD Operations	✅	✅	✅	Full support
	Flexible Schema	✅	✅	✅	Full support
	JSON Import/Export	✅	✅	✅	Full support
Experiments	Create & Manage	✅	✅	✅	Full support
	Log Items	✅	✅	✅	Full support
Prompts	Text Prompts	✅	✅	✅	Full support
	Chat Prompts	✅	✅	✅	Full support
	Version History	✅	✅	✅	Full support
Attachments	Upload/Download	✅	❌	✅	Full support
Evaluation	Heuristic Metrics	✅	✅	✅	ExactMatch, Contains, RegexMatch, IsJson, Equals, LevenshteinRatio
	LLM Judge Metrics	✅	✅	❌	Not implemented
	`evaluate()`	✅	✅	✅	Full support
Integrations	OpenAI	✅	✅	❌	Not implemented
	LangChain	✅	✅	❌	Not implemented
	Other Frameworks	✅	✅	❌	Not implemented
Advanced	Guardrails	✅	❌	❌	Not implemented
	Simulation	✅	❌	❌	Not implemented
	CLI Commands	✅	❌	❌	Not implemented

Coverage Summary

SDK	Core Features	Advanced Features	Overall
Python (Official)	100%	100%	100%
TypeScript (Official)	~90%	~60%	~80%
PHP (Community)	~95%	~25%	~75%

What's Missing in PHP SDK

High Priority (Core Functionality):

LLM Judge Metrics (AnswerRelevance, Hallucination, etc.)

Medium Priority (Integrations):

OpenAI integration for automatic tracing
Other LLM provider integrations

Low Priority (Advanced):

Guardrails (PII detection, topic filtering)
Simulation framework
CLI commands
Local recording for testing

Contributing

Contributions are welcome! If you'd like to help implement missing features, please see the Development section.

Installation

Requirements: PHP 8.1+, Composer

composer require klipitkas/opik-php

Quick Start

<?php

use Opik\OpikClient;
use Opik\Tracer\SpanType;

$client = new OpikClient();

// Create a trace
$trace = $client->trace(
    name: 'chat-completion',
    input: ['messages' => [['role' => 'user', 'content' => 'Hello!']]],
);

// Create an LLM span within the trace
$span = $trace->span(name: 'openai-call', type: SpanType::LLM);
$span->update(
    output: ['response' => 'Hi there!'],
    model: 'gpt-4',
    provider: 'openai',
    usage: new \Opik\Tracer\Usage(promptTokens: 10, completionTokens: 5, totalTokens: 15),
);
$span->end();

// End trace and flush
$trace->update(output: ['response' => 'Hi there!']);
$trace->end();
$client->flush();

Configuration

Environment Variables

Variable	Description	Required	Default
`OPIK_API_KEY`	API key	Yes (cloud)	-
`OPIK_WORKSPACE`	Workspace name	Yes (cloud)	-
`OPIK_PROJECT_NAME`	Project name	No	`Default Project`
`OPIK_URL_OVERRIDE`	Custom API URL	No	-
`OPIK_DEBUG`	Enable debug mode	No	`false`
`OPIK_ENABLE_COMPRESSION`	Enable gzip compression	No	`true`

Setup Methods

# Cloud (recommended)
export OPIK_API_KEY=your-api-key
export OPIK_WORKSPACE=your-workspace
export OPIK_PROJECT_NAME=your-project-name

// From environment (recommended)
$client = new OpikClient();

// Explicit parameters
$client = new OpikClient(
    apiKey: 'your-api-key',
    workspace: 'your-workspace',
    projectName: 'my-project',
);

// Local development
$client = new OpikClient(baseUrl: 'http://localhost:5173/api/');

// Verify credentials
if ($client->authCheck()) {
    echo "Connected!";
}

Features

Tracing

Basic Trace with Spans

$trace = $client->trace(name: 'my-trace', input: ['query' => 'Hello']);

$span = $trace->span(name: 'process', type: SpanType::LLM);
$span->update(output: ['result' => 'Done']);
$span->end();

$trace->end();
$client->flush();

Nested Spans

$trace = $client->trace(name: 'multi-step');
$parent = $trace->span(name: 'parent');

$child1 = $parent->span(name: 'step-1', type: SpanType::TOOL);
$child1->end();

$child2 = $parent->span(name: 'step-2', type: SpanType::LLM);
$child2->end();

$parent->end();
$trace->end();

Search Traces and Spans

// Search traces with OQL filter
$traces = $client->searchTraces(
    projectName: 'my-project',
    filter: 'name = "chat-completion"',
);

// Get specific trace/span
$trace = $client->getTraceContent('trace-id');
$span = $client->getSpanContent('span-id');

Span Types

Type	Description
`SpanType::GENERAL`	General purpose span
`SpanType::LLM`	LLM API call
`SpanType::TOOL`	Tool/function call
`SpanType::GUARDRAIL`	Guardrail check

Cost Calculation

Calculate and track LLM costs using your own pricing:

use Opik\Cost\CostCalculator;
use Opik\Tracer\Usage;

$usage = new Usage(promptTokens: 1000, completionTokens: 500);

// Using per-million token pricing (common format)
$cost = CostCalculator::calculateFromMillionPricing(
    $usage,
    inputCostPerMillion: 2.50,   // $2.50 per 1M input tokens
    outputCostPerMillion: 10.00, // $10.00 per 1M output tokens
);

// Or using per-token pricing
$cost = CostCalculator::calculate(
    $usage,
    inputCostPerToken: 0.0000025,
    outputCostPerToken: 0.00001,
);

// Attach cost to span
$span->update(totalCost: $cost);

Feedback Scores

On Traces and Spans

$trace = $client->trace(name: 'scored-trace');

// Numeric score
$trace->logFeedbackScore(name: 'relevance', value: 0.95, reason: 'Good answer');

// Categorical score
$span = $trace->span(name: 'llm-call', type: SpanType::LLM);
$span->logFeedbackScore(name: 'sentiment', value: 1.0, categoryName: 'positive');

Batch Feedback Scores

use Opik\Feedback\FeedbackScore;

// For traces
$client->logTracesFeedbackScores([
    FeedbackScore::forTrace('trace-1', 'quality', value: 0.9),
    FeedbackScore::forTrace('trace-2', 'quality', value: 0.85, reason: 'Good'),
]);

// For spans
$client->logSpansFeedbackScores([
    FeedbackScore::forSpan('span-1', 'accuracy', value: 0.95),
    FeedbackScore::forSpan('span-2', 'accuracy', categoryName: 'high'),
]);

// Delete feedback scores
$client->deleteTraceFeedbackScore('trace-id', 'quality');
$client->deleteSpanFeedbackScore('span-id', 'accuracy');

Threads

Group related traces into conversations:

use Opik\Feedback\FeedbackScore;

// Create traces in a thread
$trace1 = $client->trace(name: 'user-msg-1', threadId: 'conversation-123');
$trace1->end();

$trace2 = $client->trace(name: 'user-msg-2', threadId: 'conversation-123');
$trace2->end();
$client->flush();

// Close thread before scoring
$client->closeThread('conversation-123');

// Score the thread
$client->logThreadsFeedbackScores([
    FeedbackScore::forThread('conversation-123', 'satisfaction', value: 0.95),
]);

Datasets

Create and Populate

use Opik\Dataset\DatasetItem;

$dataset = $client->getOrCreateDataset(
    name: 'eval-dataset',
    description: 'Test cases',
);

// Standard schema
$dataset->insert([
    new DatasetItem(
        input: ['question' => 'What is PHP?'],
        expectedOutput: ['answer' => 'A programming language'],
        metadata: ['difficulty' => 'easy'],
    ),
]);

// Flexible schema
$dataset->insert([
    new DatasetItem(data: [
        'prompt' => 'Translate: Hello',
        'expected' => 'Bonjour',
    ]),
]);

Read and Manage

// Get items
$items = $dataset->getItems(page: 1, size: 100);
foreach ($items as $item) {
    $input = $item->getInput();
    $output = $item->getExpectedOutput();
}

// Update/delete
$dataset->update($items);
$dataset->delete(['item-id-1', 'item-id-2']);
$dataset->clear(); // Delete all

// List/delete datasets
$datasets = $client->getDatasets();
$client->deleteDataset('dataset-name');

JSON Import/Export

// Import from JSON string
$json = '[{"input": "question 1", "output": "answer 1"}, {"input": "question 2", "output": "answer 2"}]';
$dataset->insertFromJson($json);

// Import with key mapping (rename keys)
$json = '[{"Question": "What is PHP?", "Expected Answer": "A language"}]';
$dataset->insertFromJson($json, keysMapping: [
    'Question' => 'input',
    'Expected Answer' => 'expected_output',
]);

// Import while ignoring certain keys
$dataset->insertFromJson($json, ignoreKeys: ['internal_id', 'debug_info']);

// Export to JSON string
$json = $dataset->toJson();

// Export with key mapping
$json = $dataset->toJson(keysMapping: [
    'input' => 'Question',
    'expected_output' => 'Expected Answer',
]);

Experiments

use Opik\Experiment\ExperimentItem;

// Create experiment
$experiment = $client->createExperiment(
    name: 'gpt-4-eval',
    datasetName: 'eval-dataset',
);

// Log results
$experiment->logItems([
    new ExperimentItem(
        datasetItemId: 'item-1',
        traceId: 'trace-1',
        output: ['result' => 'Answer'],
        feedbackScores: [['name' => 'accuracy', 'value' => 0.9]],
    ),
]);

// Manage experiments
$experiment = $client->getExperimentById('experiment-id');
$client->updateExperiment(id: 'experiment-id', name: 'new-name');
$client->deleteExperiment('experiment-name');

Prompts

Opik supports two types of prompts: text prompts (simple string templates) and chat prompts (array of messages following OpenAI's chat format).

Text Prompts

// Create a text prompt
$prompt = $client->createPrompt(
    name: 'greeting',
    template: 'Hello {{name}}, you asked: {{question}}',
);

// Get and format
$prompt = $client->getPrompt('greeting');
$text = $prompt->format(['name' => 'John', 'question' => 'How are you?']);
// Returns: "Hello John, you asked: How are you?"

Chat Prompts

use Opik\Prompt\ChatMessage;

// Create a chat prompt with messages array
$prompt = $client->createPrompt(
    name: 'assistant-prompt',
    template: [
        ChatMessage::system('You are a helpful assistant specializing in {{domain}}.'),
        ChatMessage::user('{{question}}'),
    ],
);

// Format returns array of messages
$messages = $prompt->format(['domain' => 'physics', 'question' => 'What is gravity?']);
// Returns:
// [
//     ['role' => 'system', 'content' => 'You are a helpful assistant specializing in physics.'],
//     ['role' => 'user', 'content' => 'What is gravity?'],
// ]

ChatMessage Factory Methods

Method	Description
`ChatMessage::system($content)`	Create a system message
`ChatMessage::user($content)`	Create a user message
`ChatMessage::assistant($content)`	Create an assistant message
`ChatMessage::tool($content)`	Create a tool message

Prompt Versions

// Get version history
$history = $client->getPromptHistory('greeting');

// Get specific version
$version = $prompt->getVersion('commit-hash');

// Check prompt type
if ($version->isChat()) {
    $messages = $version->format($variables);
} else {
    $text = $version->format($variables);
}

Delete Prompts

$client->deletePrompts(['prompt-id-1', 'prompt-id-2']);

Attachments

Upload files to traces or spans:

use Opik\Attachment\AttachmentEntityType;

$attachmentClient = $client->getAttachmentClient();

// Upload
$attachmentClient->uploadAttachment(
    projectName: 'my-project',
    entityType: AttachmentEntityType::TRACE,
    entityId: $trace->getId(),
    filePath: '/path/to/file.pdf',
);

// List
$attachments = $attachmentClient->getAttachmentList(
    projectName: 'my-project',
    entityType: AttachmentEntityType::TRACE,
    entityId: $trace->getId(),
);

// Download
$content = $attachmentClient->downloadAttachment(
    projectName: 'my-project',
    entityType: AttachmentEntityType::TRACE,
    entityId: $trace->getId(),
    fileName: 'file.pdf',
    mimeType: 'application/pdf',
);

Evaluation Metrics

The SDK provides heuristic metrics for evaluating LLM outputs:

use Opik\Evaluation\Metrics\ExactMatch;
use Opik\Evaluation\Metrics\Contains;
use Opik\Evaluation\Metrics\RegexMatch;
use Opik\Evaluation\Metrics\IsJson;

// ExactMatch - checks for exact equality
$metric = new ExactMatch();
$result = $metric->score([
    'output' => 'hello world',
    'expected' => 'hello world',
]);
echo $result->value; // 1.0 (match) or 0.0 (no match)

// Contains - checks if output contains expected substring
$metric = new Contains(caseSensitive: false);
$result = $metric->score([
    'output' => 'Hello World',
    'expected' => 'hello',
]);
echo $result->value; // 1.0

// RegexMatch - checks if output matches a regex pattern
$metric = new RegexMatch();
$result = $metric->score([
    'output' => 'Contact: test@example.com',
    'pattern' => '/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/',
]);
echo $result->value; // 1.0

// IsJson - checks if output is valid JSON
$metric = new IsJson();
$result = $metric->score([
    'output' => '{"key": "value"}',
]);
echo $result->value; // 1.0

Available Metrics

Metric	Description
`ExactMatch`	Checks if output exactly equals expected (strict comparison)
`Contains`	Checks if output contains expected substring (supports case-insensitive)
`RegexMatch`	Checks if output matches a regex pattern
`IsJson`	Checks if output is valid JSON

Evaluation Function

Run evaluations against datasets with automatic experiment tracking:

use Opik\Evaluation\Metrics\ExactMatch;
use Opik\Evaluation\Metrics\Contains;

// Get or create a dataset
$dataset = $client->getOrCreateDataset('qa-dataset');
$dataset->insert([
    new DatasetItem(data: [
        'input' => 'What is PHP?',
        'expected' => 'programming language',
    ]),
    new DatasetItem(data: [
        'input' => 'What is Python?',
        'expected' => 'programming language',
    ]),
]);

// Define your task function
$task = function (array $item): array {
    // Your LLM call or processing logic here
    $response = $llm->complete($item['input']);
    return ['output' => $response];
};

// Run evaluation
$result = $client->evaluate(
    dataset: $dataset,
    task: $task,
    scoringMetrics: [
        new ExactMatch(),
        new Contains(),
    ],
    experimentName: 'my-evaluation',
);

// Access results
echo "Evaluated {$result->count()} items in {$result->durationSeconds}s\n";
echo "Average exact_match: {$result->getAverageScore('exact_match')}\n";
echo "Average contains: {$result->getAverageScore('contains')}\n";

// Get all average scores
$averages = $result->getAverageScores();
foreach ($averages as $metric => $score) {
    echo "{$metric}: {$score}\n";
}

The evaluate() function:

Creates an experiment for tracking results
Runs the task function on each dataset item
Calculates scores using the provided metrics
Logs feedback scores to traces
Returns detailed results with averages

API Reference

OpikClient Methods

Category	Method	Description
Tracing	`trace(...)`	Create a trace
	`span(...)`	Create a standalone span
	`searchTraces(...)`	Search traces with OQL
	`searchSpans(...)`	Search spans with OQL
	`getTraceContent(id)`	Get trace by ID
	`getSpanContent(id)`	Get span by ID
Feedback	`logTracesFeedbackScores(scores)`	Batch log trace scores
	`logSpansFeedbackScores(scores)`	Batch log span scores
	`logThreadsFeedbackScores(scores)`	Batch log thread scores
	`deleteTraceFeedbackScore(id, name)`	Delete trace score
	`deleteSpanFeedbackScore(id, name)`	Delete span score
Threads	`closeThread(id)`	Close a thread
	`closeThreads(ids)`	Close multiple threads
Datasets	`getDataset(name)`	Get dataset
	`getDatasets()`	List datasets
	`createDataset(name)`	Create dataset
	`getOrCreateDataset(name)`	Get or create dataset
	`deleteDataset(name)`	Delete dataset
Experiments	`createExperiment(name, datasetName)`	Create experiment
	`getExperiment(name)`	Get by name
	`getExperimentById(id)`	Get by ID
	`updateExperiment(id, ...)`	Update experiment
	`deleteExperiment(name)`	Delete experiment
Prompts	`createPrompt(name, template)`	Create text or chat prompt
	`getPrompt(name)`	Get prompt
	`getPrompts()`	List prompts
	`getPromptHistory(name)`	Get versions
	`deletePrompts(ids)`	Delete prompts
Attachments	`getAttachmentClient()`	Get attachment client
Evaluation	`evaluate(dataset, task, ...)`	Run evaluation with metrics
Utilities	`authCheck()`	Verify credentials
	`flush()`	Send pending data
	`getConfig()`	Get configuration
	`getProjectUrl()`	Get project URL

Trace Methods

Method	Description
`span(name, type?, ...)`	Create child span
`update(output?, ...)`	Update trace data
`end()`	End the trace
`logFeedbackScore(name, value, ...)`	Log feedback score
`getId()`	Get trace ID

Span Methods

Method	Description
`span(name, type?, ...)`	Create child span
`update(output?, model?, usage?, ...)`	Update span data
`end()`	End the span
`logFeedbackScore(name, value, ...)`	Log feedback score
`getId()`	Get span ID

Development

# Install dependencies
composer install

# Run tests
composer test

# Run with coverage (requires pcov/xdebug)
composer test:coverage

# Static analysis
composer analyse

# Code formatting
composer format
composer format:check

License

MIT

Trademarks

Opik and Comet ML are trademarks of Comet ML, Inc. This project is not affiliated with, endorsed by, or sponsored by Comet ML, Inc.

klipitkas / opik-php

Maintainers

Details