klipitkas / opik-php
Opik PHP SDK - LLM observability and evaluation platform
Installs: 26
Dependents: 0
Suggesters: 0
Security: 0
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/klipitkas/opik-php
Requires
- php: ^8.1
- guzzlehttp/guzzle: ^7.9
- psr/log: ^3.0
- ramsey/uuid: ^4.7
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.64
- phpstan/phpstan: ^2.0
- phpunit/phpunit: ^10.5 || ^11.0
README
PHP SDK for Opik - an LLM observability and evaluation platform.
NOTE: This is a community-maintained SDK, not an official Comet ML product. For official SDKs, see Python and TypeScript.
Table of Contents
SDK Comparison
This table compares feature coverage between the official SDKs and this community PHP SDK.
| Category | Feature | Python | TypeScript | PHP | Notes |
|---|---|---|---|---|---|
| Tracing | Traces & Spans | ✅ | ✅ | ✅ | Full support |
| Nested Spans | ✅ | ✅ | ✅ | Full support | |
| Search (OQL) | ✅ | ✅ | ✅ | Full support | |
| Span Types | ✅ | ✅ | ✅ | Full support | |
| Usage Tracking | ✅ | ✅ | ✅ | Full support | |
| Cost Calculation | ✅ | ✅ | ✅ | User-provided pricing | |
@track Decorator |
✅ | ✅ | ❌ | PHP lacks decorators | |
| Feedback | Feedback Scores | ✅ | ✅ | ✅ | Full support |
| Batch Feedback | ✅ | ✅ | ✅ | Full support | |
| Threads | ✅ | ❌ | ✅ | Full support | |
| Datasets | CRUD Operations | ✅ | ✅ | ✅ | Full support |
| Flexible Schema | ✅ | ✅ | ✅ | Full support | |
| JSON Import/Export | ✅ | ✅ | ✅ | Full support | |
| Experiments | Create & Manage | ✅ | ✅ | ✅ | Full support |
| Log Items | ✅ | ✅ | ✅ | Full support | |
| Prompts | Text Prompts | ✅ | ✅ | ✅ | Full support |
| Chat Prompts | ✅ | ✅ | ✅ | Full support | |
| Version History | ✅ | ✅ | ✅ | Full support | |
| Attachments | Upload/Download | ✅ | ❌ | ✅ | Full support |
| Evaluation | Heuristic Metrics | ✅ | ✅ | ✅ | ExactMatch, Contains, RegexMatch, IsJson, Equals, LevenshteinRatio |
| LLM Judge Metrics | ✅ | ✅ | ❌ | Not implemented | |
evaluate() |
✅ | ✅ | ✅ | Full support | |
| Integrations | OpenAI | ✅ | ✅ | ❌ | Not implemented |
| LangChain | ✅ | ✅ | ❌ | Not implemented | |
| Other Frameworks | ✅ | ✅ | ❌ | Not implemented | |
| Advanced | Guardrails | ✅ | ❌ | ❌ | Not implemented |
| Simulation | ✅ | ❌ | ❌ | Not implemented | |
| CLI Commands | ✅ | ❌ | ❌ | Not implemented |
Coverage Summary
| SDK | Core Features | Advanced Features | Overall |
|---|---|---|---|
| Python (Official) | 100% | 100% | 100% |
| TypeScript (Official) | ~90% | ~60% | ~80% |
| PHP (Community) | ~95% | ~25% | ~75% |
What's Missing in PHP SDK
High Priority (Core Functionality):
- LLM Judge Metrics (AnswerRelevance, Hallucination, etc.)
Medium Priority (Integrations):
- OpenAI integration for automatic tracing
- Other LLM provider integrations
Low Priority (Advanced):
- Guardrails (PII detection, topic filtering)
- Simulation framework
- CLI commands
- Local recording for testing
Contributing
Contributions are welcome! If you'd like to help implement missing features, please see the Development section.
Installation
Requirements: PHP 8.1+, Composer
composer require klipitkas/opik-php
Quick Start
<?php use Opik\OpikClient; use Opik\Tracer\SpanType; $client = new OpikClient(); // Create a trace $trace = $client->trace( name: 'chat-completion', input: ['messages' => [['role' => 'user', 'content' => 'Hello!']]], ); // Create an LLM span within the trace $span = $trace->span(name: 'openai-call', type: SpanType::LLM); $span->update( output: ['response' => 'Hi there!'], model: 'gpt-4', provider: 'openai', usage: new \Opik\Tracer\Usage(promptTokens: 10, completionTokens: 5, totalTokens: 15), ); $span->end(); // End trace and flush $trace->update(output: ['response' => 'Hi there!']); $trace->end(); $client->flush();
Configuration
Environment Variables
| Variable | Description | Required | Default |
|---|---|---|---|
OPIK_API_KEY |
API key | Yes (cloud) | - |
OPIK_WORKSPACE |
Workspace name | Yes (cloud) | - |
OPIK_PROJECT_NAME |
Project name | No | Default Project |
OPIK_URL_OVERRIDE |
Custom API URL | No | - |
OPIK_DEBUG |
Enable debug mode | No | false |
OPIK_ENABLE_COMPRESSION |
Enable gzip compression | No | true |
Setup Methods
# Cloud (recommended) export OPIK_API_KEY=your-api-key export OPIK_WORKSPACE=your-workspace export OPIK_PROJECT_NAME=your-project-name
// From environment (recommended) $client = new OpikClient(); // Explicit parameters $client = new OpikClient( apiKey: 'your-api-key', workspace: 'your-workspace', projectName: 'my-project', ); // Local development $client = new OpikClient(baseUrl: 'http://localhost:5173/api/'); // Verify credentials if ($client->authCheck()) { echo "Connected!"; }
Features
Tracing
Basic Trace with Spans
$trace = $client->trace(name: 'my-trace', input: ['query' => 'Hello']); $span = $trace->span(name: 'process', type: SpanType::LLM); $span->update(output: ['result' => 'Done']); $span->end(); $trace->end(); $client->flush();
Nested Spans
$trace = $client->trace(name: 'multi-step'); $parent = $trace->span(name: 'parent'); $child1 = $parent->span(name: 'step-1', type: SpanType::TOOL); $child1->end(); $child2 = $parent->span(name: 'step-2', type: SpanType::LLM); $child2->end(); $parent->end(); $trace->end();
Search Traces and Spans
// Search traces with OQL filter $traces = $client->searchTraces( projectName: 'my-project', filter: 'name = "chat-completion"', ); // Get specific trace/span $trace = $client->getTraceContent('trace-id'); $span = $client->getSpanContent('span-id');
Span Types
| Type | Description |
|---|---|
SpanType::GENERAL |
General purpose span |
SpanType::LLM |
LLM API call |
SpanType::TOOL |
Tool/function call |
SpanType::GUARDRAIL |
Guardrail check |
Cost Calculation
Calculate and track LLM costs using your own pricing:
use Opik\Cost\CostCalculator; use Opik\Tracer\Usage; $usage = new Usage(promptTokens: 1000, completionTokens: 500); // Using per-million token pricing (common format) $cost = CostCalculator::calculateFromMillionPricing( $usage, inputCostPerMillion: 2.50, // $2.50 per 1M input tokens outputCostPerMillion: 10.00, // $10.00 per 1M output tokens ); // Or using per-token pricing $cost = CostCalculator::calculate( $usage, inputCostPerToken: 0.0000025, outputCostPerToken: 0.00001, ); // Attach cost to span $span->update(totalCost: $cost);
Feedback Scores
On Traces and Spans
$trace = $client->trace(name: 'scored-trace'); // Numeric score $trace->logFeedbackScore(name: 'relevance', value: 0.95, reason: 'Good answer'); // Categorical score $span = $trace->span(name: 'llm-call', type: SpanType::LLM); $span->logFeedbackScore(name: 'sentiment', value: 1.0, categoryName: 'positive');
Batch Feedback Scores
use Opik\Feedback\FeedbackScore; // For traces $client->logTracesFeedbackScores([ FeedbackScore::forTrace('trace-1', 'quality', value: 0.9), FeedbackScore::forTrace('trace-2', 'quality', value: 0.85, reason: 'Good'), ]); // For spans $client->logSpansFeedbackScores([ FeedbackScore::forSpan('span-1', 'accuracy', value: 0.95), FeedbackScore::forSpan('span-2', 'accuracy', categoryName: 'high'), ]); // Delete feedback scores $client->deleteTraceFeedbackScore('trace-id', 'quality'); $client->deleteSpanFeedbackScore('span-id', 'accuracy');
Threads
Group related traces into conversations:
use Opik\Feedback\FeedbackScore; // Create traces in a thread $trace1 = $client->trace(name: 'user-msg-1', threadId: 'conversation-123'); $trace1->end(); $trace2 = $client->trace(name: 'user-msg-2', threadId: 'conversation-123'); $trace2->end(); $client->flush(); // Close thread before scoring $client->closeThread('conversation-123'); // Score the thread $client->logThreadsFeedbackScores([ FeedbackScore::forThread('conversation-123', 'satisfaction', value: 0.95), ]);
Datasets
Create and Populate
use Opik\Dataset\DatasetItem; $dataset = $client->getOrCreateDataset( name: 'eval-dataset', description: 'Test cases', ); // Standard schema $dataset->insert([ new DatasetItem( input: ['question' => 'What is PHP?'], expectedOutput: ['answer' => 'A programming language'], metadata: ['difficulty' => 'easy'], ), ]); // Flexible schema $dataset->insert([ new DatasetItem(data: [ 'prompt' => 'Translate: Hello', 'expected' => 'Bonjour', ]), ]);
Read and Manage
// Get items $items = $dataset->getItems(page: 1, size: 100); foreach ($items as $item) { $input = $item->getInput(); $output = $item->getExpectedOutput(); } // Update/delete $dataset->update($items); $dataset->delete(['item-id-1', 'item-id-2']); $dataset->clear(); // Delete all // List/delete datasets $datasets = $client->getDatasets(); $client->deleteDataset('dataset-name');
JSON Import/Export
// Import from JSON string $json = '[{"input": "question 1", "output": "answer 1"}, {"input": "question 2", "output": "answer 2"}]'; $dataset->insertFromJson($json); // Import with key mapping (rename keys) $json = '[{"Question": "What is PHP?", "Expected Answer": "A language"}]'; $dataset->insertFromJson($json, keysMapping: [ 'Question' => 'input', 'Expected Answer' => 'expected_output', ]); // Import while ignoring certain keys $dataset->insertFromJson($json, ignoreKeys: ['internal_id', 'debug_info']); // Export to JSON string $json = $dataset->toJson(); // Export with key mapping $json = $dataset->toJson(keysMapping: [ 'input' => 'Question', 'expected_output' => 'Expected Answer', ]);
Experiments
use Opik\Experiment\ExperimentItem; // Create experiment $experiment = $client->createExperiment( name: 'gpt-4-eval', datasetName: 'eval-dataset', ); // Log results $experiment->logItems([ new ExperimentItem( datasetItemId: 'item-1', traceId: 'trace-1', output: ['result' => 'Answer'], feedbackScores: [['name' => 'accuracy', 'value' => 0.9]], ), ]); // Manage experiments $experiment = $client->getExperimentById('experiment-id'); $client->updateExperiment(id: 'experiment-id', name: 'new-name'); $client->deleteExperiment('experiment-name');
Prompts
Opik supports two types of prompts: text prompts (simple string templates) and chat prompts (array of messages following OpenAI's chat format).
Text Prompts
// Create a text prompt $prompt = $client->createPrompt( name: 'greeting', template: 'Hello {{name}}, you asked: {{question}}', ); // Get and format $prompt = $client->getPrompt('greeting'); $text = $prompt->format(['name' => 'John', 'question' => 'How are you?']); // Returns: "Hello John, you asked: How are you?"
Chat Prompts
use Opik\Prompt\ChatMessage; // Create a chat prompt with messages array $prompt = $client->createPrompt( name: 'assistant-prompt', template: [ ChatMessage::system('You are a helpful assistant specializing in {{domain}}.'), ChatMessage::user('{{question}}'), ], ); // Format returns array of messages $messages = $prompt->format(['domain' => 'physics', 'question' => 'What is gravity?']); // Returns: // [ // ['role' => 'system', 'content' => 'You are a helpful assistant specializing in physics.'], // ['role' => 'user', 'content' => 'What is gravity?'], // ]
ChatMessage Factory Methods
| Method | Description |
|---|---|
ChatMessage::system($content) |
Create a system message |
ChatMessage::user($content) |
Create a user message |
ChatMessage::assistant($content) |
Create an assistant message |
ChatMessage::tool($content) |
Create a tool message |
Prompt Versions
// Get version history $history = $client->getPromptHistory('greeting'); // Get specific version $version = $prompt->getVersion('commit-hash'); // Check prompt type if ($version->isChat()) { $messages = $version->format($variables); } else { $text = $version->format($variables); }
Delete Prompts
$client->deletePrompts(['prompt-id-1', 'prompt-id-2']);
Attachments
Upload files to traces or spans:
use Opik\Attachment\AttachmentEntityType; $attachmentClient = $client->getAttachmentClient(); // Upload $attachmentClient->uploadAttachment( projectName: 'my-project', entityType: AttachmentEntityType::TRACE, entityId: $trace->getId(), filePath: '/path/to/file.pdf', ); // List $attachments = $attachmentClient->getAttachmentList( projectName: 'my-project', entityType: AttachmentEntityType::TRACE, entityId: $trace->getId(), ); // Download $content = $attachmentClient->downloadAttachment( projectName: 'my-project', entityType: AttachmentEntityType::TRACE, entityId: $trace->getId(), fileName: 'file.pdf', mimeType: 'application/pdf', );
Evaluation Metrics
The SDK provides heuristic metrics for evaluating LLM outputs:
use Opik\Evaluation\Metrics\ExactMatch; use Opik\Evaluation\Metrics\Contains; use Opik\Evaluation\Metrics\RegexMatch; use Opik\Evaluation\Metrics\IsJson; // ExactMatch - checks for exact equality $metric = new ExactMatch(); $result = $metric->score([ 'output' => 'hello world', 'expected' => 'hello world', ]); echo $result->value; // 1.0 (match) or 0.0 (no match) // Contains - checks if output contains expected substring $metric = new Contains(caseSensitive: false); $result = $metric->score([ 'output' => 'Hello World', 'expected' => 'hello', ]); echo $result->value; // 1.0 // RegexMatch - checks if output matches a regex pattern $metric = new RegexMatch(); $result = $metric->score([ 'output' => 'Contact: test@example.com', 'pattern' => '/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/', ]); echo $result->value; // 1.0 // IsJson - checks if output is valid JSON $metric = new IsJson(); $result = $metric->score([ 'output' => '{"key": "value"}', ]); echo $result->value; // 1.0
Available Metrics
| Metric | Description |
|---|---|
ExactMatch |
Checks if output exactly equals expected (strict comparison) |
Contains |
Checks if output contains expected substring (supports case-insensitive) |
RegexMatch |
Checks if output matches a regex pattern |
IsJson |
Checks if output is valid JSON |
Evaluation Function
Run evaluations against datasets with automatic experiment tracking:
use Opik\Evaluation\Metrics\ExactMatch; use Opik\Evaluation\Metrics\Contains; // Get or create a dataset $dataset = $client->getOrCreateDataset('qa-dataset'); $dataset->insert([ new DatasetItem(data: [ 'input' => 'What is PHP?', 'expected' => 'programming language', ]), new DatasetItem(data: [ 'input' => 'What is Python?', 'expected' => 'programming language', ]), ]); // Define your task function $task = function (array $item): array { // Your LLM call or processing logic here $response = $llm->complete($item['input']); return ['output' => $response]; }; // Run evaluation $result = $client->evaluate( dataset: $dataset, task: $task, scoringMetrics: [ new ExactMatch(), new Contains(), ], experimentName: 'my-evaluation', ); // Access results echo "Evaluated {$result->count()} items in {$result->durationSeconds}s\n"; echo "Average exact_match: {$result->getAverageScore('exact_match')}\n"; echo "Average contains: {$result->getAverageScore('contains')}\n"; // Get all average scores $averages = $result->getAverageScores(); foreach ($averages as $metric => $score) { echo "{$metric}: {$score}\n"; }
The evaluate() function:
- Creates an experiment for tracking results
- Runs the task function on each dataset item
- Calculates scores using the provided metrics
- Logs feedback scores to traces
- Returns detailed results with averages
API Reference
OpikClient Methods
| Category | Method | Description |
|---|---|---|
| Tracing | trace(...) |
Create a trace |
span(...) |
Create a standalone span | |
searchTraces(...) |
Search traces with OQL | |
searchSpans(...) |
Search spans with OQL | |
getTraceContent(id) |
Get trace by ID | |
getSpanContent(id) |
Get span by ID | |
| Feedback | logTracesFeedbackScores(scores) |
Batch log trace scores |
logSpansFeedbackScores(scores) |
Batch log span scores | |
logThreadsFeedbackScores(scores) |
Batch log thread scores | |
deleteTraceFeedbackScore(id, name) |
Delete trace score | |
deleteSpanFeedbackScore(id, name) |
Delete span score | |
| Threads | closeThread(id) |
Close a thread |
closeThreads(ids) |
Close multiple threads | |
| Datasets | getDataset(name) |
Get dataset |
getDatasets() |
List datasets | |
createDataset(name) |
Create dataset | |
getOrCreateDataset(name) |
Get or create dataset | |
deleteDataset(name) |
Delete dataset | |
| Experiments | createExperiment(name, datasetName) |
Create experiment |
getExperiment(name) |
Get by name | |
getExperimentById(id) |
Get by ID | |
updateExperiment(id, ...) |
Update experiment | |
deleteExperiment(name) |
Delete experiment | |
| Prompts | createPrompt(name, template) |
Create text or chat prompt |
getPrompt(name) |
Get prompt | |
getPrompts() |
List prompts | |
getPromptHistory(name) |
Get versions | |
deletePrompts(ids) |
Delete prompts | |
| Attachments | getAttachmentClient() |
Get attachment client |
| Evaluation | evaluate(dataset, task, ...) |
Run evaluation with metrics |
| Utilities | authCheck() |
Verify credentials |
flush() |
Send pending data | |
getConfig() |
Get configuration | |
getProjectUrl() |
Get project URL |
Trace Methods
| Method | Description |
|---|---|
span(name, type?, ...) |
Create child span |
update(output?, ...) |
Update trace data |
end() |
End the trace |
logFeedbackScore(name, value, ...) |
Log feedback score |
getId() |
Get trace ID |
Span Methods
| Method | Description |
|---|---|
span(name, type?, ...) |
Create child span |
update(output?, model?, usage?, ...) |
Update span data |
end() |
End the span |
logFeedbackScore(name, value, ...) |
Log feedback score |
getId() |
Get span ID |
Development
# Install dependencies composer install # Run tests composer test # Run with coverage (requires pcov/xdebug) composer test:coverage # Static analysis composer analyse # Code formatting composer format composer format:check
License
MIT
Trademarks
Opik and Comet ML are trademarks of Comet ML, Inc. This project is not affiliated with, endorsed by, or sponsored by Comet ML, Inc.