meirdick / laravel-cf-workersai
Cloudflare Workers AI provider for Laravel AI SDK with AI Gateway support.
Requires
- php: ^8.3
- laravel/ai: ^0.7 || ^0.8
Requires (Dev)
- orchestra/testbench: ^10.0 || ^11.0
- pestphp/pest: ^3.0 || ^4.0
- pestphp/pest-plugin-laravel: ^3.0 || ^4.0
README
A native Laravel AI provider for Cloudflare Workers AI with first-class support for Cloudflare AI Gateway.
- Text generation, embeddings, structured output, tool calling, streaming.
- Three URL shapes: direct Workers AI, AI Gateway routed, or arbitrary
/compatendpoint. - Reasoning content replay across tool-call turns.
#[Strict]JSON schema opt-in.- Provider options pass-through.
- Sub-agent tools, and MCP tools on laravel/ai
^0.8. - Streamed usage summed across tool-call steps.
- Retry policy and AI Gateway session affinity.
- Failover-ready: 429/402/502/503/504 map to laravel/ai's failoverable exceptions.
Requirements
- PHP
^8.3 laravel/ai ^0.7 || ^0.8
Installation
composer require meirdick/laravel-cf-workersai
The service provider auto-registers via package discovery. No manual wiring needed.
Configuration
Add a workers-ai provider to config/ai.php:
'providers' => [ 'workers-ai' => [ 'key' => env('CLOUDFLARE_AI_API_TOKEN'), 'account_id' => env('CLOUDFLARE_ACCOUNT_ID'), 'gateway' => env('CLOUDFLARE_AI_GATEWAY'), // optional // 'url' => env('CLOUDFLARE_AI_URL'), // optional escape hatch // 'default_max_tokens' => 4096, // override the package default ], ],
key is a Cloudflare API token with the Workers AI: Read permission, matching the credential key name every first-party laravel/ai provider uses. The api_key name from earlier releases of this package is still accepted as a fallback.
default_max_tokens
Cloudflare's /v1/chat/completions defaults to 256 tokens when max_completion_tokens is omitted — far too small for any non-trivial structured output, which then arrives mid-JSON with a misreported finish_reason: "stop". The package sends 4096 by default to defuse this. Override the value per-provider, or set it to null to fall back to Cloudflare's endpoint default. Per-call #[MaxTokens(...)] (or TextGenerationOptions::$maxTokens) always wins.
The package also normalizes Cloudflare's misreported stop-at-budget into FinishReason::Length so laravel/ai's length-aware retry primitives can react to truncated completions.
Endpoint resolution
There are three ways to configure the endpoint, in priority order:
url(explicit). When set, all requests go to this URL. Useful for/compator self-hosted gateways.account_id+gateway. Routes throughhttps://gateway.ai.cloudflare.com/v1/<account_id>/<gateway>/workers-ai/v1/...— get AI Gateway's caching, retries, cost tracking.account_idonly. Hits the direct Workers AI API athttps://api.cloudflare.com/client/v4/accounts/<account_id>/ai/v1/....
Quickstart
use function Laravel\Ai\agent; $response = agent('helper')->prompt('Say hi.', provider: 'workers-ai'); echo $response->text;
Use any Workers AI model — pass it as model::
agent('helper') ->prompt('Summarize this in one sentence.', provider: 'workers-ai', model: '@cf/meta/llama-3.3-70b-instruct-fp8-fast');
Embeddings
use Laravel\Ai\Embeddings; $vectors = Embeddings::for(['hello', 'world']) ->generate(provider: 'workers-ai', model: '@cf/baai/bge-base-en-v1.5');
Forward arbitrary fields with providerOptions:
Embeddings::for(['hello']) ->generate(provider: 'workers-ai', providerOptions: ['encoding_format' => 'base64']);
Streaming
foreach (agent('helper')->streamed('Tell me a story.', provider: 'workers-ai') as $event) { if ($event instanceof \Laravel\Ai\Events\TextDelta) { echo $event->text; } }
Reasoning-capable models (e.g. Kimi K2.5) emit ReasoningStart → ReasoningDelta → ReasoningEnd events before text.
Tools
use Laravel\Ai\Attributes\Tool; #[Tool(description: 'Look up the current weather.')] function getWeather(string $city): string { return "Sunny in {$city}."; } agent('helper')->withTools([getWeather(...)])->prompt('Weather in Tokyo?', provider: 'workers-ai');
Reasoning content from the tool-call turn is preserved and replayed in the follow-up automatically (providerContentBlocks).
Model choice matters for tool calling
Verified live against the production API (2026-06-11): @cf/meta/llama-3.3-70b-instruct-fp8-fast — the package's default text model — does not emit tool calls on the /v1 endpoint; it answers in prose instead. @cf/meta/llama-4-scout-17b-16e-instruct and @cf/openai/gpt-oss-120b tool-call correctly, but under tool_choice: auto open-weight models only choose to call a tool some of the time. When the tool must run, force it via provider options:
public function providerOptions(Lab|string $provider): array { // Custom drivers arrive as a plain string, not a Lab enum case. return $provider === 'workers-ai' ? ['tool_choice' => 'required'] : []; }
The package automatically relaxes a forced tool_choice back to auto on tool-result follow-up turns — otherwise the model is forced to call a tool again instead of answering, looping until max-steps with empty text.
Timeouts
laravel/ai resolves a 60-second timeout by default. Large models, structured output, and reasoning models on Workers AI can exceed it — observed live: a structured llama-3.3-70b request taking 60s+, and kimi-k2.6 taking 45s on a small prompt. Raise it per agent or per call:
use Laravel\Ai\Attributes\Timeout; #[Timeout(120)] class ExtractionAgent implements Agent { /* ... */ } // or per call: $agent->prompt('...', provider: 'workers-ai', timeout: 120);
A request that exceeds the timeout fails after a single attempt with a ConnectionException. (Before v0.3.0 the retry policy re-ran timed-out requests, turning a 60s timeout into ~3 minutes of wall time before failing.) Connect-phase failures and transient 502/503/504 responses are still retried with backoff.
Structured output
use Laravel\Ai\Attributes\Strict; #[Strict] // opt-in to strict JSON schema enforcement final class TaskAgent extends \Laravel\Ai\Agent {}
When #[Strict] is applied, strict: true is forwarded to Workers AI's /compat endpoint and the generated JSON schema requires all properties.
Reasoning models
Some Workers AI models (Kimi K2.5/K2.6, QwQ, Gemma) emit a chain of thought before their answer. Following the laravel/ai convention — the same one used to pass Anthropic thinking or Gemini thinkingConfig — reasoning is controlled through HasProviderOptions, not a dedicated attribute. The returned options are merged into the request body verbatim. Kimi uses chat_template_kwargs.thinking:
use Laravel\Ai\Contracts\HasProviderOptions; use Laravel\Ai\Enums\Lab; class AnalysisAgent implements Agent, HasProviderOptions { public function providerOptions(Lab|string $provider): array { // Workers AI is a custom driver, so $provider arrives as the string. return $provider === 'workers-ai' ? ['chat_template_kwargs' => ['thinking' => false]] : []; } }
Disabling reasoning (thinking => false) is the right default for structured-output and extraction work: reasoning is incompatible with response_format (the model spends its token budget thinking instead of conforming to the schema) and roughly triples latency. Verified live on Kimi K2.6 — structured calls return valid JSON in ~4s with thinking off versus busting both the schema and the 60s timeout with it on.
Enabling reasoning (thinking => true) suits free-form, latency-tolerant judgment tasks. The package then:
- captures the model's reasoning (under either the
reasoning_contentor the K2.6reasoningfield) and replays it across tool-call turns so multi-step tool loops stay coherent; - raises
max_completion_tokensto a 2048 floor when it would otherwise be lower, so the model isn't starved of answer tokens after reasoning (a small budget returnscontent: null/finish_reason: "length"). Pair it with a raised#[Timeout](see above).
AI Gateway
Set the gateway config key to route through Cloudflare AI Gateway. You get free caching, retries, cost analytics, and request logs in the Cloudflare dashboard.
'workers-ai' => [ 'key' => env('CLOUDFLARE_AI_API_TOKEN'), 'account_id' => env('CLOUDFLARE_ACCOUNT_ID'), 'gateway' => 'my-gateway', ],
A session-affinity header is sent automatically so successive related requests hit the same cache shard.
Models
Workers AI hosts dozens of open-weight models. See the Cloudflare Workers AI models catalog for current options. Common prefixes:
@cf/meta/...— Llama variants@cf/openai/...— OpenAI open-weight models on Cloudflare@cf/google/...— Gemma@cf/qwen/...,@cf/mistralai/...,@cf/microsoft/..., etc.@cf/baai/...— embedding models
Provider keys
The provider can be referenced as workers-ai (primary) or workersai (alias).
Versioning
This package follows Semantic Versioning. Compatible with laravel/ai ^0.7 || ^0.8 — the test suite runs against both bounds.
License
MIT