brynj-digital / laravel-scout-vectorize
Cloudflare Vectorize driver for Laravel Scout
Installs: 3
Dependents: 0
Suggesters: 0
Security: 0
Stars: 6
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/brynj-digital/laravel-scout-vectorize
Requires
- php: ^8.1
- guzzlehttp/guzzle: ^7.0
- illuminate/support: ^10.0|^11.0|^12.0
- laravel/scout: ^10.0|^11.0
Requires (Dev)
- orchestra/testbench: ^8.0|^9.0|^10.0
- phpunit/phpunit: ^10.0
This package is auto-updated.
Last update: 2025-12-03 17:45:09 UTC
README
A Laravel Scout driver for Cloudflare Vectorize, enabling semantic search using vector embeddings in your Laravel applications.
Features
- Semantic Search: Search by meaning, not just keywords
- Native Scout Integration: Works seamlessly with Laravel Scout
- Cloudflare Workers AI: Automatic embedding generation using Cloudflare's AI models
- Easy Setup: Simple configuration and migration from other Scout drivers
- Batch Operations: Efficient bulk indexing and deletion
- Multiple Models: Support for searching across different Eloquent models
Requirements
- PHP 8.1 or higher
- Laravel 10.x, 11.x, or 12.x
- Laravel Scout 10.x or 11.x
- A Cloudflare account with Vectorize enabled
- Cloudflare API token with Vectorize permissions
Installation
Install the package via Composer:
composer require brynj-digital/laravel-scout-vectorize
Publish the configuration file:
php artisan vendor:publish --tag=scout-vectorize-config
Configuration
1. Create a Vectorize Index
Create a Vectorize index in your Cloudflare dashboard or via the API:
# Using Wrangler CLI
npx wrangler vectorize create my-index --dimensions=768 --metric=cosine
The dimensions must match your chosen embedding model:
@cf/baai/bge-small-en-v1.5: 384 dimensions@cf/baai/bge-base-en-v1.5: 768 dimensions (default)@cf/baai/bge-large-en-v1.5: 1024 dimensions
Create Metadata Indexes
Create metadata indexes to enable efficient filtering. The model and key indexes are required for the driver to function correctly:
# Required: Create metadata index for model filtering npx wrangler vectorize create-metadata-index my-index --property-name=model --type=string # Required: Create metadata index for key filtering npx wrangler vectorize create-metadata-index my-index --property-name=key --type=number
Optional: Additional Metadata Indexes for where() Clauses
You can create additional metadata indexes for any custom fields you want to filter on using Scout's where() method:
# Example: Create index for filtering by status npx wrangler vectorize create-metadata-index my-index --property-name=status --type=string # Example: Create index for filtering by category_id npx wrangler vectorize create-metadata-index my-index --property-name=category_id --type=number # Example: Create index for boolean fields npx wrangler vectorize create-metadata-index my-index --property-name=in_stock --type=boolean
To use these filters, include the fields in your model's toSearchableArray():
public function toSearchableArray(): array { return [ // These fields are used for both embeddings AND metadata 'name' => $this->name, 'description' => $this->description, // These fields are stored as metadata for filtering // (included in embeddings but primarily for where() clauses) 'status' => $this->status, 'category_id' => $this->category_id, 'in_stock' => $this->in_stock, ]; }
Then use where() in your searches:
Product::search('laptop') ->where('status', 'active') ->where('in_stock', true) ->get();
How it works: All fields from toSearchableArray() are:
- Converted to text and used to generate the embedding vector for semantic search
- Stored as metadata for filtering with
where()clauses
This means you can search semantically while also applying exact-match filters.
2. Create API Token
You'll need a Cloudflare API token with Vectorize permissions to allow Laravel to interact with your Vectorize index.
Create the token in Cloudflare Dashboard:
- Log in to your Cloudflare Dashboard
- Navigate to My Profile (click your user icon in the top right)
- Select API Tokens from the left sidebar
- Click Create Token
- Choose Create Custom Token
- Configure your token:
- Token name: Give it a descriptive name (e.g., "Laravel Scout Vectorize")
- Permissions: Add the following two permissions:
- Account → Vectorize → Read
- Account → Vectorize → Write
- Account Resources: Select your specific account (or "All accounts" if needed)
- TTL: Set an expiration date or leave as default
- Click Continue to summary
- Review the permissions and click Create Token
- Important: Copy the token immediately - it will only be shown once
- Store the token securely (you'll add it to your
.envfile in the next step)
Token Permissions Summary
Your token must have these permissions:
- ✅ Vectorize Read - Allows reading from your Vectorize indexes
- ✅ Vectorize Write - Allows creating, updating, and deleting vectors
Security Note: Avoid using tokens with broader permissions (like "Account Settings: Read" or "Workers: Edit") unless absolutely necessary.
3. Environment Variables
Add the following to your .env file:
SCOUT_DRIVER=vectorize CLOUDFLARE_ACCOUNT_ID=your_account_id CLOUDFLARE_API_TOKEN=your_api_token CLOUDFLARE_VECTORIZE_INDEX=my-index CLOUDFLARE_EMBEDDING_MODEL=@cf/baai/bge-base-en-v1.5
4. Scout Configuration
Ensure Scout is configured in config/scout.php:
'driver' => env('SCOUT_DRIVER', 'vectorize'),
Usage
Basic Model Setup
Add the Searchable trait to your model:
use Laravel\Scout\Searchable; class Product extends Model { use Searchable; /** * Get the indexable data array for the model. */ public function toSearchableArray(): array { return [ 'name' => $this->name, 'description' => $this->description, 'brand' => $this->brand, 'category' => $this->category, ]; } }
Custom Text Conversion (Optional)
For more control over how your model is converted to searchable text, implement a toSearchableText() method:
class Product extends Model { use Searchable; /** * Convert the model to searchable text. * This method takes precedence over toSearchableArray(). */ public function toSearchableText(): string { return implode('. ', [ $this->name, $this->brand, $this->description, implode(' ', $this->tags ?? []), ]); } public function toSearchableArray(): array { return [ 'name' => $this->name, 'description' => $this->description, ]; } }
Searching
// Simple search $products = Product::search('wireless headphones')->get(); // Limit results $products = Product::search('laptop')->take(20)->get(); // Paginate results $products = Product::search('smartphone')->paginate(15); // Get raw search results with scores $results = Product::search('tablet')->raw();
Indexing
// Index a single model $product = Product::find(1); $product->searchable(); // Index all models Product::makeAllSearchable(); // Using artisan command php artisan scout:import "App\Models\Product"
Removing from Index
// Remove a single model $product->unsearchable(); // Remove all models of a type Product::removeAllFromSearch(); // Using artisan command php artisan scout:flush "App\Models\Product"
Model Observers
Scout automatically syncs your models when you create, update, or delete them:
// Automatically indexed $product = Product::create([ 'name' => 'Wireless Headphones', 'description' => 'High-quality Bluetooth headphones', ]); // Automatically re-indexed $product->update(['name' => 'Premium Wireless Headphones']); // Automatically removed from index $product->delete();
Practical Examples
E-commerce Product Search
use Laravel\Scout\Searchable; class Product extends Model { use Searchable; public function toSearchableArray(): array { return [ 'name' => $this->name, 'brand' => $this->brand, 'description' => $this->description, 'category' => $this->category->name, 'features' => implode(', ', $this->features ?? []), // Metadata for filtering 'status' => $this->status, 'price' => $this->price, 'in_stock' => $this->in_stock, ]; } } // Search with semantic understanding $results = Product::search('laptop for programming and gaming') ->where('in_stock', true) ->where('status', 'published') ->take(20) ->get();
Blog Article Search
class Article extends Model { use Searchable; public function toSearchableArray(): array { return [ 'title' => $this->title, 'excerpt' => $this->excerpt, 'content' => strip_tags($this->content), 'author' => $this->author->name, 'tags' => $this->tags->pluck('name')->join(', '), // Metadata 'category_id' => $this->category_id, 'published_at' => $this->published_at, 'status' => $this->status, ]; } public function toSearchableText(): string { // Custom text format for better embeddings return sprintf( '%s. %s. Written by %s. Tags: %s', $this->title, $this->excerpt, $this->author->name, $this->tags->pluck('name')->join(', ') ); } } // Find related articles $related = Article::search('introduction to machine learning') ->where('status', 'published') ->where('category_id', $article->category_id) ->take(5) ->get();
Documentation Search
class Documentation extends Model { use Searchable; public function toSearchableArray(): array { return [ 'title' => $this->title, 'content' => strip_tags($this->content), 'section' => $this->section, 'version' => $this->version, ]; } } // Semantic search in docs $docs = Documentation::search('how to handle file uploads') ->where('version', config('app.docs_version')) ->get();
Customer Support Ticket Search
class SupportTicket extends Model { use Searchable; public function toSearchableArray(): array { return [ 'subject' => $this->subject, 'description' => $this->description, 'customer_name' => $this->customer->name, 'category' => $this->category, // Metadata 'status' => $this->status, 'priority' => $this->priority, ]; } } // Find similar support tickets $similar = SupportTicket::search($newTicket->description) ->where('status', 'resolved') ->take(10) ->get();
Advanced Usage
Custom Search Callbacks
For advanced search requirements, use a callback:
$results = Product::search('laptop', function ($client, $query, $options) { // $client is the VectorizeClient instance return $client->search($query, 50, [ 'model' => Product::class, 'in_stock' => true, ]); })->get();
Using Where Clauses for Filtering
You can combine semantic search with metadata filtering:
// Search with filters $products = Product::search('gaming laptop') ->where('status', 'published') ->where('price', '< 2000') ->get(); // Multiple filters $articles = Article::search('machine learning') ->where('category', 'technology') ->where('published_at', '>', now()->subDays(30)) ->get();
Note: Filters are applied to metadata stored in Vectorize. Make sure the fields you filter on are:
- Included in your model's
toSearchableArray() - Have corresponding metadata indexes created in Vectorize (see Configuration section)
Querying the Client Directly
use ScoutVectorize\VectorizeClient; $client = app(VectorizeClient::class); // Get index information $info = $client->getIndexInfo(); // Manual search with filters $results = $client->search( query: 'wireless headphones', topK: 10, filter: ['status' => 'active'] ); // Generate embedding for text $embedding = $client->generateEmbedding('sample text'); // Batch upsert documents $client->batchUpsert([ [ 'id' => 'doc_1', 'text' => 'Document content', 'metadata' => ['category' => 'tech'], ], // ... more documents ]); // Delete vectors by IDs $client->deleteVectors(['doc_1', 'doc_2']);
Queueing Scout Operations
For better performance in production, queue your Scout operations:
// In config/scout.php 'queue' => true, // Specify queue connection and queue name 'queue' => [ 'connection' => env('SCOUT_QUEUE_CONNECTION', 'redis'), 'queue' => env('SCOUT_QUEUE_NAME', 'default'), ],
This will queue all indexing operations, preventing API rate limits and improving response times.
Available Commands
This package uses the standard Laravel Scout commands:
# Import all records of a model php artisan scout:import "App\Models\Product" # Flush all vectors for a specific model php artisan scout:flush "App\Models\Product"
How It Works
-
Indexing: When a model is indexed, the driver:
- Calls
toSearchableText()or flattenstoSearchableArray()to text - Generates an embedding using Cloudflare Workers AI
- Stores the vector in Cloudflare Vectorize with metadata
- Calls
-
Searching: When you search:
- Your query text is converted to an embedding
- Vectorize finds the most similar vectors
- Results are mapped back to your Eloquent models
- Models are fetched from your database and returned
-
Vector IDs: The driver prefixes vector IDs with the model class name to support multiple model types in one index (e.g.,
App_Models_Product_123)
Limitations
- No traditional filters: Vector search doesn't support WHERE clauses like traditional search engines. Apply filters in PHP after retrieval or use metadata filtering (which may not work reliably in all cases)
- No offset-based pagination: Vector search returns top-K results. Use cursor-based pagination or retrieve more results upfront
- Metadata filtering: Cloudflare Vectorize metadata filtering may not be reliable for all use cases. Consider filtering in your application layer
- Eventual consistency: There may be a slight delay between indexing/deletion and seeing changes in search results
Configuration Reference
// config/scout-vectorize.php return [ 'cloudflare' => [ 'account_id' => env('CLOUDFLARE_ACCOUNT_ID'), 'api_token' => env('CLOUDFLARE_API_TOKEN'), ], 'index' => env('CLOUDFLARE_VECTORIZE_INDEX', 'default'), 'embedding_model' => env('CLOUDFLARE_EMBEDDING_MODEL', '@cf/baai/bge-base-en-v1.5'), ];
Troubleshooting
Search returns no results
- Ensure your models are indexed: Run
php artisan scout:import "App\Models\Product" - Check your Vectorize index has vectors: Use the Cloudflare dashboard or API to verify
- Verify your API credentials: Double-check
CLOUDFLARE_ACCOUNT_IDandCLOUDFLARE_API_TOKENin your.env - Check model filters: The driver automatically filters by model class. Ensure you're searching the right model
Indexing is slow
- API overhead: Vector embedding generation requires API calls to Cloudflare Workers AI
- Use batch operations: Use
makeAllSearchable()for bulk indexing (more efficient than individual saves) - Enable queuing: Set
'queue' => trueinconfig/scout.phpto process indexing in the background - Rate limits: Cloudflare has rate limits on API calls. Implement throttling or use queues
Errors about dimensions
- Dimension mismatch: Ensure your Vectorize index dimensions match your embedding model
@cf/baai/bge-small-en-v1.5: 384 dimensions@cf/baai/bge-base-en-v1.5: 768 dimensions (default)@cf/baai/bge-large-en-v1.5: 1024 dimensions
- Recreate index: If you changed embedding models, you'll need to create a new index with the correct dimensions
Authentication errors
- Invalid API token: Verify your
CLOUDFLARE_API_TOKENhas Vectorize permissions - Incorrect account ID: Double-check your
CLOUDFLARE_ACCOUNT_ID - Token permissions: Ensure your API token has
Vectorizeread and write permissions
Metadata filtering not working
- Create metadata indexes: Metadata filters require indexes. Run:
npx wrangler vectorize create-metadata-index my-index --property-name=your_field --type=string
- Check field types: Ensure the metadata index type matches your data (string, number, boolean)
- Include in searchable array: The field must be in your model's
toSearchableArray()
Performance optimization
- Limit result size: Use
take()orpaginate()to limit results - Cache frequent queries: Cache search results for common queries
- Use metadata filters wisely: Filters can reduce the search space and improve performance
- Optimize text conversion: Keep
toSearchableText()concise to reduce embedding generation time
Architecture
Package Structure
src/
├── Engines/
│ └── VectorizeEngine.php # Scout engine implementation
├── VectorizeClient.php # Cloudflare API client
└── VectorizeServiceProvider.php # Service provider
tests/
├── TestCase.php # Base test case
└── VectorizeEngineTest.php # Engine tests
How Embeddings Work
This package uses Cloudflare Workers AI to generate embeddings:
- Text Preparation: Your model data is converted to text using
toSearchableText()or by flatteningtoSearchableArray() - Embedding Generation: The text is sent to Cloudflare Workers AI which returns a vector (array of floats)
- Vector Storage: The vector is stored in Vectorize along with metadata (model class, key, and searchable data)
- Semantic Search: When you search, your query is also converted to a vector and compared against stored vectors using cosine similarity
Supported Embedding Models
| Model | Dimensions | Best For |
|---|---|---|
@cf/baai/bge-small-en-v1.5 |
384 | Faster processing, lower memory |
@cf/baai/bge-base-en-v1.5 |
768 | Balanced (default) |
@cf/baai/bge-large-en-v1.5 |
1024 | Higher accuracy, slower |
Vector ID Format
Vectors are stored with IDs in the format: {ModelClass}_{ModelKey}
Example: App_Models_Product_123
This allows multiple model types to coexist in the same Vectorize index.
Testing
The package includes comprehensive tests covering all engine functionality:
# Run all tests composer test # Run with coverage vendor/bin/phpunit --coverage-html coverage # Run specific test vendor/bin/phpunit tests/VectorizeEngineTest.php
Test Coverage
The test suite includes 23+ tests covering:
- Update operations: Empty collections, valid models, custom text conversion, array values
- Delete operations: Empty collections, model deletion
- Search operations: Default limits, custom limits, filters, callbacks, pagination
- Result mapping: ID extraction, model mapping, ordering
- Flush operations: Batch deletion, different embedding models
- Index operations: Create/delete (no-op for Vectorize)
Running Tests
Tests use Orchestra Testbench to simulate a Laravel environment and Mockery to mock the VectorizeClient, ensuring tests run without making actual API calls.
# Install dependencies composer install # Run tests ./vendor/bin/phpunit # Run tests with detailed output ./vendor/bin/phpunit --testdox
Best Practices
Optimizing Search Quality
-
Use descriptive text: Include context in your searchable content
public function toSearchableText(): string { // Good: Includes context return "Product: {$this->name}. Brand: {$this->brand}. {$this->description}"; // Not ideal: Just raw values return "{$this->name} {$this->brand} {$this->description}"; }
-
Avoid overly long text: Embeddings work best with focused, relevant content
public function toSearchableArray(): array { return [ 'title' => $this->title, 'excerpt' => Str::limit($this->content, 500), // Limit long content 'category' => $this->category->name, ]; }
-
Include relevant metadata: Add fields you'll filter on
public function toSearchableArray(): array { return [ 'content' => $this->content, // Always include filterable fields 'status' => $this->status, 'created_at' => $this->created_at, 'author_id' => $this->author_id, ]; }
Performance Tips
-
Enable queueing for production: Prevent blocking requests
// config/scout.php 'queue' => env('SCOUT_QUEUE', true),
-
Use batch operations: Import in bulk rather than one-by-one
# Efficient php artisan scout:import "App\Models\Product" # Less efficient Product::all()->each->searchable();
-
Limit search results: Only fetch what you need
// Good: Limited results Product::search('laptop')->take(20)->get(); // Avoid: Fetching everything Product::search('laptop')->get();
-
Cache frequent queries: Use Laravel's cache for popular searches
$results = Cache::remember( "search:{$query}", now()->addMinutes(10), fn() => Product::search($query)->take(20)->get() );
Security Considerations
-
Sanitize user input: Always validate and sanitize search queries
$query = request()->validate(['q' => 'required|string|max:255'])['q']; $results = Product::search($query)->get();
-
Protect API credentials: Never commit API tokens to version control
# .env (not in version control) CLOUDFLARE_API_TOKEN=your_secret_token
-
Use scopes for access control: Filter by user permissions
$results = Article::search('security') ->where('visibility', 'public') ->orWhere('author_id', auth()->id()) ->get();
Comparison with Other Search Solutions
| Feature | Vectorize (this package) | Algolia | Meilisearch | Elasticsearch |
|---|---|---|---|---|
| Semantic Search | ✅ Built-in | ❌ Keyword only | ⚠️ Limited | ⚠️ Via plugins |
| Setup Complexity | ⭐⭐ Easy | ⭐ Very Easy | ⭐⭐ Easy | ⭐⭐⭐⭐ Complex |
| Cost | 💰 Cloudflare pricing | 💰💰💰 Premium | 💰 Free/Cheap | 💰💰 Moderate |
| Latency | Fast (edge network) | Very Fast | Fast | Moderate |
| Filtering | ⚠️ Basic metadata | ✅ Advanced | ✅ Good | ✅ Advanced |
| Typo Tolerance | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes |
| Relevance by Keywords | ❌ No | ✅ Excellent | ✅ Good | ✅ Excellent |
| Relevance by Meaning | ✅ Excellent | ❌ No | ⚠️ Limited | ⚠️ Via plugins |
| Infrastructure | Serverless | Managed | Self-host/Managed | Self-host/Managed |
When to Use Vectorize
Good fit:
- Semantic/conceptual search (finding by meaning, not keywords)
- Multi-language search (embeddings understand concepts across languages)
- Finding similar content or recommendations
- Applications already using Cloudflare
- Budget-conscious projects needing semantic search
Not ideal for:
- Exact keyword matching
- Complex filtering and faceting requirements
- Typo-tolerant search
- Traditional full-text search
- Applications requiring instant consistency
FAQ
Q: Can I use multiple models in the same index? A: Yes! The driver automatically namespaces vectors by model class, so multiple models can coexist in one index.
Q: How accurate is semantic search compared to keyword search? A: Semantic search excels at understanding intent and meaning, but may miss exact keyword matches. Consider your use case.
Q: Can I migrate from Algolia/Meilisearch to Vectorize? A: Yes, but be aware that Vectorize uses semantic search, which behaves differently from keyword-based search engines.
Q: What happens if I change the embedding model? A: You'll need to create a new index with the correct dimensions and re-index all your data.
Q: Is there a limit on the number of vectors? A: Check Cloudflare's Vectorize pricing and limits for your account tier.
Q: Can I use this with multilingual content? A: Yes! The BGE embedding models support multiple languages and can find semantically similar content across languages.
Contributing
Contributions are welcome! Please submit pull requests or open issues on GitHub.
Development Setup
# Clone the repository git clone https://github.com/brynj-digital/laravel-scout-vectorize.git cd laravel-scout-vectorize # Install dependencies composer install # Run tests composer test # Run code style checks composer format
License
This package is open-source software licensed under the MIT license.
Credits
- Built for use with Cloudflare Vectorize
- Integrates with Laravel Scout
Support
For issues, questions, or contributions, please visit the GitHub repository.