README

A Laravel Scout driver for Cloudflare Vectorize, enabling semantic search using vector embeddings in your Laravel applications.

Features

Semantic Search: Search by meaning, not just keywords
Native Scout Integration: Works seamlessly with Laravel Scout
Cloudflare Workers AI: Automatic embedding generation using Cloudflare's AI models
Easy Setup: Simple configuration and migration from other Scout drivers
Batch Operations: Efficient bulk indexing and deletion
Multiple Models: Support for searching across different Eloquent models

Requirements

PHP 8.1 or higher
Laravel 10.x, 11.x, or 12.x
Laravel Scout 10.x or 11.x
A Cloudflare account with Vectorize enabled
Cloudflare API token with Vectorize permissions

Installation

Install the package via Composer:

composer require brynj-digital/laravel-scout-vectorize

Publish the configuration file:

php artisan vendor:publish --tag=scout-vectorize-config

Configuration

1. Create a Vectorize Index

Use the provided artisan command to create a Vectorize index:

# Recommended: Using artisan command
php artisan vectorize:create-index my-index

# Or with custom dimensions and metric
php artisan vectorize:create-index my-index --dimensions=1024 --metric=euclidean --embedding-model=@cf/baai/bge-large-en-v1.5

Alternative: Using Wrangler CLI

npx wrangler vectorize create my-index --dimensions=768 --metric=cosine

The dimensions must match your chosen embedding model:

@cf/baai/bge-small-en-v1.5: 384 dimensions
@cf/baai/bge-base-en-v1.5: 768 dimensions (default)
@cf/baai/bge-large-en-v1.5: 1024 dimensions

2. Create Metadata Indexes

Create metadata indexes to enable efficient filtering using the artisan commands:

# Required: Create metadata index for model filtering
php artisan vectorize:create-metadata-index model string --index-name=my-index

Note: Recent versions of this package no longer require a key metadata index, as model keys are now extracted directly from the vector ID format. This provides cleaner metadata and reduced storage requirements.

Optional: Additional Metadata Indexes for `where()` Clauses

You can create additional metadata indexes for any custom fields you want to filter on using Scout's where() method:

# Example: Create index for filtering by status
php artisan vectorize:create-metadata-index status string --index-name=my-index

# Example: Create index for filtering by category_id
php artisan vectorize:create-metadata-index category_id number --index-name=my-index

# Example: Create index for boolean fields
php artisan vectorize:create-metadata-index in_stock boolean --index-name=my-index

Alternative: Using Wrangler CLI

# Required: Create metadata index for model filtering
npx wrangler vectorize create-metadata-index my-index --property-name=model --type=string

# Optional: Additional metadata indexes
npx wrangler vectorize create-metadata-index my-index --property-name=status --type=string
npx wrangler vectorize create-metadata-index my-index --property-name=category_id --type=number
npx wrangler vectorize create-metadata-index my-index --property-name=in_stock --type=boolean

Managing Metadata Indexes

Use the provided commands to manage your metadata indexes:

# List all metadata indexes for an index
php artisan vectorize:list-metadata-indexes --index-name=my-index

# Delete a metadata index
php artisan vectorize:delete-metadata-index status --index-name=my-index

To use these filters, include the fields in your model's toSearchableArray():

public function toSearchableArray(): array
{
    return [
        // These fields are used for both embeddings AND metadata
        'name' => $this->name,
        'description' => $this->description,

        // These fields are stored as metadata for filtering
        // (included in embeddings but primarily for where() clauses)
        'status' => $this->status,
        'category_id' => $this->category_id,
        'in_stock' => $this->in_stock,
    ];
}

Then use where() in your searches:

Product::search('laptop')
    ->where('status', 'active')
    ->where('in_stock', true)
    ->get();

How it works: All fields from toSearchableArray() are:

Converted to text and used to generate the embedding vector for semantic search
Stored as metadata for filtering with where() clauses

This means you can search semantically while also applying exact-match filters.

3. Create API Token

You'll need a Cloudflare API token with Vectorize permissions to allow Laravel to interact with your Vectorize index.

Create the token in Cloudflare Dashboard:

Log in to your Cloudflare Dashboard
Navigate to My Profile (click your user icon in the top right)
Select API Tokens from the left sidebar
Click Create Token
Choose Create Custom Token
Configure your token:
- Token name: Give it a descriptive name (e.g., "Laravel Scout Vectorize")
- Permissions: Add the following two permissions:
  - Account → Vectorize → Read
  - Account → Vectorize → Write
- Account Resources: Select your specific account (or "All accounts" if needed)
- TTL: Set an expiration date or leave as default
Click Continue to summary
Review the permissions and click Create Token
Important: Copy the token immediately - it will only be shown once
Store the token securely (you'll add it to your .env file in the next step)

Token Permissions Summary

Your token must have these permissions:

✅ Vectorize Read - Allows reading from your Vectorize indexes
✅ Vectorize Write - Allows creating, updating, and deleting vectors

Security Note: Avoid using tokens with broader permissions (like "Account Settings: Read" or "Workers: Edit") unless absolutely necessary.

4. Environment Variables

Add the following to your .env file:

SCOUT_DRIVER=vectorize

CLOUDFLARE_ACCOUNT_ID=your_account_id
CLOUDFLARE_API_TOKEN=your_api_token
CLOUDFLARE_VECTORIZE_INDEX=my-index
CLOUDFLARE_EMBEDDING_MODEL=@cf/baai/bge-base-en-v1.5

5. Scout Configuration

Ensure Scout is configured in config/scout.php:

'driver' => env('SCOUT_DRIVER', 'vectorize'),

Usage

Basic Model Setup

Add the Searchable trait to your model:

use Laravel\Scout\Searchable;

class Product extends Model
{
    use Searchable;

    /**
     * Get the indexable data array for the model.
     */
    public function toSearchableArray(): array
    {
        return [
            'name' => $this->name,
            'description' => $this->description,
            'brand' => $this->brand,
            'category' => $this->category,
        ];
    }
}

Custom Text Conversion (Optional)

For more control over how your model is converted to searchable text, implement a toSearchableText() method:

class Product extends Model
{
    use Searchable;

    /**
     * Convert the model to searchable text.
     * This method takes precedence over toSearchableArray().
     */
    public function toSearchableText(): string
    {
        return implode('. ', [
            $this->name,
            $this->brand,
            $this->description,
            implode(' ', $this->tags ?? []),
        ]);
    }

    public function toSearchableArray(): array
    {
        return [
            'name' => $this->name,
            'description' => $this->description,
        ];
    }
}

Searching

// Simple search
$products = Product::search('wireless headphones')->get();

// Limit results
$products = Product::search('laptop')->take(20)->get();

// Paginate results
$products = Product::search('smartphone')->paginate(15);

// Get raw search results with scores
$results = Product::search('tablet')->raw();

Indexing

// Index a single model
$product = Product::find(1);
$product->searchable();

// Index all models
Product::makeAllSearchable();

// Using artisan command
php artisan scout:import "App\Models\Product"

Removing from Index

// Remove a single model
$product->unsearchable();

// Remove all models of a type
Product::removeAllFromSearch();

// Using artisan command
php artisan scout:flush "App\Models\Product"

Model Observers

Scout automatically syncs your models when you create, update, or delete them:

// Automatically indexed
$product = Product::create([
    'name' => 'Wireless Headphones',
    'description' => 'High-quality Bluetooth headphones',
]);

// Automatically re-indexed
$product->update(['name' => 'Premium Wireless Headphones']);

// Automatically removed from index
$product->delete();

Practical Examples

E-commerce Product Search

use Laravel\Scout\Searchable;

class Product extends Model
{
    use Searchable;

    public function toSearchableArray(): array
    {
        return [
            'name' => $this->name,
            'brand' => $this->brand,
            'description' => $this->description,
            'category' => $this->category->name,
            'features' => implode(', ', $this->features ?? []),
            // Metadata for filtering
            'status' => $this->status,
            'price' => $this->price,
            'in_stock' => $this->in_stock,
        ];
    }
}

// Search with semantic understanding
$results = Product::search('laptop for programming and gaming')
    ->where('in_stock', true)
    ->where('status', 'published')
    ->take(20)
    ->get();

Blog Article Search

class Article extends Model
{
    use Searchable;

    public function toSearchableArray(): array
    {
        return [
            'title' => $this->title,
            'excerpt' => $this->excerpt,
            'content' => strip_tags($this->content),
            'author' => $this->author->name,
            'tags' => $this->tags->pluck('name')->join(', '),
            // Metadata
            'category_id' => $this->category_id,
            'published_at' => $this->published_at,
            'status' => $this->status,
        ];
    }

    public function toSearchableText(): string
    {
        // Custom text format for better embeddings
        return sprintf(
            '%s. %s. Written by %s. Tags: %s',
            $this->title,
            $this->excerpt,
            $this->author->name,
            $this->tags->pluck('name')->join(', ')
        );
    }
}

// Find related articles
$related = Article::search('introduction to machine learning')
    ->where('status', 'published')
    ->where('category_id', $article->category_id)
    ->take(5)
    ->get();

Documentation Search

class Documentation extends Model
{
    use Searchable;

    public function toSearchableArray(): array
    {
        return [
            'title' => $this->title,
            'content' => strip_tags($this->content),
            'section' => $this->section,
            'version' => $this->version,
        ];
    }
}

// Semantic search in docs
$docs = Documentation::search('how to handle file uploads')
    ->where('version', config('app.docs_version'))
    ->get();

Customer Support Ticket Search

class SupportTicket extends Model
{
    use Searchable;

    public function toSearchableArray(): array
    {
        return [
            'subject' => $this->subject,
            'description' => $this->description,
            'customer_name' => $this->customer->name,
            'category' => $this->category,
            // Metadata
            'status' => $this->status,
            'priority' => $this->priority,
        ];
    }
}

// Find similar support tickets
$similar = SupportTicket::search($newTicket->description)
    ->where('status', 'resolved')
    ->take(10)
    ->get();

Advanced Usage

Custom Search Callbacks

For advanced search requirements, use a callback:

$results = Product::search('laptop', function ($client, $query, $options) {
    // $client is the VectorizeClient instance
    return $client->search($query, 50, [
        'model' => Product::class,
        'in_stock' => true,
    ]);
})->get();

Using Where Clauses for Filtering

You can combine semantic search with metadata filtering:

// Search with filters
$products = Product::search('gaming laptop')
    ->where('status', 'published')
    ->where('price', '< 2000')
    ->get();

// Multiple filters
$articles = Article::search('machine learning')
    ->where('category', 'technology')
    ->where('published_at', '>', now()->subDays(30))
    ->get();

Note: Filters are applied to metadata stored in Vectorize. Make sure the fields you filter on are:

Included in your model's toSearchableArray()
Have corresponding metadata indexes created in Vectorize (see Configuration section)

Querying the Client Directly

use ScoutVectorize\VectorizeClient;

$client = app(VectorizeClient::class);

// Get index information
$info = $client->getIndexInfo();

// Manual search with filters
$results = $client->search(
    query: 'wireless headphones',
    topK: 10,
    filter: ['status' => 'active']
);

// Generate embedding for text
$embedding = $client->generateEmbedding('sample text');

// Batch upsert documents
$client->batchUpsert([
    [
        'id' => 'doc_1',
        'text' => 'Document content',
        'metadata' => ['category' => 'tech'],
    ],
    // ... more documents
]);

// Delete vectors by IDs
$client->deleteVectors(['doc_1', 'doc_2']);

Queueing Scout Operations

For better performance in production, queue your Scout operations:

// In config/scout.php
'queue' => true,

// Specify queue connection and queue name
'queue' => [
    'connection' => env('SCOUT_QUEUE_CONNECTION', 'redis'),
    'queue' => env('SCOUT_QUEUE_NAME', 'default'),
],

This will queue all indexing operations, preventing API rate limits and improving response times.

Available Commands

This package provides custom commands for managing Vectorize indexes and metadata indexes, plus the standard Laravel Scout commands:

Vectorize Index Management

# Create a new Vectorize index
php artisan vectorize:create-index

# Create index with custom dimensions and metric
php artisan vectorize:create-index my-index --dimensions=1024 --metric=euclidean --embedding-model=@cf/baai/bge-large-en-v1.5

# Drop (delete) a Vectorize index
php artisan vectorize:drop-index my-index

# Force drop without confirmation (use with caution)
php artisan vectorize:drop-index my-index --force

Options for vectorize:create-index:

name (optional): Index name (uses config value if not provided)
--dimensions: Vector dimensions (default: 768)
--metric: Distance metric - cosine, euclidean, or dotproduct (default: cosine)
--embedding-model: Cloudflare embedding model (default: @cf/baai/bge-base-en-v1.5)

Options for vectorize:drop-index:

name (optional): Index name (uses config value if not provided)
--force: Skip confirmation prompts

Metadata Index Management

# Create a metadata index for filtering
php artisan vectorize:create-metadata-index property-name type --index-name=my-index

# List all metadata indexes
php artisan vectorize:list-metadata-indexes --index-name=my-index

# Delete a metadata index
php artisan vectorize:delete-metadata-index property-name --index-name=my-index

# Force delete without confirmation
php artisan vectorize:delete-metadata-index property-name --index-name=my-index --force

Arguments for vectorize:create-metadata-index:

property-name: The metadata property to index
type: Property type (string, number, boolean)

Arguments for vectorize:delete-metadata-index:

property-name: The metadata property to delete

Options for metadata index commands:

--index-name: Vectorize index name (uses config value if not provided)
--force: Skip confirmation prompts (delete command only)

Standard Scout Commands

# Import all records of a model
php artisan scout:import "App\Models\Product"

# Flush all vectors for a specific model
php artisan scout:flush "App\Models\Product"

How It Works

Indexing: When a model is indexed, the driver:
- Calls toSearchableText() or flattens toSearchableArray() to text
- Generates an embedding using Cloudflare Workers AI
- Stores the vector in Cloudflare Vectorize with metadata
Searching: When you search:
- Your query text is converted to an embedding
- Vectorize finds the most similar vectors
- Results are mapped back to your Eloquent models
- Models are fetched from your database and returned
Vector IDs: The driver prefixes vector IDs with the model class name to support multiple model types in one index (e.g., App_Models_Product_123)

Limitations

No traditional filters: Vector search doesn't support WHERE clauses like traditional search engines. Apply filters in PHP after retrieval or use metadata filtering (which may not work reliably in all cases)
No offset-based pagination: Vector search returns top-K results. Use cursor-based pagination or retrieve more results upfront
Metadata filtering: Cloudflare Vectorize metadata filtering may not be reliable for all use cases. Consider filtering in your application layer
Eventual consistency: There may be a slight delay between indexing/deletion and seeing changes in search results

Configuration Reference

// config/scout-vectorize.php

return [
    'cloudflare' => [
        'account_id' => env('CLOUDFLARE_ACCOUNT_ID'),
        'api_token' => env('CLOUDFLARE_API_TOKEN'),
    ],

    'index' => env('CLOUDFLARE_VECTORIZE_INDEX', 'default'),

    'embedding_model' => env('CLOUDFLARE_EMBEDDING_MODEL', '@cf/baai/bge-base-en-v1.5'),
];

Troubleshooting

Search returns no results

Ensure your models are indexed: Run php artisan scout:import "App\Models\Product"
Check your Vectorize index has vectors: Use the Cloudflare dashboard or API to verify
Verify your API credentials: Double-check CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN in your .env
Check model filters: The driver automatically filters by model class. Ensure you're searching the right model

Indexing is slow

API overhead: Vector embedding generation requires API calls to Cloudflare Workers AI
Use batch operations: Use makeAllSearchable() for bulk indexing (more efficient than individual saves)
Enable queuing: Set 'queue' => true in config/scout.php to process indexing in the background
Rate limits: Cloudflare has rate limits on API calls. Implement throttling or use queues

Errors about dimensions

Dimension mismatch: Ensure your Vectorize index dimensions match your embedding model
- @cf/baai/bge-small-en-v1.5: 384 dimensions
- @cf/baai/bge-base-en-v1.5: 768 dimensions (default)
- @cf/baai/bge-large-en-v1.5: 1024 dimensions
Recreate index: If you changed embedding models, you'll need to create a new index with the correct dimensions

Authentication errors

Invalid API token: Verify your CLOUDFLARE_API_TOKEN has Vectorize permissions
Incorrect account ID: Double-check your CLOUDFLARE_ACCOUNT_ID
Token permissions: Ensure your API token has Vectorize read and write permissions

Metadata filtering not working

Create metadata indexes: Metadata filters require indexes. Run:

npx wrangler vectorize create-metadata-index my-index --property-name=your_field --type=string

Check field types: Ensure the metadata index type matches your data (string, number, boolean)
Include in searchable array: The field must be in your model's toSearchableArray()

Performance optimization

Limit result size: Use take() or paginate() to limit results
Cache frequent queries: Cache search results for common queries
Use metadata filters wisely: Filters can reduce the search space and improve performance
Optimize text conversion: Keep toSearchableText() concise to reduce embedding generation time

Architecture

Package Structure

src/
├── Engines/
│   └── VectorizeEngine.php    # Scout engine implementation
├── VectorizeClient.php         # Cloudflare API client
└── VectorizeServiceProvider.php # Service provider

tests/
├── TestCase.php                # Base test case
└── VectorizeEngineTest.php     # Engine tests

How Embeddings Work

This package uses Cloudflare Workers AI to generate embeddings:

Text Preparation: Your model data is converted to text using toSearchableText() or by flattening toSearchableArray()
Embedding Generation: The text is sent to Cloudflare Workers AI which returns a vector (array of floats)
Vector Storage: The vector is stored in Vectorize along with metadata (model class and searchable data)
Semantic Search: When you search, your query is also converted to a vector and compared against stored vectors using cosine similarity

Supported Embedding Models

Model	Dimensions	Best For
`@cf/baai/bge-small-en-v1.5`	384	Faster processing, lower memory
`@cf/baai/bge-base-en-v1.5`	768	Balanced (default)
`@cf/baai/bge-large-en-v1.5`	1024	Higher accuracy, slower

Vector ID Format

Vectors are stored with IDs in the format: {ModelClass}_{ModelKey}

Example: App_Models_Product_123

This allows multiple model types to coexist in the same Vectorize index.

Testing

The package includes comprehensive tests covering all engine functionality:

# Run all tests
composer test

# Run with coverage
vendor/bin/phpunit --coverage-html coverage

# Run specific test
vendor/bin/phpunit tests/VectorizeEngineTest.php

Test Coverage

The test suite includes 23+ tests covering:

Update operations: Empty collections, valid models, custom text conversion, array values
Delete operations: Empty collections, model deletion
Search operations: Default limits, custom limits, filters, callbacks, pagination
Result mapping: ID extraction, model mapping, ordering
Flush operations: Batch deletion, different embedding models
Index operations: Create/delete (no-op for Vectorize)

Running Tests

Tests use Orchestra Testbench to simulate a Laravel environment and Mockery to mock the VectorizeClient, ensuring tests run without making actual API calls.

# Install dependencies
composer install

# Run tests
./vendor/bin/phpunit

# Run tests with detailed output
./vendor/bin/phpunit --testdox

Best Practices

Optimizing Search Quality

Use descriptive text: Include context in your searchable content

public function toSearchableText(): string
{
    // Good: Includes context
    return "Product: {$this->name}. Brand: {$this->brand}. {$this->description}";

    // Not ideal: Just raw values
    return "{$this->name} {$this->brand} {$this->description}";
}

Avoid overly long text: Embeddings work best with focused, relevant content

public function toSearchableArray(): array
{
    return [
        'title' => $this->title,
        'excerpt' => Str::limit($this->content, 500), // Limit long content
        'category' => $this->category->name,
    ];
}

Include relevant metadata: Add fields you'll filter on

public function toSearchableArray(): array
{
    return [
        'content' => $this->content,
        // Always include filterable fields
        'status' => $this->status,
        'created_at' => $this->created_at,
        'author_id' => $this->author_id,
    ];
}

Performance Tips

Enable queueing for production: Prevent blocking requests

// config/scout.php
'queue' => env('SCOUT_QUEUE', true),

Use batch operations: Import in bulk rather than one-by-one

# Efficient
php artisan scout:import "App\Models\Product"

# Less efficient
Product::all()->each->searchable();

Limit search results: Only fetch what you need

// Good: Limited results
Product::search('laptop')->take(20)->get();

// Avoid: Fetching everything
Product::search('laptop')->get();

Cache frequent queries: Use Laravel's cache for popular searches

$results = Cache::remember(
    "search:{$query}",
    now()->addMinutes(10),
    fn() => Product::search($query)->take(20)->get()
);

Security Considerations

Sanitize user input: Always validate and sanitize search queries

$query = request()->validate(['q' => 'required|string|max:255'])['q'];
$results = Product::search($query)->get();

Protect API credentials: Never commit API tokens to version control

# .env (not in version control)
CLOUDFLARE_API_TOKEN=your_secret_token

Use scopes for access control: Filter by user permissions

$results = Article::search('security')
    ->where('visibility', 'public')
    ->orWhere('author_id', auth()->id())
    ->get();

Comparison with Other Search Solutions

Feature	Vectorize (this package)	Algolia	Meilisearch	Elasticsearch
Semantic Search	✅ Built-in	❌ Keyword only	⚠️ Limited	⚠️ Via plugins
Setup Complexity	⭐⭐ Easy	⭐ Very Easy	⭐⭐ Easy	⭐⭐⭐⭐ Complex
Cost	💰 Cloudflare pricing	💰💰💰 Premium	💰 Free/Cheap	💰💰 Moderate
Latency	Fast (edge network)	Very Fast	Fast	Moderate
Filtering	⚠️ Basic metadata	✅ Advanced	✅ Good	✅ Advanced
Typo Tolerance	❌ No	✅ Yes	✅ Yes	✅ Yes
Relevance by Keywords	❌ No	✅ Excellent	✅ Good	✅ Excellent
Relevance by Meaning	✅ Excellent	❌ No	⚠️ Limited	⚠️ Via plugins
Infrastructure	Serverless	Managed	Self-host/Managed	Self-host/Managed

When to Use Vectorize

Good fit:

Semantic/conceptual search (finding by meaning, not keywords)
Multi-language search (embeddings understand concepts across languages)
Finding similar content or recommendations
Applications already using Cloudflare
Budget-conscious projects needing semantic search

Not ideal for:

Exact keyword matching
Complex filtering and faceting requirements
Typo-tolerant search
Traditional full-text search
Applications requiring instant consistency

FAQ

Q: Can I use multiple models in the same index? A: Yes! The driver automatically namespaces vectors by model class, so multiple models can coexist in one index.

Q: How accurate is semantic search compared to keyword search? A: Semantic search excels at understanding intent and meaning, but may miss exact keyword matches. Consider your use case.

Q: Can I migrate from Algolia/Meilisearch to Vectorize? A: Yes, but be aware that Vectorize uses semantic search, which behaves differently from keyword-based search engines.

Q: What happens if I change the embedding model? A: You'll need to create a new index with the correct dimensions and re-index all your data.

Q: Is there a limit on the number of vectors? A: Check Cloudflare's Vectorize pricing and limits for your account tier.

Q: Can I use this with multilingual content? A: Yes! The BGE embedding models support multiple languages and can find semantically similar content across languages.

Contributing

Contributions are welcome! Please submit pull requests or open issues on GitHub.

Development Setup

# Clone the repository
git clone https://github.com/brynj-digital/laravel-scout-vectorize.git
cd laravel-scout-vectorize

# Install dependencies
composer install

# Run tests
composer test

# Run code style checks
composer format

License

This package is open-source software licensed under the MIT license.

Credits

Built for use with Cloudflare Vectorize
Integrates with Laravel Scout

Support

For issues, questions, or contributions, please visit the GitHub repository.

brynj-digital / laravel-scout-vectorize

Maintainers

Details