crell/serde

A general purpose serialization and deserialization library

Fund package maintenance!
Crell

Installs: 55 848

Dependents: 6

Suggesters: 1

Security: 0

Stars: 295

Watchers: 6

Forks: 14

Open Issues: 12

1.3.1 2024-09-25 13:44 UTC

README

Latest Version on Packagist Software License Total Downloads

Serde (pronounced "seer-dee") is a fast, flexible, powerful, and easy to use serialization and deserialization library for PHP that supports a number of standard formats. It draws inspiration from both Rust's Serde crate and Symfony Serializer, although it is not directly based on either.

At this time, Serde supports serializing PHP objects to and from PHP arrays, JSON, YAML, TOML, and CSV files. It also supports serializing to JSON or CSV via a stream. Further support is planned, but by design can also be extended by anyone.

Install

Via Composer

$ composer require crell/serde

Usage

Serde is designed to be both quick to start using and robust in more advanced cases. In its most basic form, you can do the following:

use Crell\Serde\SerdeCommon;

$serde = new SerdeCommon();

$object = new SomeClass();
// Populate $object somehow;

$jsonString = $serde->serialize($object, format: 'json');

$deserializedObject = $serde->deserialize($jsonString, from: 'json', to: SomeClass::class);

(The named arguments are optional, but recommended.)

Serde is highly configurable, but common cases are supported by just using the SerdeCommon class as provided. For most basic cases, that is all you need.

Key features

Supported formats

Serde can serialize to:

  • PHP arrays (array)
  • JSON (json)
  • Streaming JSON (json-stream)
  • YAML (yaml)
  • TOML (toml)
  • CSV (csv)
  • Streaming CSV (csv-stream)

Serde can deserialize from:

  • PHP arrays (array)
  • JSON (json)
  • YAML (yaml)
  • TOML (toml)
  • CSV (csv)

YAML support requires the Symfony/Yaml library.

TOML support requires the Vanodevium/Toml library.

XML support is in progress.

Robust object support

Serde automatically supports nested objects in properties of other objects, which will be handled recursively as long as there are no circular references.

Serde handles public, private, protected, and readonly properties, both reading and writing, with optional default values.

If you try to serialize or deserialize an object that implements PHP's __serialize() or __unserialize() hooks, those will be respected. (If you want to read/write from PHP's internal serialization format, just call serialize()/unserialize() directly.)

Serde also supports post-load callbacks that allow you to re-initialize derived information if necessary without storing it in the serialized format.

PHP objects can be mutated to and from a serialized format. Nested objects can be flattened or collected, classes with common interfaces can be mapped to the appropriate object, and array values can be imploded into a string for serialization and exploded back into an array when reading.

Configuration

Serde's behavior is driven almost entirely through attributes. Any class may be serialized from or deserialized to as-is with no additional configuration, but there is a great deal of configuration that may be opted-in to.

Attribute handling is provided by Crell/AttributeUtils. It is worth looking into as well.

The main attribute is the Crell\Serde\Attributes\Field attribute, which may be placed on any object property. (Static properties are ignored.) All of its arguments are optional, as is the Field itself. (That is, adding #[Field] with no arguments is the same as not specifying it at all.) The meaning of the available arguments is listed below.

Although not required, it is strongly recommended that you always use named arguments with attributes. The precise order of arguments is not guaranteed.

In the examples below, the Field is generally referenced directly. However, you may also import the namespace and then use namespaced versions of the attributes, like so:

use Crell\Serde\Attributes as Serde;

#[Serde\ClassSettings(includeFieldsByDefault: false)]
class Person
{
    #[Serde\Field(serializedName: 'callme')]
    protected string $name = 'Larry';
}

Which you do is mostly a matter of preference, although if you are mixing Serde attributes with attributes from other libraries then the namespaced approach is advisable.

There is also a ClassSettings attribute that may be placed on classes to be serialized. At this time it has four arguments:

  • includeFieldsByDefault, which defaults to true. If set to false, a property with no #[Field] attribute will be ignored. It is equivalent to setting exclude: true on all properties implicitly.
  • requireValues, which defaults to false. If set to true, then when deserializing any field that is not provided in the incoming data will result in an exception. This may also be turned on or off on a per-field level. (See requireValue below.) The class-level setting applies to any field that does not specify its behavior.
  • renameWith. If set, the specified renaming strategy will be used for all properties of the class, unless a property specifies its own. (See renameWith below.) The class-level setting applies to any field that does not specify its behavior.
  • omitNullFields, which defaults to false. If set to true, any property on the class that is null will be omitted when serializing. It has on effect on deserialization. This may also be turned on or off on a per-field level. (See omitIfNull below.)
  • scopes, which sets the scope of a given class definition attribute. See the section on Scopes below.

exclude (bool, default false)

If set to true, Serde will ignore the property entirely on both serializing and deserializing.

serializedName (string, default null)

If provided, this string will be used as the name of a property when serialized out to a format and when reading it back in. for example:

use Crell\Serde\Attributes\Field;

class Person
{
    #[Field(serializedName: 'callme')]
    protected string $name = 'Larry';
}

Round trips to/from:

{
    "callme": "Larry"
}

renameWith (RenamingStrategy, default null)

The renameWith key specifies a way to mangle the name of the property to produce a serializedName. The most common examples here would be case folding, say if serializing to a format that uses a different convention than PHP does.

The value of renameWith can be any object that implements the RenamingStrategy interface. The most common versions are already provided via the Cases enum and Prefix class, but you are free to provide your own.

The Cases enum implements RenamingStrategy and provides a series of instances (cases) for common renaming. For example:

use Crell\Serde\Attributes\Field;
use Crell\Serde\Renaming\Cases;

class Person
{
    #[Field(renameWith: Cases::snake_case)]
    public string $firstName = 'Larry';

    #[Field(renameWith: Cases::CamelCase)]
    public string $lastName = 'Garfield';
}

Serializes to/from:

{
    "first_name": "Larry",
    "LastName": "Garfield"
}

Available cases are:

  • Cases::UPPERCASE
  • Cases::lowercase
  • Cases::snake_case
  • Cases::kebab_case (renders with dashes, not underscores)
  • Cases::CamelCase
  • Cases::lowerCamelCase

The Prefix class attaches a prefix to values when serialized, but otherwise leaves the property name intact.

use Crell\Serde\Attributes\Field;
use Crell\Serde\Renaming\Prefix;

class MailConfig
{
    #[Field(renameWith: new Prefix('mail_')]
    protected string $host = 'smtp.example.com';

    #[Field(renameWith: new Prefix('mail_')]
    protected int $port = 25;

    #[Field(renameWith: new Prefix('mail_')]
    protected string $user = 'me';

    #[Field(renameWith: new Prefix('mail_')]
    protected string $password = 'sssh';
}

Serializes to/from:

{
    "mail_host": "smtp.example.com",
    "mail_port": 25,
    "mail_user": "me",
    "mail_password": "sssh"
}

If both serializedName and renameWith are specified, serializedName will be used and renameWith ignored.

alias (array, default [])

When deserializing (only), if the expected serialized name is not found in the incoming data, these additional property names will be examined to see if the value can be found. If so, the value will be read from that key in the incoming data. If not, it will behave the same as if the value was simply not found in the first place.

use Crell\Serde\Attributes\Field;

class Person
{
    #[Field(alias: ['layout', 'design'])]
    protected string $format = '';
}

All three of the following JSON strings would be read into an identical object:

{
    "format": "3-column-layout"
}
{
    "layout": "3-column-layout"
}
{
    "design": "3-column-layout"
}

This is mainly useful when an API key has changed, and legacy incoming data may still have an old key name.

omitIfNull (bool, default false)

This key only applies on serialization. If set to true, and the value of this property is null when an object is serialized, it will be omitted from the output entirely. If false, a null will be written to the output, however that looks for the particular format.

useDefault (bool, default true)

This key only applies on deserialization. If a property of a class is not found in the incoming data, and this property is true, then a default value will be assigned instead. If false, the value will be skipped entirely. Whether the deserialized object is now in an invalid state depends on the object.

The default value to use is derived from a number of different locations. The priority order of defaults is:

  1. The value provided by the default argument to the Field attribute.
  2. The default value provided by the code, as reported by Reflection.
  3. The default value of an identically named constructor argument, if any.

So for example, the following class:

use Crell\Serde\Attributes\Field;

class Person
{
    #[Field(default: 'Hidden')]
    public string $location;

    #[Field(useDefault: false)]
    public int $age;

    public function __construct(
        public string $name = 'Anonymous',
    ) {}
}

if deserialized from an empty source (such as {} in JSON), will result in an object with location set to Hidden, name set to Anonymous, and age still uninitialized.

default (mixed, default null)

This key only applies on deserialization. If specified, then if a value is missing in the incoming data being deserialized this value will be used instead, regardless of what the default in the source code itself is.

strict (bool, default true)

This key only applies on deserialization. If set to true, a type mismatch in the incoming data will be rejected and an exception thrown. If false, a deformatter will attempt to cast an incoming value according to PHP's normal casting rules. That means, for example, "1" is a valid value for an integer property if strict is false, but will throw an exception if set to true.

For sequence fields, strict set to true will reject a non-sequence value. (It must pass an array_is_list() check.) If strict is false, any array-ish value will be accepted but passed through array_values() to discard any keys and reindex it.

Additionally, in non-strict mode, numeric strings in the incoming array will be cast to ints or floats as appropriate in both sequence fields and dictionary fields. In strict mode, numeric strings will still be rejected.

The exact handling of this setting may vary slightly depending on the incoming format, as some formats handle their own types differently. (For instance, everything is a string in XML.)

requireValue (bool, default false)

This key only applies on deserialization. If set to true, if the incoming data does not include a value for this field and there is no default specified, a MissingRequiredValueWhenDeserializing exception will be thrown. If not set, and there is no default value, then the property will be left uninitialized.

If a field has a default value, then the default value will always be used for missing data and this setting has no effect.

flatten (bool, default false)

The flatten keyword can only be applied on an array or object property. A property that is "flattened" will have all of its properties injected into the parent directly on serialization, and will have values from the parent "collected" into it on deserialization.

Multiple objects and arrays may be flattened (serialized), but on deserialization only the lexically last array property marked flatten will collect remaining keys. Any number of objects may "collect" their properties, however.

As an example, consider pagination. It may be very helpful to represent pagination information in PHP as an object property of a result set, but in the serialized JSON or XML you may want the extra object removed.

Given this set of classes:

use Crell\Serde\Attributes as Serde;

class Results
{
    public function __construct(
        #[Serde\Field(flatten: true)]
        public Pagination $pagination,
        #[Serde\SequenceField(arrayType: Product::class)]
        public array $products,
    ) {}
}

class Pagination
{
    public function __construct(
        public int $total,
        public int $offset,
        public int $limit,
    ) {}
}

class Product
{
    public function __construct(
        public string $name,
        public float $price,
    ) {}
}

When serialized, the $pagination object will get "flattened," meaning its three properties will be included directly in the properties of Results. Therefore, a JSON-serialized copy of this object may look like:

{
    "total": 100,
    "offset": 20,
    "limit": 10,
    "products": [
        {
            "name": "Widget",
            "price": 9.99
        },
        {
            "name": "Gadget",
            "price": 4.99
        }
    ]
}

The extra "layer" of the Pagination object has been removed. When deserializing, those extra properties will be "collected" back into a Pagination object.

Now consider this more complex example:

use Crell\Serde\Attributes as Serde;

class DetailedResults
{
    public function __construct(
        #[Serde\Field(flatten: true)]
        public NestedPagination $pagination,
        #[Serde\Field(flatten: true)]
        public ProductType $type,
        #[Serde\SequenceField(arrayType: Product::class)]
        public array $products,
        #[Serde\Field(flatten: true)]
        public array $other = [],
    ) {}
}

class NestedPagination
{
    public function __construct(
        public int $total,
        public int $limit,
        #[Serde\Field(flatten: true)]
        public PaginationState $state,
    ) {}
}

class PaginationState
{
    public function __construct(
        public int $offset,
    ) {
    }
}

class ProductType
{
    public function __construct(
        public string $name = '',
        public string $category = '',
    ) {}
}

In this example, both NestedPagination and PaginationState will be flattened when serializing. NestedPagination itself also has a field that should be flattened. Both will flatten and collect cleanly, as long as none of them share a property name.

Additionally, there is an extra array property, $other. $other may contain whatever associative array is desired, and its values will also get flattened into the output.

When collecting, only the lexically last flattened array will get any data, and will get all properties not already accounted for by some other property. For example, an instance of DetailedResults may serialize to JSON as:

{
    "total": 100,
    "offset": 20,
    "limit": 10,
    "products": [
        {
            "name": "Widget",
            "price": 9.99
        },
        {
            "name": "Gadget",
            "price": 4.99
        }
    ],
    "foo": "beep",
    "bar": "boop"
}

In this case, the $other property has two keys, foo and bar, with values beep and boop, respectively. The same JSON will deserialize back to the same object as before.

Value objects

Flattening can also be used in conjunction with renaming to silently translate value objects. Consider:

class Person
{
    public function __construct(
        public string $name,
        #[Field(flatten: true)]
        public Age $age,
        #[Field(flatten: true)]
        public Email $email,
    ) {}
}

readonly class Email
{
    public function __construct(
        #[Field(serializedName: 'email')] public string $value,
    ) {}
}

readonly class Age
{
    public function __construct(
        #[Field(serializedName: 'age')] public int $value
    ) {
        $this->validate();
    }

    #[PostLoad]
    private function validate(): void
    {
        if ($this->value < 0) {
            throw new \InvalidArgumentException('Age cannot be negative.');
        }
    }
}

In this example, Email and Age are value objects, in the latter case with extra validation. However, both are marked flatten: true, so their properties will be moved up a level to Person when serializing. However, they both use the same property name, so both have a custom serialization name specified. The above object will serialize to (and deserialize from) something like this:

{
    "name": "Larry",
    "age": 21,
    "email": "me@example.com"
}

Note that because deserialization bypasses the constructor, the extra validation in Age must be placed in a separate method that is called from the constructor and flagged to run automatically after deserialization.

It is also possible to specify a prefix for a flattened value, which will also be applied recursively. For example, assuming the same Age class above:

readonly class JobDescription
{
    public function __construct(
        #[Field(flatten: true, flattenPrefix: 'min_')]
        public Age $minAge,
        #[Field(flatten: true, flattenPrefix: 'max_')]
        public Age $maxAge,
    ) {}
}

class JobEntry
{
    public function __construct(
        #[Field(flatten: true, flattenPrefix: 'desc_')]
        public JobDescription $description,
    ) {}
}

In this case, serializing JobEntry will first flatten the $description property, with desc_ as a prefix. Then, JobDescription will flatten both of its age fields, giving each a separate prefix. That will result in a serialized output something like this:

{
    "desc_min_age": 18,
    "desc_max_age": 65,
}

And it will deserialize back to the same original 3-layer-object structure.

flattenPrefix (string, default '')

When an object or array property is flattened, by default its properties will be flattened using their existing name (or serializedName, if specified). That may cause issues if the same class is included in a parent class twice, or if there is some other name collission. Instead, flattened fields may be given a flattenPrefix value. That string will be prepended to the name of the property when serializing.

If set on a non-flattened field, this value is meaningless and has no effect.

Sequences and Dictionaries

In most languages, and many serialization formats, there is a difference between a sequential list of values (called variously an array, sequence, or list) and a map of arbitrary size of arbitrary values to other arbitrary values (called a dictionary or map). PHP does not make a distinction, and shoves both data types into a single associative array variable type.

Sometimes that works out, but other times the distinction between the two greatly matters. To support those cases, Serde allows you to flag an array property as either a #[SequenceField] or #[DictionaryField] (and it is recommended that you always do so). Doing so ensures that the correct serialization pathway is used for the property, and also opens up a number of additional features.

arrayType

On both a #[SequenceField] and #[DictionaryField], the arrayType argument lets you specify the type that all values in that structure are. For example, a sequence of integers can easily be serialized to and deserialized from most formats without any additional help. However, an ordered list of Product objects could be serialized, but there's no way to tell then how to deserialize that data back to Product objects rather than just a nested associative array (which would also be legal). The arrayType argument solves that issue.

If arrayType is specified, then all values of that array are assumed to be of that type. It may either be a class-string to specify all values are a class, or a value of the ValueType enum to indicate one of the four supported scalars.

On deserialization, then, Serde will either validate that all incoming values are of the right scalar type, or look for nested object-like structures (depending on the specific format), and convert those into the specified object type.

For example:

use Crell\Serde\Attributes\SequenceField;

class Order
{
    public string $orderId;

    public int $userId;

    #[SequenceField(arrayType: Product::class)]
    public array $products;
}

In this case, the attribute tells Serde that $products is an indexed, sequential list of Product objects. When serializing, that may be represented as an array of dictionaries (in JSON or YAML) or perhaps with some additional metadata in other formats.

When deserializing, the otherwise object-ignorant data will be upcast back to Product objects.

arrayType works the exact same way on a DictionaryField.

keyType

On DictionaryField only, it's possible to restrict the array to only allowing integer or string keys. It has two legal values, KeyType::Int and KeyType::String (an enum). If set to KeyType::Int, then deserialization will reject any arrays that have string keys, but will accept numeric strings. If set to KeyType::String, then deserialization will reject any arrays that have integer keys, including numeric strings.

(PHP auto-casts integer string array keys to actual integers, so there is no way to allow them in string-based dictionaries.)

If no value is set, then either key type will be accepted.

implodeOn

The implodeOn argument to SequenceField, if present, indicates that the value should be joined into a string serialization, using the provided value as glue. For example:

use Crell\Serde\Attributes\SequenceField;

class Order
{
    #[SequenceField(implodeOn: ',')]
    protected array $productIds = [5, 6, 7];
}

Will serialize in JSON to:

{
    "productIds": "5,6,7"
}

On deserialization, that string will get automatically get exploded back into an array when placed into the object.

By default, on deserialization the individual values will be trim()ed to remove excess whitespace. That can be disabled by setting the trim attribute argument to false.

joinOn

DictionaryFields also support imploding/exploding on serialization, but require two keys. implodeOn specifies the string to use between distinct values. joinOn specifies the string to use between the key and value.

For example:

use Crell\Serde\Attributes\DictionaryField;

class Settings
{
    #[DictionaryField(implodeOn: ',', joinOn: '=')]
    protected array $dimensions = [
        'height' => 40,
        'width' => 20,
    ];
}

Will serialize/deserialize to this JSON:

{
    "dimensions": "height=40,width=20"
}

As with SequenceField, values will automatically be trim()ed unless trim: false is specified in the attribute's argument list.

Date and Time fields

DateTime and DateTimeImmutable fields can also be serialized, and you can control how they are serialized using the DateField or the UnixTimeField attribute. DateField has two arguments, which may be used individually or together. Specifying neither is the same as not specifying the DateField attribute at all.

use Crell\Serde\Attributes\DateField;

class Settings
{
    #[DateField(format: 'Y-m-d')]
    protected DateTimeImmutable $date = new DateTimeImmutable('4 July 2022-07-04 14:22);
}

Will serialize to this JSON:

{
    "date": "2022-07-04"
}

timezone

The timezone argument may be any timezone string legal in PHP, such as America/Chicago or UTC. If specified, the value will be cast to this timezone first before it is serialized. If not specified, the value will be left in whatever timezone it is in before being serialized. Whether that makes a difference to the output depends on the format.

On deserializing, the timezone has no effect. If the incoming value has a timezone specified, the resulting DateTime[Immutable] object will use that timezone. If not, the system default timezone will be used.

format

This argument lets you specify the format that will be used when serializing. It may be any string accepted by PHP's date_format syntax, including one of the various constants defined on DateTimeInterface. If not specified, the default format is RFC3339_EXTENDED, or Y-m-d\TH:i:s.vP. While not the most human-friendly, it is the default format used by Javascript/JSON so makes for reasonable compatibility.

On deserializing, the format has no effect. Serde will pass the string value to a DateTime or DateTimeImmutable constructor, so any format recognized by PHP will be parsed according to PHP's standard date-parsing rules.

Unix Time

In cases where you need to serialize the date to/from Unix Time, you can use UnixTimeField, which supports a resolution parameter that can handle up to microsecond resolution:

use Crell\Serde\Attributes\UnixTimeField;
use Crell\Serde\Attributes\Enums\UnixTimeResolution;

class Jwt
{
    #[UnixTimeField]
    protected DateTimeImmutable $exp;
    
    #[UnixTimeField(resolution: UnixTimeResolution::Milliseconds)]
    protected DateTimeImmutable $iss;
}

Will serialize to this JSON:

{
    "exp": 1707764358,
    "iss": 1707764358000
}

The serialized integer should be read as "this many seconds since the epoc" or "this many milliseconds since the epoc," etc. ("The epoc" being 1 January 1970, the first year after humans first walked on the moon.)

Note that the permissible range of milliseconds and microseconds is considerably smaller than that for seconds, since there is a limit on the size of an integer that we can represent. For timestamps in the early 21st century there should be no issue, but trying to record the microseconds since the epoc for the setting of Dune (somewhere in the 10,000s) won't work.

Generators, Iterables, and Traversables

PHP has a number of "lazy list" options. Generally, they are all objects that implement the \Traversable interface. However, there are several syntax options available with their own subtleties. Serde supports them in different ways.

If a property is defined to be an iterable, then regardless of whether it's a Traversable object or a Generator the iterable will be "run out" and converted to an array by the serialization process. Note that if the iterable is an infinite iterator, the process will continue forever and your program will freeze. Don't do that.

Also, when using an iterable property the property MUST be marked with either #[SequenceField] or #[DictionaryField] as appropriate. Serde cannot deduce which it is on its own the way it (usually) can with arrays.

On deserializing, the incoming values will always be assigned to an array. As an array is an iterable, that is still type safe. While in theory it would be possible to build a dynamic generator on the fly to materialize the values lazily, that would not actually save any memory.

Note this does mean that serializing and deserializing an object will not be fully symmetric. The initial object may have properties that are generators, but the deserialized object will have arrays instead.

If a property is typed to be some other Traversable object (usually because it implements either \Iterator or \IteratorAggregate), then it will be serialized and deserialized as a normal object. Its iterable-ness is ignored. In this case, the #[SequenceField] and #[DictionaryField] attributes are forbidden.

CSV Formatter

Serde includes support for serializing/deserializing CSV files. However, because CSV is a more limited type of format only certain object structures are supported.

Specifically, the object in question must have a single property that is marked #[SequenceField], and it must have an explicit arrayType that is a class. That class, in turn, may contain only int, float, or string properties. Anything else will throw an error.

For example:

namespace Crell\Serde\Records;

use Crell\Serde\Attributes\SequenceField;

class CsvTable
{
    public function __construct(
        #[SequenceField(arrayType: CsvRow::class)]
        public array $people,
    ) {}
}

class CsvRow
{
    public function __construct(
        public string $name,
        public int $age,
        public float $balance,
    ) {}
}

This combination will result in a three-column CSV file, and also deserialize from a three-column CSV file.

The CSV formatter uses PHP's native CSV parsing and writing tools. If you want to control the delimiters used, pass those as constructor arguments to a CsvFormatter instance and inject that into the Serde class instead of the default.

Note that the lone property may be a generator. That allows a CSV to be generated on the fly off of arbitrary data. When deserialized, it will still deserialize to an array.

Streams

Serde includes two stream-based formatters (but not deformatters, yet), one for JSON and one for CSV. They work nearly the same way as any other formatter, but when calling $serde->serialize() you may (and should) pass an extra init argument. $init should be an instance of Serde\Formatter\FormatterStream, which wraps a writeable PHP stream handle.

The value returned will then be that same stream handle, after the object to be serialized has been written to it.

For example:

// The JsonStreamFormatter and CsvStreamFormatter are not included by default.
$s = new SerdeCommon(formatters: [new JsonStreamFormatter()]);

// You may use any PHP supported stream here, including files, network sockets,
// stdout, an in-memory temp stream, etc.
$init = FormatterStream::new(fopen('/tmp/output.json', 'wb'));

$result = $serde->serialize($data, format: 'json-stream', init: $init);

// $result is a FormatterStream object that wraps the same handle as before.
// What you can now do with the stream depends on what kind of stream it is.

In this example, the $data object (whatever it is) gets serialized to JSON piecemeal and streamed out to the specified file handle.

The CsvStreamFormatter works in the exact same way, but outputs CSV data and has the same restrictions as the CsvFormatter in terms of the objects it accepts.

In many cases that won't actually offer much benefit, as the whole object must be in memory anyway. However, it may be combined with the support for lazy iterators to have a property that produces objects lazily, say from a database query or read from some other source.

Consider this example:

use Crell\Serde\Attributes\SequenceField;

class ProductList
{
    public function __construct(
        #[SequenceField(arrayType: Product::class)]
        private iterable $products,
    ) {}
}

class Product
{
    public function __construct(
        public readonly string $name,
        public readonly string $color,
        public readonly float $price,
    ) {}
}

$databaseConn = ...;

$callback = function() use ($databaseConn) {
    $result = $databaseConn->query("SELECT name, color, price FROM products ORDER BY name");

    // Assuming $record is an associative array.
    foreach ($result as $record) {
        yield new Product(...$record);
    }
};

// This is a lazy list of products, which will be pulled from the database.
$products = new ProductList($callback());

// Use the CSV formatter this time, but JsonStream works just as well.
$s = new SerdeCommon(formatters: [new CsvStreamFormatter()]);

// Write to stdout, aka, back to the browser.
$init = FormatterStream::new(fopen('php://output', 'wb'));

$result = $serde->serialize($products, format: 'csv-stream', init: $init);

This setup will lazily pull records out of the database and instantiate an object from them, then lazily stream that data out to stdout. No matter how many product records are in the database, the memory usage remains roughly constant. (Note the database driver may do its own buffering of the entire result set, which could cause memory issues. That's a separate matter, however.)

While likely overkill for CSV, it can work very well for more involved objects being serialized to JSON.

TypeMaps

Type maps are a powerful feature of Serde that allows precise control over how objects with inheritance are serialized and deserialized. Type Maps translate between the class of an object and some unique identifier that is included in the serialized data.

In the abstract, a Type Map is any object that implements the TypeMap interface. TypeMaps may be provided as an attribute on a property, or on a class or interface, or provided to Serde when it is set up to allow for arbitrary maps.

Consider the following example, which will be used for the remaining explanations of Type Maps:

use Crell\Serde\Attributes\SequenceField;

interface Product {}

interface Book extends Product {}

class PaperBook implements Book
{
    protected string $title;
    protected int $pages;
}

class DigitalBook implements Book
{
    protected string $title;
    protected int $bytes;
}

class Sale
{
    protected Book $book;

    protected float $discountRate;
}

class Order
{
    protected string $orderId;

    #[SequenceField(arrayType: Book::class)]
    protected array $products;
}

Both Sale and Order reference Book, but that value could be a PaperBook, DigitalBook, or any other class that implements Book. Type Maps provide a way for Serde to tell which concrete type it is.

Class name maps

The simplest case of a class map is to include a #[ClassNameTypeMap] attribute on an object property. For example,

use Crell\Serde\ClassNameTypeMap;

class Sale
{
    #[ClassNameTypeMap(key: 'type')]
    protected Book $book;

    protected float $discountRate;
}

Now when a Sale is serialized, an extra property will be included named type that contains the class name. So a sale on a digital book would serialize like so:

{
    "book": {
        "type": "Your\\App\\DigitalBook",
        "title": "Thinking Functionally in PHP",
        "bytes": 45000
    },
    "discountRate": 0.2
}

On deserialization, the "type" property will be read and used to determine that the remaining values should be used to construct a DigitalBook instance, specifically.

Class name maps have the advantage that they are very simple, and will work with any class that implements that interface, even those you haven't thought of yet. The downside is that they put a PHP implementation detail (the class name) into the output, which may not be desirable.

Static Maps

Static maps allow you to provide a fixed map from classes to meaningful keys.

use Crell\Serde\Attributes\StaticTypeMap;

class Sale
{
    #[StaticTypeMap(key: 'type', map: [
        'paper' => Book::class,
        'ebook' => DigitalBook::class,
    ])]
    protected Book $book;

    protected float $discountRate;
}

Now, if a Sale object is serialized it will look like this:

{
    "book": {
        "type": "ebook",
        "title": "Thinking Functionally in PHP",
        "bytes": 45000
    },
    "discountRate": 0.2
}

Static maps have the advantage of simplicity and not polluting the output with PHP-specific implementation details. The downside is that they are static: They can only handle the classes you know about at code time, and will throw an exception if they encounter any other class.

Type maps on collections

Type Maps may also be applied to array properties, either sequence or dictionary. In that case, they will apply to all values in that collection. For example:

use Crell\Serde\Attributes as Serde;

class Order
{
    protected string $orderId;

    #[Serde\SequenceField(arrayType: Book::class)]
    #[Serde\StaticTypeMap(key: 'type', map: [
        'paper' => Book::class,
        'ebook' => DigitalBook::class,
    ])]
    protected array $books;
}

$products is an array of objects that implement Book, but could be either PaperBook or DigitalBook. A serialized copy of this object may look like:

{
    "orderId": "abc123",
    "products": [
        {
            "type": "ebook",
            "title": "Thinking Functionally in PHP",
            "bytes": 45000
        },
        {
            "type": "paper",
            "title": "Category Theory for Programmers",
            "pages": 335
        }
    ]
}

On deserialization, the type property will again be used to determine the class that the rest of the properties should be hydrated into.

Type mapped classes

In addition to putting a type map on a property, you may also place it on the class or interface that the property references.

use Crell\Serde\Attributes\StaticTypeMap;

#[StaticTypeMap(key: 'type', map: [
    'paper' => Book::class,
    'ebook' => DigitalBook::class,
])]
interface Book {}

Now, that Type Map will apply to both Sale::$book and to Order::$books with no further work on our part.

Type Maps also inherit. That means we can put a type map on Product instead if we wanted:

use Crell\Serde\Attributes\StaticTypeMap;

#[StaticTypeMap(key: 'type', map: [
    'paper' => Book::class,
    'ebook' => DigitalBook::class,
    'toy' => Gadget::class,
])]
interface Product {}

And both Sale and Order will still serialize with the appropriate key.

Dynamic type maps

Type Maps may also be provided directly to the Serde object when it is created. Any object that implements TypeMap may be used. This is most useful when the list of possible classes is dynamic based on user configuration, database values, what plugins are installed in your application, etc.

use Crell\Serde\TypeMap;

class ProductTypeMap implements TypeMap
{
    public function __construct(protected readonly Connection $db) {}

    public function keyField(): string
    {
        return 'type';
    }

    public function findClass(string $id): ?string
    {
        return $this->db->someLookup($id);
    }

    public function findIdentifier(string $class): ?string
    {
        return $this->db->someMappingLogic($class);
    }
}

$typeMap = new ProductTypeMap($dbConnection);

$serde = new SerdeCommon(typeMaps: [
    Your\App\Product::class => $typeMap,
]);

$json = $serde->serialize($aBook, to: 'json');

In practice, you would likely set that up via your Dependency Injection system.

Note that ClassNameTypeMap and StaticTypeMap may be injected as well, as can any other class that implements TypeMap.

Custom type maps

You may also write your own Type Maps as attributes. The only requirements are:

  1. The class implements the TypeMap interface.
  2. The class is marked as an #[\Attribute].
  3. The class is legal on both classes and properties. That is, #[\Attribute(\Attribute::TARGET_CLASS | \Attribute::TARGET_PROPERTY)]

Scopes

Serde supports "scopes" for having different versions of an attribute recognized in different contexts.

Any attribute (Field, TypeMap, SequenceField, DictionaryField, PostLoad, etc.) may take a scopes argument, which accepts an array of strings. If specified, that attribute is only valid if serializing or deserializing in that scope. If no scoped attribute is specified, then the behavior will fall back to an unscoped attribute or an omitted attribute.

For example, given this class:

class User
{
    private string $username;

    #[Field(exclude: true)]
    private string $password;

    #[Field(exclude: true)]
    #[Field(scope: 'admin')]
    private string $role;
}

If you serialize it like so:

$json = $serde->serialize($user, 'json');

It will result in this JSON response:

{
    "username": "Larry"
}

That's because, in an unscoped request, the first Field on $role is used, which excludes it from the output. However, if you specify a scope:

$json = $serde->serialize($user, 'json', scopes: ['admin']);

Then the admin version of $role's Field will be used, which is not excluded, and get this result:

{
    "username": "Larry",
    "role": "Developer"
}

When using scopes, it may be helpful to disable automatic property inclusion and require that each be specified explicitly. For example:

#[ClassSettings(includeFieldsByDefault: false)]
class Product
{
    #[Field]
    private int $id = 5;

    #[Field]
    #[Field(scopes: ['legacy'], serializedName: 'label')]
    private string $name = 'Fancy widget';

    #[Field(scopes: ['newsystem'])]
    private float $price = '9.99';

    #[Field(scopes: ['legacy'], serializedName: 'cost')]
    private float $legacyPrice = 9.99;

    #[Field(serializedName: 'desc')]
    private string $description = 'A fancy widget';

    private int $stock = 50;
}

If serialized with no scope specified, it will result in this:

{
    "id": 5,
    "name": "Fancy widget",
    "desc": "A fancy widget"
}

As those are the only fields that are "in scope" when no scope is specified.

If serialized with the legacy scope:

{
    "id": 5,
    "label": "Fancy widget",
    "cost": 9.99,
    "desc": "A fancy widget"
}

The scope-specific Field on $name gets used instead, which changes the serialized name. The $legacyPrice property is also included now, but renamed to "cost".

If serialized with the newsystem scope:

{
    "id": 5,
    "name": "Fancy widget",
    "price": "9.99",
    "desc": "A fancy widget"
}

In this case, the $name property uses the unscoped version of Field, and so is not renamed. The string-based $price is now in-scope, but the float-based $legacyPrice is not. Note that in none of these cases is the current $stock included, as it has no attribute at all.

Finally, it's also possible to serialize multiple scopes simultaneously. This is an OR operation, so any field marked for any specified scope will be included.

$json = $serde->serialize($product, 'json', scopes: ['legacy', 'newsystem']);
{
    "id": 5,
    "name": "Fancy widget",
    "price": "9.99",
    "cost": 9.99,
    "desc": "A fancy widget"
}

Note that since there is both an unscoped and a scoped version of the Field on $name, the scoped one wins and the property gets renamed.

If multiple attribute variants could apply for the specified scope, the lexically first in a scope will take precedence over later ones, and a scoped attribute will take precedence over an unscoped one.

Note that when deserializing, specifying a scope will exclude not only out-of-scope properties but their defaults as well. That is, they will not be set, even to a default value, and so may be "uninitialized." That is rarely desirable, so it may be preferable to deserialize without a scope, even if a value was serialized with a scope. That will depend on your use case.

For more on scopes, see the AttributeUtils documentation.

Validation with #[PostLoad]

It is important to note that when deserializing, __construct() is not called at all. That means any validation present in the constructor will not be run on deserialization.

Instead, Serde will look for any method or methods that have a #[\Crell\Serde\Attributes\PostLoad] attribute on them. This attribute takes no arguments other than scopes. After an object is populated, any PostLoad methods will be invoked with no arguments in lexical order. The main use case for this feature is validation, in which case the method should throw an exception if the populated data is invalid in some way. (For instance, some integer must be positive.)

The visibilty of the method is irrelevant. Serde will call public, private, or protected methods the same. Note, however, that a private method in a parent class of the class being deserialized to will not get called, as it is not accessible to PHP from that scope.

Extending Serde

Internally, Serde has six types of extensions that work in concert to produce a serialized or deserialized product.

  • Type Maps, as discussed above, are optional and translate a class name to a lookup identifier and back.
  • A Exporter is responsible for pulling values off of an object, processing them if necessary, and then passing them on to a Formatter. This is part of the Serialization pipeline.
  • A Importer is responsible for using a Deformatter to extract data from incoming data and then translate it as necessary to be written to an object. This is part of the Deserialization pipeline.
  • A Formatter is responsible for writing to a specific output format, like JSON or YAML. This is part of the Serialization pipeline.
  • A Deformatter is responsible for reading data off of an incoming format and passing it back to an Importer. This is part of the Deserialization pipeline.
  • A TypeField is a custom per-type field that can be added to a property to provide more type-specific logic to a corresponding Exporter or Importer. In a sense, they are "extra arguments" to an Exporter or Importer, and if you implement a custom one you will almost certainly implement your own Exporter or Importer to go with it. DateTimeField, DictionaryField, and SequenceField are examples of Type Fields. TypeFields are also Transitive, so they can be put on a custom class to apply to anywhere that class is used on a property (unless overridden locally).

Collectively, Importer and Exporter instances are called "handlers."

In general, Importers and Exporters are PHP-type specific, while Formatters and Deformatters are serialized-format specific. Custom Importers and Exporters can also declare themselves to be format-specific if they contain format-sensitive optimizations.

Importer and Exporter may be implemented on the same object, or not. Similarly, Formatter and Deformatter may be implemented together or not. That is up to whatever seems easiest for the particular implementation, and the provided extensions do a little of each depending on the use case.

The interfaces linked above provide more precise explanations of how to use them. In most cases, you would only need to implement a Formatter or Deformatter to support a new format. You would only need to implement an Importer or Exporter when dealing with a specific class that needs extra special handling for whatever reason, such as its serialized representation having little or no relationship with its object representation.

As an example, a few custom handlers are included to deal with common cases.

  • DateTimeExporter: This object will translate DateTime and DateTimeImmutable objects to and from a serialized form as a string. Specifically, it will use the \DateTimeInterface::RFC3339_EXTENDED format for the string when serializing. The timestamp will then appear in the serialized output as a normal string. When deserializing, it will accept any datetime format supported by DateTime's constructor.
  • DateTimeZoneExporter: This object will translate DateTimeZone objects to and from a serialized form as a timezone string. That is, DateTimeZone('America/Chicago)will be represented in the format as the stringAmerica/Chicago`.
  • NativeSerializeExporter: This object will apply to any class that has a __serialize() method (when serializing) or __unserialize() method (when deserializing). These PHP magic methods provide alternate representations of an object intended for use with PHP's native serialize() and unserialize() methods, but can also be used for any other format. If __serialize() is defined, it will be invoked and whatever associative array it returns will be written to the selected format as a dictionary. If __unserialize() is defined, this object will read a dictionary from the incoming data and then pass it to that method on a newly created object, which will then be responsible for populating the object as appropriate. No further processing will be done in either direction.
  • EnumOnArrayImporter: Serde natively supports PHP Enums and can serialize them as ints or strings as appropriate. However, in the special case of reading from a PHP array format this object will take over and support reading an Enum literal in the incoming data. That allows, for example, a configuration array to include hand-inserted Enum values and still be cleanly imported into a typed, defined object.

Architecture diagrams

Serialization works approximately like this:

And deserialization looks very similar:

In both cases, note that nearly all behavior is controlled by a one-off serializer/deserializer object, not by Serde itself. Serde itself is just a wrapper that configures the context for the runner object.

Dependency Injection configuration

Serde is designed to be usable "out of the box" without any additional setup. However, when included in a larger system it is best to configure it properly via Dependency Injection.

There are three ways you can set up Serde.

  1. The SerdeCommon class includes most available handlers and formatters out of the box, ready to go, although you can add additional ones via the constructor.
  2. The SerdeBasic class has no pre-built configuration whatsoever; you will need to provide all Handlers, Formatters, or Type Maps you want yourself, in the order you want them applied.
  3. You may also extend the Serde base class itself and create your own custom pre-made configuration, with just the Handlers or Formatters (provided or custom) that you want.

Both SerdeCommon and SerdeBasic take four arguments: The ClassAnalyzer to use, an array of Handlers, an array of Formatters, and an array of Type Maps. If no analyzer is provided, Serde creates a memory-cached Analyzer by default so that it will always work. However, in a DI configuration it is strongly recommended that you configure the Analyzer yourself, with appropriate caching, and inject that into Serde as a dependency to avoid duplicate Analyzers (and duplicate caches). If you have multiple different Serde configurations in different services, it may also be beneficial to make all handlers and formatters services as well and explicitly inject them into SerdeBasic rather than relying on SerdeCommon.

Change log

Please see CHANGELOG for more information on what has changed recently.

Testing

$ composer test

Contributing

Please see CONTRIBUTING and CODE_OF_CONDUCT for details.

Security

If you discover any security related issues, please use the GitHub security reporting form rather than the issue queue.

Credits

Initial development of this library was sponsored by TYPO3 GmbH.

License

The Lesser GPL version 3 or later. Please see License File for more information.