crell / serde
A general purpose serialization and deserialization library
Fund package maintenance!
Crell
Installs: 60 355
Dependents: 6
Suggesters: 1
Security: 0
Stars: 299
Watchers: 6
Forks: 14
Open Issues: 12
Requires
- php: ~8.1
- crell/attributeutils: ~1.2
- crell/fp: ~1.0
Requires (Dev)
- devium/toml: ^1.0.5
- phpbench/phpbench: ^1.3.0
- phpstan/phpstan: ^1.11
- phpunit/phpunit: ~10.5
- symfony/yaml: ^5.4
Suggests
- devium/toml: Enables serializing to/from TOML files.
- symfony/yaml: Enables serializing to/from YAML files.
README
Serde (pronounced "seer-dee") is a fast, flexible, powerful, and easy to use serialization and deserialization library for PHP that supports a number of standard formats. It draws inspiration from both Rust's Serde crate and Symfony Serializer, although it is not directly based on either.
At this time, Serde supports serializing PHP objects to and from PHP arrays, JSON, YAML, TOML, and CSV files. It also supports serializing to JSON or CSV via a stream. Further support is planned, but by design can also be extended by anyone.
Install
Via Composer
$ composer require crell/serde
Usage
Serde is designed to be both quick to start using and robust in more advanced cases. In its most basic form, you can do the following:
use Crell\Serde\SerdeCommon; $serde = new SerdeCommon(); $object = new SomeClass(); // Populate $object somehow; $jsonString = $serde->serialize($object, format: 'json'); $deserializedObject = $serde->deserialize($jsonString, from: 'json', to: SomeClass::class);
(The named arguments are optional, but recommended.)
Serde is highly configurable, but common cases are supported by just using the SerdeCommon
class as provided. For most basic cases, that is all you need.
Key features
Supported formats
Serde can serialize to:
- PHP arrays (
array
) - JSON (
json
) - Streaming JSON (
json-stream
) - YAML (
yaml
) - TOML (
toml
) - CSV (
csv
) - Streaming CSV (
csv-stream
)
Serde can deserialize from:
- PHP arrays (
array
) - JSON (
json
) - YAML (
yaml
) - TOML (
toml
) - CSV (
csv
)
YAML support requires the Symfony/Yaml
library.
TOML support requires the Vanodevium/Toml
library.
XML support is in progress.
Robust object support
Serde automatically supports nested objects in properties of other objects, which will be handled recursively as long as there are no circular references.
Serde handles public
, private
, protected
, and readonly
properties, both reading and writing, with optional default values.
If you try to serialize or deserialize an object that implements PHP's __serialize()
or __unserialize()
hooks, those will be respected. (If you want to read/write from PHP's internal serialization format, just call serialize()
/unserialize()
directly.)
Serde also supports post-load callbacks that allow you to re-initialize derived information if necessary without storing it in the serialized format.
PHP objects can be mutated to and from a serialized format. Nested objects can be flattened or collected, classes with common interfaces can be mapped to the appropriate object, and array values can be imploded into a string for serialization and exploded back into an array when reading.
Configuration
Serde's behavior is driven almost entirely through attributes. Any class may be serialized from or deserialized to as-is with no additional configuration, but there is a great deal of configuration that may be opted-in to.
Attribute handling is provided by Crell/AttributeUtils
. It is worth looking into as well.
The main attribute is the Crell\Serde\Attributes\Field
attribute, which may be placed on any object property. (Static properties are ignored.) All of its arguments are optional, as is the Field
itself. (That is, adding #[Field]
with no arguments is the same as not specifying it at all.) The meaning of the available arguments is listed below.
Although not required, it is strongly recommended that you always use named arguments with attributes. The precise order of arguments is not guaranteed.
In the examples below, the Field
is generally referenced directly. However, you may also import the namespace and then use namespaced versions of the attributes, like so:
use Crell\Serde\Attributes as Serde; #[Serde\ClassSettings(includeFieldsByDefault: false)] class Person { #[Serde\Field(serializedName: 'callme')] protected string $name = 'Larry'; }
Which you do is mostly a matter of preference, although if you are mixing Serde attributes with attributes from other libraries then the namespaced approach is advisable.
There is also a ClassSettings
attribute that may be placed on classes to be serialized. At this time it has four arguments:
includeFieldsByDefault
, which defaults totrue
. If set to false, a property with no#[Field]
attribute will be ignored. It is equivalent to settingexclude: true
on all properties implicitly.requireValues
, which defaults tofalse
. If set to true, then when deserializing any field that is not provided in the incoming data will result in an exception. This may also be turned on or off on a per-field level. (SeerequireValue
below.) The class-level setting applies to any field that does not specify its behavior.renameWith
. If set, the specified renaming strategy will be used for all properties of the class, unless a property specifies its own. (SeerenameWith
below.) The class-level setting applies to any field that does not specify its behavior.omitNullFields
, which defaults to false. If set to true, any property on the class that is null will be omitted when serializing. It has on effect on deserialization. This may also be turned on or off on a per-field level. (SeeomitIfNull
below.)scopes
, which sets the scope of a given class definition attribute. See the section on Scopes below.
exclude
(bool, default false)
If set to true
, Serde will ignore the property entirely on both serializing and deserializing.
serializedName
(string, default null)
If provided, this string will be used as the name of a property when serialized out to a format and when reading it back in. for example:
use Crell\Serde\Attributes\Field; class Person { #[Field(serializedName: 'callme')] protected string $name = 'Larry'; }
Round trips to/from:
{ "callme": "Larry" }
renameWith
(RenamingStrategy, default null)
The renameWith
key specifies a way to mangle the name of the property to produce a serializedName. The most common examples here would be case folding, say if serializing to a format that uses a different convention than PHP does.
The value of renameWith
can be any object that implements the RenamingStrategy
interface. The most common versions are already provided via the Cases
enum and Prefix
class, but you are free to provide your own.
The Cases
enum implements RenamingStrategy
and provides a series of instances (cases) for common renaming. For example:
use Crell\Serde\Attributes\Field; use Crell\Serde\Renaming\Cases; class Person { #[Field(renameWith: Cases::snake_case)] public string $firstName = 'Larry'; #[Field(renameWith: Cases::CamelCase)] public string $lastName = 'Garfield'; }
Serializes to/from:
{ "first_name": "Larry", "LastName": "Garfield" }
Available cases are:
Cases::UPPERCASE
Cases::lowercase
Cases::snake_case
Cases::kebab_case
(renders with dashes, not underscores)Cases::CamelCase
Cases::lowerCamelCase
The Prefix
class attaches a prefix to values when serialized, but otherwise leaves the property name intact.
use Crell\Serde\Attributes\Field; use Crell\Serde\Renaming\Prefix; class MailConfig { #[Field(renameWith: new Prefix('mail_')] protected string $host = 'smtp.example.com'; #[Field(renameWith: new Prefix('mail_')] protected int $port = 25; #[Field(renameWith: new Prefix('mail_')] protected string $user = 'me'; #[Field(renameWith: new Prefix('mail_')] protected string $password = 'sssh'; }
Serializes to/from:
{ "mail_host": "smtp.example.com", "mail_port": 25, "mail_user": "me", "mail_password": "sssh" }
If both serializedName
and renameWith
are specified, serializedName
will be used and renameWith
ignored.
alias
(array, default []
)
When deserializing (only), if the expected serialized name is not found in the incoming data, these additional property names will be examined to see if the value can be found. If so, the value will be read from that key in the incoming data. If not, it will behave the same as if the value was simply not found in the first place.
use Crell\Serde\Attributes\Field; class Person { #[Field(alias: ['layout', 'design'])] protected string $format = ''; }
All three of the following JSON strings would be read into an identical object:
{ "format": "3-column-layout" }
{ "layout": "3-column-layout" }
{ "design": "3-column-layout" }
This is mainly useful when an API key has changed, and legacy incoming data may still have an old key name.
omitIfNull
(bool, default false)
This key only applies on serialization. If set to true, and the value of this property is null when an object is serialized, it will be omitted from the output entirely. If false, a null
will be written to the output, however that looks for the particular format.
useDefault
(bool, default true)
This key only applies on deserialization. If a property of a class is not found in the incoming data, and this property is true, then a default value will be assigned instead. If false, the value will be skipped entirely. Whether the deserialized object is now in an invalid state depends on the object.
The default value to use is derived from a number of different locations. The priority order of defaults is:
- The value provided by the
default
argument to theField
attribute. - The default value provided by the code, as reported by Reflection.
- The default value of an identically named constructor argument, if any.
So for example, the following class:
use Crell\Serde\Attributes\Field; class Person { #[Field(default: 'Hidden')] public string $location; #[Field(useDefault: false)] public int $age; public function __construct( public string $name = 'Anonymous', ) {} }
if deserialized from an empty source (such as {}
in JSON), will result in an object with location
set to Hidden
, name
set to Anonymous
, and age
still uninitialized.
default
(mixed, default null)
This key only applies on deserialization. If specified, then if a value is missing in the incoming data being deserialized this value will be used instead, regardless of what the default in the source code itself is.
strict
(bool, default true)
This key only applies on deserialization. If set to true
, a type mismatch in the incoming data will be rejected and an exception thrown. If false
, a deformatter will attempt to cast an incoming value according to PHP's normal casting rules. That means, for example, "1"
is a valid value for an integer property if strict
is false
, but will throw an exception if set to true
.
For sequence fields, strict
set to true
will reject a non-sequence value. (It must pass an array_is_list()
check.) If strict
is false
, any array-ish value will be accepted but passed through array_values()
to discard any keys and reindex it.
Additionally, in non-strict
mode, numeric strings in the incoming array will be cast to ints or floats as appropriate in both sequence fields and dictionary fields. In strict
mode, numeric strings will still be rejected.
The exact handling of this setting may vary slightly depending on the incoming format, as some formats handle their own types differently. (For instance, everything is a string in XML.)
requireValue
(bool, default false)
This key only applies on deserialization. If set to true
, if the incoming data does not include a value for this field and there is no default specified, a MissingRequiredValueWhenDeserializing
exception will be thrown. If not set, and there is no default value, then the property will be left uninitialized.
If a field has a default value, then the default value will always be used for missing data and this setting has no effect.
flatten
(bool, default false)
The flatten
keyword can only be applied on an array or object property. A property that is "flattened" will have all of its properties injected into the parent directly on serialization, and will have values from the parent "collected" into it on deserialization.
Multiple objects and arrays may be flattened (serialized), but on deserialization only the lexically last array property marked flatten
will collect remaining keys. Any number of objects may "collect" their properties, however.
As an example, consider pagination. It may be very helpful to represent pagination information in PHP as an object property of a result set, but in the serialized JSON or XML you may want the extra object removed.
Given this set of classes:
use Crell\Serde\Attributes as Serde; class Results { public function __construct( #[Serde\Field(flatten: true)] public Pagination $pagination, #[Serde\SequenceField(arrayType: Product::class)] public array $products, ) {} } class Pagination { public function __construct( public int $total, public int $offset, public int $limit, ) {} } class Product { public function __construct( public string $name, public float $price, ) {} }
When serialized, the $pagination
object will get "flattened," meaning its three properties will be included directly in the properties of Results
. Therefore, a JSON-serialized copy of this object may look like:
{ "total": 100, "offset": 20, "limit": 10, "products": [ { "name": "Widget", "price": 9.99 }, { "name": "Gadget", "price": 4.99 } ] }
The extra "layer" of the Pagination
object has been removed. When deserializing, those extra properties will be "collected" back into a Pagination
object.
Now consider this more complex example:
use Crell\Serde\Attributes as Serde; class DetailedResults { public function __construct( #[Serde\Field(flatten: true)] public NestedPagination $pagination, #[Serde\Field(flatten: true)] public ProductType $type, #[Serde\SequenceField(arrayType: Product::class)] public array $products, #[Serde\Field(flatten: true)] public array $other = [], ) {} } class NestedPagination { public function __construct( public int $total, public int $limit, #[Serde\Field(flatten: true)] public PaginationState $state, ) {} } class PaginationState { public function __construct( public int $offset, ) { } } class ProductType { public function __construct( public string $name = '', public string $category = '', ) {} }
In this example, both NestedPagination
and PaginationState
will be flattened when serializing. NestedPagination
itself also has a field that should be flattened. Both will flatten and collect cleanly, as long as none of them share a property name.
Additionally, there is an extra array property, $other
. $other
may contain whatever associative array is desired, and its values will also get flattened into the output.
When collecting, only the lexically last flattened array will get any data, and will get all properties not already accounted for by some other property. For example, an instance of DetailedResults
may serialize to JSON as:
{ "total": 100, "offset": 20, "limit": 10, "products": [ { "name": "Widget", "price": 9.99 }, { "name": "Gadget", "price": 4.99 } ], "foo": "beep", "bar": "boop" }
In this case, the $other
property has two keys, foo
and bar
, with values beep
and boop
, respectively. The same JSON will deserialize back to the same object as before.
Value objects
Flattening can also be used in conjunction with renaming to silently translate value objects. Consider:
class Person { public function __construct( public string $name, #[Field(flatten: true)] public Age $age, #[Field(flatten: true)] public Email $email, ) {} } readonly class Email { public function __construct( #[Field(serializedName: 'email')] public string $value, ) {} } readonly class Age { public function __construct( #[Field(serializedName: 'age')] public int $value ) { $this->validate(); } #[PostLoad] private function validate(): void { if ($this->value < 0) { throw new \InvalidArgumentException('Age cannot be negative.'); } } }
In this example, Email
and Age
are value objects, in the latter case with extra validation. However, both are marked flatten: true
, so their properties will be moved up a level to Person
when serializing. However, they both use the same property name, so both have a custom serialization name specified. The above object will serialize to (and deserialize from) something like this:
{ "name": "Larry", "age": 21, "email": "me@example.com" }
Note that because deserialization bypasses the constructor, the extra validation in Age
must be placed in a separate method that is called from the constructor and flagged to run automatically after deserialization.
It is also possible to specify a prefix for a flattened value, which will also be applied recursively. For example, assuming the same Age class above:
readonly class JobDescription { public function __construct( #[Field(flatten: true, flattenPrefix: 'min_')] public Age $minAge, #[Field(flatten: true, flattenPrefix: 'max_')] public Age $maxAge, ) {} } class JobEntry { public function __construct( #[Field(flatten: true, flattenPrefix: 'desc_')] public JobDescription $description, ) {} }
In this case, serializing JobEntry
will first flatten the $description
property, with desc_
as a prefix. Then, JobDescription
will flatten both of its age fields, giving each a separate prefix. That will result in a serialized output something like this:
{ "desc_min_age": 18, "desc_max_age": 65, }
And it will deserialize back to the same original 3-layer-object structure.
flattenPrefix
(string, default '')
When an object or array property is flattened, by default its properties will be flattened using their existing name (or serializedName
, if specified). That may cause issues if the same class is included in a parent class twice, or if there is some other name collission. Instead, flattened fields may be given a flattenPrefix
value. That string will be prepended to the name of the property when serializing.
If set on a non-flattened field, this value is meaningless and has no effect.
Sequences and Dictionaries
In most languages, and many serialization formats, there is a difference between a sequential list of values (called variously an array, sequence, or list) and a map of arbitrary size of arbitrary values to other arbitrary values (called a dictionary or map). PHP does not make a distinction, and shoves both data types into a single associative array variable type.
Sometimes that works out, but other times the distinction between the two greatly matters. To support those cases, Serde allows you to flag an array property as either a #[SequenceField]
or #[DictionaryField]
(and it is recommended that you always do so). Doing so ensures that the correct serialization pathway is used for the property, and also opens up a number of additional features.
arrayType
On both a #[SequenceField]
and #[DictionaryField]
, the arrayType
argument lets you specify the type that all values in that structure are. For example, a sequence of integers can easily be serialized to and deserialized from most formats without any additional help. However, an ordered list of Product
objects could be serialized, but there's no way to tell then how to deserialize that data back to Product
objects rather than just a nested associative array (which would also be legal). The arrayType
argument solves that issue.
If arrayType
is specified, then all values of that array are assumed to be of that type. It may either be a class-string
to specify all values are a class, or a value of the ValueType
enum to indicate one of the four supported scalars.
On deserialization, then, Serde will either validate that all incoming values are of the right scalar type, or look for nested object-like structures (depending on the specific format), and convert those into the specified object type.
For example:
use Crell\Serde\Attributes\SequenceField; class Order { public string $orderId; public int $userId; #[SequenceField(arrayType: Product::class)] public array $products; }
In this case, the attribute tells Serde that $products
is an indexed, sequential list of Product
objects. When serializing, that may be represented as an array of dictionaries (in JSON or YAML) or perhaps with some additional metadata in other formats.
When deserializing, the otherwise object-ignorant data will be upcast back to Product
objects.
arrayType
works the exact same way on a DictionaryField
.
keyType
On DictionaryField
only, it's possible to restrict the array to only allowing integer or string keys. It has two legal values, KeyType::Int
and KeyType::String
(an enum). If set to KeyType::Int
, then deserialization will reject any arrays that have string keys, but will accept numeric strings. If set to KeyType::String
, then deserialization will reject any arrays that have integer keys, including numeric strings.
(PHP auto-casts integer string array keys to actual integers, so there is no way to allow them in string-based dictionaries.)
If no value is set, then either key type will be accepted.
implodeOn
The implodeOn
argument to SequenceField
, if present, indicates that the value should be joined into a string serialization, using the provided value as glue. For example:
use Crell\Serde\Attributes\SequenceField; class Order { #[SequenceField(implodeOn: ',')] protected array $productIds = [5, 6, 7]; }
Will serialize in JSON to:
{ "productIds": "5,6,7" }
On deserialization, that string will get automatically get exploded back into an array when placed into the object.
By default, on deserialization the individual values will be trim()
ed to remove excess whitespace. That can be disabled by setting the trim
attribute argument to false
.
joinOn
DictionaryField
s also support imploding/exploding on serialization, but require two keys. implodeOn
specifies the string to use between distinct values. joinOn
specifies the string to use between the key and value.
For example:
use Crell\Serde\Attributes\DictionaryField; class Settings { #[DictionaryField(implodeOn: ',', joinOn: '=')] protected array $dimensions = [ 'height' => 40, 'width' => 20, ]; }
Will serialize/deserialize to this JSON:
{ "dimensions": "height=40,width=20" }
As with SequenceField
, values will automatically be trim()
ed unless trim: false
is specified in the attribute's argument list.
Date and Time fields
DateTime
and DateTimeImmutable
fields can also be serialized, and you can control how they are serialized using the DateField
or the UnixTimeField
attribute. DateField
has two arguments, which may be used individually or together. Specifying neither is the same as not specifying the DateField
attribute at all.
use Crell\Serde\Attributes\DateField; class Settings { #[DateField(format: 'Y-m-d')] protected DateTimeImmutable $date = new DateTimeImmutable('4 July 2022-07-04 14:22); }
Will serialize to this JSON:
{ "date": "2022-07-04" }
timezone
The timezone
argument may be any timezone string legal in PHP, such as America/Chicago
or UTC
. If specified, the value will be cast to this timezone first before it is serialized. If not specified, the value will be left in whatever timezone it is in before being serialized. Whether that makes a difference to the output depends on the format
.
On deserializing, the timezone
has no effect. If the incoming value has a timezone specified, the resulting DateTime[Immutable]
object will use that timezone. If not, the system default timezone will be used.
format
This argument lets you specify the format that will be used when serializing. It may be any string accepted by PHP's date_format syntax, including one of the various constants defined on DateTimeInterface
. If not specified, the default format is RFC3339_EXTENDED
, or Y-m-d\TH:i:s.vP
. While not the most human-friendly, it is the default format used by Javascript/JSON so makes for reasonable compatibility.
On deserializing, the format
has no effect. Serde will pass the string value to a DateTime
or DateTimeImmutable
constructor, so any format recognized by PHP will be parsed according to PHP's standard date-parsing rules.
Unix Time
In cases where you need to serialize the date to/from Unix Time, you can use UnixTimeField
,
which supports a resolution parameter that can handle up to microsecond resolution:
use Crell\Serde\Attributes\UnixTimeField; use Crell\Serde\Attributes\Enums\UnixTimeResolution; class Jwt { #[UnixTimeField] protected DateTimeImmutable $exp; #[UnixTimeField(resolution: UnixTimeResolution::Milliseconds)] protected DateTimeImmutable $iss; }
Will serialize to this JSON:
{ "exp": 1707764358, "iss": 1707764358000 }
The serialized integer should be read as "this many seconds since the epoc" or "this many milliseconds since the epoc," etc. ("The epoc" being 1 January 1970, the first year after humans first walked on the moon.)
Note that the permissible range of milliseconds and microseconds is considerably smaller than that for seconds, since there is a limit on the size of an integer that we can represent. For timestamps in the early 21st century there should be no issue, but trying to record the microseconds since the epoc for the setting of Dune (somewhere in the 10,000s) won't work.
Generators, Iterables, and Traversables
PHP has a number of "lazy list" options. Generally, they are all objects that implement the \Traversable
interface. However, there are several syntax options available with their own subtleties. Serde supports them in different ways.
If a property is defined to be an iterable
, then regardless of whether it's a Traversable
object or a Generator the iterable will be "run out" and converted to an array by the serialization process. Note that if the iterable is an infinite iterator, the process will continue forever and your program will freeze. Don't do that.
Also, when using an iterable
property the property MUST be marked with either #[SequenceField]
or #[DictionaryField]
as appropriate. Serde cannot deduce which it is on its own the way it (usually) can with arrays.
On deserializing, the incoming values will always be assigned to an array. As an array is an iterable
, that is still type safe. While in theory it would be possible to build a dynamic generator on the fly to materialize the values lazily, that would not actually save any memory.
Note this does mean that serializing and deserializing an object will not be fully symmetric. The initial object may have properties that are generators, but the deserialized object will have arrays instead.
If a property is typed to be some other Traversable
object (usually because it implements either \Iterator
or \IteratorAggregate
), then it will be serialized and deserialized as a normal object. Its iterable
-ness is ignored. In this case, the #[SequenceField]
and #[DictionaryField]
attributes are forbidden.
CSV Formatter
Serde includes support for serializing/deserializing CSV files. However, because CSV is a more limited type of format only certain object structures are supported.
Specifically, the object in question must have a single property that is marked #[SequenceField]
, and it must have an explicit arrayType
that is a class. That class, in turn, may contain only int
, float
, or string
properties. Anything else will throw an error.
For example:
namespace Crell\Serde\Records; use Crell\Serde\Attributes\SequenceField; class CsvTable { public function __construct( #[SequenceField(arrayType: CsvRow::class)] public array $people, ) {} } class CsvRow { public function __construct( public string $name, public int $age, public float $balance, ) {} }
This combination will result in a three-column CSV file, and also deserialize from a three-column CSV file.
The CSV formatter uses PHP's native CSV parsing and writing tools. If you want to control the delimiters used, pass those as constructor arguments to a CsvFormatter
instance and inject that into the Serde
class instead of the default.
Note that the lone property may be a generator. That allows a CSV to be generated on the fly off of arbitrary data. When deserialized, it will still deserialize to an array.
Streams
Serde includes two stream-based formatters (but not deformatters, yet), one for JSON and one for CSV. They work nearly the same way as any other formatter, but when calling $serde->serialize()
you may (and should) pass an extra init
argument. $init
should be an instance of Serde\Formatter\FormatterStream
, which wraps a writeable PHP stream handle.
The value returned will then be that same stream handle, after the object to be serialized has been written to it.
For example:
// The JsonStreamFormatter and CsvStreamFormatter are not included by default. $s = new SerdeCommon(formatters: [new JsonStreamFormatter()]); // You may use any PHP supported stream here, including files, network sockets, // stdout, an in-memory temp stream, etc. $init = FormatterStream::new(fopen('/tmp/output.json', 'wb')); $result = $serde->serialize($data, format: 'json-stream', init: $init); // $result is a FormatterStream object that wraps the same handle as before. // What you can now do with the stream depends on what kind of stream it is.
In this example, the $data
object (whatever it is) gets serialized to JSON piecemeal and streamed out to the specified file handle.
The CsvStreamFormatter
works in the exact same way, but outputs CSV data and has the same restrictions as the CsvFormatter
in terms of the objects it accepts.
In many cases that won't actually offer much benefit, as the whole object must be in memory anyway. However, it may be combined with the support for lazy iterators to have a property that produces objects lazily, say from a database query or read from some other source.
Consider this example:
use Crell\Serde\Attributes\SequenceField; class ProductList { public function __construct( #[SequenceField(arrayType: Product::class)] private iterable $products, ) {} } class Product { public function __construct( public readonly string $name, public readonly string $color, public readonly float $price, ) {} } $databaseConn = ...; $callback = function() use ($databaseConn) { $result = $databaseConn->query("SELECT name, color, price FROM products ORDER BY name"); // Assuming $record is an associative array. foreach ($result as $record) { yield new Product(...$record); } }; // This is a lazy list of products, which will be pulled from the database. $products = new ProductList($callback()); // Use the CSV formatter this time, but JsonStream works just as well. $s = new SerdeCommon(formatters: [new CsvStreamFormatter()]); // Write to stdout, aka, back to the browser. $init = FormatterStream::new(fopen('php://output', 'wb')); $result = $serde->serialize($products, format: 'csv-stream', init: $init);
This setup will lazily pull records out of the database and instantiate an object from them, then lazily stream that data out to stdout. No matter how many product records are in the database, the memory usage remains roughly constant. (Note the database driver may do its own buffering of the entire result set, which could cause memory issues. That's a separate matter, however.)
While likely overkill for CSV, it can work very well for more involved objects being serialized to JSON.
TypeMaps
Type maps are a powerful feature of Serde that allows precise control over how objects with inheritance are serialized and deserialized. Type Maps translate between the class of an object and some unique identifier that is included in the serialized data.
In the abstract, a Type Map is any object that implements the TypeMap
interface. TypeMaps may be provided as an attribute on a property, or on a class or interface, or provided to Serde when it is set up to allow for arbitrary maps.
Consider the following example, which will be used for the remaining explanations of Type Maps:
use Crell\Serde\Attributes\SequenceField; interface Product {} interface Book extends Product {} class PaperBook implements Book { protected string $title; protected int $pages; } class DigitalBook implements Book { protected string $title; protected int $bytes; } class Sale { protected Book $book; protected float $discountRate; } class Order { protected string $orderId; #[SequenceField(arrayType: Book::class)] protected array $products; }
Both Sale
and Order
reference Book
, but that value could be a PaperBook
, DigitalBook
, or any other class that implements Book
. Type Maps provide a way for Serde to tell which concrete type it is.
Class name maps
The simplest case of a class map is to include a #[ClassNameTypeMap]
attribute on an object property. For example,
use Crell\Serde\ClassNameTypeMap; class Sale { #[ClassNameTypeMap(key: 'type')] protected Book $book; protected float $discountRate; }
Now when a Sale
is serialized, an extra property will be included named type
that contains the class name. So a sale on a digital book would serialize like so:
{ "book": { "type": "Your\\App\\DigitalBook", "title": "Thinking Functionally in PHP", "bytes": 45000 }, "discountRate": 0.2 }
On deserialization, the "type" property will be read and used to determine that the remaining values should be used to construct a DigitalBook
instance, specifically.
Class name maps have the advantage that they are very simple, and will work with any class that implements that interface, even those you haven't thought of yet. The downside is that they put a PHP implementation detail (the class name) into the output, which may not be desirable.
Static Maps
Static maps allow you to provide a fixed map from classes to meaningful keys.
use Crell\Serde\Attributes\StaticTypeMap; class Sale { #[StaticTypeMap(key: 'type', map: [ 'paper' => Book::class, 'ebook' => DigitalBook::class, ])] protected Book $book; protected float $discountRate; }
Now, if a Sale
object is serialized it will look like this:
{ "book": { "type": "ebook", "title": "Thinking Functionally in PHP", "bytes": 45000 }, "discountRate": 0.2 }
Static maps have the advantage of simplicity and not polluting the output with PHP-specific implementation details. The downside is that they are static: They can only handle the classes you know about at code time, and will throw an exception if they encounter any other class.
Type maps on collections
Type Maps may also be applied to array properties, either sequence or dictionary. In that case, they will apply to all values in that collection. For example:
use Crell\Serde\Attributes as Serde; class Order { protected string $orderId; #[Serde\SequenceField(arrayType: Book::class)] #[Serde\StaticTypeMap(key: 'type', map: [ 'paper' => Book::class, 'ebook' => DigitalBook::class, ])] protected array $books; }
$products
is an array of objects that implement Book
, but could be either PaperBook
or DigitalBook
. A serialized copy of this object may look like:
{ "orderId": "abc123", "products": [ { "type": "ebook", "title": "Thinking Functionally in PHP", "bytes": 45000 }, { "type": "paper", "title": "Category Theory for Programmers", "pages": 335 } ] }
On deserialization, the type
property will again be used to determine the class that the rest of the properties should be hydrated into.
Type mapped classes
In addition to putting a type map on a property, you may also place it on the class or interface that the property references.
use Crell\Serde\Attributes\StaticTypeMap; #[StaticTypeMap(key: 'type', map: [ 'paper' => Book::class, 'ebook' => DigitalBook::class, ])] interface Book {}
Now, that Type Map will apply to both Sale::$book
and to Order::$books
with no further work on our part.
Type Maps also inherit. That means we can put a type map on Product
instead if we wanted:
use Crell\Serde\Attributes\StaticTypeMap; #[StaticTypeMap(key: 'type', map: [ 'paper' => Book::class, 'ebook' => DigitalBook::class, 'toy' => Gadget::class, ])] interface Product {}
And both Sale
and Order
will still serialize with the appropriate key.
Dynamic type maps
Type Maps may also be provided directly to the Serde object when it is created. Any object that implements TypeMap
may be used. This is most useful when the list of possible classes is dynamic based on user configuration, database values, what plugins are installed in your application, etc.
use Crell\Serde\TypeMap; class ProductTypeMap implements TypeMap { public function __construct(protected readonly Connection $db) {} public function keyField(): string { return 'type'; } public function findClass(string $id): ?string { return $this->db->someLookup($id); } public function findIdentifier(string $class): ?string { return $this->db->someMappingLogic($class); } } $typeMap = new ProductTypeMap($dbConnection); $serde = new SerdeCommon(typeMaps: [ Your\App\Product::class => $typeMap, ]); $json = $serde->serialize($aBook, to: 'json');
In practice, you would likely set that up via your Dependency Injection system.
Note that ClassNameTypeMap
and StaticTypeMap
may be injected as well, as can any other class that implements TypeMap
.
Custom type maps
You may also write your own Type Maps as attributes. The only requirements are:
- The class implements the
TypeMap
interface. - The class is marked as an #[\Attribute].
- The class is legal on both classes and properties. That is,
#[\Attribute(\Attribute::TARGET_CLASS | \Attribute::TARGET_PROPERTY)]
Scopes
Serde supports "scopes" for having different versions of an attribute recognized in different contexts.
Any attribute (Field
, TypeMap
, SequenceField
, DictionaryField
, PostLoad
, etc.) may take a scopes
argument, which accepts an array of strings. If specified, that attribute is only valid if serializing or deserializing in that scope. If no scoped attribute is specified, then the behavior will fall back to an unscoped attribute or an omitted attribute.
For example, given this class:
class User { private string $username; #[Field(exclude: true)] private string $password; #[Field(exclude: true)] #[Field(scope: 'admin')] private string $role; }
If you serialize it like so:
$json = $serde->serialize($user, 'json');
It will result in this JSON response:
{ "username": "Larry" }
That's because, in an unscoped request, the first Field
on $role
is used, which excludes it from the output. However, if you specify a scope:
$json = $serde->serialize($user, 'json', scopes: ['admin']);
Then the admin
version of $role
's Field
will be used, which is not excluded, and get this result:
{ "username": "Larry", "role": "Developer" }
When using scopes, it may be helpful to disable automatic property inclusion and require that each be specified explicitly. For example:
#[ClassSettings(includeFieldsByDefault: false)] class Product { #[Field] private int $id = 5; #[Field] #[Field(scopes: ['legacy'], serializedName: 'label')] private string $name = 'Fancy widget'; #[Field(scopes: ['newsystem'])] private float $price = '9.99'; #[Field(scopes: ['legacy'], serializedName: 'cost')] private float $legacyPrice = 9.99; #[Field(serializedName: 'desc')] private string $description = 'A fancy widget'; private int $stock = 50; }
If serialized with no scope specified, it will result in this:
{ "id": 5, "name": "Fancy widget", "desc": "A fancy widget" }
As those are the only fields that are "in scope" when no scope is specified.
If serialized with the legacy
scope:
{ "id": 5, "label": "Fancy widget", "cost": 9.99, "desc": "A fancy widget" }
The scope-specific Field
on $name
gets used instead, which changes the serialized name. The $legacyPrice
property is also included now, but renamed to "cost".
If serialized with the newsystem
scope:
{ "id": 5, "name": "Fancy widget", "price": "9.99", "desc": "A fancy widget" }
In this case, the $name
property uses the unscoped version of Field
, and so is not renamed. The string-based $price
is now in-scope, but the float-based $legacyPrice
is not. Note that in none of these cases is the current $stock
included, as it has no attribute at all.
Finally, it's also possible to serialize multiple scopes simultaneously. This is an OR operation, so any field marked for any specified scope will be included.
$json = $serde->serialize($product, 'json', scopes: ['legacy', 'newsystem']);
{ "id": 5, "name": "Fancy widget", "price": "9.99", "cost": 9.99, "desc": "A fancy widget" }
Note that since there is both an unscoped and a scoped version of the Field
on $name
, the scoped one wins and the property gets renamed.
If multiple attribute variants could apply for the specified scope, the lexically first in a scope will take precedence over later ones, and a scoped attribute will take precedence over an unscoped one.
Note that when deserializing, specifying a scope will exclude not only out-of-scope properties but their defaults as well. That is, they will not be set, even to a default value, and so may be "uninitialized." That is rarely desirable, so it may be preferable to deserialize without a scope, even if a value was serialized with a scope. That will depend on your use case.
For more on scopes, see the AttributeUtils documentation.
Validation with #[PostLoad]
It is important to note that when deserializing, __construct()
is not called at all. That means any validation present in the constructor will not be run on deserialization.
Instead, Serde will look for any method or methods that have a #[\Crell\Serde\Attributes\PostLoad]
attribute on them. This attribute takes no arguments other than scopes. After an object is populated, any PostLoad
methods will be invoked with no arguments in lexical order. The main use case for this feature is validation, in which case the method should throw an exception if the populated data is invalid in some way. (For instance, some integer must be positive.)
The visibilty of the method is irrelevant. Serde will call public
, private
, or protected
methods the same. Note, however, that a private
method in a parent class of the class being deserialized to will not get called, as it is not accessible to PHP from that scope.
Extending Serde
Internally, Serde has six types of extensions that work in concert to produce a serialized or deserialized product.
- Type Maps, as discussed above, are optional and translate a class name to a lookup identifier and back.
- A
Exporter
is responsible for pulling values off of an object, processing them if necessary, and then passing them on to a Formatter. This is part of the Serialization pipeline. - A
Importer
is responsible for using a Deformatter to extract data from incoming data and then translate it as necessary to be written to an object. This is part of the Deserialization pipeline. - A
Formatter
is responsible for writing to a specific output format, like JSON or YAML. This is part of the Serialization pipeline. - A
Deformatter
is responsible for reading data off of an incoming format and passing it back to anImporter
. This is part of the Deserialization pipeline. - A
TypeField
is a custom per-type field that can be added to a property to provide more type-specific logic to a corresponding Exporter or Importer. In a sense, they are "extra arguments" to an Exporter or Importer, and if you implement a custom one you will almost certainly implement your own Exporter or Importer to go with it.DateTimeField
,DictionaryField
, andSequenceField
are examples of Type Fields. TypeFields are also Transitive, so they can be put on a custom class to apply to anywhere that class is used on a property (unless overridden locally).
Collectively, Importer
and Exporter
instances are called "handlers."
In general, Importer
s and Exporter
s are PHP-type specific, while Formatter
s and Deformatter
s are serialized-format specific. Custom Importers and Exporters can also declare themselves to be format-specific if they contain format-sensitive optimizations.
Importer
and Exporter
may be implemented on the same object, or not. Similarly, Formatter
and Deformatter
may be implemented together or not. That is up to whatever seems easiest for the particular implementation, and the provided extensions do a little of each depending on the use case.
The interfaces linked above provide more precise explanations of how to use them. In most cases, you would only need to implement a Formatter or Deformatter to support a new format. You would only need to implement an Importer or Exporter when dealing with a specific class that needs extra special handling for whatever reason, such as its serialized representation having little or no relationship with its object representation.
As an example, a few custom handlers are included to deal with common cases.
DateTimeExporter
: This object will translateDateTime
andDateTimeImmutable
objects to and from a serialized form as a string. Specifically, it will use the\DateTimeInterface::RFC3339_EXTENDED
format for the string when serializing. The timestamp will then appear in the serialized output as a normal string. When deserializing, it will accept any datetime format supported byDateTime
's constructor.DateTimeZoneExporter
: This object will translateDateTimeZone
objects to and from a serialized form as a timezone string. That is,DateTimeZone('America/Chicago
)will be represented in the format as the string
America/Chicago`.NativeSerializeExporter
: This object will apply to any class that has a__serialize()
method (when serializing) or__unserialize()
method (when deserializing). These PHP magic methods provide alternate representations of an object intended for use with PHP's nativeserialize()
andunserialize()
methods, but can also be used for any other format. If__serialize()
is defined, it will be invoked and whatever associative array it returns will be written to the selected format as a dictionary. If__unserialize()
is defined, this object will read a dictionary from the incoming data and then pass it to that method on a newly created object, which will then be responsible for populating the object as appropriate. No further processing will be done in either direction.EnumOnArrayImporter
: Serde natively supports PHP Enums and can serialize them as ints or strings as appropriate. However, in the special case of reading from a PHP array format this object will take over and support reading an Enum literal in the incoming data. That allows, for example, a configuration array to include hand-inserted Enum values and still be cleanly imported into a typed, defined object.
Architecture diagrams
Serialization works approximately like this:
And deserialization looks very similar:
In both cases, note that nearly all behavior is controlled by a one-off serializer/deserializer object, not by Serde itself. Serde itself is just a wrapper that configures the context for the runner object.
Dependency Injection configuration
Serde is designed to be usable "out of the box" without any additional setup. However, when included in a larger system it is best to configure it properly via Dependency Injection.
There are three ways you can set up Serde.
- The
SerdeCommon
class includes most available handlers and formatters out of the box, ready to go, although you can add additional ones via the constructor. - The
SerdeBasic
class has no pre-built configuration whatsoever; you will need to provide all Handlers, Formatters, or Type Maps you want yourself, in the order you want them applied. - You may also extend the
Serde
base class itself and create your own custom pre-made configuration, with just the Handlers or Formatters (provided or custom) that you want.
Both SerdeCommon
and SerdeBasic
take four arguments: The ClassAnalyzer
to use, an array of Handlers, an array of Formatters, and an array of Type Maps. If no analyzer is provided, Serde creates a memory-cached Analyzer by default so that it will always work. However, in a DI configuration it is strongly recommended that you configure the Analyzer yourself, with appropriate caching, and inject that into Serde as a dependency to avoid duplicate Analyzers (and duplicate caches). If you have multiple different Serde configurations in different services, it may also be beneficial to make all handlers and formatters services as well and explicitly inject them into SerdeBasic
rather than relying on SerdeCommon
.
Change log
Please see CHANGELOG for more information on what has changed recently.
Testing
$ composer test
Contributing
Please see CONTRIBUTING and CODE_OF_CONDUCT for details.
Security
If you discover any security related issues, please use the GitHub security reporting form rather than the issue queue.
Credits
Initial development of this library was sponsored by TYPO3 GmbH.
License
The Lesser GPL version 3 or later. Please see License File for more information.