butschster / proto-parser
Proto parser is a library for parsing Protocol Buffers files into AST
Installs: 366
Dependents: 0
Suggesters: 0
Security: 0
Stars: 47
Watchers: 1
Forks: 3
Open Issues: 2
pkg:composer/butschster/proto-parser
Requires
- php: >=8.3
- phplrt/runtime: ^3.6
Requires (Dev)
- buggregator/trap: ^1.10
- mockery/mockery: ^1.6
- phplrt/phplrt: ^3.6
- phpunit/phpunit: ^11.3
README
A powerful and flexible Protocol Buffers parser for PHP, capable of generating an Abstract Syntax Tree (AST) from
.proto files. This package provides a robust foundation for working with Protocol Buffers in PHP projects.
Key Features
- Full support for Protocol Buffer syntax (proto3)
- Parses messages, enums, services, and RPC definitions
- Handles imports, packages, and options
- Preserves comments for documentation purposes
- Generates an Abstract Syntax Tree (AST) for easy manipulation
- Built with PHP 8.3
Use Cases
- Generating code from protobuf definitions
- Analyzing and validating protobuf schemas
- Building tools for working with protocol buffers
- Integrating protobuf support into existing PHP applications
Installation
Install the package via Composer:
composer require butschster/proto-parser
Usage
Here's a quick example of how to use the Proto Parser:
use Butschster\ProtoParser\ProtoParserFactory; $parser = ProtoParserFactory::create(); $ast = $parser->parse(<<<'PROTO' syntax = "proto3"; package example; message Person { string name = 1; int32 id = 2; string email = 3; } PROTO); // Now you can work with the AST echo $ast->package->name; // Outputs: example echo $ast->topLevelDefs[0]->name; // Outputs: Person
Understanding AST (Abstract Syntax Tree)
An Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code. In the context of this package, the AST represents the structure of a Protocol Buffer definition file.
Here's an example of what an AST looks like for a simple .proto file:
syntax = "proto3"; package example; message Person { string name = 1; int32 id = 2; string email = 3; }
The corresponding AST would look something like this:
ProtoNode
├── SyntaxDeclNode (syntax: "proto3")
├── PackageDeclNode (name: "example")
└── MessageDefNode (name: "Person")
├── FieldDeclNode (name: "name", type: "string", number: 1)
├── FieldDeclNode (name: "id", type: "int32", number: 2)
└── FieldDeclNode (name: "email", type: "string", number: 3)
Each node in the AST corresponds to a specific element in the Protocol Buffer definition. This structure allows for easy traversal and manipulation of the Protocol Buffer structure programmatically.
AST Class Diagram
The following class diagram provides an overview of the AST node classes available in the Protobuf parser. This diagram shows the relationships between different node classes and their properties.
classDiagram
ProtoNode *-- SyntaxDeclNode
ProtoNode *-- "0..*" ImportDeclNode
ProtoNode *-- "0..1" PackageDeclNode
ProtoNode *-- "0..*" OptionDeclNode
ProtoNode *-- "0..*" MessageDefNode
ProtoNode *-- "0..*" EnumDefNode
ProtoNode *-- "0..*" ServiceDefNode
MessageDefNode *-- "0..*" FieldDeclNode
MessageDefNode *-- "0..*" ReservedNode
MessageDefNode *-- "0..*" MessageDefNode
MessageDefNode *-- "0..*" EnumDefNode
MessageDefNode *-- "0..*" CommentNode
EnumDefNode *-- "0..*" EnumFieldNode
EnumDefNode *-- "0..*" CommentNode
ServiceDefNode *-- "0..*" RpcDeclNode
ServiceDefNode *-- "0..*" OptionDeclNode
ServiceDefNode *-- "0..*" CommentNode
FieldDeclNode *-- FieldType
FieldDeclNode *-- "0..1" FieldModifier
FieldDeclNode *-- "0..*" OptionDeclNode
FieldDeclNode *-- "0..*" CommentNode
EnumFieldNode *-- "0..*" OptionDeclNode
EnumFieldNode *-- "0..*" CommentNode
RpcDeclNode *-- RpcMessageType
RpcDeclNode *-- "0..*" OptionDeclNode
RpcDeclNode *-- "0..*" CommentNode
OptionDeclNode *-- "0..*" OptionNode
OptionDeclNode *-- "0..*" CommentNode
class ProtoNode {
+SyntaxDeclNode syntax
+ImportDeclNode[] imports
+PackageDeclNode? package
+OptionDeclNode[] options
+MessageDefNode[] messages
+EnumDefNode[] enums
+ServiceDefNode[] services
}
class MessageDefNode {
+string name
+FieldDeclNode[] fields
+ReservedNode[] reserved
+MessageDefNode[] nestedMessages
+EnumDefNode[] nestedEnums
+CommentNode[] comments
}
class EnumDefNode {
+string name
+EnumFieldNode[] fields
+CommentNode[] comments
}
class ServiceDefNode {
+string name
+RpcDeclNode[] methods
+OptionDeclNode[] options
+CommentNode[] comments
}
class FieldDeclNode {
+FieldType type
+string name
+int number
+FieldModifier? modifier
+OptionDeclNode[] options
+CommentNode[] comments
}
class EnumFieldNode {
+string name
+int number
+OptionDeclNode[] options
+CommentNode[] comments
}
class RpcDeclNode {
+string name
+RpcMessageType inputType
+RpcMessageType outputType
+OptionDeclNode[] options
+CommentNode[] comments
}
class SyntaxDeclNode {
+string syntax
+CommentNode[] comments
}
class ImportDeclNode {
+string path
+ImportModifier? modifier
+CommentNode[] comments
}
class PackageDeclNode {
+string name
+CommentNode[] comments
}
class OptionDeclNode {
+string? name
+OptionNode[] options
+CommentNode[] comments
}
class CommentNode {
+string text
}
class FieldType {
+string type
}
class RpcMessageType {
+string name
+bool isStream
}
class OptionNode {
+string name
+mixed value
+CommentNode[] comments
}
class ReservedNode {
+ReservedRange[]|ReservedNumber[] ranges
+CommentNode[] comments
}
Loading
Detailed Usage Examples
Parsing Messages
$ast = $parser->parse(<<<'PROTO' syntax = "proto3"; message Person { string name = 1; int32 id = 2; string email = 3; enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { string number = 1; PhoneType type = 2; } repeated PhoneNumber phones = 4; } PROTO); $personMessage = $ast->topLevelDefs[0]; echo $personMessage->name; // Outputs: Person foreach ($personMessage->fields as $field) { echo "{$field->name}: {$field->type->type}\n"; } $phoneTypeEnum = $personMessage->enums[0]; echo $phoneTypeEnum->name; // Outputs: PhoneType $phoneNumberMessage = $personMessage->messages[0]; echo $phoneNumberMessage->name; // Outputs: PhoneNumber
Parsing Services
$ast = $parser->parse(<<<'PROTO' syntax = "proto3"; service Greeter { rpc SayHello (HelloRequest) returns (HelloReply) {} rpc SayHelloAgain (HelloRequest) returns (HelloReply) {} } message HelloRequest { string name = 1; } message HelloReply { string message = 1; } PROTO); $service = $ast->topLevelDefs[0]; echo $service->name; // Outputs: Greeter foreach ($service->methods as $method) { echo "{$method->name}: {$method->inputType->name} -> {$method->outputType->name}\n"; }
Handling Imports and Options
$ast = $parser->parse(<<<'PROTO' syntax = "proto3"; import "google/protobuf/timestamp.proto"; import public "other.proto"; package mypackage; option java_package = "com.example.foo"; option optimize_for = SPEED; message MyMessage { string my_string = 1; google.protobuf.Timestamp time = 2; } PROTO); foreach ($ast->imports as $import) { echo "Import: {$import->path} (Modifier: {$import->modifier})\n"; } echo "Package: {$ast->package->name}\n"; foreach ($ast->options as $option) { echo "Option: {$option->name} = {$option->options[0]->value}\n"; }
Example: Generating DTO Classes
One powerful use case for the AST is generating Data Transfer Object (DTO) classes from Protocol Buffer definitions. You
can use a code generator (like nette/php-generator) to create PHP classes based on the AST.
Here's an example of how you might use the AST to generate a PHP class:
use Butschster\ProtoParser\ProtoParser; use Butschster\ProtoParser\ProtoParserFactory; use Nette\PhpGenerator\ClassType; use Nette\PhpGenerator\PhpFile; $parser = ProtoParserFactory::create(); $ast = $parser->parse(<<<'PROTO' syntax = "proto3"; message Person { string name = 1; int32 age = 2; repeated string hobbies = 3; } PROTO, ); // Generate PHP class $file = new PhpFile(); $file->setStrictTypes(); $namespace = $file->addNamespace('App\Dto'); $class = $namespace->addClass('Person'); $class->setFinal() ->setReadOnly(); foreach ($ast->topLevelDefs[0]->fields as $field) { $type = match ($field->type->type) { 'string' => 'string', 'int32' => 'int', default => 'mixed', }; $property = $class->addProperty($field->name) ->setType($type) ->setPublic(); if ($field->modifier === \Butschster\ProtoParser\Ast\FieldModifier::Repeated) { $property->setType('array'); } } echo $file;
This would generate a PHP class like this:
<?php declare(strict_types=1); namespace App\Dto; final readonly class Person { public string $name; public int $age; public array $hobbies; }
There are many possibilities for code generation using the AST. More use cases:
-
Generate DTOs with Validation Attributes: You can use the AST to generate DTOs with built-in validation using Symfony Validator attributes.
-
Generate OpenAPI Schema: By parsing the service RPC options, you can automatically generate OpenAPI (Swagger) schema documentation for your API endpoints, keeping your API documentation in sync with your protobuf definitions.
-
Generate Clean JSON-Serializable DTOs: Create DTOs that are optimized for JSON serialization, ensuring efficient data transfer between your PHP application and other systems or frontends.
-
Static Analysis Tool Integration: Use the AST to create custom PHPStan or Psalm rules that can analyze your protobuf usage within your PHP codebase, enhancing type safety and catching potential issues early.
-
Code Migration Assistance: Leverage the AST to help automate code migrations when your protobuf definitions change, ensuring that your PHP code stays in sync with evolving protobuf schemas.
AST Node Classes
This document provides an overview of the available node classes in the Protobuf parser. These classes represent different elements of a Protobuf definition file.
ProtoNode
Represents the root node of a Protobuf file.
final readonly class ProtoNode implements NodeInterface { public function __construct( public SyntaxDeclNode $syntax, public array $imports = [], public ?PackageDeclNode $package = null, public array $options = [], public array $topLevelDefs = [], ) {} }
$syntax: The syntax declaration of the Protobuf file.$imports: An array ofImportDeclNodeobjects representing imported files.$package: An optionalPackageDeclNoderepresenting the package declaration.$options: An array ofOptionDeclNodeobjects representing file-level options.$topLevelDefs: An array of top-level definitions (messages, enums, services).
SyntaxDeclNode
Represents the syntax declaration of a Protobuf file.
final readonly class SyntaxDeclNode implements NodeInterface { public function __construct( public string $syntax, public array $comments = [], ) {} }
$syntax: The syntax version (e.g., "proto3").$comments: An array ofCommentNodeobjects associated with the syntax declaration.
ImportDeclNode
Represents an import statement in a Protobuf file.
final readonly class ImportDeclNode implements NodeInterface { public function __construct( public string $path, public ?ImportModifier $modifier = null, public array $comments = [], ) {} }
$path: The path of the imported file.$modifier: An optionalImportModifierenum value (Weak or Public).$comments: An array ofCommentNodeobjects associated with the import statement.
PackageDeclNode
Represents a package declaration in a Protobuf file.
final readonly class PackageDeclNode implements NodeInterface { public function __construct( public string $name, public array $comments = [], ) {} }
$name: The name of the package.$comments: An array ofCommentNodeobjects associated with the package declaration.
OptionDeclNode
Represents an option declaration in a Protobuf file.
final readonly class OptionDeclNode implements NodeInterface { public function __construct( public ?string $name, public array $comments = [], public array $options = [], ) {} }
$name: The name of the option.$comments: An array ofCommentNodeobjects associated with the option.$options: An array ofOptionNodeobjects representing nested options.
MessageDefNode
Represents a message definition in a Protobuf file.
final readonly class MessageDefNode implements NodeInterface { public function __construct( public string $name, public array $fields = [], public array $messages = [], public array $enums = [], public array $comments = [], ) {} }
$name: The name of the message.$fields: An array ofFieldDeclNodeobjects representing the message fields.$messages: An array of nestedMessageDefNodeobjects.$enums: An array ofEnumDefNodeobjects defined within the message.$comments: An array ofCommentNodeobjects associated with the message.
FieldDeclNode
Represents a field declaration within a message.
final readonly class FieldDeclNode implements NodeInterface { public function __construct( public FieldType $type, public string $name, public int $number, public ?FieldModifier $modifier = null, public array $options = [], public array $comments = [], ) {} }
$type: AFieldTypeobject representing the type of the field.$name: The name of the field.$number: The field number.$modifier: An optionalFieldModifierenum value (Required, Optional, or Repeated).$options: An array ofOptionDeclNodeobjects representing field options.$comments: An array ofCommentNodeobjects associated with the field.
EnumDefNode
Represents an enum definition in a Protobuf file.
final readonly class EnumDefNode implements NodeInterface { public function __construct( public string $name, public array $fields = [], public array $comments = [], ) {} }
$name: The name of the enum.$fields: An array ofEnumFieldNodeobjects representing the enum values.$comments: An array ofCommentNodeobjects associated with the enum.
EnumFieldNode
Represents a field within an enum definition.
final readonly class EnumFieldNode implements NodeInterface { public function __construct( public string $name, public int $number, public array $options = [], public array $comments = [], ) {} }
$name: The name of the enum value.$number: The numeric value associated with the enum field.$options: An array ofOptionDeclNodeobjects representing field options.$comments: An array ofCommentNodeobjects associated with the enum field.
ServiceDefNode
Represents a service definition in a Protobuf file.
final readonly class ServiceDefNode implements NodeInterface { public function __construct( public string $name, public array $methods = [], public array $options = [], public array $comments = [], ) {} }
$name: The name of the service.$methods: An array ofRpcDeclNodeobjects representing the service methods.$options: An array ofOptionDeclNodeobjects representing service options.$comments: An array ofCommentNodeobjects associated with the service.
RpcDeclNode
Represents an RPC (Remote Procedure Call) declaration within a service.
final readonly class RpcDeclNode implements NodeInterface { public function __construct( public string $name, public RpcMessageType $inputType, public RpcMessageType $outputType, public array $options = [], public array $comments = [], ) {} }
$name: The name of the RPC method.$inputType: AnRpcMessageTypeobject representing the input message type.$outputType: AnRpcMessageTypeobject representing the output message type.$options: An array ofOptionDeclNodeobjects representing RPC options.$comments: An array ofCommentNodeobjects associated with the RPC.
MapFieldDeclNode
Represents a map field declaration within a message.
final readonly class MapFieldDeclNode implements NodeInterface { public function __construct( public FieldType $keyType, public FieldType $valueType, public string $name, public int $number, public array $options = [], public array $comments = [], ) {} }
$keyType: AFieldTypeobject representing the key type of the map.$valueType: AFieldTypeobject representing the value type of the map.$name: The name of the map field.$number: The field number.$options: An array ofOptionDeclNodeobjects representing field options.$comments: An array ofCommentNodeobjects associated with the map field.
OneofFieldNode
Represents a field within a oneof group.
final readonly class OneofFieldNode implements NodeInterface { public function __construct( public FieldType $type, public string $name, public int $number, public array $options = [], ) {} }
$type: AFieldTypeobject representing the type of the field.$name: The name of the field.$number: The field number.$options: An array ofOptionDeclNodeobjects representing field options.
CommentNode
Represents a comment in the Protobuf file.
final readonly class CommentNode { public function __construct( public string $text, ) {} }
$text: The text content of the comment.
ReservedNode
Represents a reserved statement in a message or enum.
final readonly class ReservedNode implements NodeInterface { public function __construct( public array $ranges, ) {} }
$ranges: An array ofReservedRangeorReservedNumberobjects representing the reserved fields or numbers.
OptionNode
Represents an individual option within an option declaration.
final readonly class OptionNode { public function __construct( public string $name, public mixed $value, public array $comments = [], ) {} }
$name: The name of the option.$value: The value of the option (can be a mixed type or anotherOptionDeclNode).$comments: An array ofCommentNodeobjects associated with the option.
Contributing
Contributions are welcome! Please follow these guidelines when contributing to the project:
- Fork the repository and create your branch from
master. - Ensure that your code follows the PSR-12 coding standard.
- If you're modifying the parser grammar:
- Edit the
ebnf.pp2file to make your changes. - Run the unit tests to regenerate the
src/grammar.phpfile. - Ensure that all existing tests pass and add new tests to cover your changes.
- Edit the
- Update the documentation in the README if you're adding or changing functionality.
- Write clear and concise commit messages.
- Submit a pull request with a detailed description of your changes.
License
This project is licensed under the MIT License.