yarri/email-parser

Parses emails, parses them well :)

v0.2 2025-05-30 13:44 UTC

This package is auto-updated.

Last update: 2025-05-30 14:18:04 UTC


README

Build Status

Parses emails, parses them well :)

EmailParser tries to simplify some of the pains in the email parsing process:

  • All headers, text/plain and text/html parts (which are not attachments) and filenames of attachments are converted into UTF-8 encoding.
  • In these components, illegal UTF-8 characters are replaced.
  • Attachment filenames are properly sanitized.
  • Email can be parsed by its source or by its filename.
  • The email source file can be gzipped.
  • EmailParser itself determines mime types of all attachments found in the message.
  • An email sent as an attachment in another email can be easily accessed by calling $part->getAttachedEmail().
  • A caching mechanism is built-in in EmailParser.
  • Contents of attachments can be accessd via StringBuffer which has a positive impact on memory consumption.

Usage

$parser = new \Yarri\EmailParser();

// Parsing email
$email = $parser->parse($email_content);
// or
$email = $parser->parseFile("/path/to/email.eml");
// or
$email = $parser->parseFile("/path/to/email.eml.gz");

// Getting headers
$email->getSubject();
$email->getFrom();
$email->getTo();
$email->getDate(); // returns date in the ISO format (YYYY-mm-dd H:i:s) in the current timezone (set via date_default_timezone_set()); e.g. "2025-05-25 12:40:22"
$email->getHeader("Date"); // e.g. "Sun, 25 May 2025 06:37:33 +0200 (CEST)"
$email->getHeader("Return-Path");
$email->getHeader("Subject"); // same as $email->getSubject()
$email->getHeader("Received"); // returns string
$email->getHeader("Received",["as_array" => true]); // returns array of strings
$email->hasAttachment(); // true of false

// Displaying the message
$part = $email->getFirstReadablePart();
// or
$part = $email->getFirstReadablePart(["prefer_html" => true]);
//
header(sprintf(
  "Content-Type: %s; charset=%s",
  $part->getMimeType(), // "text/plain" or "text/html"
  $part->getCharset() // always "UTF-8"
));
echo $part->getContent();

// Traversing email structure
$parts = $email->getParts();
foreach($parts as $part){
  $id = $part->getId(); // 1,2,3...
  $level = $part->getLevel(); // 1,2,3..
  $padding = str_repeat(" ",$level); // " ","  ","   "...
  $mime_type = $part->getMimeType();
  if($part->hasContent()){
    $content_info = $part->getSize()." bytes";
    if($part->getFilename()){
      $content_info .= ", ".$part->getFilename();
    }
  }else{
    $content_info = "no content";
  }
  echo "$id.$padding$mime_type ($content_info)\n";
}

// Something like this can be printed:
/*
1. multipart/related (no content)
2.  multipart/alternative (no content)
3.   text/plain (55 bytes)
4.   text/html (107 bytes)
5.  image/png (11462 bytes, dungeon-master.png)
6.  image/jpeg (9123 bytes, pigeon.jpg)
// */

// Getting parts
$part = $email->getPartById(5);
$part->isAttachment(); // true
$part->getMimeType(); // "image/png"
$part->getContent(); // binary content

// Email sent as an attachment
$email = $parser->parseFile("/path/to/email_with_message_rfc822_part.eml");

$parts = $email->getParts();
$part_message_rfc822 = $parts[2]; // for instance the 3rd part is message/rfc822, i.e. an attached email
$part_message_rfc822->isAttachedEmail(); // true
$attached_email = $part_message_rfc822->getAttachedEmail();

$attached_email->getSubject();
$attached_email->getFrom();
$attached_email->getTo();
$attached_email_parts = $attached_email->getParts();
// etc.

// Caching mechanism
// (you are responsible for providing specific cache path for every email you want to parse)
$email = $parser->parse($email_1_content,"/path/to/cache/for_email_1/");
// or
$email = $parser->parseFile("/path/to/email_2.eml","/path/to/cache/for_email_2/");
// or
$email = $parser->parseFile("/path/to/email_3.eml.gz","/path/to/cache/for_email_3/");

// Displaying attachment via StringBuffer which is memory more efficient
// (only takes effect when caching is active)
header(sprintf('Content-Type: %s',$part->getMimeType());
header(sprintf('Content-Disposition: attachment; filename="%s"',$part->getFilename()));
$buffer = $part->getContentBuffer();
$buffer->printOut();

Installation

Just use the Composer:

composer require yarri/email-parser

Testing

The EmailParser is tested automatically using Travis CI in PHP 7.0 to PHP 8.4.

For the tests execution, the package atk14/tester is used. It is just a wrapping script for phpunit/phpunit.

Install required dependencies for development:

composer update --dev

Run tests:

cd test
../vendor/bin/run_unit_tests

License

EmailParser is free software distributed under the terms of the MIT license