willdurand / email-reply-parser
Port of the cool GitHub's EmailReplyParser library in PHP
Installs: 4 506 191
Dependents: 14
Suggesters: 2
Security: 0
Stars: 640
Watchers: 29
Forks: 79
Open Issues: 10
Requires
- php: >=7.4.0
Requires (Dev)
- symfony/phpunit-bridge: ^5.0
README
EmailReplyParser is a PHP library for parsing plain text email content, based on GitHub's email_reply_parser library written in Ruby.
Installation
The recommended way to install EmailReplyParser is through Composer:
composer require willdurand/email-reply-parser
Usage
Instantiate an EmailParser
object and parse your email:
<?php use EmailReplyParser\Parser\EmailParser; $email = (new EmailParser())->parse($emailContent);
You get an Email
object that contains a set of Fragment
objects. The Email
class exposes two methods:
getFragments()
: returns all fragments;getVisibleText()
: returns a string which represents the content considered as "visible".
The Fragment
represents a part of the full email content, and has the
following API:
<?php $fragment = current($email->getFragments()); $fragment->getContent(); $fragment->isSignature(); $fragment->isQuoted(); $fragment->isHidden(); $fragment->isEmpty();
Alternatively, you can rely on the EmailReplyParser
to either parse an email
or get its visible content in a single line of code:
$email = \EmailReplyParser\EmailReplyParser::read($emailContent); $visibleText = \EmailReplyParser\EmailReplyParser::parseReply($emailContent);
Known Issues
Quoted Headers
Quoted headers aren't picked up if there's an extra line break:
On <date>, <author> wrote:
> blah
Also, they're not picked up if the email client breaks it up into multiple lines. GMail breaks up any lines over 80 characters for you.
On <date>, <author>
wrote:
> blah
The above On ....wrote:
can be cleaned up with the following regex:
$fragment_without_date_author = preg_replace( '/\nOn(.*?)wrote:(.*?)$/si', "", $fragment->getContent() );
Note though that we're search for "on" and "wrote". Therefore, it won't work with other languages.
Possible solution: Remove "reply@reply.github.com" lines...
Weird Signatures
Lines starting with -
or _
sometimes mark the beginning of
signatures:
Hello
--
Rick
Not everyone follows this convention:
Hello
Mr Rick Olson
Galactic President Superstar Mc Awesomeville
GitHub
**********************DISCLAIMER***********************************
* Note: blah blah blah *
**********************DISCLAIMER***********************************
Strange Quoting
Apparently, prefixing lines with >
isn't universal either:
Hello
--
Rick
________________________________________
From: Bob [reply@reply.github.com]
Sent: Monday, March 14, 2011 6:16 PM
To: Rick
Unit Tests
Setup the test suite using Composer:
$ composer install
Run it using PHPUnit:
$ ./vendor/bin/simple-phpunit
Contributing
See CONTRIBUTING file.
Credits
- GitHub
- William Durand
License
EmailReplyParser is released under the MIT License. See the bundled LICENSE file for details.