v1.1.0 2024-04-04 20:02 UTC

Last update: 2024-04-04 20:07:21 UTC


A PHP HTML to pure text transformer that beautifully handles various and malformed HTML.

Hypertext is excellent at pulling text content out of any HTML based document and automatically:

  • Removes CSS
  • Removes scripts
  • Removes headers
  • Removes non-HTML based content
  • Preserves spacing
  • Preserves links (optional)
  • Preserves new lines (optional)

It is directed at using the output in LLM related tasks, such as prompts and embeddings.


composer require stevebauman/hypertext


use Stevebauman\Hypertext\Transformer;

$transformer = new Transformer();

// (Optional) Filter out specific elements by their XPath.

// (Optional) Retain new line characters.

// (Optional) Retain anchor tags and their href attribute.

$text = $transformer->toText($html);


For larger examples, please view the tests/Fixtures directory.


<!DOCTYPE html>
<html lang="en">
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>My Blog</title>
    <h1>Welcome to My Blog</h1>
    <p>This is a paragraph of text on my webpage.</p>
    <a href="">Click here</a> to view my posts.

Output (Pure Text):

echo (new Transformer)->toText($html);
Welcome to My Blog This is a paragraph of text on my webpage. Click here to view my posts.

Output (Keep New Lines):

echo (new Transformer)->keepNewLines()->toText($html);
Welcome to My Blog
This is a paragraph of text on my webpage.
Click here to view my posts.

Output (Keep Links):

echo (new Transformer)->keepLinks()->toText($html);
Welcome to My Blog This is a paragraph of text on my webpage. <a href="">Click Here</a> to view my posts.

Output (Keep Both):

echo (new Transformer)
Welcome to My Blog
This is a paragraph of text on my webpage.
<a href="">Click Here</a> to view my posts.