caco / un-markdown
A PHP library which removes markdown "formatting" from text.
Requires (Dev)
- devster/ubench: ^2.1
- erusev/parsedown: ^1.7
- phpunit/phpunit: ^8.5
This package is auto-updated.
Last update: 2024-11-08 09:06:33 UTC
README
A simple PHP library to convert Markdown back to plain text.
The purpose of this lib is to convert markdown to plain text for e.g. chat notificationsβ¦.
- Does not loose the whole text structure as it would happen with the following call
strip_tags(Parsedown::instance()->text('β¦'))
- It is more performant than using a full featured markdown parser with AST support.
- Decorate some content with prefixes e.g.
- π for links
- π¬ for comments
- β’ for unordered list items.
- ποΈ Uses only regular expression for text transformation.
- Provides a good level (Not 100% π€·ββοΈ) of compliance with GitHub Flavored Markdown Spec
- Testdriven with over 65 unit tests πͺ and more than 370 assertions.
Usage
The basic usage is as simple as using a common markdown parser lib.
$markdownRemover = new MarkdownRemover(); echo $markdownRemover->strip('Hello **World**');
Would produce Hello World
.
You can change the prefixes easily when constructing an instance.
$markdownRemover = new MarkdownRemover('"Link prefix" ', '"Image prefixοΈ" ', '"Comment prefix" ', 'β¦ '); echo $markdownRemover->strip('Wow look at this link [example.com](https://example.com/) isn't it **awesome**?');
Would produce Wow look at this link example.com "Link prefix" https://example.com/ isn't it awesome?
.
You can change specific rules, remove or replace them easily.
$classUnderTest = new MarkdownRemover(); $classUnderTest ->getReplacements()[8] ->setReplace(function ($matches) { return ReEmphasis::toBold($matches[2]); }); $classUnderTest ->getReplacements()[9] ->setReplace(function ($matches) { return ReEmphasis::toItalic($matches[2]); }); $classUnderTest ->getReplacements()[16] ->setReplace(function ($matches) { return ReEmphasis::toMonospaced($matches[1]); }); echo $classUnderTest->strip('**Test** *italic* `replacement`');
Would produce π§π²ππ πͺπ΅π’ππͺπ€ πππππππππππ
;
Transformation example
The following Markdown:
# Headings Heading with `#` or as setext are supported. Alt-H1 (Setext) ====== Alt-H2 (Setext) ------ ## Emphasis, Strong emphasis & Strikethrough Emphasis, aka italics, with *asterisks* or _underscores_. Strong emphasis, aka bold, with **asterisks** or __underscores__. Combined emphasis with **asterisks and _underscores_**. Strikethrough uses two tildes. ~~Scratch this.~~ ### Lists 1. Ordered lists gets 2. passed as they are 4. As you can see the numbering 5. is not correct * Unordered + lists - gets + converted - to the bullet UTF-8 char - [ ] Task - [x] List - [ ] are - [X] supported! #### Links and images [I'm an inline-style link](https://www.google.com) [I'm an inline-style link with title](https://www.google.com "Google's Homepage") [I'm a reference-style link][Arbitrary case-insensitive reference text] [I'm a relative reference to a repository file](../blob/master/LICENSE) [You can use numbers for reference-style link definitions][1] ![alt text](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Logo Title Text 1") ![alt text][logo] ##### Code Inline `code` and block code is supported, too. \`\`\`no-highlight This is a code block, **MD** is ~~not~~ *interpreted*. \`\`\` ###### Blockquotes > Blockquotes are very handy in **email** to emulate reply text. >> This line is part of the same quote. ###### Escaping You can use the \\ character to escape MD. So you can escape the asterisk in strong e.g. \\\* to archive this \*\*Not strong\*\*. ###### Thematic breaks aka <hr> All hr gets stripped, you should not see any chars below this line: --- *** ___ [arbitrary case-insensitive reference text]: https://www.mozilla.org [1]: http://slashdot.org [logo]: https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Logo Title Text 2"
Will be converted to this plain text:
Headings
Heading with # or as setext are supported.
Alt-H1 (Setext)
Alt-H2 (Setext)
Emphasis, Strong emphasis & Strikethrough
Emphasis, aka italics, with asterisks or underscores.
Strong emphasis, aka bold, with asterisks or underscores.
Combined emphasis with asterisks and underscores.
Strikethrough uses two tildes. Scratch this.
Lists
1. Ordered lists gets
2. passed as they are
4. As you can see the numbering
5. is not correct
β’ Unordered
β’ lists
β’ gets
β’ converted
β’ to the bullet UTF-8 char
β’ β Task
β’ β List
β’ β are
β’ β supported!
Links and images
I'm an inline-style link π https://www.google.com
I'm an inline-style link with title π https://www.google.com "Google's Homepage"
I'm a reference-style link π https://www.mozilla.org
I'm a relative reference to a repository file π ../blob/master/LICENSE
You can use numbers for reference-style link definitions π http://slashdot.org
πΌοΈ alt text
πΌοΈ alt text
Code
Inline code and block code is supported, too.
This is a code block, **MD** is ~~not~~ *interpreted*.
Blockquotes
π¬ Blockquotes are very handy in email to emulate reply text.
π¬ This line is part of the same quote.
Escaping
You can use the \ character to escape MD. So you can escape the asterisk in strong e.g. \* to archive this **Not strong**.
Thematic breaks aka <hr>
All hr gets stripped, you should not see any chars below this line: