paulyg/wikicreole

A library to convert Creole, a common & standardized wiki markup, into HTML.

0.1.5 2012-07-13 02:46 UTC

This package is not auto-updated.

Last update: 2024-05-11 10:41:30 UTC


README

Creole is a standardized specification for wiki markup syntax that attempts to be as universal as possible across all of the different wiki engines out there (WikiMedia, MoinMoin, etc). The 1.0 stable version of the specification was released in 2008. You can find out more about Creole, including syntax cheat sheets and the history behind the project at the website http://wikiceole.org.

The intent of this library is provide a general pupose Wiki Creole-to-(X)HTML converter that can be uses in any CMS, wiki, blog, etc software. The library is implemented as a single class. An MIT license allows it's reuse in just about any type of software.

Motivation

I began using Creole syntax inside of TiddlyWiki, a personal wiki written in Javascript. I find the Creole syntax to be the most natural and intuitive wiki syntax out there. I prefer it to Markdown (despite it's popularity these days), maybe just because I am so used it it. I wrote this class/library because I wanted to use Creole markup in a CMS I am (very slowly) writing. At the time the only other general purpose Wiki Creole parser written in PHP was Pear_Text_Wiki. I decided to write my own that used PHP5 class structure and did not depend on PEAR. Later on I found out about the little known ezcDocumentCreoleWiki.

Requirements

  • PHP 5.2.4 or greater (I use htmlspecialchar's fourth $double_encode argument to keep > from turning into >).
  • PCRE enabled (it usually is)

Basic Usage

include 'WikiCreole.php';
$parser = new WikiCreole(array(
    'urlBase' => '/wiki/',
    'imgBase' => '/media/',
    ),
    $list_of_existing_page_slugs
);
$html = $parser->parse($wiki_markup);

Features

  • Parses everything in the Creole 1.0 spec.
  • Parses creole additions for --strikethrough--, __underline__, and ##monospace##. See discussion below about these tags.
  • Automatic prepending of a base URL or path to link or image slug. See urlBase and imgBase option keys below.
  • Automatic conversion of slugs to links to other wiki pages. This removes spaces and other characters not allowed in a URL path.
  • Control over markup applied to all link tags, in case you would like to style links internal to and external to your site differently. See the linkFormat* option keys below.
  • Macros. Run a user defined function on text inside the macro markup tag <<macro>>.

Setting Options

The class requires an array of options to be passed as the first argument in it's constructor. The options array follows a 'keyname' => 'value' format. There are currently six valid key names.

  • urlBase: A URL string that will be prepended to any wiki links, link tags where the target is just a page name and is not a full URL with scheme & host name. An empty string will prepend nothing. You can pass a path for absolute path URLs or a scheme + host name + optionally a path for full absolute URLs.
  • imgBase: A URL string that will be prepended to any wiki images, image tags where the target is just a filename or path & filename, not a full URL with scheme & host name. The same rules for absolute paths & full URLs as urlBase apply.
  • linkFormatExternal: Specifies a format to use when creating <a> tags for link tags that have a full URL. This is most usefull if you want to apply a special class to external links so they are formatted differently. The format string will be passed to PHP's sprintf function. The format can use two numbered placeholders: %1$s - The URL, %2$s - The text part of the link. The default format is <a href="%1$s" class="external">%2$s</a>. As an alternative you can pass a valid PHP callback (Closures work here if you are on PHP 5.3) and your function will be called with the URL as the first argument and the text as the second.
  • linkFormatInternal: Same as linkFormatExternal except this format will be used for all links to other wiki/blog/CMS pages, where just the page title was supplied. The default format is <a href="%1$s">%2$s</a>. You can also pass a PHP callback or Closure here.
  • linkFormatMissing: Same as linkFormatInternal except if you supply a list of existing pages and the page in the link tag does not yet exist, this format will be applied. This is especially usefull for specifcally styling pages that are in need of being created. The default format is <a href="%1$s" class="notcreated" title="This wiki page does not exist yet. Click to create it.">%2$s</a>. You can also pass a PHP callback or Closure here.
  • linkFormatFree: is applied to URLS which appear in a body of text but are not contained in a tag. Unlike the above three formats this one is a format string for preg_replace. There is also only one placeholder, $0, though it can be repeated. The default format is <a href="$0" class="external">$0</a>. Callback and Closures don't work here yet.
  • debug: Off by default. Allows extra information to be printed in case parsing goes wrong. If the parsing of a block level element goes wrong you will get back text with placeholders instead of that block level element. Setting debug to true will print out all of the strings which could not be remerged at the end of the HTML string inside a PRE tag. This options is really only useful to developers and hopefully not needed much anymore.

Any of the above options can also be set with the setOption($key, $value) method.

A second, optional, constructor argument $list_of_existing_page_slugs accepts an array containing a list of existing pages. The values of the array (keys are ignored) must be just the URL slug for a page after any illegal characters have stripped and formatting been applied (i.e. spaces to dashes). See the linkCallback() method and $url_special_chars property for more info on the URL formatting. I realize different software packages may have different rules for URL formatting/preparation. This is something that may become pluggable in the future.

Known issues

  • Multi-line list items do not work.
  • Putting }}} inside a no wiki tag will trip up the parser.

Creole Additions

  • I like to use strikethrough so I added --strikethrough-- and then also __underline__ as markup formats. The rendered HTML uses a <span> tag with style applied to be compatible with HTML4/XHTML1 Strict. Note however that <s> and <u> are valid HTML5 tags. I may add a flag to toggle between these two options in the future.
  • I've also implemented the ##monospace## addition which wraps the text in <code> tags (this was supposed to be <tt> but that was dropped in HTML5).

Block Level Macros

You register your marco by calling the registerMacro($macroName, $callback) method. Again the $callback can be any valid PHP callable including a Closure. The parser will match a block level macro with the following sytax:

<<macroName arg1 arg2
This is some paragraph text.
>>

Arguments for your macro function should be separated by spaces and must appear on the first line. The body of the macro follows on the next lines until the closing marker >> is found. When your macro is called the arguments will be passed in the order they were presented. The body will always be last. Whatever your function returns will be placed in the markup in place of the macro tag. Inline macros are next on my development list.

Plans for the future

  • Fix the stuff in known issues.
  • Inline macros.
  • Possible automatic creation of anchors on headings (this is a common usage pattern).
  • Implement a strict/non-strict mode that would toggle between HTML4/XHTML1 Strict compliance and tags allowed in HTML 5.
  • Pluggable formatting for free link.
  • Pluggable transformation of plain text wiki links into page slugs.