wiki-connect / parsewiki
A library that helps parse wikitext template data
2.0
2025-07-18 22:35 UTC
Requires
- php: >=8.1.0
Requires (Dev)
- phpstan/phpstan: ^2.1
- phpunit/phpunit: ^10
README
A powerful PHP library for parsing MediaWiki-style content from raw wiki text.
๐ Overview
This library allows you to extract:
- Templates (single, multiple, nested)
- Internal wiki links
- External links
- Citations (references)
- Categories (with or without display text)
- Tables (complete MediaWiki table syntax support) Perfect for handling wiki-formatted text in PHP projects.
๐๏ธ Project Structure
ParserTemplates: Parses multiple templates.ParserTemplate: Parses a single template.ParserInternalLinks: Parses internal wiki links.ParserExternalLinks: Parses external links.ParserCitations: Parses citations and references.ParserCategories: Parses categories from wiki text.ParserTable: Parses MediaWiki tables with full syntax support.DataModelclasses:AttributeCitationExternalLinkInternalLinkParametersTemplateTableCellLegacyTableCompatibility(backward compatibility trait)
tests/: Contains PHPUnit test files:ParserCategoriesTestParserCitationsTestParserExternalLinksTestParserInternalLinksTestParserSectionTestParserTableTest(39 tests)ParserTemplatesTestParserTemplateTestDataModeltests:AttributeTestCellTestParametersTestTableTest(46 tests with 106 assertions)TemplateTest
demo/: Live HTML testing interface:index.html- Interactive demo frontendparser.php- Backend API for real-time parsing
๐ Features
- โ Parse single and multiple templates.
- โ Support nested templates.
- โ Handle named and unnamed template parameters.
- โ Extract internal links with or without display text.
- โ Extract external links with or without labels.
- โ Parse citations including attributes and special characters.
- โ Parse categories, support custom namespaces, handle whitespaces and special characters.
- โ
Full MediaWiki table syntax support with advanced features:
- โ Multi-line cell content (paragraphs, lists, complex markup)
- โ Rowspan and colspan attributes with proper span matrix handling
- โ
HTML wrapper detection (automatically strips
<div>wrappers) - โ Accessibility support (scope attributes for screen readers)
- โ Complex attribute parsing (style, alignment, etc.)
- โ Nested table support
- โ Caption and header attributes
๐งฉ Wikitext Features Support
| Feature | Read โ | Modify โ๏ธ | Replace ๐ |
|---|---|---|---|
| Templates | โ Yes | โ Yes | โ Yes |
| Parameters | โ Yes | โ Yes | โ Yes |
| Citations | โ Yes | โ Yes | โ Yes |
| Citations>Attributes | โ Yes | โ Yes | โ Yes |
| Internal Links | โ Yes | ||
| External Links | โ Yes | ||
| Categories | โ Yes | ||
| Tables | โ Yes | โ Yes | โ Yes |
| Tables>Attributes | โ Yes | โ Yes | โ Yes |
| Tables>Cells | โ Yes | โ Yes | โ Yes |
| Tables>Headers | โ Yes | โ Yes | โ Yes |
| Tables>Captions | โ Yes | โ Yes | โ Yes |
| Tables>Rowspan/Colspan | โ Yes | โ Yes | โ Yes |
| Tables>Multi-line | โ Yes | โ Yes | โ Yes |
| Tables>HTML Wrappers | โ Yes | โ Yes | โ Yes |
| Tables>Nested | โ Yes | ||
| HTML Tags | |||
| Parser Functions | |||
| Sections | |||
| Magic Words |
๐ก Note: Some features are partially supported or under development. Contributions are welcome!
๐ Table Usage Examples
Basic Table Parsing
use WikiConnect\ParseWiki\ParserTable;
$wikitext = <<<WIKI
{| class="wikitable"
|+ Table Caption
|-
! Header 1 !! Header 2
|-
| Cell 1 || Cell 2
|-
| Cell 3 || Cell 4
|}
WIKI;
$parser = new ParserTable($wikitext);
$table = $parser->parse();
// Access table properties
echo $table->getCaption(); // "Table Caption"
echo $table->getAttributes()['class']; // "wikitable"
// Access headers
$headers = $table->getHeaders();
echo $headers[0]->getContent(); // "Header 1"
// Access data
$rows = $table->getRows();
echo $rows[0][0]->getContent(); // "Cell 1"
Advanced Table Features
// Table with attributes, spans, and accessibility
$complexTable = <<<WIKI
{| class="wikitable" style="width: 100%;"
! scope="col" colspan="2" | Product Info
! scope="col" | Price
|-
! scope="row" | Bread
| 2 loaves
| style="text-align: right;" | $3.50
|-
| rowspan="2" | Dairy
| Milk (1 gallon)
| align="right" | $4.25
|}
WIKI;
$parser = new ParserTable($complexTable);
$table = $parser->parse();
// Check spans
$headers = $table->getHeaders();
echo $headers[0]->getColSpan(); // 2
echo $headers[0]->getScope(); // "col"
// Check alignment
$rows = $table->getRows();
echo $rows[0][2]->getAlign(); // "right"
Multi-line Cell Content
// Table with complex multi-line content
$multiLineTable = <<<WIKI
{|
|Lorem ipsum dolor sit amet,
consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt
ut labore et dolore magna aliquyam erat.
At vero eos et accusam et justo duo dolores
et ea rebum. Stet clita kasd gubergren.
|
* Lorem ipsum dolor sit amet
* consetetur sadipscing elitr
* sed diam nonumy eirmod tempor invidunt
|}
WIKI;
$parser = new ParserTable($multiLineTable);
$table = $parser->parse();
// Access multi-line content
$rows = $table->getRows();
echo $rows[0][0]->getRawContent(); // Full paragraph content
echo $rows[0][1]->getRawContent(); // List content with bullets
HTML Wrapper Support
// Parser automatically handles HTML wrappers
$wrappedTable = <<<HTML
<div class="noresize">
{| class="wikitable"
! colspan="6" |Shopping List
|-
| rowspan="2" |Bread & Butter
| Pie
| Buns
|}
</div>
HTML;
$parser = new ParserTable($wrappedTable);
$table = $parser->parse(); // Automatically strips <div> wrapper
// Access span attributes
$headers = $table->getHeaders();
echo $headers[0]->getAttributes()['colspan']; // "6"
$rows = $table->getRows();
echo $rows[0][0]->getAttributes()['rowspan']; // "2"
๐ Demo Interface
The library includes HTML demo interface files for testing:
demo/index.html- Interactive frontend interfacedemo/parser.php- Backend API for parsing
Demo Features:
- โ Real-time parsing for all parser types
- โ Tabbed output (Parsed Data, JSON, Methods)
- โ Example library with pre-built syntax examples
- โ Responsive design for desktop and mobile
- โ Statistics display (cell counts, attributes, etc.)
- โ Error handling with detailed messages
Supported Parsers in Demo:
- Templates Parser
- Single Template Parser
- Table Parser (with full span and multi-line support)
- Internal Links Parser
- External Links Parser
- Citations Parser
- Categories Parser
- Sections Parser
๐ Recent Enhancements
Table Parser Improvements (2025)
- โ Multi-line cell support: Handles paragraphs, lists, and complex content spanning multiple lines
- โ Rowspan/colspan detection: Proper span matrix handling for advanced table layouts
- โ
HTML wrapper stripping: Automatically detects and removes
<div>,<span>, and other HTML wrappers - โ Enhanced attribute parsing: Support for complex CSS styles and accessibility attributes
- โ
Content consistency: Unified
getRawContent()andgetContent()methods across all data models - โ 100% MediaWiki compatibility: Tested against all official MediaWiki table examples
Testing & Development
- โ Comprehensive test suite: 169 tests with 533 assertions (100% success rate)
- โ TableTest.php: 46 dedicated tests for Table data model and LegacyTableCompatibility
- โ Live demo interface: Real-time testing with JSON output and method documentation
- โ Official examples: Validated against MediaWiki.org documentation examples
โ๏ธ Requirements
- PHP 8.0 or higher
- PHPUnit 9 or higher
๐ป Installation
composer require wiki-connect/parsewiki
Make sure you have proper PSR-4 autoloading for the WikiConnect\ParseWiki namespace.
๐งช Running Tests
vendor/bin/phpunit tests