shaarli/netscape-bookmark-parser

Generic Netscape bookmark parser

v3.2.0 2021-01-31 09:39 UTC

README

license

About

This library provides a generic NetscapeBookmarkParser class that is able of parsing Netscape bookmarks as exported by common Web browsers and bookmarking services.

The motivations behind developing this parser are the following:

  • the Netscape format has a very loose specification: no DTD nor XSL stylesheet to constrain how data is formatted
  • software and web services export bookmarks using a wild variety of attribute names and values
  • using standard SAX or DOM parsers is thus not straightforward.

How it works:

  • the input bookmark file is trimmed and sanitized to improve parsing results
  • the resulting data is then parsed using PCRE patterns to match attributes and values corresponding to the most likely:
    • attribute names: description vs. note, tags vs. labels, date vs. time, etc.
    • data formats: comma,separated,tags vs. space separated labels, UNIX epochs vs. human-readable dates, newlines & carriage returns, etc.
  • an associative array containing all successfully parsed links with their attributes is returned

Shaarli community fork

This friendly fork is maintained by the Shaarli community at https://github.com/shaarli/netscape-bookmark-parser and is used by the open-source Shaarli bookmarking service. This is a community fork of the original netscape-bookmark-parser project by Kafene.

Installation

Using Composer (package):

composer require shaarli/netscape-bookmark-parser

Example

Script:

<?php

require_once 'vendor/autoload.php';

use Shaarli\NetscapeBookmarkParser\NetscapeBookmarkParser;

$parser = new NetscapeBookmarkParser();
$bookmarks = $parser->parseFile('./tests/input/netscape_basic.htm');
var_dump($bookmarks);

Output:

array(2) {
  [0] =>
  array(6) {
    'uri' =>
    string(19) "https://private.tld"
    'title' =>
    string(12) "Secret stuff"
    'note' =>
    string(52) "Super-secret stuff you're not supposed to know about"
    'tags' =>
    array(2) {
      [0] =>
      string(7) "private"
      [1] =>
      string(6) "secret"
    }
    'time' =>
    int(971175336)
    'pub' =>
    int(0)
  }
  [1] =>
  array(6) {
    'uri' =>
    string(17) "http://public.tld"
    'title' =>
    string(12) "Public stuff"
    'note' =>
    string(0) ""
    'tags' =>
    array(3) {
      [0] =>
      string(6) "public"
      [1] =>
      string(5) "hello"
      [2] =>
      string(5) "world"
    }
    'time' =>
    int(1456433748)
    'pub' =>
    int(1)
  }
}