diggin/diggin-http-charset

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

Installs: 1 975

Dependents: 5

Stars: 7

Watchers: 2

Forks: 3

Language: PHP

v0.8.0 2013-03-09 13:38 UTC

README

Automatically convert to UTF-8.

Master: Build Status Coverage Status

Detecting based on header's charset & html meta charset.

(handling several charset more carefully - SJIS-win, TIS-620 and others..)

This library aims to used in web-scraping.

Requirements

  • PHP 5.3 or over
  • mbstring and iconv

Usage

  1. wrap response object:
<?php
use Diggin\Http\Charset\WrapperFactory;
$client = new Zend\Http\Client($url);
$response = $client->send();
$response = WrapperFactory::wrapResponse($response); // then, response getBody() return with converted UTF-8.

Please see more at demos/Diggin/Http/Charset .

Guzzle & Goutte

guzzle-plugin-AutoCharsetEncodingPlugin supports for using with Guzzle3.

Usage of with Behat by @MugeSo

Technical Information

Diggin_Http_Charset is based on HTMLScraping.

License

Diggin_Http_Charset is licensed under LGPL(GNU Lesser General Public License).

Similar library

TODOs

  • handling non text/html content types.
  • better APIs & according ZF2 coding standard.
  • struggle in more charset :-\