diggin/diggin-http-charset

Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.

1.0.0 2019-05-06 04:57 UTC

This package is auto-updated.

Last update: 2022-10-06 11:58:06 UTC


README

Automatically convert to UTF-8.

Master: Build Status Coverage Status

Detecting based on header's charset & html meta charset.

(handling several charset more carefully - SJIS-win, TIS-620 and others..)

This library aims to used in web-scraping.

Requirements

  • PHP 5.3 or over
  • mbstring and iconv

Usage

  1. wrap response object:
<?php
use Diggin\Http\Charset\WrapperFactory;
$client = new Zend\Http\Client($url);
$response = $client->send();
$response = WrapperFactory::factory($response); // then, response getBody() return with converted UTF-8.

Please see more at demos/Diggin/Http/Charset .

Guzzle & Goutte

guzzle-plugin-AutoCharsetEncodingPlugin supports for using with Guzzle3.

Usage of with Behat by @MugeSo

Technical Information

Diggin_Http_Charset is based on HTMLScraping.

License

Diggin_Http_Charset is licensed under LGPL(GNU Lesser General Public License).

Similar library

TODOs

  • handling non text/html content types.
  • better APIs & according ZF2 coding standard.
  • struggle in more charset :-\