diggin / diggin-http-charset
Detecting based on header's charset and html meta charset. Automatically convert to UTF-8.
Installs: 3 055
Dependents: 4
Suggesters: 0
Security: 0
Stars: 6
Watchers: 2
Forks: 5
Open Issues: 0
Requires
- php: >=5.3.3
Requires (Dev)
- phpunit/phpunit: ~4.3
- symfony/browser-kit: ~2.4
- zendframework/zend-http: ~2.2
This package is auto-updated.
Last update: 2024-11-06 16:42:22 UTC
README
Automatically convert to UTF-8.
Detecting based on header's charset & html meta charset.
(handling several charset more carefully - SJIS-win, TIS-620 and others..)
This library aims to used in web-scraping.
Requirements
- PHP 5.3 or over
- mbstring and iconv
Usage
- wrap response object:
<?php use Diggin\Http\Charset\WrapperFactory; $client = new Zend\Http\Client($url); $response = $client->send(); $response = WrapperFactory::factory($response); // then, response getBody() return with converted UTF-8.
Please see more at demos/Diggin/Http/Charset .
Guzzle & Goutte
guzzle-plugin-AutoCharsetEncodingPlugin supports for using with Guzzle3.
Usage of with Behat by @MugeSo
Technical Information
Diggin_Http_Charset is based on HTMLScraping.
License
Diggin_Http_Charset is licensed under LGPL(GNU Lesser General Public License).
Similar library
- perl : HTTP::Response::Encoding
- python : Universal Encoding Detector
TODOs
- handling non text/html content types.
- better APIs & according ZF2 coding standard.
- struggle in more charset :-\