simbiat / htmlcache
Caching library specifically designed for web-pages.
Requires
- php: ^8
Suggests
- simbiat/http20: Provides 'zEcho' for direct output of cached data
This package is auto-updated.
Last update: 2025-01-17 00:06:18 UTC
README
This is a caching library designed specifically for caching of web-pages. It is not compliant with PSR-6, in terms of format, although some functionality may be similar. It's not meant as a replacement for other caching solutions like Symfony's Cache, as well, since it's a niche solution, rather a more generic one.
Why?
While caching seems to be long-resolved thing with lots of good solutions, I've found 2 points, that did not sit right with my desire to server consistent web-pages:
- If codebase serving a page changed, it did not invalidate the cached version of a page automatically, meaning a chance to serve stale content.
- If I am to serve identical headers, I would need to cache them separately and link them somehow to the cached content through extra logic. Some may say that these are minor issues, and he may be right, that they are unlikely to cause an actual issue, but for me both mean that unless I add extra logic accompanying Symfony\Cache (used previously), I will serve stale content with new headers, that can worsen user experience. I do not want that.
What?
That's why I've decided to write my own solution, designed specifically for web pages. First problem is solved by calculating the version of the code base like this:
$usedFiles = get_included_files(); $this->version = count($usedFiles).'.'.max(max(array_map('filemtime', array_filter($usedFiles, 'is_file'))),
Then this version is stored in the cache for each designated key. When getting the data out of cache it will be matched with current one and if they do not match, it will be treated as a cache miss. This is not a 100% guarantee, but since I have a main "gateway" file that handles all the page generation, it works for me just fine. Second problem is solved by caching headers sent along with the data. Once the data is retrieved the same headers will be sent once again. That's not all: it has other features, that may be useful:
- Optional zipping of the data in cache to reduce footprint
- Direct output using
zEcho
function from HTTP20 library, if available (otherwise regularecho
will be used) - Validation of cache using hash to reduce chances of serving corrupted cache
- Techniques to negate cache-slamming (at least, to an extent)
- Automatically tries to use either APCU or file storage. Memcache and the like are not supported, since I do not ahve appropriate infrastructure to test them.
- Transient in case of failures so that your webpage will still be displayed (generally)
How?
Here's a simple example of how I'm using it. Scroll further for more details on the functions.
#Create HTMLCache object $HTMLCache = (new \Simbiat\HTMLCache($siteconfig['cachedir'].'html/')); #Attempt to use cache $HTMLCache->get('', true, true, true); #Do some processing in case cache was not hit, to get $output #Save to cache and output directly if ($uri[1] === 'statistics') { $HTMLCache->set($output, '', 604800, 600, true, true); } elseif ($uri[1] === 'achievement') { $HTMLCache->set($output, '', 259200, 600, true, true); } #Output page if not required to cache (new \Simbiat\HTTP20\Common)->zEcho($output);
Details on usage
Construct
__construct(string $filesPool = '', bool $apcu = false, int $maxRandom = 1)
When creating the object you can specify path where files of the cache will be stored using $filesPool
. If empty, this will let the class know, that you do not want to use file storage for caching. In that case you need to explicitly enable aPCU caching with setting $apcu
to true
. It is set to false
by default due to potential limitations in resources you may have.
In order to negate cache slamming, class reduces expiration date during validation by a random amount from 0 to $maxRandom
, which is defaulted to 1 minute. You can adjust this number or use 0 to, essentially, disable this feature (not advisable).
Set
set(string $string, string $key ='', int $ttl = 60, int $grace = 100, bool $zip = true, bool $direct = true, string $cacheStrat = '')
Use set
to write to cache. $string
is the only mandatory value. Since the class is designed for HTML pages, we are restricting the type of the value to string
only.
$key
is an optional value for ID with which the value will be stored. If empty current REQUEST_URI
will be used (if it's empty index.php
will be used). Regardless, the value will be hashed for consistency.
$ttl
is time to live
for the cached value. After it expires, the value will be considered stale
. Defaults to 60, that is 1 hour.
$grace
is an optional grace period to help with cache slamming. When cache hit is successful, but it has expired, class updates the expiration value to time()+$grace
and sets $grace = 0
. This helps with concurrent requests, so that they will still receive the stale data for extra seconds after its expiration, while initial hit updates the cache. Default is 1 minute.
$zip
will GZIP the body and headers of the page to save some space. With current processing power and average size of HTML pages, this is a very fast operation, which can help you cache more stuff both in memory and on disk. You can disable it, if you want, by setting it to false
.
$direct
if set to true
will output the webpage right after cache is written. Since we are dealing with webpages, there is not much reason to do something after we have a generated page, but you can disable this behaviour and, instead, receive a boolean value representing the result of the function.
$cacheStrat
is used for setup of Cache-Control
header (using appropriate function) if you are using zEcho
. This value will also be cached.
Get
get(string $key = '', bool $scriptVersion = true, bool $direct = true, bool $staleReturn = false)
Use get
to get the cached data.
$key
is an optional value for ID with which the value will be stored. If empty current REQUEST_URI
will be used (if it's empty index.php
will be used). Regardless, the value will be hashed for consistency.
$scriptVersion
if set to true
, will force validation of codebase version, as described above. Since this is more of a personal preference, you can disable this feature.
$direct
if set to true
will output the webpage right getting the page. Since we are dealing with webpages, there is not much reason to disable this, but you can do this and, instead, receive an array of representing all the cached data. I doubt it will be useful outside the class, though.
$staleReturn
if set to true
will allow to manually serve stale content, while generating new content. In a way, similar to how stale-while-revalidate
in Cache-Control
header works.
cacheOutput
cacheOutput(array $data, bool $exit = true)
Use to manually output the data returned by get
.
$data
is the array returned by get
.
$exit
flag allows canceling automatic exit after output (default), in case you plan to do something after it (very useful when using get
with $staleReturn = true
).
Delete
delete(string $key = '')
Use delete
to remove cached item.
$key
is an optional value for ID with which the value will be stored. If empty current REQUEST_URI
will be used (if it's empty index.php
will be used). Regardless, the value will be hashed for consistency.
Garbage collection
gc(int $maxAge = 60, int $maxSize = 1024)
This garbage collection function explicitly removes old entries older than $maxAge
minutes (60 by default) or all the oldest entries until the total size it less than $maxSize
megabytes.
Modification time is checked for this, meaning, that only cache that was not used for the amount of days will be affected. You should adjust this value based on the longest cache time you ahve in your project. Alternatively you can disable the feature by setting the value to 0.
Will also remove empty directories, when file storage is used.