andrew-svirin / replace-files-duplicates-php
PHP library to scan files duplicates and replace by soft links or hard links.
1.0.0
2019-09-26 15:41 UTC
Requires
- php: ^7.2
Requires (Dev)
- phpunit/phpunit: ^6.5
This package is auto-updated.
Last update: 2024-12-27 03:50:02 UTC
README
Replace files duplicates by links.
Overview
Script oriented to scan directories for files those are equal and replace newest by hard or soft link on older one. Hard link allow to remove parent file without impact on linked instance, but modification of file or linked instance have effect on both. Useful tool for decrease size of storage by removing copies.
Usage
Define Storage for cache and service.
$scanStorage = new UnixScanStorage([__DIR_PATH_FOR_SCAN__]); $cacheStorage = new FileIndexStorage(__DIR_PATH_FOR_CACHE_STORAGE__); $replacementService = new ReplacementService($scanStorage, $cacheStorage);
Scan directories for build index.
$replacementService->scan(function (Record $file) { // Hash consists from concatenation file size + first byte + last byte. $fp = fopen($file->path, 'r'); fseek($fp, 10); $firstChar = fgetc($fp); fseek($fp, -10, SEEK_END); $lastChar = fgetc($fp); $fileSize = filesize($file->path); $hash = $fileSize . ord($firstChar) . ord($lastChar); return $hash; }, function (string $hashA = null, string $hashB = null) { // Compare hashes. $result = strnatcmp($hashA, $hashB); return $result; }, function (Record $file) { // Filter only txt files for scan. $ext = pathinfo($file->path, PATHINFO_EXTENSION); return in_array($ext, ['txt']); });
Find duplicates after scan and replace by hard link.
$duplicatesGen = $replacementService->findDuplicates(); while (($records = $duplicatesGen->current())) { $replacementService->replaceDuplicatesHard($records); $duplicatesGen->next(); }
Calculate duplicates size in bytes.
$duplicatesGen = $replacementService->findDuplicates(); $duplicateSize = 0; $linkBlock = 1; while (($records = $duplicatesGen->current())) { /* @var $record Record */ $record = reset($records); $stat = stat($record->path); if (0 < $stat['blocks']) { $duplicateSize += ($stat['blocks'] * 512) - $stat['blksize']; } $duplicatesGen->next(); }