andrew-svirin/replace-files-duplicates-php

PHP library to scan files duplicates and replace by soft links or hard links.

1.0.0 2019-09-26 15:41 UTC

This package is auto-updated.

Last update: 2024-12-27 03:50:02 UTC


README

Replace files duplicates by links.

Overview

Script oriented to scan directories for files those are equal and replace newest by hard or soft link on older one. Hard link allow to remove parent file without impact on linked instance, but modification of file or linked instance have effect on both. Useful tool for decrease size of storage by removing copies.

Usage

Define Storage for cache and service.

      $scanStorage = new UnixScanStorage([__DIR_PATH_FOR_SCAN__]);
      $cacheStorage = new FileIndexStorage(__DIR_PATH_FOR_CACHE_STORAGE__);
      $replacementService = new ReplacementService($scanStorage, $cacheStorage);

Scan directories for build index.

      $replacementService->scan(function (Record $file)
      {
         // Hash consists from concatenation file size + first byte + last byte.
         $fp = fopen($file->path, 'r');
         fseek($fp, 10);
         $firstChar = fgetc($fp);
         fseek($fp, -10, SEEK_END);
         $lastChar = fgetc($fp);
         $fileSize = filesize($file->path);
         $hash = $fileSize . ord($firstChar) . ord($lastChar);
         return $hash;
      }, function (string $hashA = null, string $hashB = null)
      {
         // Compare hashes.
         $result = strnatcmp($hashA, $hashB);
         return $result;
      }, function (Record $file)
      {
         // Filter only txt files for scan.
         $ext = pathinfo($file->path, PATHINFO_EXTENSION);
         return in_array($ext, ['txt']);
      });

Find duplicates after scan and replace by hard link.

      $duplicatesGen = $replacementService->findDuplicates();
      while (($records = $duplicatesGen->current()))
      {
         $replacementService->replaceDuplicatesHard($records);
         $duplicatesGen->next();
      }

Calculate duplicates size in bytes.

      $duplicatesGen = $replacementService->findDuplicates();
      $duplicateSize = 0;
      $linkBlock = 1;
      while (($records = $duplicatesGen->current()))
      {
         /* @var $record Record */
         $record = reset($records);
         $stat = stat($record->path);
         if (0 < $stat['blocks'])
         {
            $duplicateSize += ($stat['blocks'] * 512) - $stat['blksize'];
         }
         $duplicatesGen->next();
      }