buonzz / scalp
Command line tool to Analyze and store your Media file's metadata
Requires
- php: >=5.4
- cwhite92/b2-sdk-php: ^1.2
- guzzlehttp/guzzle: ^6.2
- james-heinrich/getid3: 1.9.*
- masterexploder/phpthumb: ^2.1
- monolog/monolog: ^1.19
- symfony/console: ^3.0
- symfony/stopwatch: ^3.0
- vlucas/phpdotenv: ^2.2
Requires (Dev)
- phpunit/phpunit: 5.2.*
README
Command line tool to Analyze and store your Media file's metadata.
Do you or your organization have a bunch of images/videos lumped into some hard drive? If those are stored in a non-structured way, like there is no real scheme on how it is organized (by date, by album etc). Performing analysis and retrieving a certain set of files will be really tough to do.
Scalp allows you analyze and extract a lot of information from those bunch of media files and build "something" out of that data. It is purely a backend tool that is designed to support any kind of application you might be developing on that top of the extracted info.
Few situations it could be useful:
- Build a private Search engine, that allows user to input certain keywords and return a list of files matching that keyword
- Use as a backend store for your CMS
- Generate thumbnails (small/medium/large) and host the processed files to CDN
Below is the overview of how Scalp works
- Your media files can be stored in the same server or a dedicated NAS server (needs to be mounted to a location where scalp can access it )
- Scalp reads the files and extract Metadata from it (represented as JSON object)
- Scalp generates thumbnails (small/medium/large)
- The data processed can then be forwarded to any of the supported backend storage (File, ElasticSearch, S3, BackBlaze etc)
- Your application accesses the processed data via those backend storage
Advantages
- You can continously re-organize how the files is presented to your users without having to physically move around the actual files. Since the representation can be abstracted by the Application itself
- Your raw images/videos is left untouched (Scalp never move, resize or modify its properties)
- Saves a lot of bandwidth, since instead of serving the raw images while browsing items. You can just use the thumbnails created by Scalp. and only access the raw (usually big file) image when user requested.
Requirements
- Linux-based Server (Debian/RHEL etc)
- PHP 5.4 or greater
Install
It is very easy to install Scalp as a CLI utility:
via wget
wget https://downloads.buonzz.com/scalp.phar
sudo mv scalp.phar /usr/local/bin/scalp
chmod +x /usr/local/bin/scalp
via curl
curl -o scalp.phar 'https://downloads.buonzz.com/scalp.phar'
sudo mv scalp.phar /usr/local/bin/scalp
chmod +x /usr/local/bin/scalp
After this, scalp command is available anywhere in your computer. To check if the scalp is installed properly, just execute
scalp -V
This should output
Scalp Metadata Extraction Tool by Darwin Biler version v2
via Composer - Globally
You can install scalp globally in your machine:
composer global require 'buonzz/scalp=dev-master'
Simply add this directory to your PATH in your ~/.bash_profile (or ~/.bashrc) like this:
export PATH=~/.composer/vendor/bin:$PATH
via Composer - per-project
just require the buonzz/scalp in your composer project
{
"require":{
"buonzz/scalp": "1.*"
}
}
Usage
Scalp can extract the metadata information from your media files and export it to following:
- Static JSON files
- Thumbnails
First, you need to create a configuration file called .env This will be used by scalp to retrieve certain information:
Metadata Extraction
To generate static JSON files
scalp metadata:extract
Sample extracted metadata
{
"last_modified":"2017-03-25T12:10:39+00:00",
"last_accessed":"2017-04-09T12:07:11+00:00",
"file_permissions":"0644",
"date_indexed":"2017-04-10T03:04:23+00:00",
"human_filesize":"974.83 kB",
"filepath":"IMG_1123.JPG",
"path_tags":[
],
"filesize":"998225",
"fileformat":"jpg",
"filename":"IMG_1123.JPG",
"mime_type":"image\/jpeg",
"exif":{
"DateTimeDigitized":"2016-05-17T16:19:26+00:00",
"ExposureTime":0.016666666666667,
"FNumber":4,
"ISOSpeedRatings":200,
"ShutterSpeedValue":6,
"ApertureValue":4,
"FocalLength":20
},
"date_tags":[
"Tue",
"17th",
"Tuesday",
"May",
"May",
"2016",
"4pm",
"UTC",
"17"
],
"file_contents_hash":"4c7e796bc250b14fe7964694c4db5852eca34ddee24991371f848c3e8097436d",
"width":"1920",
"height":"1280"
}
Thumbnail Creation
scalp thumbnail:create
Running in the background
If you got a relatively large collection of media files. It might take hours for the process to complete. You can place the process in the background, so that it will continue to execute even after you had logged out in the terminal.
nohup scalp metadata:extract > scalp.log &
nohup scalp thumbnail:create > scalp.log &
Backend Storage
The processed data can be stored in multitude of ways. Allowing you to freely use its data in whatever purpose it may serve for your use-case.
File
The simpliest backend is the file storage. It simply dumps everything to whatever you had defined in the OUTPUT_FOLDER configuration in your .env file. You can either use the generated files and host it to your web server/CDN or pass it to your application for further processing.
ElasticSearch
When using ElasticSearch as a backend. Make sure to provide the following entries in your .env file
Load it to ElasticSearch
scalp es:index
Save thumbnails to ElasticSearch
scalp thumb:save
Viewing the thumbnails
You can get a preview of those thumbnails by using the PHP's built in web server
php -S localhost:8080 /usr/local/bin/scalp
then in your localhost, just append the url given by search command.
BackBlaze's B2 Cloud Storage
B2 is a great way to store both your metadata and thumbnails if you plan to serve those data in a web site with high traffic. It is way cheaper than the other alternative, plus it offers you a free 10GB initial storage so you don't really need to pay anything unless you decide to upload more contents.
Provide the following entries in your .env file prior to using the BackBlaze
Then execute the following to extract, create thumbnail and upload the data to B2:
scalp b2:upload scalp-test dist
the first parameter (scalp-test) is the bucket name you created in your backblaze account, and the second parameter is the folder where the files to be uploaded is located.
AWS S3
TODO