silinternational / s3-expand
CLI tool to expand files from S3 into a Docker container
Installs: 471
Dependents: 0
Suggesters: 0
Security: 0
Stars: 4
Watchers: 12
Forks: 2
Open Issues: 0
Language:Shell
This package is auto-updated.
Last update: 2024-10-28 21:19:22 UTC
README
s3-expand
is a wrapper bash shell script, intended for use in a docker
container. It provides functionality to generate and edit files:
- Directing the contents of environmental variables into files
- Pulling data from Amazon S3:
- Individual files
- Tar archives, which are extracted into a directory
- File synchronization
- Running
sed
on existing files
Note: for local development you can pass in these environmental variables using
docker's --env-file
switch; see ENV-FILE.example
for an example of what
this file might look like.
Usage
s3-expand <actual_cmd> [args ...]
s3-expand -c <filename>
actual_cmd
is exec'ed by the wrapper once it has completed its run.
Operation is determined by the setting one or more of the following
environmental variables. If none are set, the script exec's the actual_cmd
.
EXPAND_FILES
EXPAND_SED_FILES
EXPAND_S3_TARS
EXPAND_S3_FILES
EXPAND_S3_FOLDERS
Details on the workings of each are in the following sections.
The wrapper was written under the assumption that it is run within a container as root, and so has arbitrary ability to create files and change their owners and modes. Consequently, it is possible that not everything will work correctly if it is run as another user.
The second form runs the script as a conversion utility to translate all the
newlines in a file to octal 032, which can then be placed into ENV variables for
use with the EXPAND_FILES
mode.
EXPAND_FILES
The value of EXPAND_FILES
must be a space-delimited list of key-value
pairs, each separated by an equals sign (=
). For each pair, the key is the
name of a referenced environmental variable, and the value is the path to a new
file, whose contents will be the current value of the referenced variable.
All parent directories in the path will be created if they do not exist, and
if the referenced environmental variable is not set, that particular
key-value pair will be ignored. You can also append [mode|]
,
[mode|owner]
, or [|owner]
to the path to set the numerical file
permissions, and/or the file owner. Since newlines are not allowed in
environmental variables, the script will replace any ASCII SUB character
(\032
or \x1a
) in the value of the referenced environmental variable
with a newline in the created file.
So, as an example, suppose the following are set for the container:
EXPAND_FILES= ISSUE=/etc/issue SPECIFIC=/home/foo/.bashrc[0664|foo] FORGOT=/data/my_file
ISSUE=Linux, running in Docker!
SPECIFIC=cd ~
The wrapper script will then, when the container is started, overwrite
/etc/issue
with "Linux, running in Docker!", create /home/foo/.bashrc
with
contents "cd ~" (no newline at the end), permissions 664 and owner
user 'foo', and do nothing for /data/my_file
, since FORGOT was not set.
EXPAND_SED_FILES
The value of EXPAND_SED_FILES
must be a space-delimited list of key-value
pairs, each separated by a pipe (|
). The key is the path to an existing file,
while the value is the path to a sed script.
sed -i "<key>" -f "<value>"
will be run for every key-value pair.
Suppose the following ENV variable is set for the container:
EXPAND_SED_FILES="/etc/issue|/data/issue.sed"
Given that /etc/issue
already exists, with contents:
Ubuntu 14.04.2 LTS \n \l
...and that /data/issue.sed
has previously been placed into the container, with
contents:
1s/$/ \n \t/
The wrapper will run sed
, and the resulting contents of /etc/issue
will be:
Ubuntu 14.04.2 LTS \n \l \n \t
S3 Access
Three modes are available for pulling data from Amazon S3: file, sync, and archive. Each require two other environmental variables to be set in order to work. They are:
EXPAND_S3_KEY
EXPAND_S3_SECRET
They must be set to the AWS access key, and AWS secret key, respectively, which have sufficient permissions to access the specified S3 targets. After use, they will be scrubbed from the environment.
EXPAND_S3_FILES
The value of EXPAND_S3_FILES
must be a space-delimited list of key-value
pairs, each separated by a pipe (|
). The key is the location of a file in S3,
in the format bucket/path-to-file
. The value is the path where the file
should be placed, either a filename, or a directory (if ending with a /
).
Parent directories will be created if they do not already exist.
You can also append [mode,]
, [mode,owner]
, or [,owner]
to the path to set
the numerical file permissions, and/or the file owner.
Suppose the following ENV variable is set for the container (in addition to the S3 credentials):
EXPAND_S3_FILES="DXCmEdg4gb/database.sql|/archive.sql yBO8IJ/homes/foo/authorized_keys|/home/foo/.ssh/"
The wrapper will pull the files thus:
s3://DXCmEdg4gb/database.sql -> /archive.sql
s3://yBO8IJ/homes/foo/authorized_keys -> /home/foo/.ssh/authorized_keys
EXPAND_S3_TARS
The value of EXPAND_S3_TARS
must be a space-delimited list of key-value
pairs, each separated by a pipe (|
). The key is the location of a tar archive
in S3, in the format bucket/path-to-archive
. The value is is the directory in
which the archive should be extracted. This target directory will be created if
it does not already exist. You can also append [owner]
to the expansion
directory path to set the ownership of all the files in that directory.
Suppose the following ENV variable is set for the container (in addition to the S3 credentials):
EXPAND_S3_TARS="DXCmEdg4gb/data.tar|/data yBO8IJ/homes/foo/special.tar|/home/foo/my_dir/"
Given contents of data.tar
:
toplevelfile
newdir/
newdir/foo
newdir/bar
Given contents of special.tar
:
a
b
c
The wrapper will extract the archives thus:
s3://DXCmEdg4gb/data.tar -> /data/toplevelfile
-> /data/newdir/foo
-> /data/newdir/bar
s3://yBO8IJ/homes/foo/special.tar -> /home/foo/my_dir/a
-> /home/foo/my_dir/b
-> /home/foo/my_dir/c
EXPAND_S3_FOLDERS
The value of EXPAND_S3_FOLDERS
must be a space-delimited list of key-value
pairs, each separated by a pipe (|
). The key is the location of a folder in
S3, in the format bucket/path-to-folder
. The value is the path to a directory
(which will be created if it does not already exist.
If the key ends in a slash (/
), the contents of the S3 folder will be
synchronized; if it does not end in slash, the S3 folder itself will be
synchronized in the target directory
You can also append [owner]
to the expansion directory path to set the
ownership of all the files in that directory.
Suppose the following ENV variable is set for the container (in addition to the S3 credentials):
EXPAND_S3_FOLDERS="DXCmEdg4gb/proj|/data yBO8IJ/homes/foo/|/home/foo"
Given that the folder layout in S3 is:
s3://DXCmEdg4gb/proj/Makefile
s3://DXCmEdg4gb/proj/source.c
s3://DXCmEdg4gb/proj/source.h
s3://yBO8IJ/homes/foo/.ssh/
s3://yBO8IJ/homes/foo/.ssh/authorized_keys
The wrapper will synchronize thus:
s3://DXCmEdg4gb/proj/Makefile -> /data/proj/Makefile
s3://DXCmEdg4gb/proj/source.c -> /data/proj/source.c
s3://DXCmEdg4gb/proj/source.h -> /data/proj/source.h
s3://yBO8IJ/homes/foo/.ssh/authorized_keys -> /home/foo/.ssh/authorized_keys
Testing
A simple shell testing framework is included in tests
for verifying the
operation of the wrapper. It is designed to be run from within a container as
root; the included Dockerfile is for this purpose.
Just create an file called env.local
with contents similar to:
EXPAND_S3_KEY=OTGJTJBPGPXVHUKOUBTY
EXPAND_S3_SECRET=NUId1Ar6nnQ/ah4Y27q5bskVHxhJHPipvC3kEitb
S3_TEST_PATH=random-bucket-OyQ3Qu/randomfolder-xtyD2C
Then, to run the tests, use these commands:
docker build -t s3-expand-testing .
docker run --rm -it --env-file env.local s3-expand-testing bash
The S3 folder formed from the url s3://$S3_TEST_PATH
will be used as a staging
area for testing the wrapper modes that pull from S3.