S3cmd is another easy to use Command line client, which can be used for accessing the S3 protocol-based storage systems or backups. The tool itself is free for Linux or Mac-based systems. For Windows, there is a paid version available called S3Express. Here is a quick tutorial to work with pilw.io StorageVault with the S3cmd client.
The blog article herewith will describe the Linux based setup (Ubuntu 18.04LTS), we cover for now S3cmd only.
To install S3cmd, we have several options:
- Download from Open Source community-driven repository SourceForge
- Download from GitHub
- Or install from Linux repositories
Choosing the latter option, you need to be alert the version is not the latest there. It is recommended to use installation from one of the first two options. S3cmd requires to have Python installed and one reason going for the latest version is to use Python version 3.
Let’s go first with the SourceForge based installation routine. I created directory
s3cmd, where to download the software and run the installation from. First, you need to download the latest version of S3cmd. The latest version available at the time when blog article was written is 2.0.2. So the command is:
~/s3cmd$ wget https://sourceforge.net/projects/s3tools/files/s3cmd/2.0.2/s3cmd-2.0.2.tar.gz
We need to unzip the source:
~/s3cmd$ tar zxvf s3cmd-2.0.2.tar.gz
The files were unzipped to
s3cmd-2.0.2 directory and we can remove now the archive, since we do not need it anymore and change directory to unzipped archive:
~/s3cmd$ rm s3cmd-2.0.2.tar.gz ~/s3cmd$ cd s3cmd-2.0.2/
Getting that far, you need to install the S3cmd software:
~/s3cmd/s3cmd-2.0.2$ sudo python3 setup.py install Using xml.etree.ElementTree for XML processing running install . . . Finished processing dependencies for s3cmd==2.0.2
And we should be good to go with using S3cmd.
For example, purpose herewith created new directory
github_s3cmd. Just download the latest master from GitHub:
~/github_s3cmd$ git clone git://github.com/s3tools/s3cmd
The command will copy the latest master to
s3cmd directory. You need to run the installation from that directory:
~/github_s3cmd$ cd s3cmd ~/gihub_s3cmd/s3cmd-2.0.2$ sudo python3 setup.py install
Linux Repository Based Installation
For software operations Ubuntu uses Advanced Packaging Tool (apt) to manage software. Here is how the s3cmd can be installed in Ubuntu or Debian systems:
~$ sudo apt-get update ~$ sudo apt-get install s3cmd
All necessary dependencies are installed during the process. At the time when blog article is written, the version of S3cmd installed with Linux repository is 2.0.1.
Using S3cmd is fairly simple. We will not go through all the possible options and features. If needed, the documentation to the commands and options can be seen with
s3cmd -h command. Rather we just show a couple of use cases here with explanations. First, we need to configure s3cmd with the following command:
~$ s3cmd --configure ... Access Key: LY6PSWJKNZPPFAT8QVPP Secret Key: pfJuhreDHPNrQhqCiyaGoDvWT3RmTHmuh8XLomLL Default Region [US]: S3 Endpoint [s3.amazonaws.com]: s3.pilw.io:8080 DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: %(bucket)s.s3.pilw.io:8080 Encryption password: Path to GPG program [/usr/bin/gpg]: Use HTTPS protocol [Yes]: Yes HTTP Proxy server name: New settings: Access Key: LY6PSWJKNZPPFAT8QVPP Secret Key: pfJuhreDHPNrQhqCiyaGoDvWT3RmTHmuh8XLomLL Default Region: US S3 Endpoint: s3.pilw.io:8080 DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.pilw.io:8080 Encryption password: Path to GPG program: /usr/bin/gpg Use HTTPS protocol: True HTTP Proxy server name: HTTP Proxy server port: 0 Test access with supplied credentials? [Y/n] y Please wait, attempting to list all buckets... Success. Your access key and secret key worked fine :-) Now verifying that encryption works... Not configured. Never mind. Save settings? [y/N] y Configuration saved to '~/.s3cfg'
For a sake of simplicity, the output of the configuration command is made shorter. For Accces Key and Secret Key use your own keys. Keys in example herewith are for explanation purpose only and not valid for use. We have a blog article to explain how to get keys. The important is also S3 endpoint value
s3.pilw.io:8080. The rest of the answers can be left defaults.
Once you need to make changes to the configuration, just run the command
s3cmd --configure again.
Once done, you can run
s3cmd ls command to test. Here is an example output:
~$ s3cmd ls 2018-08-20 04:57 s3://nextcloud-demo 2018-09-08 07:25 s3://orchesto 2018-08-15 15:25 s3://prodbucket 2018-06-15 08:01 s3://s3fsmount
If you would have some buckets there, these will be listed. Otherwise, the list is empty, but you should not get any error messages, once everything is properly configured.
New buckets can be created with the following command:
~$ s3cmd mb s3://s3cmd-test Bucket 's3://s3cmd-test/' created
Now lets put some files to newly created bucket:
~$ s3cmd put testset/*.txt s3://s3cmd-test/ upload: 'testset/read-test.txt' -> 's3://s3cmd-test/read-test.txt' [1 of 3] upload: 'testset/sfile_1.txt' -> 's3://s3cmd-test/sfile_1.txt' [2 of 3] upload: 'testset/sfile_2.txt' -> 's3://s3cmd-test/sfile_2.txt' [3 of 3]
We can list the bucket contents like that:
~$ s3cmd ls s3://s3cmd-test 2018-10-06 17:44 58 s3://s3cmd-test/read-test.txt 2018-10-06 17:44 4717 s3://s3cmd-test/sfile_1.txt 2018-10-06 17:44 11700 s3://s3cmd-test/sfile_2.txt
To retrieve a file from the bucket
~$ s3cmd get s3://s3cmd-test/read-test.txt download: 's3://s3cmd-test/read-test.txt' -> './read-test.txt' [1 of 1]
When you want to retrieve the file and save it locally with a different name, just describe new name at the end of the command above.
To delete a bucket, you must empty it first:
~$ s3cmd del s3://s3cmd-test/* delete: 's3://s3cmd-test/read-test.txt' delete: 's3://s3cmd-test/sfile_1.txt' delete: 's3://s3cmd-test/sfile_2.txt' ~$ s3cmd rb s3://s3cmd-test/ Bucket 's3://s3cmd-test/' removed
Files can be deleted by name also, e.g. replacing wildcard (
*) with the file name or matching part of the file name with the wildcard.
Now, when having played with single files, s3cmd can be used also for a bit more complicated stuff. Like file syncing or even backup. It means you can add only files which exist in source but do not exist in the bucket. Or remove files from bucket, which does not exist in a source. To build this way kind of simple backup solution, that can be automated in example with task schedulers. Or if you will, besides simple get and put operation, there is also sync operation.
Here is the initial list of files in a source local directory:
./sfile_2.txt ./testdir1 ./testdir1/sfile_1_1.txt ./testdir1/sfile_1_3.txt ./testdir1/sfile_1_2.txt ./sfile_1.txt ./testdir2 ./testdir2/sfile_2_2.txt ./testdir2/sfile_2_3.txt ./testdir2/sfile_2_1.txt ./read-test.txt
Lets put some files to bucket
s3://s3cmd-sync, also name of uploaded file in the bucket was changed. Uploaded one directory to the same bucket also. To upload directory, I need to use
--recursive flag with the command.
~/testset$ s3cmd put read-test.txt s3://s3cmd-sync/uploaded-dir/uploaded-read-test.txt upload: 'read-test.txt' -> 's3://s3cmd-sync/uploaded-dir/uploaded-read-test.txt' [1 of 1] ~/testset$ s3cmd put --recursive testdir1 s3://s3cmd-sync/uploaded-dir/ upload: 'testdir1/sfile_1_1.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir1/sfile_1_1.txt' [1 of 3] upload: 'testdir1/sfile_1_2.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir1/sfile_1_2.txt' [2 of 3] upload: 'testdir1/sfile_1_3.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir1/sfile_1_3.txt' [3 of 3]
If you see, the name of the uploaded directory was not changed, but it has given a new location in a tree. Now, lets run the sync command:
~/testset$ s3cmd sync ./ s3://s3cmd-sync/uploaded-dir/ upload: './sfile_1.txt' -> 's3://s3cmd-sync/uploaded-dir/sfile_1.txt' [1 of 5] upload: './sfile_2.txt' -> 's3://s3cmd-sync/uploaded-dir/sfile_2.txt' [2 of 5] upload: './testdir2/sfile_2_1.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir2/sfile_2_1.txt' [3 of 5] upload: './testdir2/sfile_2_2.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir2/sfile_2_2.txt' [4 of 5] upload: './testdir2/sfile_2_3.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir2/sfile_2_3.txt' [5 of 5] remote copy: 'uploaded-read-test.txt' -> 'read-test.txt' Done. Uploaded 38850 bytes in 1.0 seconds, 37.94 kB/s.
As seen, only the files that were not uploaded were synchronised to bucket
s3cmd-sync/uploaded-dir. S3cmd checks file sizes and checksums. To test it, make changes to
read-test.txt file and run sync command again. The neat feature is also
--dry-run option, when you are not sure, what will be deleted or added. This option will print output of all changes without executing the command.
~/testset$ ls -l read-test.txt -rw-rw-r-- 1 user user 58 Sep 30 15:53 read-test.txt ~/testset$ echo "this is a test" >> read-test.txt ~/testset$ ls -l read-test.txt -rw-rw-r-- 1 user user 73 Oct 7 04:29 read-test.txt ~/testset$ rm sfile_1.txt ~/testset$ s3cmd sync --dry-run --delete-removed ./ s3://s3cmd-sync/uploaded-dir/ delete: 's3://s3cmd-sync/uploaded-dir/sfile_1.txt' upload: './read-test.txt' -> 's3://s3cmd-sync/uploaded-dir/read-test.txt' WARNING: Exiting now because of --dry-run
The result shows that
read-test.txt will be uploaded to bucket. There are also couple of interesting options:
--skip-existing – does not check files if these are present. No file checksum will be tested
--delete-removed – will delete files from bucket, which does not exist locally anymore.
Sometimes there are files which you do not want to transfer at all. Like temporary files or some hidden files, etc. For that there are few options to exclude or include files:
--include– you can describe files or directories to exclude or include to the remote S3 bucket. Standard shell wildcards work also, in example
--rinclude– the excluded or included file list can be defined with regular expression patterns.
--include-from– when previous options above will have patterns described in command line, the exclude and include pattern can be described in the file and provided as an argument with these options. The file will be standard text file and you can have multiple lines in the file to describe which files will be and which files will not be transferred. The example of the file will be below herewith.
Assuming you have file with all exclusion patterns defined with
--exclude-from, but you would still like to have this one file uploaded to s3, then option
--rinclude can be added to command line with file name or pattern that must be uploaded from excluded list.
The use case example, there is a set of various files created. The test file list is:
./sfile_1.txt ./sfile_2.txt ./sfile_3.txt ./sfile_1.tmp ./sfile_2.tmp ./sfile_1.log ./read-test.txt ./testdir1 ./testdir1/sfile_1_1.tmp ./testdir1/sfile_1_2.tmp ./testdir1/sfile_1_1.txt ./testdir1/sfile_1_2.txt ./testdir1/sfile_1_3.txt ./testdir1/sfile_1_1.log ./testdir2 ./testdir2/sfile_2_1.txt ./testdir2/sfile_2_2.txt ./testdir2/sfile_2_3.txt ./testdir2/sfile_2_1.tmp ./testdir2/sfile_2_2.tmp ./testdir2/sfile_2_1.log
Here are the contents of the exclude-from file, named
# These pattern matches will be excluded from my daily backup *.log *.tmp
And here is the command (with
--dry-run enabled) to run the backup:
~$ s3cmd sync --dry-run --exclude-from s3cmd_exclude.lst --include 'sfile_2_*.log' testset s3://s3cmd-sync/backup-demo/ exclude: testset/sfile_1.log exclude: testset/sfile_1.tmp exclude: testset/sfile_2.tmp exclude: testset/testdir1/sfile_1_1.log exclude: testset/testdir1/sfile_1_1.tmp exclude: testset/testdir1/sfile_1_2.tmp exclude: testset/testdir2/sfile_2_1.tmp exclude: testset/testdir2/sfile_2_2.tmp upload: 'testset/read-test.txt' -> 's3://s3cmd-sync/backup-demo/testset/read-test.txt' upload: 'testset/sfile_1.txt' -> 's3://s3cmd-sync/backup-demo/testset/sfile_1.txt' upload: 'testset/sfile_2.txt' -> 's3://s3cmd-sync/backup-demo/testset/sfile_2.txt' upload: 'testset/sfile_3.txt' -> 's3://s3cmd-sync/backup-demo/testset/sfile_3.txt' upload: 'testset/testdir1/sfile_1_1.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir1/sfile_1_1.txt' upload: 'testset/testdir1/sfile_1_2.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir1/sfile_1_2.txt' upload: 'testset/testdir1/sfile_1_3.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir1/sfile_1_3.txt' upload: 'testset/testdir2/sfile_2_1.log' -> 's3://s3cmd-sync/backup-demo/testset/testdir2/sfile_2_1.log' upload: 'testset/testdir2/sfile_2_1.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir2/sfile_2_1.txt' upload: 'testset/testdir2/sfile_2_2.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir2/sfile_2_2.txt' upload: 'testset/testdir2/sfile_2_3.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir2/sfile_2_3.txt' WARNING: Exiting now because of --dry-run
Just as for explanation, all
.tmp files will not be uploaded to bucket. However, as there was given in command line
--include 'sfile_2_*.log', the
'testset/testdir2/sfile_2_1.log' had matching shell-like pattern, meaning the file will be uploaded still.
This is a way how you pretty much can have the low-cost backup software. You’d still need some kind of scheduler to automate the backups and you are set. It is still not comparable to the functionality of actual backup softwares, but for simple backup routines, it might work. How cool is that?