Encrypted Open-Source Linux Backup to Amazon S3 using Amanda 3.2.x

Written by max on January 25, 2011

Overview

NOTE: This post was never quite finished… hopefully these partial instructions help someone in the future…

You want to backup a linux box using the open source version of zManda – Amanda. You’ld like to use Amazon S3 for storage on the cloud, and have it encrypted, as well as use reduced redundancy. You’ld like to not pay more S3 fees than you’ld have to. You’ld like to be able to do a “bare-metal” restore in case the sky falls. You’ld like to not store the encryption key password on the server, but instead use an “agent” like gpg-agent/ssh-agent.

Versions

I used the following versions at the time of writing:

  • Server (backing-up) CentOS 5.5 running GnuPG (gpg) 1.4.5
  • Server (restoring) Ubuntu 10.10 running 1.4.10
  • Amanda – 3.2.2
  • GnuPG (gpg) – Whatever’s on the Distro
  • Keychain – version that is installable by package manager

Download Amanda either in RPM form or in source form and install. For Amanda 3.x on Ubuntu the S3 device does not appear to be installed using the normal ‘apt-get’ version of Amanda. download the .deb file from here. For Ubuntu I also had to switch to xinetd in order for the .deb to install correctly : apt-get install xinetd. You may get some conflict message, just follow the instructions on screen to fix. If you are missing any other dependencies just apt-get install them for Ubuntu or do a yum install for Fedora/RedHat/CentOS. You may need to create the amanda user and group on your own, and perhaps fix some permissions for that user. I assume in this guide the user name is amanda, change as needed.

Amanda Setup

  1. Create a directories /etc/amanda and /etc/amanda/JOBNAME where JOBNAME is whatever you want to call your backup inside of Amanda. Daily01 I think is the default. I used the name of my server.
  2. Copy dumptypes and tapetypes from the amanda install. These are in the example/template.d directory for source installs or probably in /usr/share/doc/amanda*/ for the packaged installs.

    Here is line I use in dumptypes:

    define dumptype gpg-encrypt-nocomp {
       root-tar
       comment "server public-key encryption, dumped with tar"
       compress none
       encrypt server
       property "GNUTAR-LISTDIR" "/backups/gnutar_list_dir"
       server_encrypt "/usr/local/sbin/amgpgcrypt"
       server_decrypt_option "-d"
    }
  3. Create the setup files in /etc/amanda/JOBNAME
    From the same spot as above copy the advanced.conf, amanda.conf from the distro.

    I also create a makefile to make it easier to remember how to run commands. KEY is your amazon key, GPGKEY is the gpg key id you’ll set up below.

    CONFIG=JOBNAME
    RUN=daily
    KEY=ASDFASDFDADSFDASDF12324
    BUCKET=JOBNAME
    GPGKEY=ABCD1234
     
    check:
            amdevcheck $(CONFIG) s3:$(KEY)-$(BUCKET)/$(RUN)/slot-10
     
    dump:
            amdump $(CONFIG)
     
    report:
            amreport $(CONFIG)
     
    keychain:
            keychain $(GPGKEY)
     
    slots:
            for i in 1 2 3 4 5 6 7 8 9 10; do \
        amlabel $(CONFIG) $(RUN)-$$i slot $$i; done

    Now you can use the commands make dump, make check, make report without having to remember much. Just remember to run them as user amanda so you don’t end up with GPG and permissions issh.

    Here is my amanda.conf. Note that I DO NOT understand the whole dumpcycle, tapecycle thing. The more read, the more confused I get. Maybe someone will set me straight on that one. Note that I am using S3’s reduced redundancy storage. Please read up on that before deciding to use it. It’s cheaper!

    #
    # Created from amanda-S3.conf from source compile 3.2.2
    #
    org "Secret Headquarter"      # your organization name for reports
    mailto "root"   # space separated list of operators at your site
     
    #
    # Backup twice a week over a month using 8 "tapes"
    #
    # Storage Cost (reduced) = $0.093 * 2G * 10 = $1.86/month
    # Transfer Cost          = $0.100 * 2G * 10 = $2.00/month
    #
    dumpcycle 4 weeks       # the number of days in the normal dump cycle
    runspercycle 10         # the number of amdump runs in dumpcycle days
    tapecycle 10 tapes
    runtapes 1              # number of tapes to be used in a single run of amdump
     
    #
    # amazonaws S3
    #
    define tapetype S3 {
        comment "S3 Bucket"
        length 2 gigabytes
    }
    device_property "S3_ACCESS_KEY" "ASDFASDFDADSFDASDF12324"
    device_property "S3_SECRET_KEY" "blahblahblhabhabcopypastefroms3whenyousetitup"
    device_property "S3_STORAGE_CLASS" "REDUCED_REDUNDANCY"
    # Curl needs to have S3 Certification Authority (Verisign today)
    # in its CA list. If connection fails, try setting this no NO
    device_property "S3_SSL" "YES"
    tpchanger "chg-multi:s3:ASDFASDFDADSFDASDF12324-JOBNAME/daily/slot-{01,02,03,04,05,06,07,08,09,10}"
    changerfile  "s3-statefile"
    tapetype S3
     
    autolabel "daily-%%%%" empty
    labelstr "^daily-[0-9][0-9]*$"  # label constraint regex: all tapes must match
     
    dtimeout 1800           # number of idle seconds before a dump is aborted.
    ctimeout 30             # maximum number of seconds that amcheck waits for each client host
    etimeout 300            # number of seconds per filesystem for estimates.
     
    define dumptype global {
        comment "Global definitions"
        auth "bsdtcp"
        exclude list "/etc/amanda/JOBNAME/exclude-list"
    }
     
    define application-tool app_amgtar {
        comment "amgtar"
        plugin  "amgtar"
    }
     
    define dumptype gui-base {
            global
            program "APPLICATION"
            application "app_amgtar"
            comment "gui base dumptype dumped with tar"
            compress none
            index yes
    }
     
    includefile "./advanced.conf"
    includefile "/etc/amanda/dumptypes"
    includefile "/etc/amanda/tapetypes"

    Now for my /etc/amanda/JOBNAME/advanced.conf. I think this is almost all straight from the example one, the only difference is I store everything in /backups instead of /var.

    dumpuser "amanda"
    inparallel 4            # maximum dumpers that will run in parallel (max 63)
                            # this maximum can be increased at compile-time,
                            # modifying MAX_DUMPERS in server-src/driverio.h
     
    dumporder "sssS"        # specify the priority order of each dumper
                            #   s -> smallest size
                            #   S -> biggest size
                            #   t -> smallest time
                            #   T -> biggest time
                            #   b -> smallest bandwidth
                            #   B -> biggest bandwitdh
                            # try "BTBTBTBTBTBT" if you are not holding
                            # disk constrained
     
    taperalgo first         # The algorithm used to choose which dump image to send
                            # to the taper.
     
                            # Possible values: [first|firstfit|largest|largestfit|smallest|last]
                            # Default: first.
     
                            # first         First in - first out.
                            # firstfit      The first dump image that will fit on the current tape.
                            # largest       The largest dump image.
                            # largestfit    The largest dump image that will fit on the current tape.
                            # smallest      The smallest dump image.
                            # last          Last in - first out.
    displayunit "g"         # Possible values: "k|m|g|t"
                            # Default: k.
                            # The unit used to print many numbers.
                            # k=kilo, m=mega, g=giga, t=tera
    netusage  8000 Kbps     # maximum net bandwidth for Amanda, in KB per sec
     
    bumpsize 20 Mb          # minimum savings (threshold) to bump level 1 -> 2
    bumppercent 20          # minimum savings (threshold) to bump level 1 -> 2
    bumpdays 1              # minimum days at each level
    # By default, Amanda can only track at most one run per calendar day. When
    # the usetimestamps option is enabled, however, Amanda can track as many
    # runs as you care to make.
    # WARNING: This option is not backward-compatible. Do not enable it if you
    #          intend to downgrade your server installation to any version
    #          earlier than Amanda 2.5.1
    usetimestamps yes
     
    device_output_buffer_size 1280k
                            # amount of buffer space to use when writing to devices
     
    # If you want Amanda to automatically label any non-Amanda tapes it
    # encounters, uncomment the line below. Note that this will ERASE any
    # non-Amanda tapes you may have, and may also ERASE any near-failing tapes.
    # Use with caution.
    ## autolabel "DailySet1-%%%" empty
     
    maxdumpsize -1          # Maximum total size the planner will schedule
                            # for a run (default: runtapes * tape_length) (kbytes).
    bumpmult 4              # threshold = bumpsize * bumpmult^(level-1)
     
     
     
    amrecover_changer "changer"     # amrecover will use the changer if you restore
        # from this device. It could be a string like 'changer' and amrecover will use your
        # changer if you set your tape to 'changer' with 'setdevice changer' or via
        # 'tapedev "changer"' in amanda-client.conf
    autoflush no
    infofile "/backups/info"      # database DIRECTORY
    logdir   "/backups/log"              # log directory
    indexdir "/backups/index"        # index directory
    holdingdisk hd1 {
        directory "/backups/holding"
        use 1000 Mb
    }
    define interface local {
        use 8000 kbps
    }
  4. Now it’s time for include and exclude files ! For me this is easy, I’m backing up /home and /etc and trying to not backup any BS in these spots.

    Here is /etc/amanda/JOBNAME/exclude-list. Note that this is a VirtualMin server, so some of these paths may seem strange to others.

    /home/*/log*
    /home/*/old*
    /home/*/*old
    /home/*/awstats
    /home/*/tmp
    /home/*/back*
    /home/*.zip
    /home/*.gz
    /home/*.bz2
    /home/*.tgz
    /home/*.tbz
    /home/*.tbz2
    /home/*/homes/*/Maildir/.spam
    /home/*/homes/*/Maildir/.Trash*
    /home/*/Maildir/.spam
    /home/*/Maildir/.Trash*
    *.o
    */core

    And the include list at /etc/amanda/JOBNAME/disklist:

    localhost /etc gpg-encrypt-nocomp
    localhost /home gpg-encrypt-nocomp

GnuPG and Keychain Setup

In order to not store your encryption key password in clear text you can instead use gpg-agent! In order to use gpg-agent, you really need to use keychain.

  1. Install : For CentOS I just ran yum install keychain et voila.
  2. Add the following lines to /etc/amanda/.bash_profile assuming your amanda user’s home directory
    is set to /etc/amanda. Check /etc/passwd to verify this.

    export PATH="/usr/local/sbin:/usr/local/bin:$PATH"
    host=`uname -n`
    [ -f $HOME/.keychain/$host-sh-gpg ] && \
            . $HOME/.keychain/$host-sh-gpg

    Now every time the backup user logs in, it should get the running key server. Big CAVEAT HERE : you will need to re-enter the passcode to the gpg key EVERY TIME the server is reboot. I think this is worth the price of additional security for that key to all your files.

  3. Now let’s create a key :
    %sudo -i -u amanda
    $ gpg --gen-key
    ...
    $ gpg --list-keys
    /etc/amanda/.gnupg/pubring.gpg
    ------------------------------
    pub   1024D/ABCD1234 2011-01-17
    ...

    Ok, so you followed the directions, created a key, and now you know it’s numbered ABCD1234, so you can copy that into your makefile.

  4. Next test keychain with sudo -u amanda make keychain. Again, I’m assuming you’re using my makefile.

Running the Backup

First let’s test the backup. I’m assuming you’re using my makefile from above, or you enjoy typing more than you have to.

  1. sudo -i -u amanda
  2. make slots
  3. make check
  4. make dump
  5. make report

You will be now tracking down lots of stupid setup issues for 3-4 hours. But hopefully this guide saved you an additional 3-4.

Scheduling the Backup

Once you are certain that everything is working, you can turn on backups! Add the following line to /etc/crontab.

0 3 * * * amanda /usr/local/sbin/amdump JOBNAME

Bare-Metal Restoring

Ok, so what good are backups if you can't restore anything. This is important. You need to safely store copies of the amanda setup and the encryption key or otherwise you are just paying Amazon for nothing.

  1. Create a copy of the /etc/amanda directory and make sure it has .gnupg/ in it.
    tar cvfz amanda_restore.tar.gz -C /etc amanda
  2. Copy that archive to a safe place(s)
  3. Setup amanda on the restore server and untar amanda_restore.tar.gz somewhere
  4. Setup /etc/amanda/restore

    Here is /etc/amanda/restore/amanda.conf

    dumpcycle 4 weeks       # the number of days in the normal dump cycle
    runspercycle 10         # the number of amdump runs in dumpcycle days
    tapecycle 10 tapes
    runtapes 1              # number of tapes to be used in a single run of amdump
    autolabel "daily-%%%%" empty
    labelstr "^daily-[0-9][0-9]*$"  # label constraint regex: all tapes must match
     
    define tapetype S3 {
        comment "S3 Bucket"
        length 2 gigabytes
    }
    device_property "S3_ACCESS_KEY" "ADFSASDFDASDFD1234"
    device_property "S3_SECRET_KEY" "biglongkeycopypastehere"
    device_property "S3_STORAGE_CLASS" "REDUCED_REDUNDANCY"
    # Curl needs to have S3 Certification Authority (Verisign today)
    # in its CA list. If connection fails, try setting this no NO
    device_property "S3_SSL" "YES"
     
    # 3.1.x
    #tpchanger "chg-multi"  # the tape-changer glue script
    #tapedev "S3:"          # the no-rewind tape device to be used
    #tapetype S3    # what kind of tape it is (see tapetypes below)
    #changerfile "changer.conf"
     
     
    # 3.2.x
    tpchanger "chg-multi:s3:ADSFADSDFFADDF1234-JOBNAME/daily/slot-{01,02,03,04,05,06,07,08,09,10}"
    changerfile  "s3-statefile"
    tapetype S3
  5. Let's try a download from S3
    #!/bin/sh
     
    amrestore  \
        --config test \
        -r \
        chg-multi:s3:ADFASDSFDADSFDASDFD1234-JOBNAME/daily/slot-01 \
        -h localhost

    Note that this is only getting one slot, and since these are partial jobs, you would need to get all the slots. This saves a file called $HOSTNAME-$DIRNAME-$TIMESTAMP.RAW, as used in the next step.

  6. Now let's decrypt it. Note that I've un-tarred the backup into /root/restore in order to get the key in the .gnupg directory.
    #!/bin/sh
     
    export GNUPGHOME=/root/restore/.gnupg
    dd if=localhost._etc.20110117152337.0.0000001.RAW bs=32k skip=1 | gpg --decrypt > restore.tar
  7. And then untar a file inside with tar xvf restore.tar ./some/file/in/the/tar

To reiterate : If you cannot make the above restore steps work, then there's no point in backing up. If you don't save the key and config files somewhere, there's no point in backing up. There are more user-friendly restore processes and scripts that come with Amanda, but I couldn't make them work with the S3+GPG combination. Hence the above step-by-step approach.

Partial Restore on a working Server

In the case that you want to restore some accidentally lost data you can use the methods lined up in the wiki. I haven't gotten this far yet, I just wanted to make sure the bare-metal version worked since that's harder.

Links

I found the following pages of help when trying to figure this out. Please note that a lot of these setup directives and config files are out of date for the newest versions of Amanda.

Share and Enjoy!