Backup of a Local Drive

From Wiki99

Jump to: navigation, search

↑ Computers ↑
← prev: Backup Software next: Backup of a Remote Drive →


Contents

A Local Backup Script

To use this script,

  1. copy and paste it into a text file,
  2. make modifications appropriate for your needs,
  3. save the file as something like ~/bin/BackupPhotos.sh,
  4. make it executable ( chmod +x ~/bin/BackupPhotos.sh)
  5. run it using sudo ( sudo ~/bin/BackupPhotos.sh)

This script uses quite a few bash features that we have not discussed before, but this is not meant to be a tutorial on bash. Each feature should be pretty obvious, and if not, there are plenty of places on the web that cover bash scripting.
This is the official bash manual.
This is a guide to bash scripting that starts simple but gets to advanced material without wasting time.

The script

#!/bin/bash

#This script performs a backup.
# The variables you'd need to set to modify it for you needs are clustered
# below.
#Two non-obvious things you should be aware of are:
# * The script must be run as superuser, ie sudo backupScript
# * It's not a good idea to run two instances of this script at the same 
#   time if they write to the same hard drive. This is because the diskutil
#   stage of the backup, where the destination drive is checked for file 
#   system consistency, requires the drive to be unmounted, which you don't want
#   to happen if the other script is buy writing to it.
#   In theory two copies of the script that write to different drives should 
#   run in parallel without a problem, but I've never had a reason to try 
#   that so I can't promise anything.
#===============================================================================

HOME_DIR=/Users/mjh
RSYNC_LOCAL=$HOME_DIR/bin/rsync
POWERSHIFT=$HOME_DIR/bin/powershift
BACKUP_EXCLUDES=$HOME_DIR/bin/backup_excludes.txt

MAIL_ADDR=mjh@bluecloud.com
LOG_FILE=$HOME_DIR/Library/Logs/backup.log
#Using $$ below uses the process ID in the file name and thus makes it unique.
TMP_FILE=/tmp/myBak.$$.txt

DST_VOL=/Volumes/Backup1

BAK_NAME=Photos
SRC_DIR=/Volumes/Photos/
DST_DIR=$DST_VOL/Backups/

#===============================================================================

ReportErrorAndExit()
{
    #This function reports an error. 
    # It takes a compulsory argument, $1, a string that describes the error and
    # and optional argument, $2. 
    # If $2 is anything, the error string is only logged to stderr, otherwise 
    # it is also logged to the log file.
    # The error string is also mailed to $MAIL_ADDR.

    #Set bash "word" separator to newline only.
    # (If we didn't do this, the string argument passed in would not be 
    # treated as a single $1 argument.)
    # Normally you'd want to restore this after you're done, but we're exiting
    # at the end of this function so that's not necessary.
    IFS=$'\n'
    ERROR_STRING="BACKUP $BAK_NAME: $1"

    if [[ $2 ]]; then
        echo    $ERROR_STRING >&2
    else
        echo    $ERROR_STRING >&2
        echo    $ERROR_STRING >> $LOG_FILE
    fi
    mail -s $ERROR_STRING $MAIL_ADDR </dev/null &> /dev/null
    
    exit 1
}
#===============================================================================

#Test user is root

if [[ `id -u` != 0 ]]; then 
    ReportErrorAndExit "user is not root" DONT_LOG
fi

#...............................................................................

echo "============================================================" >> $LOG_FILE
echo `date`                                                         >> $LOG_FILE
echo "Start backup $BAK_NAME"                                       >> $LOG_FILE

#...............................................................................

#Test src volume exists
if [[ ! -d $SRC_DIR ]]; then
    ReportErrorAndExit "$SRC_DIR does not exist"
fi

#Test dst directory exists
if [[ ! -d $DST_DIR ]]; then
    ReportErrorAndExit "$DST_DIR does not exist"
fi

#...............................................................................
#Force the backup drive to have permissions enabled

#This (helpfully non-documented, no-built in help --- thanks Apple) command will
# force permissions to be enabled for the backup drive.
# http://www.macosxhints.com/article.php?story=20020925051644480
vsdbutil -a $DST_VOL
#Test that the backup drive has permissions enabled (otherwise not only do we 
# have the obvious problem of permissions not stored correctly), we also have 
# the dest-link stuff won't generate correct links for anything with ownership
# that's not the current user.
PERMISSIONS_ENABLED_ON_BACKUP=`diskutil info $DST_VOL | grep "Owners" | awk '{print $2}'`
if [[ $PERMISSIONS_ENABLED_ON_BACKUP != "Enabled" ]]; then
    ReportErrorAndExit" backup drive does not have permissions enabled"
fi

#-------------------------------------------------------------------------------
#Do the actual backup. If a prior backup exists, use that as a link source.

   INITIAL_SIZE=`df -k $DST_VOL | grep "^/" | awk '{print $4}'`
INITIAL_SECONDS=`date "+%s"`

sync
if [[ ! -d $DST_DIR/1 ]]; then
     $RSYNC_LOCAL -axHEy  --delete --delete-after                               \
      --delete-excluded --exclude-from=$BACKUP_EXCLUDES                         \
      --ea-checksum                                                             \
      --stats --progress                                                        \
      $SRC_DIR      $DST_DIR/0/
else
     $RSYNC_LOCAL -axHEy  --delete --delete-after                               \
      --delete-excluded --exclude-from=$BACKUP_EXCLUDES                         \
      --ea-checksum                                                             \
      --link-dest=$DST_DIR/1/                                                   \
      --stats                                                                   \
      $SRC_DIR      $DST_DIR/0/                                                 \
    >>$LOG_FILE  2>&1
fi
RSYNC_ERROR_CODE=$?
RSYNC_PHASE=1
# An error code of 24 (some files vanished before they could be transferred) 
# is common and not worth treating as an error.
if [[ $RSYNC_ERROR_CODE == 24 ]]; then RSYNC_ERROR_CODE=0; fi

#Copy over the boot file (which has special permissions and can't be 
# multiple-hard-linked to).
#We run this step twice to cope with either Intel or PPC boot.
if [[ $RSYNC_ERROR_CODE == 0 ]]; then
    BOOT=/System/Library/CoreServices/BootX
    if [[ -e $SRC_DIR/$BOOT ]]; then
        $RSYNC_LOCAL -aEW  --delete --delete-after                              \
          --ea-checksum                                                         \
          --rsync-path=$RSYNC_REMOTE                                            \
          $SRC_DIR/$BOOT    $DST_DIR/0/$BOOT                                    \
        >>$LOG_FILE  2>&1
        RSYNC_ERROR_CODE=$?
        RSYNC_PHASE=2
    fi
fi
if [[ $RSYNC_ERROR_CODE == 0 ]]; then
    BOOT=/System/Library/CoreServices/boot.efi
    if [[ -e $SRC_DIR/$BOOT ]]; then
        $RSYNC_LOCAL -aEW  --delete --delete-after                              \
          --ea-checksum                                                         \
          --rsync-path=$RSYNC_REMOTE                                            \
          $SRC_DIR/$BOOT    $DST_DIR/0/$BOOT                                    \
        >>$LOG_FILE  2>&1
        RSYNC_ERROR_CODE=$?
        RSYNC_PHASE=3
    fi
fi

#...............................................................................

   FINAL_SIZE=`df -k $DST_VOL | grep "^/" | awk '{print $4}'`
FINAL_SECONDS=`date "+%s"`
let   CHANGE_IN_SIZE=$(($INITIAL_SIZE  - $FINAL_SIZE     ))
let DURATION_SECONDS=$(($FINAL_SECONDS - $INITIAL_SECONDS))
let   DURATION_HOURS=$(($DURATION_SECONDS/3600))
let DURATION_SECONDS=$(($DURATION_SECONDS-$DURATION_HOURS*3600))
let DURATION_MINUTES=$(($DURATION_SECONDS/60))
let DURATION_SECONDS=$(($DURATION_SECONDS-$DURATION_MINUTES*60))
echo 
echo "Backup Duration hr min s =" $DURATION_HOURS $DURATION_MINUTES $DURATION_SECONDS >> $LOG_FILE
echo "Backup Change in size  KB MB GB =" $CHANGE_IN_SIZE                        \
  $(( ($CHANGE_IN_SIZE+512)/1024 ))                                             \
  $(( ($CHANGE_IN_SIZE+512*1024)/1024/1024 )) >> $LOG_FILE

if [[ $RSYNC_ERROR_CODE != 0 ]]; then 
    ReportErrorAndExit "*** rsync reported error $RSYNC_ERROR_CODE in phase $RSYNC_PHASE"
fi

#-------------------------------------------------------------------------------
#Proactively repair the backup volume.

#1 Get the device node for the backup volume.
#  We will need this later.
DST_VOLUME_DEV=`diskutil info $DST_VOL | grep "Device Identifier:  " |  awk '{ print $3 }'`

#2 Loop trying to unmount the backup volume. 
#  This may take a few tries because Spotlight may be busy indexing the volume.
COUNTER=0
while [[ $COUNTER < 3 ]]; do
    diskutil unmount $DST_VOL &> /dev/null
    UNMOUNT_ERROR_CODE=$?
    if [[ $UNMOUNT_ERROR_CODE == 0 ]]; then 
        let COUNTER=3;
    else
        let COUNTER=$COUNTER+1
        echo "Could not unmount. Waiting 60 seconds. Attempt $COUNTER of 3."
        sleep 60
    fi
done
#3 Once we unomunted successfully, remount the drive
#  We should now be cleared to run diskutil repairVolume without problems
#  when the repair tries to unmount the volume.
diskutil mount $DST_VOLUME_DEV &> /dev/null

#...............................................................................

   INITIAL_SIZE=`df -k $DST_VOL | grep "^/" | awk '{print $4}'`
INITIAL_SECONDS=`date "+%s"`

rm $TMP_FILE &> /dev/null
touch $TMP_FILE
tail $TMP_FILE& 
echo "dm rv"
diskutil repairVolume $DST_VOL &> $TMP_FILE
REPAIR_ERROR_CODE=$?
kill %1  #Kill the tail command above.

if [[ $REPAIR_ERROR_CODE != 0 ]]; then
    cat $TMP_FILE >> $LOG_FILE
    rm $TMP_FILE &> /dev/null
    ReportErrorAndExit "*** diskutil reported error $REPAIR_ERROR_CODE"
else
    rm $TMP_FILE &> /dev/null
fi
sync

#...............................................................................

   FINAL_SIZE=`df -k $DST_VOL | grep "^/" | awk '{print $4}'`
FINAL_SECONDS=`date "+%s"`
let   CHANGE_IN_SIZE=$(($INITIAL_SIZE  - $FINAL_SIZE     ))
let DURATION_SECONDS=$(($FINAL_SECONDS - $INITIAL_SECONDS))
let   DURATION_HOURS=$(($DURATION_SECONDS/3600))
let DURATION_SECONDS=$(($DURATION_SECONDS-$DURATION_HOURS*3600))
let DURATION_MINUTES=$(($DURATION_SECONDS/60))
let DURATION_SECONDS=$(($DURATION_SECONDS-$DURATION_MINUTES*60))
echo "Repair Duration hr min s =" $DURATION_HOURS $DURATION_MINUTES $DURATION_SECONDS >> $LOG_FILE
echo "Repair Change in size KB MB =" $CHANGE_IN_SIZE                            \
  $(( ($CHANGE_IN_SIZE+512)/1024 )) >> $LOG_FILE

#-------------------------------------------------------------------------------
#Shift all the backup names down by 1.

$POWERSHIFT $DST_DIR 5 >> $LOG_FILE   2>&1
echo "============================================================" >> $LOG_FILE

#===============================================================================

Discussion of the script

Script variables

The script starts off defining the variables that you are most likely to want to change. I have a few backup scripts, one for each drive that I backup, and they all follow this same pattern with the details changed.

We start off defining where to find the copy of rsync that will be used for this backup. Remember that this script will run as root, so defining paths as RSYNC_LOCAL=~/bin/rsync is not going to work very well!
The file referred to by BACKUP_EXCLUDES is a text file that lists files and directories we do not want backed up. After we've discussed everything else in the script, we'll discuss what you should put in this file.

Next are three variables to report status as things go along. Change these as you wish. You will certainly want to change the email address. (If the script encounters any errors, mail will be sent to this address.) You can change the location of the log file, and of a temp file the script uses, but there's really no need to do so.

With DST_VOL we get to the real stuff. This is the name of the volume (ie the disk partition) used for your backups. Or, to put it simply, this is the name of the hard drive that Finder displays. It's probably best to give your backup drive a name that is simple in UNIX terms, with no spaces, puntuation marks and so on. While fancy names can be escaped, it's difficult, especially when writing scripts, to be sure that you've got the details exactly correct. You might want to call your backup drive Backups or Backup_Seagate_400GiB, or whatever.

BAK_NAME is simply a name for this particular backup routine, to be used when reporting errors.

SRC_DIR is the name of the hard drive that is being backed up. Once again, this is the name the Finder displays for the hard drive. If you are backing up the boot drive for a computer, you can write this as SRC_DIR=/, but if you are backing up any other drive, eg an external drive you connect using FireWire that holds your photos or home videos, then it will have a name that looks like SRC_DIR=/Volumes/driveName. You can get a list of the drives connected to your computer by typing

cd /Volumes
ls

or, once again, just by looking at the drives mounted in Finder.

The SRC_DIR drive should be a local drive, ie not a drive connected over the network using SMB or AFP or whatever. The next article will describe what's different when performing a network backup.

Finally DST_DIR specifies where on the backup hard drive this backup is to be placed.

ReportErrorAndExit() function

This is a simple utility function we define to simplify error reporting at various points in the script.

All errors are reported in two ways:

  • the error is printed to stderr or to the log file
  • you are sent an email notification of the error

The very first error condition checked for is only reported using stderr. This is because that first error is that the script is not running as root, and you cannot write to the log file unless you are root. All subsequent errors are reported both in the log file and using stderr.

To see the errors (along with various statistics about each backup) backup log:

  1. open Console.app (in /Applications/Utilities)
  2. if you don't see a list of logs available for viewing on the left side of the window, then click the button in the top left of the window labelled Logs

  3. click the triangle next to ~/Library/Logs to see a list of logs in your home directory

  4. right now you will probably not see backup.log in this list because no such log exists yet. One will be created the first time the script runs. Subsequently each time the script runs it will add more material to the log. You may want to keep Console.app open at least the first time you run the script to see what is recorded in the log. Occasionally you should look at it to see that there are no surprises.

Test that user is root

Nothing much to say here. We perform the test and, if the user is not root, report the error and exit.

Test that the source and the destination exist

rsync will not make directories on your backup drive that do not exist. Thus, if you want your backups to be created on hard drive Backup_400GiB in directory Backups/Photos/, you need to create the directory Backups and then the directory Photos, either through the command line or using Finder. If you forgot to make these directories, you'll get an error at this point. (Obviously you only have to create these directories once. On each subsequent run of the script, the directories will already exist.)

The most common error this code detects is that you have a backup drive that you only switch on or plug into FireWire/USB when backing up, and you forgot to switch it on or plug it in.

Force the backup drive to have permissions enabled

We discussed this earlier. vsdbutil should force the permissions to be enabled, but just to be safe, we then check that they are enable using diskutil.

Spotlight

With the most recent implementations of Spotlight (10.4.8+) the problems described below seem to be a lot less of an issue. Spotlight remains flaky if you try to control it with mdutil, but it no longer seems to affect rsync performance. I tried two backups with it both active and inactive, and they were identical in terms of time taken.
The price you pay for this is that after you think the backup is done, the drive may spin updating Spotlight info for up to an hour or so after the backup is done, and may (or may not, seems to be luck of the draw) allow you to unmount it during this period. So the bottom line is that, for now, I'll retain this info, but if things remain good in Leopard, I'll delete it.

The current implementation of Spotlight is a real problem for backups.

Recall that what Spotlight does is track all changes to files as files are written, so that at any point the index that allows searching in files is up to date.
Ideally you do not want Spotlight running during backups because this tracking of changes and updating of the index slows down the backup. (The disk head keeps jumping around between where you are writing data to the disk, and where Spotlight is writing changes to the disk index.)

This might seem like it shouldn't be a problem. Apple provides a command, mdutil, that is supposed to allow the switching off Spotlight for a particular drive then, later, switching it on again.
Unfortunately there appear to be bugs in Spotlight that mean this does not work very well, (ie not at all in any useful sense) If you try to switch Spotlight indexing off before doing the backup, then switch it on after the backup, chances are extremely high that either the index that Spotlight creates is corrupt (meaning that Spotlight doesn't work properly on the backup volume) or, even worse, the entire file system will be corrupted while Spotlight is rebuilding its index, and so badly that Apple Disk Utility can't repair it.

The problem appears to be related to having many many hard links on a volume. My experience, when I was playing around with this, was that I could run four or five backups in this way (switch Spotlight off during the backup), and things would be fine, but around the sixth or seventh backup, after switching Spotlight back on and letting the indexing proceed, the result was a corrupted file system. After this happened twice, I gave up and decided to just accept slower backups that have Spotlight active during the backup. (If you run Activity Monitor during the backup and look at what programs are using the CPU, programs like mdimport and mds are pieces of Spotlight; you'll see them occasionally taking up a fair bit of CPU.)

Spotlight is not only a problem on the destination side. It is also, and rather more strangely, a problem on the source side.

When you run rsync, the first stage consists of scanning the source disk to build an internal representatio of all files and directories and their properties. When this stage is running, you will see that there is a vast amount of write activity occurring, every bit as much, or more, as the read activity. You can get a rough sense of this by using Activity Monitor to see the disk IO rates and observing that, along with rsync, the other program taking a lot of CPU is mds. You can get a more accurate view by running sudo fs_usage which will list the commands being generated by the file system; there are many many writes interspersed with the rsync reads, and the writes are all generated by mds. (You may find it easier to see exactly what is going on, or find the results more believable, when you run rsync over a network, and view the behavior of the source drive.)

Naturally whatever write activity is generated by mds slows down disk activity tremendously. Once again it would be very nice to be able to switch off Spotlight for the source volume (it would be even nicer if Spotlight acted sanely in this situation and didn't need to be switched off); once again doing so appears to be a risky business that may result in Spotlight never again working satisfactorily on that volume.

When I was playing around, and in 10.3 before Spotlight, I wrote code to scan my entire hard drive (using UNIX primitives and stat'ing each file) in about 4 minutes. (Using the Carbon calls that directly walk the catalog B-tree would presumably be rather faster.) I have a rough recollection, but I may be misremembering, that rsync could do it in about 10 minutes. On 10.4.8 rsync takes about 20 minutes.

There is obviously room here for rsync to be smarter about how it walks, but the primary onus is really on Apple to get all aspects of its random disk IO act together.

rsync

Now we get to the actual rsync command.

There are two versions of the rsync command line for the main back; the first is used the very first time a backup is performed, the second for each subsequent backup.

They differ in the following ways:

  • The first-time version reports its progress to stdout (--progress) while the subsequent-time version does not bother with a detailed progress report and reports a summary (--stats of what happened to the log file ( >>$LOG_FILE 2>&1).
  • The subsequent-time version uses the directory $DST_DIR/1/, ie the previous backup, as a base directory, using the --link-dest directive. Before any file is copied to the new backup directory, it is compared with its equivalent in this base directory, and if they match in all respects then, instead of copying over the file, a hard link is created in the new backup directory referencing the same inode as is referenced by this base directory.

Apart from these, the command lines are the same.

Let's look at each flag and directive:

  • The -a flag tells rsync to copy over the standard UNIX metadata when it copies files (eg permissions and dates), not just the file data; and to recurse over directories.
  • The -x command tells rsync not to recurse across mount points, ie to traverse only a single file system. If you don't know what that means, don't worry, the important thing is that you really want this flag set.

  • The -H flag tells rsync to preserve, on the destination volume, the hard link structure of the source volume. That is, if you have a file on your source volume with a hard link that refers to it, this will be copied to the destination volume as a single copy of the file and a hard link, not as two copies of the file (which is what a Finder copy would do).

  • The -E flag tells rsync to copy over Apple specific metadata like the resource fork and extended attributes. This flag is the main thing that separates an HFS+ specific version of rsync from a general version of rsync.

  • The -y flag does not exist in the Apple version of rsync, only in the one I suggested you download. If you are using the Apple version of rsync, leave it out. This flag tells rsync to perform fuzzy matching.

    Without fuzzy matching, when rsync looks for the equivalent of a file in the base directory (the --link-dest directory), it looks for a file with the exact same path and exact same name, then tests if all attributes are identical. This means that if all you did was change the name of a file, it will no longer match its equivalent in the base directory.

    With fuzzy matching, rsync will look for a file that has all the same attributes, even if the names do not match. If it finds a match, it will create the hard link using the new name you gave the file when you renamed it.

  • The next set of directives, --delete and --delete-after, tell rsync that if it finds a file in the destination directory that is not in the source directory, it should delete it (--delete) and that this deleting should be done at the very end of rsync's work, not as soon as rsync encounters the file (--delete-after).

    This might seem a bit pointless. Given how I described our backup strategy, the usual scheme is to back up into directory $DST_DIR/0/ which didn't exist until the backup started and so is clearly empty.

    The reason for these directives is to be safer and more flexible. Suppose you start a backup, it runs for two hours, gets 90% done, but then something comes up and you have to stop it for some reason (you can stop the backup script just by typing control-C in the terminal window, like you can stop any other unix command-line program). You will now have a $DST_DIR/0/ directory on your backup drive that is 90% full, and when, in a few days, you have a chance to continue the backup, you'd really like the backup to use all that material you've already copied over, rather than starting from scratch. But now, of course, if you deleted some files between then and now, you want the new backup to delete those files from $DST_DIR/0/.
    Hence the --delete directive.

    Whether to use --delete-after or not is a matter of taste. To my mind it seems safer that rsync only start deleting files at the very end of its run, when it's quite sure that everything went fine. Other people prefer that rsync delete things the moment it finds files that should not be in the destination directory on the grounds that that frees up space which might be needed later in the backup. It's your choice.

  • The directive --exclude-from=$BACKUP_EXCLUDES specifies that the files and directories listed in text file $BACKUP_EXCLUDES should not be copied over, while the directive --delete-excluded specifies that if a file or directory should somehow be found in the destination that is on the list of excluded files and directories, it should be deleted.

  • The directive --ea-checksum is somewhat technical and has to do with how rsync was retrofitted to fit extended-attributes into its existing code base; the main point is that by having it you will avoid rsync generating a number of warnings and complaints about files named ._someRandomFileName not existing. As I've said before, extended attributes are new to rsync, and things don't yet work completely smoothly --- even with this flag you may occasionally see such warnings which you can ignore.

    Note that the Apple version of rsync does not provide this flag and, in consequence, will generate a lot of warnings about these ._someRandomFileName not existing. With luck Leopard will bring us improvements on this front; until then you have to simply put up with the warnings.

After the rsync command runs, we check for, and perhaps report errors, along with noting in the log file how long the backup took and how much disk space it required.

rsync the boot file (if necessary)

The file used to boot MacOS X is /System/Library/CoreServices/BootX (on PPC) or /System/Library/CoreServices/boot.efi (on Intel). This file has special permissions that seem to cause problems when rsync tries to create a hard link to it.

The easiest way to deal with this problem is to

  • not copy the file during the main rsync pass and then
  • specially copy over the file (not using hard links) in a second pass

and this is what we do. The file is prevented from being copied by being in the $BACKUP_EXCLUDES file.

The code to do this tests whether either the PPC or the Intel boot file exists and, if so, copies it over in the appropriate manner. If you are backing up a non-boot volume, this part of the script will, of course, find no boot file and so do nothing.

It appears to be Apple's goal for Leopard to have a single unified OS file that boots both PPC and Intel. When that occurs, presumably this boot file's name will change to something different from both the above cases, and the code here and the $BACKUP_EXCLUDES file will have to be changed as appropriate.

diskutil

We then run diskutil (the command line version of Apple Disk Utility) to repair any possible filesystem inconsistencies on our backup drive. This check, for me, usually takes about as long as the backup. (The first backup of a large drive takes a few hours, but subsequent backups take about half an hour.) You can, of course, skip it if you like, but I'd rather be safe and catch any possible corruption of the backup drive right away, not when I'm trying to restore from the backup drive.

Note that before we run the diskutil command that repairs file system inconsistency, we perform a loop that tries three times to unmount the backup volume and then, when if was successfully unmounted, remounts. This seems a bit strange? Why do it?

The reason is that it is not uncommon for Spotlight to lag somewhat behind rsync with all the file writing that occurs during a backup. When diskutilthen tries to unmount the backup drive, the first step in scanning it for errors, the unmount fails because Spotlight will not allow the drive to be unmounted. Rather than report an error for this rather petty reason, the script tries to make sure the disk has calmed down enough to allow for unmounting before it tries to check the file system.

If you find that a drive just will not unmount, even after this delay, the most common reason is that you have a terminal window open with the current directory set to a directory on that hard drive. It's pretty pathetic (IMHO) that hard drives won't umount in that situation, but that's the way it is.

powershift

Finally, assuming everything went well, which is usually the case, we run powershift to renumber the immediate backup directory, along with all the older backup directories.

The BackupExcludes Text File

You want to

  1. copy the material below,
  2. modify it as appropriate for your needs
  3. save it in a file named something like ~/bin/backup_excludes.txt

The file contents

/private/var/launchd/**/sock
/opt/local/var/run/mysql5/**
/private/var/spool/postfix/private/**
/private/var/spool/postfix/public/**

/Volumes/
/Network/
/afs/
/automount/
/.vol/
/dev/**

/.hotfiles.btree
/.Spotlight-V100
/private/var/db/Spotlight-V100/
/.journal
/.journal_info_block
/private/var/db/BootCache.playlist
/System/Library/Extensions.kextcache
/System/Library/Extensions.mkext

/System/Library/CoreServices/BootX
/System/Library/CoreServices/boot.efi

/private/var/run/
**/.TemporaryItems/
/private/tmp/
/private/var/tmp/
/private/var/vm/

**/.FBCIndex
**/.FBCLockFolder/
**/.DS_Store
/Desktop DB
/Desktop DF
/TheVolumeSettingsFolder/
/mach.sym
/cores/

**/Library/Caches/
/Users/*/Library/Preferences/Metrowerks/CW Debugging Cache/
/Users/*/Library/Icons/**/*.cache
/Users/*/Library/Safari/Icons/**/*.cache

/.Trashes/
**/.Trash/
**/Library/Mail/**/Deleted Messages*.mbox
**/Library/Mail/**/Junk*.mbox

Discussion of the file

As mentioned, you can supply to rsync a file that lists items you do not want backed up. The kinds of things you want to put in this file take two forms. There are:

  • items that will cause problems if backed up and
  • items that you are not interested in backing up

The syntax for the list is fairly obvious, but you can read man rsync if you want to learn more details.

Problematic items

The smaller set is the problem causing items. These are various types of special items that are visible in the file system as files or directories, but which are not really files or directories.

unix domain sockets

/private/var/launchd/**/sock
/opt/local/var/run/mysql5/**
/private/var/spool/postfix/private/**
/private/var/spool/postfix/public/**

These directories hold unix sockets which are used for interprocess communication and disappear when the machine powers down; they are not real files.

directories that are special mount points

/Volumes/
/Network/
/afs/
/automount/
/.vol/
/dev/**

These directories make local hard drives, network hard drives, and other hardware visible through the file system, but again they are not standard directories.

files that are tied to a specific volume

/.hotfiles.btree
/.Spotlight-V100
/private/var/db/Spotlight-V100/
/.journal
/.journal_info_block
/private/var/db/BootCache.playlist
/System/Library/Extensions.kextcache
/System/Library/Extensions.mkext

These are files on a hard drive that are very much tied to the specifics of that hard drive. The Spotlight files are used for Spotlight on that drive, the .hotfiles.btree and BootCache.playlist are used to optimize performance of the drive, the .journal files hold a queue of file system operations that have not yet been fully written out to the drive, and the Extensions.kextcache and the Extensions.mkext files hold device drivers in a form optimized for boot on this specific computer with its specific hardware, and are rebuilt when the hardware changes.

the boot file

/System/Library/CoreServices/BootX
/System/Library/CoreServices/boot.efi

These files are used to boot the computer (BootX for PPC, boot.efi for Intel). We want to back them up, but they have special permissions, so we need some special code to back them up.

Uninteresting items

Now we have the files that we are not interested in backing up:

temporary items

/private/var/run/
**/.TemporaryItems/
/private/tmp/
/private/var/tmp/
/private/var/vm/

These directories hold various temporary material that only makes sense during a particular boot and which is normally cleared on reboot.

obsolete files (mostly only of interest to Classic)

**/.FBCIndex
**/.FBCLockFolder/
**/.DS_Store
/Desktop DB
/Desktop DF
/TheVolumeSettingsFolder/
/mach.sym
/cores/

These are various pieces of junk used by the OS for one reason or another, but mostly obsolete and of no interest anymore. If you are still running MacOS Classic you might want to keep these.

mach.sym sounds important, but it's in fact created by MacOS X at boot time, so there's no need to back it up.

caches in various places

**/Library/Caches/
/Users/*/Library/Preferences/Metrowerks/CW Debugging Cache/
/Users/*/Library/Icons/**/*.cache
/Users/*/Library/Safari/Icons/**/*.cache

The whole point of caches is that they don't hold anything essential, just temporary data that can be recreated when necessary.

trash folders of various sorts

/.Trashes/
**/.Trash/
**/Library/Mail/**/Deleted Messages*.mbox
**/Library/Mail/**/Junk*.mbox

I feel no need to waste time and space storing copies of what's in my Trash or in my Junk mailbox. If you use your trash as a storage locker and frequently have in there files you actually care about, you may want to remove these from the list above.

What's Left?

You don't want to quit reading yet.
The next article will discuss backing up over a network. Even if this does not sound like it is of interest to you, you should read it to learn about using multiple link-dest directories which may be of interest to you.

Backing up is nice, but if disaster strikes, you need to be ready to recover from a backup, so we need to discuss this part of the problem.

Finally, as I've already mentioned, backing up databases has its own special set of problems. We'll talk about those after we've discussed MySQL and some web apps that utilize it.


← prev: Backup Software next: Backup of a Remote Drive →

Personal tools