Looking for Archivematica the software project? see: archivematica.org

archivemati.ca

archivemati.ca header image 1

A New Backup Routine for My Personal Digital Archives

January 13th, 2006 · 20 Comments · Personal Digital Archives

I hate to break this to you but your computer’s hard disk will likely fail. If your hard disk is a trooper and keeps spinning away long beyond the manufacturer’s warranty (3-5 years), there’s still the chance that your computer will be stolen or damaged by virus, fire, electrical storm, flood, kids, pets or a plague of locusts. If you don’t take some action now, you will eventually lose your digital documents, pictures, music, and videos…and that would suck.

Therefore, anyone that is serious about maintaining personal digital archives must have a backup plan. In fact, I would argue that if you don’t have a backup routine you are not maintaining a digital archives but you are just storing some digital data. Anyone that is actually managing a digital archives is committed to maintaining long-term accessibility to their digital collections. This includes providing ways to organize, find and use your digital information but also having a backup plan to deal with storage disaster and a preservation plan to deal with technological obsolensence and incompatibility. I will investigate the organization, retrieval and preservation of digital information over the course of my research. Right now I want to talk about backup because I’ve just implemented a new backup routine for my own digital archives.

Backup Options

There are different ways to organize a backup job (e.g. snapshots, incremental, differential, synchronization). Each of these are supported by different types of backup software and different types of storage media. For my personal digital archives I prefer to use a synchronization technique to replicate my digital archives on another hard disk. All this means is that I maintain an exact copy of my files in as close to real time as possible. When I add, edit or erase a file, the same action is automatically taken on the matching file in my backup. I prefer this to other types of backup because when I rename or reorganize my file directories or weed out files from my collections, I don’t want to have to repeat that work should I need to restore from my backup.

Anyway, I don’t want to go too much further into the boring details of the different ways to organize backups. I’ll summarize by saying that the primary options for the average user include:

  1. the cumbersome chore of burning gigabytes of digital data to stacks of CDs or DVDs
  2. backing up to an external hard drive
  3. paying through the nose per gigabyte for an online backup service
  4. using all those spare gigabytes on your Mom’s, friend’s or work computer to replicate your digital data

I recently chose to implement option #4. Here’s why.

1. Burning backup to CD or DVD

Home computers don’t ship with tape cartridge drives (most commonly used in enterprise settings for backup) so most users that are concerned about losing files have at one time or another backed up their data to CD or DVDs. Given the huge size of our ever-growing personal digital collections, this can take a lot of discs and is a manual process that requires patience and diligence. Not too many people can keep up this routine. I kept it up sporadically for a couple of years. Thankfully, I never had to rely on any of these CD backups because there would have been some major gaps in lost data.

2. External Hard Drive

I eventually got a Maxtor One-Touch external hard drive. It plugs into your USB port and comes with backup/replication software. You press the button on the front of the external disk drive and it automatically copies over any new or changed files from your computer since the last time you pressed the button. Very handy. This was much easier to maintain than a set of backup CDs. Anytime that I downloaded baby pictures from my camera to my computer I just hit the OneTouch button and I felt somewhat comforted by the fact that our precious memories would survive if the PC hard disk gave out (which I should clarify, it never has, but obviously I live in perpetual fear that one day it will).

However, I was always aware that this routine did not protect me in case someone broke into my house and stole my PC and external hard disk (which was sitting right next to it) or if our house were to be wiped out in a fire or flood. A colourful way to summarize the need for maintaining a backup copy of your information in a second, off-site location is The Story of Enoch presented in this quirky short animation by Nick Fox-Gieg (turn up your sound, then press the play button below).


To avoid having all of my digital archives knocked out by flood, fire or some other type of location-specific disaster, biblical or otherwise, I started lugging the Maxtor external hard disk to the office every so often to replicate my duplicate on a work computer. Despite being an external hard disk I would be hard pressed to call it portable. It was pretty heavy and akward to transport and I had to fight the wire jungle behind my computer at both locations to unplug and re-plug the USB cable and AC adapter. Also, I was not being consistent about how often I was taking the backup offsite and started to do it less and less considering how much of a hassle it was. This meant that I still stood to lose a significant portion of our digital memories should anything have happened to my home PC.

I was reluctant to look at any other options given the close to $300 I had shelled out for the Maxtor drive. Then the Maxtor helped me to make up my mind when it simply stopped working. For some inexplicable reason the driver software stopped working with all of my WindowsXP machines. I messed around for about a week with support, forums, tracking down and updating drivers, all to no avail. So on to investigate the next setup with preferably as little intervention on my part as possible. The most obvious solution to meet my requirements was an online backup service.

3. Online Backup Service

I was already using a reliable, stable service (from my ISP Telus) to backup my company data. I pay about $25 per month to backup up to 4GB of data. I can justify this as a business expense. I need to maintain this data for my livelihood. I also have no more than 3GB of business data which I consider a fairly manageable amount for 5 years worth of administrative files, email backups, project files, resource documents, and software application install files from my consulting business.

However, I have at least 10 Gigabytes of digital music and pictures on my personal PC and I’ve only been collecting digital music and taking digital pictures for a couple of years. Since I am also now beginning to shoot short (but of course sizeable) digital video clips I expect this number to keep growing at a much faster rate over the next couple of years.

About $10-15 per Gigabyte (on average) for a decent online backup service is a tough monthly expense to stomach. I would have trouble shelling out over $500 per year for backup services especially considering that I could buy consumer-grade hard disks at about $1 per Gigabyte, for a one-time cost, not one that repeats month after month. This got me to thinking…what about all that extra storage I have on my work PCs…

4. Replicating data to a spare, remote PC

Most PCs these days ship with hard disks that have about 80-120GB hard disks. I typically don’t use more than 50GB per PC and all I need is about 30 spare Gigabytes to backup my personal digital archives for the next couple of years.

I figured that it would be easy to find some software that would allow me synchronize directories on two PCs remotely. I had been using ViceVersa to replicate machines on my local network at my office. However, it turns out that to use this or any similar products between remote locations outside of a local network I would basically have to set up my own Virtual Private Network (VPN) to do it reliably and securely and that seemed like overkill to me.

I thought that someone would have created a solution like this on top of a peer-to-peer network like Gnutella but at the time that I was investigating this (November 2005) that didn’t seem to be case. I briefly considered developing my own solution using the open-source iFolder framework or the Unison File Synchronizer. However, I simply didn’t have the time to go into custom development on this. I needed something that I could install quickly and cheaply.

I did eventually track down a pay service that provided the ability to synchronize files on remote PCs but at this point I was really adament that I shouldn’t have to pay anything for backup storage since I already own sufficient hard disk storage myself and I pay enough already to my ISPs for moving bits and bytes over the phone lines.

The only two free services I could find were BuddyBackup and Foldershare. The first was a brand new beta launch but the second was a former commercial service that was just bought out my Microsoft and now offered for free. Unfortunately, the BuddyBackup beta crashed on me a couple of times and since the creators of BuddyBackup were not willing to release it as an open-source project and were hinting that it was to become a fee-based service I was reluctant to invest too much time in debugging and being a testing guineau pig.

This left me with the Microsoft solution (FolderShare). I was (and still am) distrustful about Microsoft providing a free service and having them move my data around. I expect them to start charging for this service at some point in the future and I half imagine Bill Gates’ evil twin in some dark Redmond lab mining everyone’s backup data to build the ultimate global mind control system. But that’s just being delusionally paranoid. The most likely sinister scenario would be some type of targeted advertising scheme along the lines of the Google Gmail controversy.

Anyway, for the time being I’ll believe that the RSA authentication and AES encryption via SSL that the service offers is secure and trustworthy. I have also disabled web access to my files so that they can only be accessed and used in decrypted form on the PCs that are the peer nodes in my own network.

To be fair, the FolderShare setup and configuration was a breeze and it just works very simply and very well, as advertised:

This setup has taken all the manual hassle out of keeping my backups up to date. Any changes I make to my files are automatically updated in the background and uploaded to the other, off-site computer when both are on (I leave my work computer running). I have also found it handy now to make changes or additions to my personal digital archives files while I am at work rather than having to wait until I am at home to download pictures from my cellphone or move a new MP3 file into my music library, for example.

I myself have decided to share the extra storage space on my home and office PCs for my remote backup but I expect that most people using this type service will be making arrangements with trusted family and friends to share spare storage on their respective computers.

The only restrictions on the Foldershare service is that each account is limited to 10 libraries with 10,000 files each. This meant that I had to create a few top-level libraries for my main content groups rather than being able to just set the service to work on the MyDocuments folder as a whole. This is, in fact, where I expect them to introduce ‘premium, pay-for-use service’ at some point in the future (i.e. allowing you to increase the number of libraries and files per library for a fee). Right now my pictures library is at 4000 files and my music library is at 2000 files so I should be o.k. for a while.

The other restriction is that the service only moves one file at a time per library. This means that the initial synchronization of the libraries took about 2.5 days. However, once this baseline was established the libraries sync almost instantly once new files are added.

Conclusion

Using peer-to-peer and synchronization technology to maintain a near-to-real-time backup of my personal digital archives is the only way to go for me. I essentially set it and forget it and I am not paying someone month after month for storage that I already own. My data is also stored on machines that I have full control over rather than servers in a data center that I have to trust exists somewhere and is being run securely and effectively.

I will be keeping a close, skeptical eye on the Foldershare business model and policies and may move on to (or develop if I have to) a true open-source and free alternative in the future . However, for the time being I am very satisfied with this service.

[UPDATE (FEB, 23/06): I have recently run into a Foldeshare glitch when I included my Microsoft Outlook .pst file in one of my Foldershare synchronization libraries. Unfortunately, Foldershare keeps trying to overwrite my current .pst file on my primary computer with an older version that was synced to one of my secondary computers. I believe it is because the secondary computer assigns the copy of the .pst file a new modified date/timestamp when it is finished writing on that machine. That timestamp is usually more current than the one on the primary computer, namely because I open the file on my primary computer first thing in the morning and it usually stays open until I log off at the end of the day. This means its timestamp doesn't get updated until that point in time. This wouldn't be such a big deal except Foldershare basically goes into a crazy loop trying to sync the file with the secondary copy but my system won't allow it because the same file is currently open on my primary computer. This means that all other Foldershare activity is suspended and I keep getting a nagging pop-up displayed on my screen about this 'error' that Foldershare is experiencing.

My Outlook .pst file is obviously one of my critical day-to-day business files as it handles all my email correspondence, contacts and scheduling information. Unfortunately, this means I have to use a second set of tools (a combination of Vice/Versa and my Telus online backup service) to synchronize and do an off-site backup of this file.

According to the FolderShare FAQ, this problem occurs when additional copies of MS-Outlook are opened on the secondary computers but I am experiencing this problem even though the two secondary computers to which this library is synced do not have an open copy of Outlook running. Now that Foldershare is a Microsoft company you would think and hope that they address this issue shortly as .pst files are a common and important file for most personal and professional desktop users. I have submitted this issue to FolderShare support, let's see what the response will be...]

20 responses so far ↓

  • 1 Boris Mann // Jan 14, 2006 at 2:29 pm

    I’ve been struggling with this as well. My friend Lloyd has long suggested that we form a loose group of friends that use rsync/BitTorrent/whatever to maintain encrypted dupes of data. Sounds painful unless someone automates it :P

    What I’ve now decided is to collapse storage needs and backup needs into one device. I’m going to be purchasing an external device that is an embedded RAID (mirrored) enclosure, attached via Firewire. I purchase a third drive, all the drives are in hot-swappable trays, so all I have to do is take out a drive and drop it off in our safety deposit box every week or so.

    More on this when I finally make the purchase and set it up.

  • 2 Peter Van Garderen // Jan 16, 2006 at 11:57 am

    Boris,

    Nice setup. Of course, RAID (Redunant Array of Independent Disks) is the standard way to do synchronization or ‘mirroring’ in enterprise or professional settings. However, I think this might be a bit too expensive or complicated for the average personal computer user.

    I also wonder how consistent you are going to be with dropping off the third drive in your safety deposit box. I predict that the time between swapping these will increase exponentially after a few weeks because it sounds like a bit of a hassle (just like I found with taking my external hard drive off-site).

  • 3 Philip M. Howard // Jan 25, 2006 at 1:07 pm

    When I have the funds I, too, would like to set up a RAID-1 (mirror) array. I have lots of video and audio, so total amount of space will be an issue.

    I’m concerned that my “archival” DVD data may not survive. I keep hearing horror stories about how long burned DVD disks will last.

    Of course, every time one removes a RAID-1 disk from the array the system will have to reconstruct the mirror. I wonder how long that will take.

    Since I still have 20 year old VHS tapes that play I’m thinking that a tape backup (in a poplular common DAT format) would probably be the most ‘archival’ storage medium.

  • 4 Simon // Jan 31, 2006 at 10:26 pm

    Whilst we’re waiting for that ‘p2p’ solution to come, I’ve released this free backup software that you may be interested in. Though even with software like that, I’ve got to agree that backing up is quite annoying.

  • 5 Nick Fox-Gieg // Feb 3, 2006 at 1:29 pm

    Hey, I noticed the Zed link is broken–you can watch the short here instead:
    http://www.fox-gieg.com/shorts-enoch.html

    Great essay!

    Nick

  • 6 Peter Van Garderen // Feb 3, 2006 at 1:51 pm

    Thanks Nick. Love your movie. I’ve updated the link in my article to the copy on your own website.

  • 7 Philip M. Howard // Feb 18, 2006 at 6:34 pm

    Is “this free software” a broken link?

  • 8 Peter Van Garderen // Feb 18, 2006 at 9:12 pm

    Philip,

    Simon’s link never worked at all. He put anchor tags around the phrase ‘free backup software’ but didn’t include a URL. He also did not leave anyone way to reach him so I am not 100% percent whether Simon or the free backup software are for real…

  • 9 Philip M. Howard // Feb 24, 2006 at 11:03 am

    Something about the idea of a “loose group of friends” backing up data over the net to each other struck a chord with me… after searching for a bit I “remembered” LOCKSS:

    ‘LOCKSS is open source, peer-to-peer software that functions as a persistent access preservation system. Information is delivered via the web, and stored using a sophisticated but easy to use caching system. Simply put, LOCKSS provides for Jefferson’s “multiplication of copies,” but with an electronic twist.’

    http://lockss.stanford.edu/

    A good model. Now we just need the consumer level open source software to form peer-to-peer backup networks. I’m sure there are some contenders out there, but I haven’t come across them yet myself.

  • 10 Zoli Erdos // Mar 24, 2006 at 6:35 pm

    My Foldershare / Outlook experience is even worse, and receive no response from Foldershare. The issue is not Outlook on the secondary computer, but on the primary. Win XP seems to lock the .pst file used by the default email profile (which you set in Control Panel) at bootup. Even if I don’t even open Outlook, the file is locked and Foldershare can’t copy it to the other PC. The only rather nasty workaround I’ve found was:
    - create a dummy email profile
    - Open outlook with this dummy profile
    - Quit Outlook
    - reboot PC – now XP remembers the dummy, so won’t lock my real pst
    - Copy the pst file
    - Change Outlook to the real profile.

    This is pretty baaaad :-(

  • 11 Peter Van Garderen // Mar 26, 2006 at 10:17 am

    Zoli, I also had to invent a work-around for the Foldershare’s lack of adequate .pst support. I moved my master Outlook .pst file to a directory that is outside the scope of any Foldershare Library. Then I manually sync (using ViceVersa) the master .pst file to a directory that is inside a Foldershare library. Even though it is a one-click work-around it is a bit of a pain in the butt (having to remember to manually sync every now and then) but I haven’t been able to find any other solution other than abadoning Foldershare altogether.

  • 12 Zoli Erdos // Mar 26, 2006 at 10:29 am

    Peter, interestingly enough I found the solution after posting my comment. It wasn’t WinXP that kept the pst file open, but Copernic desktop search (I believe other desktop search products do the same). So my half-manual, half-auto solution is now to start the PC in the morning, close desktop search, let the Foldershare sync take care of it while I have breakfast.. this way it works well with the default Outlook data directories.

    A similar solution to yours that I used to apply was to have a startup command run that copies the Outlook master pst to a directory inside the My Documents path – that way the copy gets synchronized, and the worst case is that it’s a day behind.

  • 13 Zoli Erdos // Apr 21, 2006 at 9:32 am

    Finally I’ve found the convenient, free online backup solution at Mozy.   The free starter pack is 2G which can be expanded by referrals – 256MB per person.   If you use the link above, I get an extra 256MB, and so do you, so your starter space is 2.25G – then you can start your own referral links. 
    I still like Foldershare though for PC to PC sync.  

  • 14 Zoli Erdos // Apr 25, 2006 at 6:14 am

    Hm, is it used as a backup device? :-)

  • 15 Dan Bell // Aug 3, 2006 at 10:14 am

    Hi, I’ve used Mozy, great product, upload speeds are slow and restores may take a bit longer then expected but it’s worth it!

    Other then the sync option, can Foldershare be used just to back up. So, can my 20 devices backup to one main device with out each device getting one anothers folders?

    So far I setup Foldershare on a server ‘main repository’ with 2 satellite devices. However, I can’t find a way to get the 2 devices to just upload their info to the main repository with out copying info across to each device.

  • 16 Peter Van Garderen // Aug 3, 2006 at 3:31 pm

    Dan, Foldershare is a syncing application. So, yes, it will always copy to all the devices in the library. It sounds like you need an online backup service like Mozy for what you are trying to do.

  • 17 Dan Bell // Aug 12, 2006 at 3:02 pm

    Hi Peter,

    Thanx for the response, yes, Mozy would be my best bet at this point. However Foldershare support suggested doing the following. Unfortunately it wouldn’t help me, i need to backup more then 10 units:

    ‘As of the moment the structure that you want to do is not yet possible
    in FolderShare. FolderShare is not design to determine the `Master
    computer` and `Slave` computers. What is allows you to do is sync files
    across all computers associated to a particular library.

    I do have an idea on how you can do the structure that you want but I
    have not yet tested this process. Let us say you have four computers
    namely: `PC1 (the master computer)`, `PC2 (the 1st slave computer)`,
    `PC3 (the 2nd slave computer)` and `PC4 (the 4th slave computer)`. Since
    you are allowed to create up to 10 libraries in one FolderShare account,
    I was thinking that you create separate libraries for each computer.
    Refer to this information below:

    `PC2 (the 1st slave computer)` has a library named `Backup PC2`
    `PC3 (the 2nd slave computer)` has a library named `Backup PC3`
    `PC4 (the 4th slave computer)` has a library named `Backup PC4`

    Now, you are going to sync these libraries one by one to `PC1 (the
    master computer)` and the destination of these files are on your
    `Desktop` having their respective folder names. Here is how to do about
    this:

    `PC2 (the 1st slave computer)` has a library named `Backup PC2` will
    sync the files to `PC1 (the master computer)` on ‘Desktop` using the
    same library/ folder name `Backup PC2`. In short, you are now
    transferring the files from `PC2 (the 1st slave computer)` to `PC1 (the
    master computer)` on the folder named `Backup PC2` on your `Desktop`.

    1. Create a folder named `Backup PC2` under `My Documents` on the
    computer named`PC2 (the 1st slave computer)`
    2. Sign into your FolderShare account on both machines the `PC2 (the 1st
    slave computer)` and `PC1 (the master computer)`
    3. Click the ‘Sync My Folders’ option on the source computer. (Hint:
    `PC2 (the 1st slave computer)`
    4. Click ‘Specify folders to sync’.
    5. Select the machine where your created folder located. (Hint: `PC2
    (the 1st slave computer)`
    6. Click the ‘Specify a folder’ option.
    7. Locate the folder you want to share or synchronize from the locations
    (hint: folder named `Backup PC2` under `My Documents`) that are
    displayed.
    8. Select the folder you want to synchronized, then click ‘Next’.
    9. Select the device on the left pane where you want to synchronize your
    created folder. [Hint:`PC1 (the master computer)`]
    10. Click the ‘Specify a folder’ option.
    11. Click ‘Desktop’ to easily locate the folder that you will
    synchronize.
    12. Create a folder name putting a dot on the ‘Create a new folder’
    option.
    13. Enter a folder name `Backup PC2` on the box, then click ‘Next’.
    14. Click ‘Finish’.
    15. Select the option to ‘Automatically Sync’ before clicking the
    ‘Complete Setup and Start Syncing’.

  • 18 Jeremy // Feb 7, 2007 at 4:40 am

    You gotta test IBackup for Windows. It’s a trusted and pioneer online backup and storage application.

    IBackup applications have lots of features for secure online storage plus other options like network drive, sharing, collaboration, Sub-Accounts and mobile access. All applications have 128-bit SSL encryption as the default option. IBackup has both browser-based and downloadable applications for Windows, Linux, Unix and Mac platforms. IBackup for Windows is also compatible with newly released Windows Vista. This software is very flexible, allowing you to run backup operations in a manner you require.

    You can either backup and restore interactively or schedule regular online backups. Incremental and compressed backups greatly reduce your network bandwidth. You can also restore files from the ‘Snapshots’ of files maintained in your IBackup account. You can also backup open files with IBackup for Windows. Currently this feature is restricted to Windows XP and Windows 2003 servers and not for all Windows operating systems. Mac users can try IDrive for Mac, an excellent desktop interface for working with the IBackup account and Mac.

    There are options to receive custom emails after a backup is successfully over. You can also share important business documents, photos, music and videos with friends or family. IBackup lets you have direct control over access to the space, folders and the files. You have control over who can view, edit, save and upload folders and files stored in your IBackup account.

    Securely backup all your photos to your online account and access them from anywhere. You can view them as thumbnails or as an animated slide show using the Media Gallery. feature in Web-Manager. You can also play your favorite audio or video files stored in your IBackup account using Media Gallery. You can collaborate with your employees and business partners by creating Sub-Accounts for them on different folders in your IBackup account. You can try these features to get a hang of them by signing up for a free trial.

  • 19 Peter Van Garderen // Feb 8, 2007 at 9:53 am

    Hi Jeremy,

    Thanks for dumping a commercial on my blog. I won’t delete it because it is actually on topic.

    I am using Mozy for my offline backup. It has all the features that IBackup has except the ability to browse Media files online. IBackup is one these interesting cross-overs between backup and hosting services.

    Anyway, Mozy offers UNLIMITED storage for $4.99!!! That’s just insane! Your average family collection will be running close to 50GB of digital media (mp3s, digital photos, and digital video). That would cost $49.95 a month on IBackup.

    My only complaint about Mozy is that the restore function is slow and clunky. The interface for selecting your files for restore is pretty slow and then it breaks down and packages your restore into 2GB zip files which you have to download separately (with no option to set the directory tree to relative – it unzips using the absolute directory path).

  • 20 Jeremy Hauger // Mar 12, 2008 at 6:12 pm

    So has anyone come up with a free or low cost purchase/no subscription p2p backup solution for sync between friends that works?