Jacob Budin is a 29-year-old software developer. He works at Kettle and lives in Jersey City. He graduated Phi Beta Kappa from Penn State with degrees in English and marketing.

Thursday, September 5, 2013

An Introduction to File Synchronization with Unison

Unison is a cross-platform file synchronization tool. Unlike rsync’s unidirectional approach, Unison is intended for a two-way merging of data. For example, it will alert you to potential conflicts in ASCII files, and if possible, can merge them on your behalf. And unlike traditional version platforms, Unison handles large binary files, like movies, with ease. While written in OCaml, the tool requires no knowledge of the programming language. Originally written in 1998, Unison has the benefits of both being very stable and actively maintained.

Installation

Unix-like

First, determine whether your distribution already provides a binary of Unison through its package manager. Otherwise, use the directions below.

Mac OS X

For more up-to-date versions of Unison, Homebrew offers a “recipe” for installing the package (brew install unison). MacPorts also has the package available.

Other

Download OCaml and GNU make from you package manager, then download the package archive from Unison’s download directory. Install with make as you would normally.

Windows

Download the binary from Alan Schmitt’s Web site.

Terminology 101

Replicas are the set of files (and folders) you would like synced. On each machine that is synced, the replica has a root which is the folder path (e.g., /home/jacob/) where the synced files will be stored. (For example, different roots allow you to keep the contents from a server /home/jacob/ in sync with its logical desktop counterpart /Users/jacob/.)

Additionally, you may specify paths, which limit the synchronization to the specified folders within your root. For example, you may only want to sync your music and books (/home/bob/music/, /home/bob/books/); using paths instead of separate replicas simplifies syncing and is easier to manage.

Lastly, Unison features profiles, which are text files that allow you to save roots, their paths, and the configuration settings to apply to the sync.

Your First Sync

Like all software of this variety, Unison’s power is matched by its potential danger. The tool keeps hidden “archive” files to retain data of the previous state of your directory tree; this allows Unison to keep track of deletions and propogate them when you sync your replicas.1 So while familiarizing yourself with the tool, be careful in experimenting with its ubiquituous options; some of these options may override its built-in safeguards.

On your system, create two dummy folders: dummy1 and dummy2. Use your text editor to create a couple dummy text files, and save them to dummy1. Now, let’s try to sync these folders (moving the files in dummy1 to the empty directory dummy2):

$ unison dummy1 dummy2

Since this the first time you are synchronizing these roots, Unison will warn you. Why? Unison does this to protect you from accidentally syncing the wrong paths. Since Unison has a “memory” of what roots were previously synchronized (via its “archive” files), you will only see this message on your initial sync between two roots. If you see this message while performing a routine sync, stop the sync (type q) and review your command line input; you probably made a typo.

Since this is your first sync and you’ve reviewed the input you entered, press the space bar to proceed. Now, you’ll be presented, one-by-one with a list of files that have changed:

dummy1         dummy2
new file ----> 1.rtf  [f]

This tells us there’s a new file 1.rtf that’s going to our destination. The character in the brackets (in this case, f) is the default action. In this case, the default action is to copy the file from dummy1 to dummy2, which is exactly what one would expect. Press f. To see a full list of actions you can take at this prompt, type ?. Once you’re satisfied, press f again at the next prompt to sync the second file.

If you’re satisfied you completed the steps above correctly, press y to propogate your changes. Otherwise, press n and follow the steps again.

Be aware: You do not need to manually “authorize” each action, which would be arduous for even a relatively modest number of files, but Unison will request permission like this by default. The next section will show you how to speed this up.

To illustrate the how Unison works, let’s try deleting a file and syncing again. At this point, dummy2 should contain two text files. Let’s delete one, and re-sync using the same command from before:

$ unison dummy1 dummy2

You should be presented with this:

dummy1         dummy2
         <---- deleted    1.rtf  [f]

Unison uses its “archive” files to determine the state of your previous sync (in this case, both roots previously contained 1.rtf), and it recognizes that one file has been deleted in one root. Consequently, its default action is to delete the file in dummy1. Please note: The command we used has not changed, and just as importantly, the order of the roots does not affect the synchronization.

Now that you understand the basics, let’s review a more complex example.

Let’s Get Serious

Syncing a couple of a text files on a local machine is wholly unimpressive, and admittedly unhelpful. In the real world, Unison can sync large filesets using complex configurations across many machines locally and remotely. It is just as powerful as services such as Dropbox.

In this example, I have two local computers and one remote server, in this case, a low-end VPS. I want to keep my user media in sync for reasons of convenience and data backup. To do this, I’ll use a Unison profile to store my settings.

Since my VPS is powered on 24 hours a day (and has the fastest Internet connection), I will be content to sync any local changes solely to the server. Since I’m regularly syncing both local machines to the server, this will have the effect of all three machines being in sync. Plus, in case of a fire or theft, my latest changes will be securely stored in a datacenter.

To follow along: First, install Unison on all of the machines. Next, familiarize yourself with ssh. Optimally, you have generated a identity key for the server (that is, you log-in without a password), and you have a stored ssh configuration (in ~/.ssh/config) on your local machines.

On Unix-like OSs, Unison stores its archive files and seeks profiles in ~/.unison/. Let’s create a new profile media.prf in this folder, and save the following text:

root = /Users/jacob
root = ssh://coolserver//home/jacob/

path = Movies
path = Music
path = Books
path = .unison/media.prf

ignore = Name .DS_Store

diff = ksdiff
auto = true
times = true

Let’s go through it section by section:

  1. I establish the roots: my home folders. For my server (the second line), I’m connecting to it via ssh using my ~/.ssh/config file. The config file contains the hostname (in this case it’s coolserver) and all of the credentials necessary to connect. Also notice the extra / preceding the path: this is necessary when specifying an absolute path.
  2. I limit the synchronization to just my media, in this case the folders Movies, Music, and Books as well as the Unison profile itself.2
  3. Third, I ask Unison to ignore files named .DS_Store. (These are non-essential Mac OS X files that store folder metadata.)
  4. I inform Unison to use the ksdiff command (part of Kaleidoscope) to review the merging of files. I also allow Unison to automatically accept the default action on sync and to synchronize file modification times. These final two lines will allow you to sync regularly with minimal human intervention.

Since I named my Unison profile media.prf, I can now run the following command on my local machine to perform the sync:

$ unison media

Assuming the media is stored locally (and the server paths are empty), this will simply upload the contents from my machine to my server. If I bought a new machine, I could run this command to download my media to it. Using this command on my local machines will keep my latest additions, deletions, and modifications current across all my machines.

Conclusion

For those who require more power in their file sychronization or privacy over their data, Unison is a powerful and unique tool. Its reference guide is comprehensive, and the command line interface is relatively intuitive. To learn more, visit Unison’s Web site.


  1. You can disable this feature. To disable deletions propogating, use the -nodeletion flag. 

  2. If the Unison profile you are syncing is the one currently being used, you will want to sync twice on the non-originating replicas. The first time to sync the profile changes (e.g., adding a path); the second time to use the new profile for your sync (e.g., having the added path be included in the sync).