Tag Archives: os.path

Calling rsync with Python’s Subprocess module

I was recently trying to script multiple file transfers via rsync, but unfortunately, I was unable to control file names.  I chose to use python and issue commands to the OS to initiate transfer.  Initially, everything was working great, but as soon as I encountered a space or parenthesis, the script blew up!

In this tuturial, I’m showing how to transfer a single file, but rsync is a very powerful tool capable of much more.  The principles discussed in this post can be adapted with other uses of rsync, but I’m considering rsync usage to be out of scope here.  There is a very good online man page here: (http://linux.die.net/man/1/rsync).  I’ve chosen to initiate transfers one file at a time, so I can easily have multiple connection streams running vs a single connection stream that transmits all files in sequence.  It is much faster this way with my current ISP, as I suspect they shape my traffic.  Also note, these methods can be applied to scp file transfers as well.

We will start with a very basic rsync command to copy /tmp/test.txt from a remote location to my local pc.  Before starting, I’ve set up public key authentication between my home pc and the remote server.  I initiate the connection from my home pc, as I don’t care to put private keys in remote locations that could provide access to my home network.

/usr/bin/rsync -va myusername@host.domain.com:/tmp/test.txt

This works very well, but what happens when the file has a space? With most commands, we can just wrap quotes around it, and it works.

rsync myusername@host.domain.com:'/tmp/with space.txt' '/tmp/with space.txt'
rsync: link_stat "/tmp/with" failed: No such file or directory (2)
rsync: link_stat "/home/myusername/space.txt" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1637) [Receiver=3.1.0]

Unfortunately, in this case, the remote system sees “/tmp/with space.txt” as two separate files, /tmp/with and $HOME/space.txt. What we need to do for the remote location is both wrap it with quotes and escape it.  We could also double escape the filename, but I chose to keep things looking a little bit sane.

/usr/bin/rsync myusername@host.domain.com:'/tmp/with space.txt' '/tmp/with space.txt'

That is fine, but we need a good way to do this on the fly when we are given file names in bulk.  There are three key libraries I like to use when doing this:

  • subprocess – This is an extremely powerful library for spawning processes on the OS.
  • os.path – This submodule of os contains very useful tools for manipulating filesystem object strings.
  • re – Regular expression operations provides an easy to use escape function.

In a nutshell, here is the operation that needs to happen to create the command and execute it:

import re
import subprocess

full_remote_path = "/tmp/filename with space.txt"
full_local_path = "/tmp/filename with space.txt"
remote_username = "myusername"
remote_hostname = "host.domain.com"

# Here we use re.escape to escape the paths.
escaped_remote = re.escape(full_remote_path)
escaped_local = re.escape(full_local_path)

# I've chosen to just escape the local path and leave off the quotes.
cmd = "/usr/bin/rsync -va %s@%s:'%s' %s" % (remote_username, remote_hostname, escaped_remote, escaped_local)
print cmd

p = subprocess.Popen(cmd, shell=True).wait()

Here is the rsync command that is sent to the os:

/usr/bin/rsync -va myusername@host.domain.com:'/tmp/filename with space.txt' /tmp/filename with space.txt

Now that we have this working, now, I get to explain how os.path fits in.  Should you be copying /tmp/mydirectory/afile.txt on the remote system to /tmp on your local system, but /tmp/mydirectory does not exist, you will receive an error:

rsync -qv myusername@host.domain.com:/tmp/test.txt /tmp/mydirectory/test.txt
rsync: change_dir#3 "/tmp/mydirectory" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at main.c(694) [Receiver=3.1.0]

The easiest way to do this would be to run a simple mkdir -p command on /tmp/mydirectory before beginning.  Should the directory exist, the command does nothing.  Should it be missing, it will be created with the necessary parent directories.  In a case where you are copying a file to a remote machine, you can pass this command to the remote machine via ssh.

To do this in python, I like to take the full filename, and split it to receive the complete directory path.

import os
import re
import subprocess

local = "/tmp/mydirectory/test.txt"

localdir = os.path.split(local)[0]
localdir = "%s/" % localdir
localdir = re.escape(localdir)

mkdir_cmd = '/bin/mkdir -p %s' % localdir
p = subprocess.Popen(mkdir_cmd, shell=True).wait()

Here is my full example code that I created to test and demo this technique:

#! /usr/bin/python

import subprocess
import os
import re

def do_rsync(rh, ru, rd, rf, ld):

 # The full file path is the directory plus file.
 remote = os.path.join(rd, rf)

 # escape all characters in the full file path.
 remote = re.escape(remote)

 # here we format the remote location as 'username@hostname:'location'
 remote = "%s@%s:'%s'" % (ru, rh, remote)

 # here we define the desired full path of the new file.
 local = os.path.join(ld, rf)

 # This statement will provide the containing directory of the file
 # this is useful in case the file passed as rf contains a directory
 localdir = os.path.split(local)[0]

 # os.path.split always returns a directory without the trailing /
 # We add it back here
 localdir = "%s/" % localdir

 # escape all characters in the local filename/directory
 local = re.escape(local)
 localdir = re.escape(localdir)

 # before issuing the rsync command, I've been running a mkdir command
 # Without this, if the directory did not exist, rsync would fail.
 # If the directory exists, then the mkdir command does nothing.
 # If you are copying the file to the remote directoy, the mkdir command can be passed by ssh
 mkdir_cmd = '/bin/mkdir -p %s' % localdir

 # create the rsync command
 rsync_cmd = '/usr/bin/rsync -va %s %s' % (remote, local)

 # Now we run the commands.
 # shell=True is used as the excaped characters would cause failures.
 p1 = subprocess.Popen(mkdir_cmd, shell=True).wait()
 p2 = subprocess.Popen(rsync_cmd, shell=True).wait()
 print ""
 return 0

rh = "host.domain.com"
ru = "myusername"
rd = "/tmp"
rf = "test.txt"
ld = "/tmp"

print "Here we do a simple test with test.dat"
do_rsync(rh, ru, rd, rf, ld)

rf = "this is a filename - with (stuff) in it.dat"

print "Here is a filename with a bit more character."
do_rsync(rh, ru, rd, rf, ld)

exit()

A function like this could be put into place very easily, but a few changes would be necessary in order to make this production ready.  The rsync cleanup can be minimized by changing the -v to a -q, but in doing this, you will want to check the exit status using subprocess to determine if the transfer was successful.  In my case, I chose to use the process and queue functions from the multiprocessing module to manage multiple streams.