Category Archives: Tutorials

Calling rsync with Python’s Subprocess module

I was recently trying to script multiple file transfers via rsync, but unfortunately, I was unable to control file names.  I chose to use python and issue commands to the OS to initiate transfer.  Initially, everything was working great, but as soon as I encountered a space or parenthesis, the script blew up!

In this tuturial, I’m showing how to transfer a single file, but rsync is a very powerful tool capable of much more.  The principles discussed in this post can be adapted with other uses of rsync, but I’m considering rsync usage to be out of scope here.  There is a very good online man page here: (http://linux.die.net/man/1/rsync).  I’ve chosen to initiate transfers one file at a time, so I can easily have multiple connection streams running vs a single connection stream that transmits all files in sequence.  It is much faster this way with my current ISP, as I suspect they shape my traffic.  Also note, these methods can be applied to scp file transfers as well.

We will start with a very basic rsync command to copy /tmp/test.txt from a remote location to my local pc.  Before starting, I’ve set up public key authentication between my home pc and the remote server.  I initiate the connection from my home pc, as I don’t care to put private keys in remote locations that could provide access to my home network.

/usr/bin/rsync -va myusername@host.domain.com:/tmp/test.txt

This works very well, but what happens when the file has a space? With most commands, we can just wrap quotes around it, and it works.

rsync myusername@host.domain.com:'/tmp/with space.txt' '/tmp/with space.txt'
rsync: link_stat "/tmp/with" failed: No such file or directory (2)
rsync: link_stat "/home/myusername/space.txt" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1637) [Receiver=3.1.0]

Unfortunately, in this case, the remote system sees “/tmp/with space.txt” as two separate files, /tmp/with and $HOME/space.txt. What we need to do for the remote location is both wrap it with quotes and escape it.  We could also double escape the filename, but I chose to keep things looking a little bit sane.

/usr/bin/rsync myusername@host.domain.com:'/tmp/with space.txt' '/tmp/with space.txt'

That is fine, but we need a good way to do this on the fly when we are given file names in bulk.  There are three key libraries I like to use when doing this:

  • subprocess – This is an extremely powerful library for spawning processes on the OS.
  • os.path – This submodule of os contains very useful tools for manipulating filesystem object strings.
  • re – Regular expression operations provides an easy to use escape function.

In a nutshell, here is the operation that needs to happen to create the command and execute it:

import re
import subprocess

full_remote_path = "/tmp/filename with space.txt"
full_local_path = "/tmp/filename with space.txt"
remote_username = "myusername"
remote_hostname = "host.domain.com"

# Here we use re.escape to escape the paths.
escaped_remote = re.escape(full_remote_path)
escaped_local = re.escape(full_local_path)

# I've chosen to just escape the local path and leave off the quotes.
cmd = "/usr/bin/rsync -va %s@%s:'%s' %s" % (remote_username, remote_hostname, escaped_remote, escaped_local)
print cmd

p = subprocess.Popen(cmd, shell=True).wait()

Here is the rsync command that is sent to the os:

/usr/bin/rsync -va myusername@host.domain.com:'/tmp/filename with space.txt' /tmp/filename with space.txt

Now that we have this working, now, I get to explain how os.path fits in.  Should you be copying /tmp/mydirectory/afile.txt on the remote system to /tmp on your local system, but /tmp/mydirectory does not exist, you will receive an error:

rsync -qv myusername@host.domain.com:/tmp/test.txt /tmp/mydirectory/test.txt
rsync: change_dir#3 "/tmp/mydirectory" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at main.c(694) [Receiver=3.1.0]

The easiest way to do this would be to run a simple mkdir -p command on /tmp/mydirectory before beginning.  Should the directory exist, the command does nothing.  Should it be missing, it will be created with the necessary parent directories.  In a case where you are copying a file to a remote machine, you can pass this command to the remote machine via ssh.

To do this in python, I like to take the full filename, and split it to receive the complete directory path.

import os
import re
import subprocess

local = "/tmp/mydirectory/test.txt"

localdir = os.path.split(local)[0]
localdir = "%s/" % localdir
localdir = re.escape(localdir)

mkdir_cmd = '/bin/mkdir -p %s' % localdir
p = subprocess.Popen(mkdir_cmd, shell=True).wait()

Here is my full example code that I created to test and demo this technique:

#! /usr/bin/python

import subprocess
import os
import re

def do_rsync(rh, ru, rd, rf, ld):

 # The full file path is the directory plus file.
 remote = os.path.join(rd, rf)

 # escape all characters in the full file path.
 remote = re.escape(remote)

 # here we format the remote location as 'username@hostname:'location'
 remote = "%s@%s:'%s'" % (ru, rh, remote)

 # here we define the desired full path of the new file.
 local = os.path.join(ld, rf)

 # This statement will provide the containing directory of the file
 # this is useful in case the file passed as rf contains a directory
 localdir = os.path.split(local)[0]

 # os.path.split always returns a directory without the trailing /
 # We add it back here
 localdir = "%s/" % localdir

 # escape all characters in the local filename/directory
 local = re.escape(local)
 localdir = re.escape(localdir)

 # before issuing the rsync command, I've been running a mkdir command
 # Without this, if the directory did not exist, rsync would fail.
 # If the directory exists, then the mkdir command does nothing.
 # If you are copying the file to the remote directoy, the mkdir command can be passed by ssh
 mkdir_cmd = '/bin/mkdir -p %s' % localdir

 # create the rsync command
 rsync_cmd = '/usr/bin/rsync -va %s %s' % (remote, local)

 # Now we run the commands.
 # shell=True is used as the excaped characters would cause failures.
 p1 = subprocess.Popen(mkdir_cmd, shell=True).wait()
 p2 = subprocess.Popen(rsync_cmd, shell=True).wait()
 print ""
 return 0

rh = "host.domain.com"
ru = "myusername"
rd = "/tmp"
rf = "test.txt"
ld = "/tmp"

print "Here we do a simple test with test.dat"
do_rsync(rh, ru, rd, rf, ld)

rf = "this is a filename - with (stuff) in it.dat"

print "Here is a filename with a bit more character."
do_rsync(rh, ru, rd, rf, ld)

exit()

A function like this could be put into place very easily, but a few changes would be necessary in order to make this production ready.  The rsync cleanup can be minimized by changing the -v to a -q, but in doing this, you will want to check the exit status using subprocess to determine if the transfer was successful.  In my case, I chose to use the process and queue functions from the multiprocessing module to manage multiple streams.

DIY Raspberry PI Camera Case

I’m currently working on a image analysis program using my Raspberry Pi and the Raspberry Pi camera.  Once that is in a complete state, and running full time, I plan to purchase a commercially produced mount.  In the meantime, I decided to make a little DIY Raspberry Pi mount.

Most hobby electronics ship in small electrostatic bags.  I keep every one of these I get.  I used one, trimmed it down, then cut a small hole for the camera to peek out.   This particular bag contained a Raspberry Pi.

Step 1 - Electrostatic bag

(forgive my crappy cameraphone focus, but I’m not going to take it apart again just to get more pictures.)

Next, I cut some cardboard to an approximate width of the camera board.  I used very thick cardboard and a boxcutter.

Step 2 - Cut some cardboard

 

Then, I taped one end of the electrostatic bag to to the cardboard so that I could wrap the bag around the cardboard.  Be sure to position it so the camera board will be centered on the cardboard.  I decided to use Scotch tape for this project as it would stick well to the bag.  I feared that duct tape would leave residue.

Step 3 - Tape the bag

Then, just wrap the bag around the cardboard and tape the other edge.

Step 4 - Wrap the bag

I used more tape, and just re-enforced the hole I had cut in. It may be difficult to see.

2013-08-28 11.36.30

I’m personally mounting my camera to look out the window, so I used more cardboard to sit in between the camera and the window.

2013-08-28 11.40.22Then, I just taped the whole thing to the window.  As you may have noticed, I added more cardboard below the camera to tilt the camera downward.

Tape to the windowThis will do just fine!  Again, I’ll be ordering a commercially produced mount  to replace this once my code is ready.  At that point, I’ll be mounting the camera higher on the window to avoid looking through the screen.  I like the mount currently for sale on Adafruit, but I’m also hoping for more variety in the coming weeks or months.

While I’m at it, I reviewed the Raspberry Pi here on Element 14’s community site: http://www.element14.com/community/roadTestReviews/1520

Aspell Custom Dictionary

I’ve been playing around a bit with google go and the aspell package.  It’s been working great, except I haven’t found a good way to tell aspell to exclude or include words as needed.

I was able to find ways to add words to your custom dictionary, but I did not find a good way to exclude the custom words.  In this particular project, I found that many small two letter combinations were marked as correct, but they were not defined as words according to the dictionary.  Perhaps they are abbreviations, but I did not want them marked as correct for this particular project.

Instead of messing with the existing aspell dictionary, I decided to create a new language dictionary in aspell.

The command “aspell dicts” will dump the existing dictionaries so you can see what already exists.  I chose rv_EN to use.  The dictionary files are kept in /usr/lib/aspell on my system.

First, I created the file rv_EN.multi, which contains only a single line: “add rv_EN.rws”.  The command “aspell dicts” will confirm that aspell can now see the en_RV dictionary.

Now, we will need to create the rv_EN.rws file that defines our dictionary.  This is essentially a three step process.

  1. Dump existing dictionary into a text file
    /usr/bin/aspell -d en dump master | aspell -l en expand > /home/ryan/cust_dict/words.txt
  2. Add or remove words as needed
    I created remove_bad.py for this
  3. Convert text file into custom.rws
    sudo aspell –lang=en create master /usr/lib/aspell/rv_EN.rws < /home/ryan/cust_dict/goodwords.txt

I’ve scripted this process, and have put all necessary files in /home/ryan/custom_dict/.  When running the scripts, I have three files:

  • exclude.txt – this contains a list of the words I would like to remove from the dictionary
  • remove_bad.py – This is a python script that generates a new word list.
  • update.sh – Shell script that will execute all commands.  It should be run as root as you will need root privs to write to /usr/lib/aspell/.

Here are my scripts:

#! /bin/bash
# update.sh - run this as root.
# This could be entered into cron, but I have not done so, as I just run
# the script manually after editing the excluded words text file.
#
# creates words.txt by dumping the english dictionary from aspell
# calls remove_bad.py, which generates goodwords.txt
#    goodwords.txt is all words in words.txt except for those listed in
#    exclude.txt
# creates /usr/lib/aspell/rv_EN.rws from goodwords.txt
# rv_EN is already configured to use custom.rws only

# export english dictionaries to words.txt
echo "Exporting words to text file."
/usr/bin/aspell -d en dump master | aspell -l en expand > /home/ryan/cust_dict/words.txt

# remove the bad words
/home/ryan/cust_dict/remove_bad.py

echo "Converting word list into dictionary file."
aspell --lang=en create master /usr/lib/aspell/rv_EN.rws < /home/ryan/cust_dict/goodwords.txt

echo "Cleaning up!"
rm /home/ryan/cust_dict/words.txt
rm /home/ryan/cust_dict/goodwords.txt

 

#! /usr/bin/python
# remove_bad.py - this script generates a text file containing a list of
# good words to include into the aspell dictionary.
# remove_bad.py is called by update.sh

# open up list of words to remove from aspell dictionary.
# File should contain one word per line.
f = open("/home/ryan/cust_dict/exclude.txt")
badw = f.readlines()
f.close()

# status message showing how many words are in exclude list
print len(badw), "words in the exclude list."

# opens up the text dump of existing dictionary
f = open("/home/ryan/cust_dict/words.txt")
lines = f.readlines()
f.close()

# number of words in original dictionary
print len(lines), "words in the original dictionary."

# create file of good words
f = open("/home/ryan/cust_dict/goodwords.txt","w")

# this will write the dictionary words into goodwords.txt
# if they do not exist on the exclude list
for line in lines:
if line not in badw:
f.write(line)

# a function could be added here to add words to goodwords.txt
# if desired.

f.close()
exit()

Yes, I realize that I could probably do this with just a shell script, but I just prefer coding the file operations with python.  Either, way, I have my new dictionary, so I just define that when I’m creating my speller in go:

speller, err := aspell.NewSpeller(map[string]string{"lang": "rv_EN",})
// code from the go-aspell documentation.

Reference links: golang.orggo-aspell on Github | aspell

Edit: I found that my custom localization of english wasn’t working as expected, so I created a new language/local title rv_EN.  I’ve updated my post to reflect my changes.