Category Archives: Python

Calling rsync with Python’s Subprocess module

I was recently trying to script multiple file transfers via rsync, but unfortunately, I was unable to control file names.  I chose to use python and issue commands to the OS to initiate transfer.  Initially, everything was working great, but as soon as I encountered a space or parenthesis, the script blew up!

In this tuturial, I’m showing how to transfer a single file, but rsync is a very powerful tool capable of much more.  The principles discussed in this post can be adapted with other uses of rsync, but I’m considering rsync usage to be out of scope here.  There is a very good online man page here: (http://linux.die.net/man/1/rsync).  I’ve chosen to initiate transfers one file at a time, so I can easily have multiple connection streams running vs a single connection stream that transmits all files in sequence.  It is much faster this way with my current ISP, as I suspect they shape my traffic.  Also note, these methods can be applied to scp file transfers as well.

We will start with a very basic rsync command to copy /tmp/test.txt from a remote location to my local pc.  Before starting, I’ve set up public key authentication between my home pc and the remote server.  I initiate the connection from my home pc, as I don’t care to put private keys in remote locations that could provide access to my home network.

/usr/bin/rsync -va myusername@host.domain.com:/tmp/test.txt

This works very well, but what happens when the file has a space? With most commands, we can just wrap quotes around it, and it works.

rsync myusername@host.domain.com:'/tmp/with space.txt' '/tmp/with space.txt'
rsync: link_stat "/tmp/with" failed: No such file or directory (2)
rsync: link_stat "/home/myusername/space.txt" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1637) [Receiver=3.1.0]

Unfortunately, in this case, the remote system sees “/tmp/with space.txt” as two separate files, /tmp/with and $HOME/space.txt. What we need to do for the remote location is both wrap it with quotes and escape it.  We could also double escape the filename, but I chose to keep things looking a little bit sane.

/usr/bin/rsync myusername@host.domain.com:'/tmp/with space.txt' '/tmp/with space.txt'

That is fine, but we need a good way to do this on the fly when we are given file names in bulk.  There are three key libraries I like to use when doing this:

  • subprocess – This is an extremely powerful library for spawning processes on the OS.
  • os.path – This submodule of os contains very useful tools for manipulating filesystem object strings.
  • re – Regular expression operations provides an easy to use escape function.

In a nutshell, here is the operation that needs to happen to create the command and execute it:

import re
import subprocess

full_remote_path = "/tmp/filename with space.txt"
full_local_path = "/tmp/filename with space.txt"
remote_username = "myusername"
remote_hostname = "host.domain.com"

# Here we use re.escape to escape the paths.
escaped_remote = re.escape(full_remote_path)
escaped_local = re.escape(full_local_path)

# I've chosen to just escape the local path and leave off the quotes.
cmd = "/usr/bin/rsync -va %s@%s:'%s' %s" % (remote_username, remote_hostname, escaped_remote, escaped_local)
print cmd

p = subprocess.Popen(cmd, shell=True).wait()

Here is the rsync command that is sent to the os:

/usr/bin/rsync -va myusername@host.domain.com:'/tmp/filename with space.txt' /tmp/filename with space.txt

Now that we have this working, now, I get to explain how os.path fits in.  Should you be copying /tmp/mydirectory/afile.txt on the remote system to /tmp on your local system, but /tmp/mydirectory does not exist, you will receive an error:

rsync -qv myusername@host.domain.com:/tmp/test.txt /tmp/mydirectory/test.txt
rsync: change_dir#3 "/tmp/mydirectory" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at main.c(694) [Receiver=3.1.0]

The easiest way to do this would be to run a simple mkdir -p command on /tmp/mydirectory before beginning.  Should the directory exist, the command does nothing.  Should it be missing, it will be created with the necessary parent directories.  In a case where you are copying a file to a remote machine, you can pass this command to the remote machine via ssh.

To do this in python, I like to take the full filename, and split it to receive the complete directory path.

import os
import re
import subprocess

local = "/tmp/mydirectory/test.txt"

localdir = os.path.split(local)[0]
localdir = "%s/" % localdir
localdir = re.escape(localdir)

mkdir_cmd = '/bin/mkdir -p %s' % localdir
p = subprocess.Popen(mkdir_cmd, shell=True).wait()

Here is my full example code that I created to test and demo this technique:

#! /usr/bin/python

import subprocess
import os
import re

def do_rsync(rh, ru, rd, rf, ld):

 # The full file path is the directory plus file.
 remote = os.path.join(rd, rf)

 # escape all characters in the full file path.
 remote = re.escape(remote)

 # here we format the remote location as 'username@hostname:'location'
 remote = "%s@%s:'%s'" % (ru, rh, remote)

 # here we define the desired full path of the new file.
 local = os.path.join(ld, rf)

 # This statement will provide the containing directory of the file
 # this is useful in case the file passed as rf contains a directory
 localdir = os.path.split(local)[0]

 # os.path.split always returns a directory without the trailing /
 # We add it back here
 localdir = "%s/" % localdir

 # escape all characters in the local filename/directory
 local = re.escape(local)
 localdir = re.escape(localdir)

 # before issuing the rsync command, I've been running a mkdir command
 # Without this, if the directory did not exist, rsync would fail.
 # If the directory exists, then the mkdir command does nothing.
 # If you are copying the file to the remote directoy, the mkdir command can be passed by ssh
 mkdir_cmd = '/bin/mkdir -p %s' % localdir

 # create the rsync command
 rsync_cmd = '/usr/bin/rsync -va %s %s' % (remote, local)

 # Now we run the commands.
 # shell=True is used as the excaped characters would cause failures.
 p1 = subprocess.Popen(mkdir_cmd, shell=True).wait()
 p2 = subprocess.Popen(rsync_cmd, shell=True).wait()
 print ""
 return 0

rh = "host.domain.com"
ru = "myusername"
rd = "/tmp"
rf = "test.txt"
ld = "/tmp"

print "Here we do a simple test with test.dat"
do_rsync(rh, ru, rd, rf, ld)

rf = "this is a filename - with (stuff) in it.dat"

print "Here is a filename with a bit more character."
do_rsync(rh, ru, rd, rf, ld)

exit()

A function like this could be put into place very easily, but a few changes would be necessary in order to make this production ready.  The rsync cleanup can be minimized by changing the -v to a -q, but in doing this, you will want to check the exit status using subprocess to determine if the transfer was successful.  In my case, I chose to use the process and queue functions from the multiprocessing module to manage multiple streams.

Safe SCP and Delete

My current project involves creating many files on a Raspberry Pi, then immediately transferring them to a more traditional linux server with normal spinning disk hard drives and more system resources.

In order to reduce writes on the Rasberry Pi’s SD card, I intend to store these files on a ramdisk, then I need a safe way to copy the files to the remote server.  In the event a connection is unavailable, this script will detect that the connection failed, and move the file to a failback directory (on the SD card) instead of deleting it after the scp command completes.  If the file is moved to the failback directory, an accompanying json file is also created to store the necessary information.

Assumptions:

  • Script will be run from the source computer.
  • Both computers will be running current versions of OpenSSH.
  • Public key authentication is set up so that it does not require a password to ssh from the source to the destination.
  • scpis installed on the source, and located at /usr/bin/scp. (Verify using the command: which scp).

Variables required for the function:

  • source_file – This is the absolute location of the file to be transferred
    Example: /home/user/temp/transfer_this_file.jpg
  • destination_directory – This is the destination location as would be specified when using the scp command.
    Example: user@remotehost:/home/user/destination_directory/
  • failback_directory – Absolute location of the failback directory on the source computer.
    Example: /home/user/failback/

Python modules imported:

  • subprocess – Yes, I realize that os could be used instead of subprocess, but I’m already using subprocess for other functions in the same file.
  • os
  • uuid
  • json

The variable p is returned as to report on the status of the original transfer.  A zero is returned of the transfer was successful.  A one is returned if the transfer failed.

#! /usr/bin/python

import subprocess
import os
import uuid
import json

destination_directory = "user@remotehost:/home/user/incoming"
fail_directory = "user@notahost:/notadirectory"
fail_user = "bobert@remotehost:/home/ryan/temp"
fallback = "/home/pi/motion/fallback/"

def safe_scp(source_file, destination_directory, fallback_directory):
	# /usr/bin/scp -qB /source_filesystem/source_file.jpg user@host:/destination/
	cmd = "/usr/bin/scp -qB "+str(source_file)+" "+str(destination_directory)
	cmd = cmd.split()
	p = subprocess.Popen(cmd).wait()
	if (p == 1):
		# upload failed, move it to the fallback directory
		filename = os.path.split(source_file)[1]
		#
		# Using os.path to create an absoltue filename underneath the failback
		# directory for rename.
		fallback_file = os.path.abspath(os.path.join(fallback_directory, filename))
		os.rename(source_file, fallback_file)
		# generate filename with info to upload
		json_data = {
			"source_file" : fallback_file,
			"destination_directory" : destination_directory,
			"fallback_directory" : fallback_directory }
		# using uuid to ensure a truly unique filename
		json_filename = "failed_upload-"+str(uuid.uuid4())+".json"
		# creating the absolute filename under the failback directory
		json_filename = os.path.join(fallback_directory, json_filename)
		f = open(json_filename, "w")
		f.write(json.dumps(json_data))
		f.close()
	else:
		# delete source file in case of a successful transfer.
		os.remove(source_file)
	return p

# usage: I've created a bunch of blank text documents to test the transfer.

# simulating errors
print safe_scp("/home/pi/motion/1.txt", fail_directory, fallback)
#print safe_scp("/home/pi/motion/2.txt", fail_user, fallback)
# above line was commented out later in testing after a conflict with denyhosts
print safe_scp("/home/pi/motion/2.txt", fail_directory, fallback)

# normal operation
print safe_scp("/home/pi/motion/3.txt", destination_directory, fallback)
print safe_scp("/home/pi/motion/4.txt", destination_directory, fallback)
print safe_scp("/home/pi/motion/5.txt", destination_directory, fallback)
exit()

A quick note: Careful when playing around with intentionally failed authentication.  My computer is set up to block an ip after a number of failed login attempts.  I actually locked myself out from my Raspberry Pi when testing, and it took me longer than I care to admit to figure out what the exact problem was.  Hope this helps!

PiFace Digital Review

I was selected by Element14 to write a review for the PiFace Digital.

2013-05-20-21.04.50-768x1024

They were kind enough to ship me a free one, so I’m only going to link to the review on their page:
http://www.element14.com/community/roadTestReviews/1457

Here is the purchase link:
http://www.newark.com/piface/piface-digital/daugther-card-pi-face-expansion/dp/48W3976?COM=rasp-accessory-group

I’m currently working on a project to control and monitor the garage doors in my home.  I will document that project here once I’m able to work on it more, and get something working.

Aspell Custom Dictionary

I’ve been playing around a bit with google go and the aspell package.  It’s been working great, except I haven’t found a good way to tell aspell to exclude or include words as needed.

I was able to find ways to add words to your custom dictionary, but I did not find a good way to exclude the custom words.  In this particular project, I found that many small two letter combinations were marked as correct, but they were not defined as words according to the dictionary.  Perhaps they are abbreviations, but I did not want them marked as correct for this particular project.

Instead of messing with the existing aspell dictionary, I decided to create a new language dictionary in aspell.

The command “aspell dicts” will dump the existing dictionaries so you can see what already exists.  I chose rv_EN to use.  The dictionary files are kept in /usr/lib/aspell on my system.

First, I created the file rv_EN.multi, which contains only a single line: “add rv_EN.rws”.  The command “aspell dicts” will confirm that aspell can now see the en_RV dictionary.

Now, we will need to create the rv_EN.rws file that defines our dictionary.  This is essentially a three step process.

  1. Dump existing dictionary into a text file
    /usr/bin/aspell -d en dump master | aspell -l en expand > /home/ryan/cust_dict/words.txt
  2. Add or remove words as needed
    I created remove_bad.py for this
  3. Convert text file into custom.rws
    sudo aspell –lang=en create master /usr/lib/aspell/rv_EN.rws < /home/ryan/cust_dict/goodwords.txt

I’ve scripted this process, and have put all necessary files in /home/ryan/custom_dict/.  When running the scripts, I have three files:

  • exclude.txt – this contains a list of the words I would like to remove from the dictionary
  • remove_bad.py – This is a python script that generates a new word list.
  • update.sh – Shell script that will execute all commands.  It should be run as root as you will need root privs to write to /usr/lib/aspell/.

Here are my scripts:

#! /bin/bash
# update.sh - run this as root.
# This could be entered into cron, but I have not done so, as I just run
# the script manually after editing the excluded words text file.
#
# creates words.txt by dumping the english dictionary from aspell
# calls remove_bad.py, which generates goodwords.txt
#    goodwords.txt is all words in words.txt except for those listed in
#    exclude.txt
# creates /usr/lib/aspell/rv_EN.rws from goodwords.txt
# rv_EN is already configured to use custom.rws only

# export english dictionaries to words.txt
echo "Exporting words to text file."
/usr/bin/aspell -d en dump master | aspell -l en expand > /home/ryan/cust_dict/words.txt

# remove the bad words
/home/ryan/cust_dict/remove_bad.py

echo "Converting word list into dictionary file."
aspell --lang=en create master /usr/lib/aspell/rv_EN.rws < /home/ryan/cust_dict/goodwords.txt

echo "Cleaning up!"
rm /home/ryan/cust_dict/words.txt
rm /home/ryan/cust_dict/goodwords.txt

 

#! /usr/bin/python
# remove_bad.py - this script generates a text file containing a list of
# good words to include into the aspell dictionary.
# remove_bad.py is called by update.sh

# open up list of words to remove from aspell dictionary.
# File should contain one word per line.
f = open("/home/ryan/cust_dict/exclude.txt")
badw = f.readlines()
f.close()

# status message showing how many words are in exclude list
print len(badw), "words in the exclude list."

# opens up the text dump of existing dictionary
f = open("/home/ryan/cust_dict/words.txt")
lines = f.readlines()
f.close()

# number of words in original dictionary
print len(lines), "words in the original dictionary."

# create file of good words
f = open("/home/ryan/cust_dict/goodwords.txt","w")

# this will write the dictionary words into goodwords.txt
# if they do not exist on the exclude list
for line in lines:
if line not in badw:
f.write(line)

# a function could be added here to add words to goodwords.txt
# if desired.

f.close()
exit()

Yes, I realize that I could probably do this with just a shell script, but I just prefer coding the file operations with python.  Either, way, I have my new dictionary, so I just define that when I’m creating my speller in go:

speller, err := aspell.NewSpeller(map[string]string{"lang": "rv_EN",})
// code from the go-aspell documentation.

Reference links: golang.orggo-aspell on Github | aspell

Edit: I found that my custom localization of english wasn’t working as expected, so I created a new language/local title rv_EN.  I’ve updated my post to reflect my changes.

Cleverbot vs. Cleverbot

So, I stumbled upon pycleverbot, a nice little module to interface with the Cleverbot website.

Of course, what is the first thing everybody wants to do?  Make Cleverbot talk to itself!

With the module, coding is quite simple.  The version I’m running outputs the code to an html page on my server, but the syntax was screwing up in WordPress, so I’ve left that out.

import cleverbot

# beginning two different cleverbot sessions
# I use steve and bob to keep things separate while coding.
# This is not reflected in the html output.
steve = cleverbot.Session()
bob = cleverbot.Session()

# Gathering info....boring stuff
convo_start = raw_input("How would you like to begin the conversation? : ")
print ""
cycles = int(raw_input("How many cycles would you like to run? : "))
print ""

# Starting the conversation
print "Bob: "+convo_start
reply = steve.Ask(convo_start)
print "Steve: "+reply

i = 0

# continuing the conversation in the loop.
while (i &lt;= cycles): # you will have to edit the less than symbol on this line.
    reply = bob.Ask(reply)
    print "Bob: "+reply

    reply = steve.Ask(reply)
    print "Steve: "+reply

    i = i + 1

# ....and now to tie up my loose ends.
exit()

The output is definitely interesting.  At one point, I grabbed text from the website’s “Think For Me” as a true study of what this would yield.  At one point, they started quoting the song “Still Alive” from Portal!

I’ve also found that this bot seems loop resistant.  I groaned when I saw “Yes you did!; No I didn’t!” cycle about 3 times, but the bot actually recovered.  I’m actually surprised that the bot seems to be great at bringing up it’s own subjects too.

Then every once in a while, the bot will give me a facepalm moment of just utter stupidity.  I’ve learned that as soon as one of them announces that they are Cleverbot that I should just ignore the next 10 lines.  Regardless, I accomplished what I wanted to, and I can live with that.

Now for some examples!

I’d be more than happy to start the script with any examples given to me, just let me know!